Posts

Showing posts from October, 2024

Silly Things Large Language Models Do With Molecules

Image
  “Pay no attention to the man behind the curtain” - The Wizard of Oz Introduction Recently, a few groups have proposed general-purpose large language models (LLMs) like ChatGPT , Claude , and Gemini as tools for generating molecules. This idea is appealing because it doesn't require specialized software or domain-specific model training. One can provide the LLM with a relatively simple prompt like the one below, and it will respond with a list of SMILES strings. You are a skilled medicinal chemist.  Generate SMILES strings for 100 analogs of the molecule represented by the SMILES CCOC(=O)N1CCC(CC1)N2CCC(CC2)C(=O)N. You can modify both the core and the substituents. Return only the SMILES as a Python list. Don’t put in line breaks. Don't put the prompt into the reply. However, when analyzing molecules created by general-purpose LLMs, I'm reminded of my undergraduate Chemistry days. My roommates, who majored in liberal arts, would often assemble random pieces from my mole