A study by Friedrich Schiller University Jena led by Dr. Kevin M. Jablonka has evaluated the performance of modern AI models such as GPT-4 in chemistry using the newly developed "ChemBench" test procedure. The results show that AI can perform convincingly in certain chemical tasks, but also has clear weaknesses compared to human experts.
More than 2,700 tasks compared between man and machine
In order to evaluate the capabilities of AI in chemistry, the team at the University of Jena developed the "ChemBench" test procedure with over 2,700 tasks from various areas of chemistry, covering both basic knowledge and complex problems. The performance of the AI models was compared with that of 19 experienced human experts. While the humans were allowed to use aids, the AI models had to draw their knowledge exclusively from training data. In addition to the accuracy of the answers, the researchers also evaluated the AI's assessment of the reliability of the answers.
AI is faster and more efficient, humans are more reflective and self-critical
The results of the study show a mixed picture, reports Jablonka: "In even very demanding textbook-type questions, some AI models proved to be more efficient than humans." However, while the chemists openly admitted in some cases that they could not answer a question with certainty, the best AI models showed the opposite tendency: they often gave answers with great confidence - even if they were incorrect in terms of content.
"Incorrect answers with high conviction can lead to problems"
"This was particularly noticeable in questions relating to the interpretation of chemical structures, such as the prediction of NMR spectra," says Jablonka. Here, the models seemed to provide clear answers, even if they sometimes made fundamental errors. The human experts, on the other hand, hesitated more often and questioned their own conclusions. "This discrepancy is a decisive factor for the practical applicability of AI in chemistry," Jablonka concludes, because: "A model that provides incorrect answers with a high degree of conviction can lead to problems in sensitive areas of research."
"Our research shows that AI can be an important addition to human expertise - not as a replacement, but as a valuable tool that supports the work," summarizes Kevin Jablonka. "Our study thus lays the foundation for closer collaboration between AI and human expertise in chemistry."
You can find the whole article here:
Source: Friedrich Schiller University (05/2025)