A team from the University of Jena has used the new "ChemBench" testing method to analyze the chemical expertise of AI models (such as GPT-4) in comparison to human experts. For laboratory digitization, the study provides crucial insights into the validity of automated decisions.
An overview of the key findings:
- High efficiency in standard tasks: In over 2,700 tests, AI models outperformed experienced chemists in some cases when answering complex textbook questions. They operate faster and can draw on vast amounts of data as a knowledge base.
- Shortcomings in error handling: While human experts clearly acknowledge uncertainties, AI models tend to produce “highly convincing hallucinations.” Particularly when interpreting chemical structures (e.g., NMR spectra), they often provided incorrect answers with misleading confidence.
- Human-machine collaboration: The study positions AI not as a replacement, but as a complementary tool. For integration into digital laboratory environments (LIMS/ELN), this means that AI can accelerate processes but requires a human validation loop (“human-in-the-loop”).
Conclusion on laboratory digitization: AI offers enormous potential for automated data preprocessing, but requires robust control mechanisms to prevent misinterpretations in sensitive areas of research.
You can find the full article here:
Source: Friedrich Schiller University (May 2025)
