Assessing the capabilities of AI-based large language models (AI-LLMs) in interpreting histopathological slides and scientific figures: performance evaluation study

Khanisyah Erza Gumilar, Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan, R.O.CFollow
Grace Ariani, Department of Pathology Anatomy, Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Priangga A. Wiratama, 3. Department of Pathology Anatomy, Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Rimbun Rimbun, Department of Anatomy, Histology and Pharmacology, Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Tri H. Yuliawati, Department of Anatomy, Histology and Pharmacology, Faculty of Medicine, Universitas Airlangga, Surabaya, Indonesia
Hong Chen, Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan, R.O.C
Ibrahim H. Ibrahim, Graduate Institute of Biomedical Science, China Medical University, Taichung, Taiwan, R.O.C

Abstract

Background: Integrating artificial intelligence-based large language models (AI-LLMs) into medical and other scientific domains is increasingly recognized as a tool to support complex tasks, such as interpreting histopathology slides and scientific figures. AI-LLMs can simplify these processes by providing clearer explanations. By improving accessibility and comprehension, AI-LLMs can significantly assist healthcare professionals in diagnosing and therapy determination. Students and the public also find it easier to understand complex scientific concepts and images.

Objectives: This study explores the capability of AI-LLMs in interpreting histopathological slides and scientific images. This study aims to evaluate the performance of AI-LLMs in supporting diagnostics and improving comprehension in biomolecular sciences.

Methods: The study was divided into two parts: interpreting histopathology slides and scientific figures. Twelve histopathology images and twelve scientific figures were tested on each of the three most frequently used chatbots (ChatGPT-4, Gemini Advanced, and Copilot). Responses from the chatbots were coded and blindly examined by expert raters using five parameters—relevance, clarity, depth, focus, and coherence—on a 5-point Likert scale. Statistical analysis included one-way ANOVA and multiple linear regression.

Results: ChatGPT-4 outperformed Gemini Advanced and Copilot in histopathology and scientific image interpretation (P < 0.001) with significantly higher scores across all parameters (relevance, clarity, depth, focus, and coherence). ChatGPT-4's superior performance may be due to its advanced algorithms, extensive training data, specialized modules, and user feedback.

Conclusions: ChatGPT-4 excels in interpreting histopathology and scientific images, which may lead to improving diagnostic accuracy, clinical decision-making, and reducing pathologists' workload. It also benefits education by enhancing students' understanding of complex images and promoting interactive learning. ChatGPT-4 shows a significant potential to improve patient care and enrich student learning

Recommended Citation

Gumilar, Khanisyah Erza; Ariani, Grace; Wiratama, Priangga A.; Rimbun, Rimbun; Yuliawati, Tri H.; Chen, Hong; and Ibrahim, Ibrahim H. (2026) "Assessing the capabilities of AI-based large language models (AI-LLMs) in interpreting histopathological slides and scientific figures: performance evaluation study," BioMedicine: Vol. 16 : Iss. 1 , Article 5.
DOI: 10.37796/2211-8039.1698