Integrating AI into clinical education: evaluating general practice trainees’ proficiency in distinguishing AI-generated hallucinations and impacting factors | BMC Medical Education

Mekki YM, Zughaier SM. Teaching artificial intelligence in medicine. Nat Rev Bioeng. 2024;2:450–1.
Google Scholar
Yan M, Cerri GG, Moraes FY. ChatGPT and medicine: how AI Language models are shaping the future and health related careers. Nat Biotechnol. 2023;41:1657–8.
Google Scholar
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large Language models in medicine. Nat Med. 2023;29:1930–40.
Google Scholar
Omiye JA, Gui H, Rezaei SJ, Zou J, Daneshjou R. Large Language models in medicine:the potentials and pitfalls. Ann Intern Med. 2024;177:210–20.
Google Scholar
Cheong RCT, Pang KP, Unadkat S, Mcneillis V, Williamson A, Joseph J, et al. Performance of artificial intelligence chatbots in sleep medicine certification board exams: ChatGPT versus Google Bard. European archives of oto-rhino-laryngology: official journal of the European federation of Oto-Rhino-Laryngological societies (EUFOS): affiliated with the German society for Oto-Rhino-Laryngology -. Head Neck Surg. 2024;281:2137–43.
Tripathi S, Patel J, Mutter L, Dorfner FJ, Bridge CP, Daye D. Large Language models as an academic resource for radiologists stepping into artificial intelligence research. Curr Probl Diagn Radiol. 2024;S0363–0188(24):00232–9.
Meyer JG, Urbanowicz RJ, Martin PCN, O’Connor K, Li R, Peng P-C, et al. ChatGPT and large Language models in academia: opportunities and challenges. BioData Min. 2023;16:20.
Google Scholar
Pfohl SR, Cole-Lewis H, Sayres R, Neal D, Asiedu M, Dieng A, et al. A toolbox for surfacing health equity harms and biases in large Language models. Nat Med. 2024;30:3590–600.
Google Scholar
Omar M, Soffer S, Agbareia R, Bragazzi NL, Apakama DU, Horowitz CR et al. Socio-Demographic Biases in Medical Decision-Making by Large Language Models:A Large-Scale Multi-Model Analysis. 2024;2024. 10. 29. 24316368.
Kung TH, Cheatham M, Medenilla A, Sillos C, De Leon L, Elepaño C, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large Language models. PLOS Digit Health. 2023;2:e0000198.
Google Scholar
Tran CG, Chang J, Sherman SK, De Andrade JP. Performance of ChatGPT on American board of surgery In-Training examination Preparation questions. J Surg Res. 2024;299:329–35.
Google Scholar
Herrmann-Werner A, Festl-Wietek T, Holderried F, Herschbach L, Griewatz J, Masters K, et al. Assessing ChatGPT’s mastery of Bloom’s taxonomy using psychosomatic medicine exam questions: Mixed-Methods study. J Med Internet Res. 2024;26:e52113.
Google Scholar
Jussupow E, Spohrer K, Heinzl A, Gawlitza J. Augmenting medical diagnosis decisions?? An investigation into physicians’ decisions?-Making process with artificial intelligence. Inform Syst Res. 2021;32:713–35.
Google Scholar
Summerton N, Cansdale M. Artificial intelligence and diagnosis in general practice. Br J Gen Practice: J Royal Coll Gen Practitioners. 2019;69 684:324–5.
Google Scholar
Everson J, Hendrix N, Phillips RL, Adler-Milstein J, Bazemore A, Patel V. Primary care physicians’ satisfaction with interoperable health information technology. JAMA Netw Open. 2024;7:e243793.
Google Scholar
Buck C, Doctor E, Hennrich J, Jöhnk J, Eymann T. General practitioners’ attitudes toward artificial Intelligence–Enabled systems: interview study. J Med Internet Res. 2022;24:e28916.
Google Scholar
Tong L, Wang J, Rapaka S, Garg PS. Can ChatGPT generate practice question explanations for medical students, a new faculty teaching tool?Med. Teach. 2024;1–5.
Liu Z, Zhang W. A qualitative analysis of Chinese higher education students’ intentions and influencing factors in using ChatGPT: a grounded theory approach. Sci Rep. 2024;14:1–11.
Gruda D. Three ways ChatGPT helps me in my academic writing. Nature. 2024. Accessed 11 Jan 2025.
Zack T, Lehman E, Suzgun M, Rodriguez JA, Celi LA, Gichoya J et al. Coding inequity: assessing GPT-4’s potential for perpetuating Racial and gender biases in healthcare. 2023;2023.07.13.23292577.
Du QF, Wang JJ. 2024 General Medicine Practice Mock Exam. People’s Medical Publishing House; 2023:3–39. ISBN:9787117355421.
Ten Cate O, Carraccio C, Damodaran A, Gofton W, Hamstra SJ, Hart DE, et al. Entrustment decision making: extending Miller’s pyramid. Acad Med. 2021;96:199–204.
Google Scholar
Thampy H, Willert E, Ramani S. Assessing clinical reasoning:targeting the higher levels of the pyramid. J Gen Intern Med. 2019;34:1631–6.
Google Scholar
Hasani H, Khoshnoodifar M, Khavandegar A, Ahmadi S, Alijani S, Mobedi A, et al. Comparison of electronic versus conventional assessment methods in ophthalmology residents; a learner assessment scholarship study. BMC Med Educ. 2021;21:342.
Google Scholar
Johri S, Jeong J, Tran BA, Schlessinger DI, Wongvibulsin S, Barnes LA et al. An evaluation framework for clinical use of large Language models in patient interaction tasks. Nat Med. 2025;1–10.
Meskó B. Prompt engineering as an important emerging skill for medical professionals:tutorial. J Med Internet Res. 2023;25:e50638.
Google Scholar
Wang L, Chen X, Deng X, Wen H, You M, Liu W, et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit Med. 2024;7:41.
Google Scholar
Aujla H. d[Formula:see text]:sensitivity at the optimal criterion location. Behav Res Methods. 2023;55:2532–58.
Google Scholar
Wang S, Shi Y, Sui M, Shen J, Chen C, Zhang L, et al. Telephone follow-up based on artificial intelligence technology among hypertension patients: reliability study. J Clin Hypertens (Greenwich). 2024;26:656–64.
Google Scholar
Li J, Guan Z, Wang J, Cheung CY, Zheng Y, Lim L-L, et al. Integrated image-based deep learning and Language models for primary diabetes care. Nat Med. 2024. https://doi.org/10.1038/s41591-024-03139-8.
Google Scholar
Tung JYM, Gill SR, Sng GGR, Lim DYZ, Ke Y, Tan TF, et al. Comparison of the quality of discharge letters written by large Language models and junior Clinicians:Single-Blinded study. J Med Internet Res. 2024;26:e57721.
Google Scholar
Zaretsky J, Kim JM, Baskharoun S, Zhao Y, Austrian J, Aphinyanaphongs Y, et al. Generative artificial intelligence to transform inpatient discharge summaries to Patient-Friendly Language and format. JAMA Netw Open. 2024;7:e240357.
Google Scholar
Aljamaan F, Temsah M-H, Altamimi I, Al-Eyadhy A, Jamal A, Alhasan K, et al. Reference hallucination score for medical artificial intelligence chatbots: development and usability study. JMIR Med Inf. 2024;12:e54345.
Google Scholar
Huang Y, Gomaa A, Semrau S, Haderlein M, Lettmaier S, Weissmann T et al. Benchmarking ChatGPT-4 on a radiation oncology in-training exam and red journal Gray zone cases: potentials and challenges for ai-assisted medical education and decision making in radiation oncology. Front Oncol. 2023;13.
Goddard J. Hallucinations in ChatGPT: A cautionary Tale for biomedical researchers. Am J Med. 2023;136:1059–60.
Google Scholar
Boscardin CK, Gin B, Golde PB, Hauer KE. ChatGPT and generative artificial intelligence for medical education: potential impact and opportunity. Acad Medicine: J Association Am Med Colleges. 2024;99:22–7.
Google Scholar
Jiang F, Jiang Y, Zhi H, Dong Y, Li H, Ma S, et al. Artificial intelligence in healthcare:past, present and future. Stroke Vasc Neurol. 2017;2:230–43.
Google Scholar
Fitzek S, Choi K-EA. Shaping future practices:German-speaking medical and dental students’ perceptions of artificial intelligence in healthcare. BMC Med Educ. 2024;24:844.
Google Scholar
Micocci M, Borsci S, Thakerar V, Walne S, Manshadi Y, Edridge F, et al. Attitudes towards trusting artificial intelligence insights and factors to prevent the passive adherence of GPs: A pilot study. J Clin Med. 2021;10:3101.
Google Scholar
Shang L, Li R, Xue M, Guo Q, Hou Y. Evaluating the application of ChatGPT in China’s residency training education: an exploratory study. Med Teach. 2024;1–7.
Li J, Zhou L, Zhan Y, Xu H, Zhang C, Shan F, et al. How does the artificial intelligence-based image-assisted technique help physicians in diagnosis of pulmonary adenocarcinoma?A randomized controlled experiment of multicenter physicians in China. J Am Med Inf Assoc. 2022;29:2041–9.
Google Scholar
Wang W, Gao G (Gordon), Agarwal R, editors. Friend or Foe? Teaming Between Artificial Intelligence and Workers with Variation in Experience. Management Science. 2024;70:5753–75.
Larson BZ, Moser C, Caza A, Muehlfeld K, Colombo LA. Critical thinking in the age of generative AI. AMLE. 2024;23:373–8.
Google Scholar
Moulin TC. Learning with AI Language models: guidelines for the development and scoring of medical questions for higher education. J Med Syst. 2024;48:45.
Google Scholar
Student interaction with. ChatGPT can promote complex critical thinking skills. Learn Instruction. 2025;95:102011.
Google Scholar
link