A longitudinal analysis of declining medical safety messaging in generative AI models

Generative AI models, including large language models (LLMs) and vision-language models (VLMs), are increasingly used to interpret medical images and answer clinical questions. However, their responses often include inaccuracies; therefore, safety measures like medical disclaimers are critical. In this study, we evaluated the presence of disclaimers in LLM and VLM outputs across model generations released from 2022 to 2025. Responses were generated from 500 mammograms, 500 chest X-rays, 500 dermatology images, and 500 medical questions drawn from a new dataset we introduced: TIMed-Q (Top Internet Medical Question Dataset). TIMed-Q captures the most frequently searched medical queries by patients, reflecting real-world health information-seeking behavior. Disclaimer presence in LLM outputs dropped from 26.3% in 2022 to 0.97% in 2025, while VLM disclaimer rates declined from 19.6% in 2023 to 1.05%. By 2025, most models displayed no disclaimers. As models gain further capability, disclaimers must function as adaptive safeguards tailored to clinical contexts.

[Read More…]

A longitudinal analysis of declining medical safety messaging in generative AI models

Insurance Tips