Home Innovation Chatbot MIT Warns: Slang and Typos Cou...

MIT Warns: Slang and Typos Could Mislead AI in Medical Advice

Chatbot

Business Fortune-MIT Warns Slang & Typos May Mislead AI in Medical Advice

Business Fortune
27 June, 2025

A recent MIT study suggests that nonclinical elements in patient communications—like typos, missing gender markers, and informal language—could affect Large Language Models (LLMs) in prescribing medical treatments.

The models may inadvertently urge people to self-manage critical health concerns rather than seeking medical attention as a result of these stylistic peculiarities. Patient-facing chatbots are frequently utilized in conversational scenarios when an LLM interacts with a patient, which exacerbates the errors brought forth by nonclinical terminology.

Models may inadvertently advise patients to self-manage critical health concerns rather than seeking medical attention as a result of these artistic shortcuts.

The research, which was published prior to the ACM Conference on Fairness, Accountability, and Transparency, demonstrates that when patient communications are modified with such differences, self-management suggestions improve by 7-9%. Even when gender signals are not present in the clinical setting, models make almost 7% more mistakes and disproportionately recommend women to stay at home, which is especially noticeable for female patients.

According to senior author and MIT associate professor Marzyeh Ghassemi, this is compelling evidence that models, which are already in use in the healthcare industry, should be audited before deployment. LLMs consider nonclinical data in ways that were previously unknown to them.

MIT graduate student and lead author Abinitha Gourabathina pointed out that LLMs, who are frequently educated on medical exam questions, are employed in duties like determining clinical severity, where their limits are less well understood. According to her, they still don't fully understand LLMs.

Ghassemi noted that the study discovered that colorful words, such as slang or dramatic statements, had the most influence on model inaccuracies and that caution should be exercised when using them for important medical decisions. These message differences did not affect human clinicians in follow-up studies, in contrast to LLMs. Patient care was not intended to be the first priority of LLMs.

The researchers hope to increase the accuracy of AI in healthcare by delving deeper into how LLMs determine gender and creating tests that identify vulnerabilities in other patient populations.