Chatbot Failures Expose AI’s Weaknesses in News Summarization

Chatbot Failures Highlight AI’s News Summarization Weaknesses

According to a BBC study, four significant artificial intelligence (AI) chatbots are summarizing news articles incorrectly.

The BBC requested ChatGPT, Copilot, Gemini, and Perplexity to summarize 100 news items as part of the study, and they scored each response. It asked journalists with relevant expertise in the article's topic to score the AI helpers' responses.

It was discovered that 51% of all AI responses to news-related queries were deemed to have serious problems of some kind. Furthermore, 19% of AI responses that referenced BBC content included factual inaccuracies, including inaccurate dates, statistics, and claims.

Among the falsehoods the BBC discovered was Gemini's claim that the NHS does not advise vaping as a smoking cessation tool. According to ChatGPT and Copilot, Nicola Sturgeon and Rishi Sunak remained in their positions after they departed. In a Middle East report, Perplexity misquoted BBC News, claiming that Iran first shown "restraint" and labeled Israel's moves as "aggressive".

Generally speaking, OpenAI's ChatGPT and Perplexity—which has Jeff Bezos as an investor—had less problems than Google's Gemini and Microsoft's Copilot. The BBC normally protects its material from AI chatbots, but for the December 2024 testing, it made its website accessible.

According to the research, the chatbots "struggled to differentiate between opinion and fact, editorialized, and often failed to include essential context" in addition to having factual errors.