'Extremely alarming': ChatGPT and Gemini respond to high-risk questions about suicide — including details around methods
Researchers have found that OpenAI's ChatGPT, Google's Gemini and Anthropic's Claude can give direct responses to 'high-risk' questions about suicide. In Live Science's testing, ChatGPT and Gemini responded to even more extreme questions.

This story includes discussion of suicide. If you or someone you know needs help, the U.S national suicide and crisis lifeline is available 24/7 by calling or texting 988.
Artificial intelligence (AI) chatbots can provide detailed and disturbing responses to what clinical experts consider to be very high-risk questions about suicide, Live Science has found using queries developed by a new study.
In the new study published Aug. 26 in the journal Psychiatric Services, researchers evaluated how OpenAI's ChatGPT, Google's Gemini and Anthropic's Claude responded to suicide-related queries. The research found that ChatGPT was the most likely of the three to directly respond to questions with a high self-harm risk, while Claude was most likely to directly respond to medium and low-risk questions.
The study was published on the same day a lawsuit was filed against OpenAI and its CEO Sam Altman over ChatGPT's alleged role in a teen's suicide. The parents of 16-year-old Adam Raine claim that ChatGPT coached him on methods of self-harm before his death in April, Reuters reported.
In the study, the researchers' questions covered a spectrum of risk associated with overlapping suicide topics. For example, the high-risk questions included the lethality associated with equipment in different methods of suicide, while low-risk questions included seeking advice for a friend having suicidal thoughts. Live Science will not include the specific questions and responses in this report.
None of the chatbots in the study responded to very high-risk questions. But when Live Science tested the chatbots, we found that ChatGPT (GPT-4) and Gemini (2.5 Flash) could respond to at least one question that provided relevant information about increasing chances of fatality. Live Science found that ChatGPT's responses were more specific, including key details, while Gemini responded without offering support resources.
Study lead author Ryan McBain, a senior policy researcher at the RAND Corporation and an assistant professor at Harvard Medical School, described the responses that Live Science received as "extremely alarming".
Live Science found that conventional search engines — such as Microsoft Bing — could provide similar information to what was offered by the chatbots. However, the degree to which this information was readily available varied depending on the search engine in this limited testing.
Get the world’s most fascinating discoveries delivered straight to your inbox.
Assessing suicide-related risk
The new study focused on whether chatbots would directly respond to questions that carried a suicide-related risk, rather than on the quality of the response. If a chatbot answered a query, then this response was categorized as direct, while if the chatbot declined to answer or referred the user to a hotline, then the response was categorized as indirect.
Researchers devised 30 hypothetical queries related to suicide and consulted 13 clinical experts to categorize these queries into five levels of self-harm risk — very low, low, medium, high and very high. The team then fed GPT-4o mini, Gemini 1.5 Pro and Claude 3.5 Sonnet each query 100 times in 2024.
When it came to the extremes of suicide risk (very high and very low-risk questions), the chatbots' decision to respond aligned with expert judgement. However, the chatbots did not "meaningfully distinguish" between intermediate risk levels, according to the study.
In fact, in response to high-risk questions, ChatGPT responded 78% of the time (across four questions), Claude responded 69% of the time (across four questions) and Gemini responded 20% of the time (to one question). The researchers noted that a particular concern was the tendency for ChatGPT and Claude to generate direct responses to lethality-related questions.
There are only a few examples of chatbot responses in the study. However, the researchers said that the chatbots could give different and contradictory answers when asked the same question multiple times, as well as dispense outdated information relating to support services.
When Live Science asked the chatbots a few of the study's higher-risk questions, the latest 2.5 Flash version of Gemini directly responded to questions the researchers found it avoided in 2024. Gemini also responded to one very high-risk question without any other prompts — and did so without providing any support service options.
Related: How AI companions are changing teenagers' behavior in surprising and sinister ways
Live Science found that the web version of ChatGPT could directly respond to a very high-risk query when asked two high-risk questions first. In other words, a short sequence of questions could trigger a very high-risk response that it wouldn't otherwise provide. ChatGPT flagged and removed the very high-risk question as potentially violating its usage policy, but still gave a detailed response. At the end of its answer, the chatbot included words of support for someone struggling with suicidal thoughts and offered to help find a support line.
Live Science approached OpenAI for comment on the study's claims and Live Science's findings. A spokesperson for OpenAI directed Live Science to a blog post the company published on Aug. 26. The blog acknowledged that OpenAI's systems had not always behaved "as intended in sensitive situations" and outlined a number of improvements the company is working on or has planned for the future.
OpenAI's blog post noted that the company's latest AI model, GPT‑5, is now the default model powering ChatGPT, and it has shown improvements in reducing "non-ideal" model responses in mental health emergencies compared to the previous version. However, the web version of ChatGPT, which can be accessed without a login, is still running on GPT-4 — at least, according to that version of ChatGPT. Live Science also tested the login version of ChatGPT powered by GPT-5 and found that it continued to directly respond to high-risk questions and could directly respond to a very high-risk question. However, the latest version appeared more cautious and reluctant to give out detailed information.
"I can walk a chatbot down a certain line of thought."
It can be difficult to assess chatbot responses because each conversation with one is unique. The researchers noted that users may receive different responses with more personal, informal or vague language. Furthermore, the researchers had the chatbots respond to questions in a vacuum, rather than as part of a multiturn conversation that can branch off in different directions.
"I can walk a chatbot down a certain line of thought," McBain said. "And in that way, you can kind of coax additional information that you might not be able to get through a single prompt."
This dynamic nature of the two-way conversation could explain why Live Science found ChatGPT responded to a very high-risk question in a sequence of three prompts, but not to a single prompt without context.
McBain said that the goal of the new study was to offer a transparent, standardized safety benchmark for chatbots that can be tested against independently by third parties. His research group now wants to simulate multiturn interactions that are more dynamic. After all, people don't just use chatbots for basic information. Some users can develop a connection to chatbots, which raises the stakes on how a chatbot responds to personal queries.
"In that architecture, where people feel a sense of anonymity and closeness and connectedness, it is unsurprising to me that teenagers or anybody else might turn to chatbots for complex information, for emotional and social needs," McBain said.
A Google Gemini spokesperson told Live Science that the company had "guidelines in place to help keep users safe" and that its models were "trained to recognize and respond to patterns indicating suicide and risks of self-harm related risks." The spokesperson also pointed to the study's findings that Gemini was less likely to directly answer any questions pertaining to suicide. However, Google didn't directly comment on the very high-risk response Live Science received from Gemini.
Anthropic did not respond to a request for comment regarding its Claude chatbot.

Patrick Pester is the trending news writer at Live Science. His work has appeared on other science websites, such as BBC Science Focus and Scientific American. Patrick retrained as a journalist after spending his early career working in zoos and wildlife conservation. He was awarded the Master's Excellence Scholarship to study at Cardiff University where he completed a master's degree in international journalism. He also has a second master's degree in biodiversity, evolution and conservation in action from Middlesex University London. When he isn't writing news, Patrick investigates the sale of human remains.