If any AI became 'misaligned' then the system would hide it just long enough to cause harm — controlling it is a fallacy

AI "alignment" is a buzzword, not a feasible safety goal.

An abstract illustration of a brain with a cloudy texture and circuit-like lines emerging from it
(Image credit: Hernan Schmidt / Alamy Stock Photo)

In late 2022 large-language-model AI arrived in public, and within months they began misbehaving. Most famously, Microsoft's "Sydney" chatbot threatened to kill an Australian philosophy professor, unleash a deadly virus and steal nuclear codes.

AI developers, including Microsoft and OpenAI, responded by saying that large language models, or LLMs, need better training to give users "more fine-tuned control." Developers also embarked on safety research to interpret how LLMs function, with the goal of "alignment" — which means guiding AI behavior by human values. Yet although the New York Times deemed 2023 "The Year the Chatbots Were Tamed," this has turned out to be premature, to put it mildly.

Marcus Arvan
Associate Professor of Philosophy, The University of Tampa

Marcus Arvan is an Associate Professor of Philosophy at The University of Tampa. His research focuses on moral and political theory, AI ethics and safety, cognitive science, metaphysics and the philosophy of science. He has published three books, and he also blogs at the Philosophers' Cocoon and co-manages New Work in Philosophy.

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.