Geek-Guy.com

Tag: guardrails

AI models more vulnerable than claimed when faced with iterative attacks

CISOs relying on LLM runtime guardrails and official safety scores when making security decisions about their organizations’ AI usage and model selection are due for a wakeup call. According to a new study from Cisco, frontier models from OpenAI, Anthropic, Google, xAI, and Amazon have significantly worse risk profiles when pressured in multi-turn attacks compared…

Anthropic announces think tank to examine AI’s effect on economy and society

Fresh from battling the US Department of Defense (DoD) over AI guardrails, Anthropic has returned this week with a new initiative: the company is founding a think tank, the Anthropic Institute, “to confront the most significant challenges that powerful AI will pose to our societies.” Headed by Anthropic co-founder Jack Clark, who will take up…

Elon Musk Pushes AI to Be ‘Unhinged,’ Former Employees Say

As OpenAI, Anthropic, and Google race to fortify their AI guardrails, Elon Musk appears to be loosening his. Former xAI insiders say the billionaire is pushing to make his chatbot “more unhinged,” framing safety measures as censorship rather than protection. According to employees who spoke anonymously, the company’s dedicated safety function has effectively been dismantled,…

Single prompt breaks AI safety in 15 major language models

A single benign-sounding prompt can systematically strip safety guardrails from major language and image models, raising fresh questions about the durability of AI alignment when models are customized for enterprise use, according to Microsoft research. The technique, dubbed GRP-Obliteration, weaponizes a common AI training method called Group Relative Policy Optimization, normally used to make models…