guardrails - Geek-Guy.com

CISOs relying on LLM runtime guardrails and official safety scores when making security decisions about their organizations’ AI usage and model selection are due for a wakeup call. According to a new study from Cisco, frontier models from OpenAI, Anthropic, Google, xAI, and Amazon have significantly worse risk profiles when pressured in multi-turn attacks compared…

ClaudeBleed Vulnerability Lets Hackers Hijack Claude Chrome Extension to Steal Data

May 8, 2026

The ClaudeBleed vulnerability allows hackers to bypass Claude for Chrome guardrails to exfiltrate private Google Drive and Gmail data.

Source Code Leaks Highlight Lack of Supply Chain Oversight

April 3, 2026

Or, why the software supply chain should be treated as critical infrastructure with guardrails built in at every layer.

Anthropic announces think tank to examine AI’s effect on economy and society

March 11, 2026

Fresh from battling the US Department of Defense (DoD) over AI guardrails, Anthropic has returned this week with a new initiative: the company is founding a think tank, the Anthropic Institute, “to confront the most significant challenges that powerful AI will pose to our societies.” Headed by Anthropic co-founder Jack Clark, who will take up…

Researchers Discover Major Security Gaps in LLM Guardrails

March 11, 2026

Palo Alto Networks’ Unit 42 has developed a successful attack to bypass safety guardrails in popular generative AI tools

What’s Really at Stake in the Fight Between Anthropic and the Pentagon

March 1, 2026

The feud goes beyond AI guardrails and revolves around the dream of the nascent technology’s future.

Trump Will End Government Use of Anthropic’s AI Models

February 27, 2026

Move follows weeks of tension between Pentagon and Anthropic over AI guardrails.

Elon Musk Pushes AI to Be ‘Unhinged,’ Former Employees Say

February 16, 2026

As OpenAI, Anthropic, and Google race to fortify their AI guardrails, Elon Musk appears to be loosening his. Former xAI insiders say the billionaire is pushing to make his chatbot “more unhinged,” framing safety measures as censorship rather than protection. According to employees who spoke anonymously, the company’s dedicated safety function has effectively been dismantled,…

Single prompt breaks AI safety in 15 major language models

February 10, 2026

A single benign-sounding prompt can systematically strip safety guardrails from major language and image models, raising fresh questions about the durability of AI alignment when models are customized for enterprise use, according to Microsoft research. The technique, dubbed GRP-Obliteration, weaponizes a common AI training method called Group Relative Policy Optimization, normally used to make models…

Tag: guardrails

AI, Global Security News, Risk Management

AI models more vulnerable than claimed when faced with iterative attacks

May 27, 2026

AI, Global Security News

ClaudeBleed Vulnerability Lets Hackers Hijack Claude Chrome Extension to Steal Data

May 8, 2026

AI, Global Security News

Source Code Leaks Highlight Lack of Supply Chain Oversight

April 3, 2026

AI, Global Security News, Government & Policy, Risk Management

Anthropic announces think tank to examine AI’s effect on economy and society

March 11, 2026

AI, Global Security News, Network Security

Researchers Discover Major Security Gaps in LLM Guardrails

March 11, 2026

AI, Global Security News

What’s Really at Stake in the Fight Between Anthropic and the Pentagon

March 1, 2026

AI, Global Security News

Trump Will End Government Use of Anthropic’s AI Models

February 27, 2026

AI, Global Security News, Risk Management

Elon Musk Pushes AI to Be ‘Unhinged,’ Former Employees Say

February 16, 2026

AI, Artificial Intelligence, Generative AI, Security, Cybersecurity, Data Breaches, Exploits, Global Security News, Risk Management

Single prompt breaks AI safety in 15 major language models

February 10, 2026