Geek-Guy.com

Tag: perform

AI cyberattackers are getting better faster

The ability of AI models to perform end-to-end, multi-stage penetration tests that match the capabilities of humans undertaking the same tasks has improved dramatically in recent months, according to new benchmarks published by the UK government’s AI Security Institute (AISI). In November 2025, the difficulty of cyber tasks the best models could complete was doubling…

Flaw in Claude’s Chrome extension allowed ‘any’ other plugin to hijack victims’ AI

As businesses and governments turn to AI agents to access the internet and perform higher-level tasks, researchers continue to find serious flaws in large language models that can be exploited by bad actors. The latest discovery comes from browser security firm LayerX, involving a bug in the Chrome extension for Anthropic’s Claude AI model that…

AI agents still need humans to teach them

AI agents need skills — specific procedural knowledge — to perform tasks well, but they can’t teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering.  The researchers looked at…

OpenAI Frontier organizes AI agents under one system

OpenAI introduced Frontier, a platform designed to organize AI agents that perform business tasks within internal systems and workflows. The platform connects data from multiple internal systems including customer relationship management tools, ticketing platforms, and data warehouses. This integration creates a shared knowledge layer that allows AI agents to understand business processes and decision points…