perform – Geek-Guy.com

The ability of AI models to perform end-to-end, multi-stage penetration tests that match the capabilities of humans undertaking the same tasks has improved dramatically in recent months, according to new benchmarks published by the UK government’s AI Security Institute (AISI). In November 2025, the difficulty of cyber tasks the best models could complete was doubling…

AI is ready to take over Python programming, but not much else

May 12, 2026

Tests of how well 19 large language models (LLMs) complete and perform complicated multi-step tasks has shown that they are both error-prone and, in many cases, unreliable. The findings are contained a preprint paper, LLMs Corrupt Your Documents When You Delegate, written by Microsoft researchers Philippe Laban, Tobias Schnabel and Jennifer Neville based on a…

Flaw in Claude’s Chrome extension allowed ‘any’ other plugin to hijack victims’ AI

May 8, 2026

As businesses and governments turn to AI agents to access the internet and perform higher-level tasks, researchers continue to find serious flaws in large language models that can be exploited by bad actors. The latest discovery comes from browser security firm LayerX, involving a bug in the Chrome extension for Anthropic’s Claude AI model that…

Perplexity’s new Computer agent will run other agents for you

February 27, 2026

Perplexity says its new Perplexity Computer service can perform complex, multi-step tasks on behalf of human users, by organizing the tasks that are needed and creating the software agents required to fulfill the process. Users begin by describing their desired outcome, the company said, then, “Perplexity Computer breaks it into tasks and subtasks, creating sub-agents…

AI agents still need humans to teach them

February 20, 2026

AI agents need skills — specific procedural knowledge — to perform tasks well, but they can’t teach themselves, a new research suggests. The authors of the research have developed a new benchmark, SkillsBench, which evaluates agentic AI performance on 84 tasks across 11 domains including healthcare, manufacturing, cybersecurity and software engineering. The researchers looked at…

Android Malware Hijacks Google Gemini to Stay Hidden

February 20, 2026

A new Android malware implant using Google Gemini to perform persistence tasks was discovered on VirusTotal and analyzed by ESET

OpenAI Frontier organizes AI agents under one system

February 5, 2026

OpenAI introduced Frontier, a platform designed to organize AI agents that perform business tasks within internal systems and workflows. The platform connects data from multiple internal systems including customer relationship management tools, ticketing platforms, and data warehouses. This integration creates a shared knowledge layer that allows AI agents to understand business processes and decision points…

Tag: perform

AI, Exploits, Global Security News, Government & Policy, Risk Management

AI cyberattackers are getting better faster

May 18, 2026

AI, Compliance, Global Security News

AI is ready to take over Python programming, but not much else

May 12, 2026

AI, Cybersecurity, Exploits, Global Security News, Government & Policy

Flaw in Claude’s Chrome extension allowed ‘any’ other plugin to hijack victims’ AI

May 8, 2026

AI, Global Security News

Perplexity’s new Computer agent will run other agents for you

February 27, 2026

AI, Cybersecurity, Global Security News

AI agents still need humans to teach them

February 20, 2026

Global Security News, malware

Android Malware Hijacks Google Gemini to Stay Hidden

February 20, 2026

agentic ai, AI, Global Security News, News, openai, oracle

OpenAI Frontier organizes AI agents under one system

February 5, 2026