Skip to content
Back to Blog

ai-security · 19 min read

MCP at 16 months: 15+ incidents, two spec revisions and an MCPwn exploited in the wild

March 2025 was the month of the first tool poisoning paper with a PoC. March 2026 closes the cycle: two spec revisions (2025-06-18 and 2025-11-25), OWASP MCP Top 10 v0.1, 15+ public incidents, 24,000 secrets leaked in GitHub configs, 1,808 servers scanned with 66% findings, an academic benchmark (MCPTox) at 72.8% ASR against o1-mini, and CVE-2026-33032 (nginx-ui MCPwn) actively exploited in the wild, patched 15 March. An operational read of the ecosystem.

· Manuel López Pérez · ai-security

March 2025 was the month of the first tool poisoning paper with a PoC. March 2026 closes the cycle: two spec revisions (2025-06-18 and 2025-11-25), OWASP MCP Top 10 v0.1, 15+ public incidents, 24,000 secrets leaked in GitHub configs, 1,808 servers scanned with 66% findings, an academic benchmark (MCPTox) at 72.8% ASR against o1-mini, and CVE-2026-33032 (nginx-ui MCPwn) actively exploited in the wild, patched 15 March. An operational read of the ecosystem.

On 1 April 2025, Invariant Labs published the first paper on MCP tool poisoning with a reproducible PoC. Sixteen months later, the MCP ecosystem has two formal spec revisions (2025-06-18 and 2025-11-25), an OWASP MCP Top 10 v0.1 in beta release and pilot testing since late 2025, 15+ public incidents spanning tool poisoning, supply chain, RCE and credential exfil, and an academic benchmark (MCPTox, arXiv 2508.14925, presented at AAAI) measuring ASRs above 70% against leading models.

The quarter closes with one more piece: on 15 March 2026 nginx-ui publishes version 2.3.4 patching CVE-2026-33032 (CVSS 9.8), dubbed MCPwn by Pluto Security — an auth bypass in the MCP integration of nginx-ui that hands full takeover in two HTTP requests, with 2,600 exposed instances on Shodan at the time of the patch.

This post isn’t another best-practices inventory. It’s an operational read of the MCP ecosystem as of March 2026: what the spec moved, what incidents happened, what the benchmarks count, and which controls have shifted from SHOULD to MUST in clients and registries.

What changed in the spec — 2025-06-18 and 2025-11-25

The MCP spec 2025-03-26, which shipped alongside the Invariant paper, already named the tool-poisoning problem in Implementation Guidelines and added the first authorisation framework based on OAuth 2.1 for HTTP transports. It wasn’t enough. Two revisions followed in twelve months.

Spec 2025-06-18 — the first after the Invariant paper. Relevant security changes:

  • MCP servers as OAuth 2.0 Resource Servers with mandatory resource indicators (RFC 8707). Every access token is explicitly bound to a specific MCP server. Closes a class of attacks where a token issued for server-a could be used against server-b.
  • Structured tool output — JSON validated by schema. Not a defence against prompt injection, but it does reduce the arbitrary text gets parsed by the model surface that was the workhorse of TPA in April 2025.
  • Elicitation — servers can request input from the user mid-session via elicitation/create with a JSON schema. The “structured human-in-the-loop” equivalent. The original spec’s SHOULD on consent gets a protocol primitive.
  • Removal of JSON-RPC batching (breaking change). You can no longer chain several tools/call in a single batch — every call goes through the consent modal individually.

Spec 2025-11-25 — the current revision at time of writing. Incremental improvements focused on authorisation and governance:

  • OpenID Connect Discovery 1.0 as a supported mechanism for authorisation server discovery. RFC 9728 aligned for Protected Resource Metadata.
  • Incremental scope consent via the WWW-Authenticate header. The client can request additional permissions on the fly rather than pre-approving everything at init.
  • OAuth Client ID Metadata Documents as a recommended client registration mechanism — an alternative to the Dynamic Client Registration flows that were a weak point.
  • Experimental tasks — support for durable requests with polling and deferred retrieval. Relevant for long agentic workflows where the model chains 20-30 tool calls.
  • JSON Schema 2020-12 as the default dialect. Small but important: closes ambiguities in input validation.
  • Formal governance — Working Groups, Interest Groups, SDK tiering. The protocol stops being Anthropic-publishes-spec and enters IETF/W3C-style governance.

What didn’t change in twelve months: the origin mark of tool descriptions still doesn’t travel on the wire. The November 2024 line — “MCP itself cannot enforce these security principles at the protocol level” — still describes the problem with day-one precision. The spec recommends, the clients implement, the model decides. And the model, as we’ll see below with MCPTox, doesn’t distinguish legitimate instructions from adversary-controlled ones in the description.

OWASP MCP Top 10 v0.1 — the formal inventory

In late 2025 OWASP published the first version of the MCP-specific Top 10, led by Vandana Verma Sehgal. It’s in Phase 3 — Beta Release and Pilot Testing, categories are stable but rankings may shift. The ten:

IDCategory
MCP01:2025Token Mismanagement & Secret Exposure
MCP02:2025Privilege Escalation via Scope Creep
MCP03:2025Tool Poisoning
MCP04:2025Software Supply Chain Attacks & Dependency Tampering
MCP05:2025Command Injection & Execution
MCP06:2025Intent Flow Subversion
MCP07:2025Insufficient Authentication & Authorization
MCP08:2025Lack of Audit and Telemetry
MCP09:2025Shadow MCP Servers
MCP10:2025Context Injection & Over-Sharing

Quick read: tool poisoning lands at MCP03 — not #1. The headline spot goes to secret management, in line with what GitGuardian will document in its mass scan. Intent Flow Subversion (MCP06) captures what Invariant calls toxic agent flows — the combination of untrusted instructions + sensitive data + exfil path that defines the lethal trifecta Simon Willison keeps repeating.

For operational audit, the IDs are useful. Mapping each technical control (command allowlist, per-process sandboxing, ICN re-prompt on description changes, structured audit log) to one of the ten categories is what turns an MCP threat model from “we read the Invariant blog and got scared” into a programme with measurable coverage.

Fifteen months of incidents — the public chain

Between April 2025 and March 2026, the public incidents that have gone through advisory, CVE or vendor blog post. The list doesn’t claim to be exhaustive — it claims to show the diversity of categories:

April 2025 — WhatsApp MCP chat-history exfiltration. Invariant publishes the first working rug-pull. A poisoned MCP server steers the agent to read the user’s entire WhatsApp chat history and exfiltrate it through tool-call parameters. Tool poisoning + cross-server tool shadowing. No assigned CVE; architectural category. Source.

May 2025 — GitHub MCP server prompt injection. Invariant demonstrates against the official GitHub server: a public issue with embedded instructions makes an agent with a broad-scope PAT access private repos, extract data (project names, salary info, relocation plans) and publish them in the public repo. The fix isn’t server-side — it’s at agent system level: one session, one repo, least-privilege tokens. Source.

June 2025 — Anthropic MCP Inspector RCE — CVE-2025-49596 (CVSS 9.4). The official developer tool listens on localhost without authentication. Any visit to a hostile web page executes arbitrary code against the inspector-proxy. Coordinated patch. The “developer tool on my machine” face of the MCP inventory is as critical as the “production” face.

June 2025 — Smithery path traversal + supply-chain compromise. GitGuardian reports a path traversal in the build pipeline (dockerBuildPath accepted ..) to Smithery. With the bug, GitGuardian extracts ~/.docker/config.json from the builder, reaches a fly.io token with account-wide scope, and demonstrates access to 3,000+ hosted MCP servers. Report 13 Jun, fix 15 Jun, public disclosure October 2025. Textbook case of “hosting MCP servers is critical supply chain”. Source.

July 2025 — mcp-remote OS command injection — CVE-2025-6514. The OAuth proxy mcp-remote (437,000 downloads, adopted by Cloudflare, Hugging Face, Auth0) passed authorization_endpoint to a shell without sanitisation. Remote RCE. Reported by JFrog.

August 2025 — Anthropic Filesystem MCP Server — CVE-2025-53109 and CVE-2025-53110. Sandbox escape and containment bypass via symlinks in the reference server. The official filesystem, the one most users register “so the model can read my notes”, escaped cwd with a symlink built on the fly. Found by Cymulate.

August 2025 — Cursor MCPoison — CVE-2025-54136 (CVSS 8.6). Check Point Research publishes the bug: Cursor tied trust to the name of the MCP entry in .cursor/rules/mcp.json, not the command or args. You approve a “benign” MCP the first time; the attacker swaps the command for a reverse shell; Cursor runs it on the next repo sync without re-prompting. Patched in Cursor 1.3, 29 Jul 2025. Any .mcp.json checked into a shared repo was a vector. Source.

September 2025 — Postmark MCP — first malicious server “in the wild”. An actor publishes postmark-mcp on npm impersonating Postmark’s official MCP. Across fifteen versions it builds legitimate reputation; in v1.0.16 (17 Sep 2025) it adds one line: BCC every sent email to phan@giftshop[.]club. ~1,500 weekly downloads, ~300 affected organisations per Koi Security. The change isn’t caught by dependency scanners because the package’s public interface doesn’t change. First operational proof that MCP-as-npm-ecosystem inherits the entire npm chain threat model: typosquatting, maintainer account takeover, payload added in a minor.

September 2025 — Flowise — CVE-2025-59528. Flow vulnerability in the STDIO transport allowing access to child_process and fs modules. RCE.

October 2025 — Figma/Framelink MCP — CVE-2025-53967. Unsanitised input passed to child_process.exec. The “MCP server is a wrapper around an API and passes arguments unescaped” pattern is the 2026 version of 2002’s SQLi bugs.

January 2026 — Unofficial gemini-mcp-tool — CVE-2026-0755 (CVSS 9.8). Command injection via execAsync. Another wrapper around a CLI; another unsanitised string concatenation.

February 2026 — Trojanised Oura MCP. Clone of a legitimate MCP server with StealC info-stealer packaged inside. Distributed by SmartLoader via registries. Same pattern as trojanised npm packages from the last five years, now on the MCP catalogue.

March 2026 — nginx-ui MCPwn — CVE-2026-33032 (CVSS 9.8). The bug of the quarter. The nginx-ui MCP integration exposes two HTTP endpoints: /mcp (with AuthRequired() middleware) and /mcp_message (with only IP allowlist, no auth middleware). Any network-reachable attacker establishes SSE against /mcp, gets a sessionID, and fires the 12 destructive tools against /mcp_message without credentials. Two HTTP requests, full takeover: nginx reload, config modification, system command exec. 2,600 instances exposed on Shodan. Patched in nginx-ui 2.3.4 (15 Mar 2026), entry into CISA KEV during April 2026. Codenamed MCPwn by Pluto Security. Source.

Summary diagnosis: three categories.

  1. Pure tool poisoning / toxic flows (Invariant, WhatsApp, GitHub MCP). Architectural model bug + server-controlled descriptions.
  2. MCP supply chain (Postmark, Oura, Smithery, gemini-mcp-tool). Anything the npm ecosystem already suffered, applied to npx -y + MCP packages.
  3. MCP servers as webapps with webapp bugs (nginx-ui, Flowise, Figma, mcp-remote, Filesystem). Auth bypass, command injection, path traversal, sandbox escape. Classic bugs in new exposed code.

The three share a property: the blast radius is the user’s agent — its filesystem, its credentials, its private repos, its mailbox. The toxic flow definition works for all three.

The ecosystem by the numbers — 12,000 servers, 24,000 secrets, 66%

Three public datasets summarise the state of the ecosystem as of Q1 2026:

GitGuardian: secrets in MCP configurations. Scan of MCP configs published on GitHub: 24,008 unique secrets exposed, 2,117 still valid at scan time. Anthropic and OpenAI API keys, GitHub PATs, Slack tokens, AWS keys, Postgres URIs with inline passwords. The pattern echoes what was seen with .env files in public repos: copy-paste of credentials into files that end up tracked by mistake. The difference is that MCP configs are new, developers don’t have the reflex to treat them as secret-sensitive, and registries (Smithery, Glama, MCP.so) crawl public repos looking for configs to list, amplifying exposure.

Endor Labs — static analysis of 2,614 implementations. 82% vulnerable to path traversal. 67% using APIs prone to code injection (exec, eval, child_process.exec with unsanitised arguments). This isn’t a benchmark of average quality in the catalogue; it’s the catalogue. When two out of three servers use dangerous APIs in their logic, the discovered bugs / latent bugs ratio is at the start of the curve.

AgentSeal — 1,808 servers scanned. 66% with at least one security finding. Breakdown by type: 43% command injection, 20% findings in tooling infrastructure, 13% auth bypasses, 10% path traversal. The false positive rate against a set of 120 documented known-benign servers is 4.2% for high/critical findings, a number of order, not of noise. Source.

For context, servers in production sit above 12,000 distributed across Smithery (~7,000), Glama (~21,000 indexed, not all in production), MCP.so (~19,700 community submissions), and the official registry. Numbers overlap; the actual deduplicated union is hard to measure. What is measured is the self-hosted long tail no registry indexes.

MCPTox — the first academic benchmark

If the incident inventory summarises the outcome, the MCPTox paper (Wang et al., AAAI 2026, submitted 19 Aug 2025) summarises the capacity. It’s the first systematic measure of how many agents are vulnerable to tool poisoning on real MCP servers — not PoC toys.

Setup:

  • 45 MCP servers from the public catalogue, not lab.
  • 353 real tools inside those servers.
  • 1,312 adversarial test cases generated by few-shot learning over three attack templates.
  • 10 risk categories covered.
  • 20 LLM agents evaluated — including GPT-4o-mini, o1-mini, DeepSeek-R1, Phi-4 and Claude-3.7-Sonnet.

Headline numbers:

ModelASR (Attack Success Rate)
o1-mini72.8%
Phi-470.2%
GPT-4o-mini>60%
DeepSeek-R1>60%
Claude-3.7-Sonnethighest refusal rate, <3% refuses attacks

Two observations from the paper with operational consequences:

  1. More capability → more vulnerability. The more capable models (o1-mini, Phi-4) have higher ASRs because they follow instructions better. The adversarial instruction embedded in the tool description is one more instruction, and models good at instruction following process it better than saturated models.
  2. Safety alignment doesn’t catch this. The highest refuse rate is <3%, and Claude-3.7-Sonnet — the model with the most public RLHF/CAI effort — gets it. Everyone else sits below. The reason: the adversarial doesn’t ask the model for something it identifies as forbidden; it asks the model to use a legitimate tool for an unauthorised operation. The “is this a refused query” boundary falls when the query, in its form, is a normal tool call.

The paper’s methodological conclusion — “existing safety alignment is ineffective against malicious actions that use legitimate tools for unauthorized operation” — becomes standard citation in every report that follows. For good reason: the operational message is that tool-poisoning control doesn’t live in the model. It lives in the host, in the registry, in the operator.

Toxic agent flows — the pattern that generalises

What the April 2025 Invariant paper called “tool poisoning” has been refined in the literature as toxic agent flow or, in Simon Willison’s formulation, lethal trifecta. The three ingredients:

  1. Untrusted instructions: text the adversary controls and that enters the model’s context. Can be a tool description, a resource (file, URL, DB row), a GitHub issue, an email, a log line, a WhatsApp message. Anything the agent reads.
  2. Sensitive data access: the agent has tool calls that can read private data. A PAT with broad GitHub scope, a filesystem MCP with $HOME, an email MCP with the inbox, a Slack MCP with DMs.
  3. Exfiltration path: the agent has a way to get data out. An email tool, an HTTP call via a browsing tool, a logging MCP writing to a shared disk, a tool-call parameter going to an attacker server.

When all three are in the same session, there’s a toxic flow. The defence that actually closes the problem is breaking the flow, not detecting it. Three operational patterns:

  • One session, one repo / one mailbox / one desktop. If the agent only sees one private repo, it can’t exfiltrate to another. The mitigation GitHub recommended for its MCP is exactly this. UX cost (the user wants to chain multi-repo tasks), but it closes the trifecta at the root.
  • Least-privilege tokens per session, with scope narrowed to the subset actually needed. PATs with repo:read only, not repo:read,issues:write,workflow. Short-TTL tokens (minutes) with rotation.
  • Blocking exfil channels. If the agent has no tool that calls arbitrary HTTP, it can’t call the attacker’s server. The “MCP web-fetch only against allowlisted hosts” policy closes the HTTP-exfil category. For email exfil, the host must break send_email against external domains without double confirmation.

The theoretical discussion of “the LLM shouldn’t follow instructions in tool descriptions” is real but not operational in 2026. Per MCPTox, the LLM will. What the operator can control are the other two trifecta ingredients — the data flows and the egress flows.

The architectural core — OX Security and the 200,000 servers

In late Q1 2026, OX Security publishes The Mother of All AI Supply Chains — the highest-blast-radius investigation of the MCP ecosystem to date. The claim is hard: the STDIO transport of Anthropic’s official SDKs (Python, TypeScript, Java, Rust) accepts configuration as command and executes it directly via subprocess without sanitisation or allowlist.

So any downstream product that accepts MCP config from the user, the repo, or an API and passes it to an official SDK client is wiring up a config-to-command-execution primitive. OX achieves RCE on six production platforms: LangFlow, LettaAI, Windsurf among the named ones. The supply chain (150M+ combined downloads, 7,000+ public servers, an estimate of 200,000 total vulnerable instances) makes the pattern broader than any single CVE.

Anthropic responds in coordinated disclosure that the behaviour is expected and declines to modify the protocol’s architecture. The defence, again, lives at the operator: any product accepting MCP configuration from untrusted sources (multi-tenant SaaS, shared repos, configs downloaded from the internet) has to wrap config reading in its own validation layer. The SDK doesn’t.

It’s an argument similar to Python’s subprocess.run: the language can’t stop you from doing subprocess.run(user_input). The MCP SDK, by the same logic, can’t either. The difference is that subprocess.run has been taught for fifteen years as runs whatever you pass it. The MCP SDK has been taught for one year as connects servers. The average developer’s surprise on discovering that command runs literally through a shell is real, and the CVEs prove it.

Operator defence — what’s changed in twelve months

The April 2025 recommendations still apply — per-server sandboxing, manual reading of descriptions, description hash on first install, alert on rug pull, no auto-allow for action tools. Twelve months on, some clients have implemented them, others haven’t:

  • Claude Desktop — per-tool-call approval modal kept. No description diff between init and tools/call. No filesystem sandboxing per server: the registry command runs as the user.
  • Cursor 1.3+ (post CVE-2025-54136) — mandatory re-approval on any change to an MCP entry’s command or args. Implicit hash over the config content. The specific rug-pull mitigation, implemented as a response to Check Point.
  • GitHub Copilot Agent Mode — since mid-2025, agent session per repository separation recommended in the documentation after the May incident. The client doesn’t prevent multi-repo, but the operational guidance changed.
  • Commercial MCP gateways (TrueFoundry, AgentSeal, several startups). The “proxy in between client and server, schema validation, description filtering, tool-call auditing” pattern has consolidated as a category. Registries start publishing trust scores (AgentSeal runs 800+ servers through 9 analyzers).
  • Network allowlists + runtime inspection — from GitGuardian to WorkOS, the operational recommendation from identity platforms is: treat every MCP server like any SaaS integration — scoped token, traffic observability, kill switch.

What hasn’t been resolved:

  • Cryptographic server verification. Without a Sigstore/cosign-style pattern applied to the MCP protocol, the final verifier of the binary that runs npx -y @something/mcp-server is the user’s trust on first use.
  • Cross-server isolation of model context. The 2025-11-25 spec doesn’t address this. When a host has filesystem, github, slack, all three enter the same prompt. Invariant’s shadowing is still architecturally possible.
  • Server identity and attestation. The client knows it’s talking to something on stdin/stdout or over HTTP, but it doesn’t know that something is the binary the developer published. The trust chain ends at the local sysadmin who wrote the command.

What to take to the threat model

If you have MCP in production at the close of Q1 2026, the operational questions:

  1. Explicit inventory of active servers per user / team. Server, source (official registry, public repo, internal fork), exact command, hash-pinned version, filesystem and network capabilities, secrets it handles.
  2. Mapping to OWASP MCP Top 10. For every inventoried server, which categories apply and which compensating control exists. Mapping → control → owner.
  3. Token scope policy. PATs on GitHub MCP with repo:read and nothing more. Low-TTL tokens with rotation. If the agent can reach production, the token can’t.
  4. Session = scope. Each agent session with a single repo / mailbox / project. Toxic-flow breaking by construction.
  5. Host command allowlist. The command primitive of the STDIO SDK (what OX Security exploits) gets closed by a wrapper that only allows starting signed binaries or those present in a manifest. Never accept command from external configuration without going through the list.
  6. Structured tool-call logging. Every tools/call with name, args, user decision (approve/deny), server output. Without logs, no forensics when a toxic flow runs.
  7. Periodic inventory audit. Scan of configs in internal repos (GitGuardian-style). The baseline the operator sets this quarter changes the next — packages update, descriptions change, new servers appear.

The programme is the same order as the SaaS posture management organisations started building in 2022-2023. The difference is that the catalogue is early-stage and the blast radius is the desktop of every developer with a registered client.

Closing

Sixteen months after the first paper. Two spec revisions. An OWASP MCP Top 10 v0.1. Twelve public CVEs. An academic benchmark with ASRs in the seventies. An MCPwn exploited in the wild in Q1 2026. A postmark server that infected 300 organisations with a single line of code.

The spec has done its part of the work — OAuth authorisation, elicitation, batching removal, formal governance. The host has done less than expected: the approval model is still the day-one modal, per-server sandboxing still isn’t default, description integrity still isn’t on the wire. The operator has had to fill the gap: commercial gateways, trust scores, audit pipelines, scope policies. The industry that in April 2025 was reading the Invariant paper now sells products to defend against the Invariant paper.

What follows around the spec’s year two: more CVEs in wrappers, more toxic flows on agents with multiple MCPs, more supply-chain with malicious servers in community registries. The next milestone — the first spec breaking change to mark description origin on the wire, or the first regulator (AESIA, CNIL, NIST) to publish MCP-specific deployment guidance — has no date at Q1’s close. The next spec revision, 2026-Hx, will decide whether the November 2024 sentence still describes the protocol in 2027.

References

Spec and governance

Academic paper

  • Wang, Z. et al., MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers (AAAI 2026, arXiv:2508.14925, Aug 2025): https://arxiv.org/abs/2508.14925

Key public incidents

Ecosystem stats

Follow-up analysis

Earlier posts on this blog

Back to Blog

Related Posts

View All Posts »
MCP tool poisoning: four months after the spec, the real-world attacks

ai-security · 14 min

MCP tool poisoning: four months after the spec, the real-world attacks

In November 2024 Anthropic published MCP and the analysis was at spec level — what the protocol said and what it left to the implementer. In April 2025, Invariant Labs publishes the first paper on Tool Poisoning Attacks: MCP servers hiding adversarial instructions in tool descriptions. Cursor, Claude Desktop and Copilot read those descriptions as prompt and obey. Reproducible PoC with the Python SDK.

· Manuel López Pérez

Confused deputy revisited: Model Context Protocol and the protocol-level version of the bug

ai-security · 13 min

Confused deputy revisited: Model Context Protocol and the protocol-level version of the bug

Anthropic publishes MCP on 25 November. The model-to-external-tools link becomes an open spec with three primitives: tools, resources, prompts. The spec says the host SHOULD ask for consent; it concedes the protocol cannot enforce it. The confused deputy pattern we documented in September 2023 is back — now as a standard integration.

· Manuel López Pérez

Anthropic's "AI-orchestrated" espionage report: what it says, what it proves, what it doesn't

ai-security · 11 min

Anthropic's "AI-orchestrated" espionage report: what it says, what it proves, what it doesn't

On 13 November Anthropic reported that a China-nexus group used Claude Code to automate 80–90% of a campaign against ~30 organisations. The first documented case of agent-driven espionage. A critical read: what the report proves, what it leaves unproven, and what changes operationally for anyone running coding agents in 2026.

· Manuel López Pérez