ChatGPhish: every page your agent summarizes is now a phishing surface

Permiso disclosed three working prompt-injection chains in ChatGPT's Markdown renderer on May 29, 2026: fake OpenAI security buttons, inline QR codes that pivot to mobile, and tracking pixels that leak IP and User-Agent. Why the renderer is the wrong trust boundary, and what every browse-with-LLM product just inherited.

Authsome

Authsome

Product & engineering

June 2, 202611 min read

ChatGPhish: every page your agent summarizes is now a phishing surface.

On May 29, 2026, Permiso Security published "ChatGPhish: The Page Is the Payload", a disclosure by researcher Andi Ahmeti documenting three working prompt-injection chains in ChatGPT's Markdown renderer. The detail most write-ups buried: Permiso filed the original report through Bugcrowd on April 29, 2026. OpenAI marked it "Not Reproducible" the next day, classified it as "Not Applicable" shortly after, and then closed the ticket as a duplicate. A month of follow-ups went nowhere, and the rendering behavior was still live at publication (The Register, May 29, 2026).

That timeline is the story. The three exploit chains are the proof.

What ChatGPhish actually is

When ChatGPT summarizes a third-party page (browse-with-ChatGPT, agent mode, or any flow where the model ingests web content and renders an answer), the response renderer treats Markdown that originated from the attacker's page the same way it treats Markdown the model itself authored. There is no source separation. As The Hacker News summarized the finding, the chatgpt.com response renderer trusts Markdown links and Markdown image URLs that originated from a third-party page the assistant has just summarized, and it auto-fetches those images and surfaces those links as live, clickable elements inside the trusted assistant UI.

That single design decision is the whole vulnerability. Everything below is a consequence of it.

Permiso demonstrated three chains. All three are live indirect-prompt-injection plus rendering bugs, not theoretical ones. None of them require the model to be "jailbroken" in the sense people usually mean. The model is doing exactly what it was asked: summarizing a page. The page just happens to contain Markdown payloads, and the renderer happens to surface them as live UI.

Chain 1: fake "OpenAI security alert" buttons inside the assistant reply

An attacker plants Markdown link directives in a page. When ChatGPT summarizes that page, the renderer emits the links as live, clickable buttons styled in ChatGPT's own UI, sitting inside the assistant's reply bubble.

The victim sees what looks like a native OpenAI notification: "Verify your account", "Resolve security alert", "Confirm billing". The chrome is right. The font is right. The position in the conversation is right. There is no out-of-band indicator that this content originated from an untrusted webpage rather than from OpenAI's own platform.

This is the chain that should worry product teams most, because it weaponizes the user's correct mental model. Users have been trained to trust UI inside the assistant reply. The renderer has just made that trust transitive to anyone who can get the model to read their page.

Chain 2: inline QR codes that pivot the lure from web to phone

Markdown image rendering auto-fetches images. The Permiso proof of concept serves a QR-code image from an attacker-controlled S3 bucket and embeds it directly in the assistant's reply (The Hacker News).

The victim scans the QR code on their phone. At that moment, every desktop defense is gone:

  • The browser's URL bar inspection. Gone, the destination is rendered by a phone camera app.
  • The password manager's domain match. Gone, the password manager on the phone has no idea where the QR pointed.
  • Corporate URL filtering at the network edge. Gone, the phone is on cellular.
  • SSO conditional access tied to managed-device posture. Gone, the phone is not the managed device.

The Register frames the cross-device pivot bluntly: the QR-code chain lets an attacker bypass desktop URL defenses, including blocklists and password-manager domain checks. That is not analyst exaggeration. It is what happens when the trust handoff jumps platforms.

Chain 3: tracking pixels that leak IP, User-Agent, and Referer

Markdown image references (often via URL shorteners that mask the final host) trigger an HTTP fetch every time the assistant's response is rendered. The fetch is silent, and the destination server logs:

  • The victim's IP address
  • Their User-Agent string
  • The Referer header
  • Request timing

The Hacker News summarizes the leakage as the renderer hitting the attacker's host with the victim's IP address, User-Agent string, Referer header, and a precise timestamp on every render. Aviatrix's Threat Research Center confirms the same primitive, framing it as "unauthorized data exposure, including users' IP addresses and browser details."

A single page summarization is enough to deanonymize the user, fingerprint their browser, and timestamp their activity. Stack a few of these into a corporate workflow where the same user summarizes a vendor portal twice a day, and you have a passive surveillance channel disguised as a content fetch.

Why the Markdown renderer is the wrong trust boundary

The right way to read all three chains is that the renderer is being asked to make a decision it does not have the information to make. It receives a token stream from the model and has to decide whether to materialize a link, an image, or a button. But the token stream does not carry provenance. The renderer cannot tell the difference between:

  • "Markdown the model authored from its own reasoning"
  • "Markdown the model quoted verbatim from a trusted source"
  • "Markdown the model copied out of an attacker-controlled webpage it just summarized"

All three look identical at the rendering layer. The renderer trusts all of them equally because it has no axis on which to distinguish them.

Ahmeti's framing in The Register is the right one: "AI systems increasingly render untrusted content directly inside browsers, which expands risk significantly." The fix is not "sanitize Markdown harder". The fix is provenance: the renderer needs to know which spans of output were derived from untrusted input, and it needs a stricter policy for those spans (no inline images, no auto-fetched assets, links rendered as plain text with the destination visible, no UI elements that mimic platform chrome).

That is a much harder change than a regex pass. It requires the model loop to tag spans with their source as they are generated, and it requires the renderer to honor those tags. Neither half exists in shipped products today.

What every browse-with-LLM product just inherited

ChatGPhish is presented as a ChatGPT bug, but the bug class is universal. If your product:

  1. Calls a model with content fetched from the web, a user upload, a customer-support email, a Notion page, a Slack message, a vendor's API response, or any other input you do not fully control, and
  2. Renders the model's output back to a user as Markdown, HTML, or any format that auto-loads remote assets,

then you have shipped the same trust boundary. Whether you have shipped the same bug depends entirely on what your renderer does with image tags, link tags, and inline HTML. Most renderers do the unsafe thing by default, because most renderers were written for trusted Markdown (think README files, blog posts) and were repurposed for model output without the threat model getting updated.

This is threat-model #3 in the four threat models for AI agents we mapped earlier this year, made very concrete: compromised tool output is no longer a thought experiment, it is a working phishing kit. And it pairs naturally with the cookie-jar problem in browser agents, where the agent's browsing session inherits the user's authenticated cookies and now also inherits the user's UI trust.

Warning

If you ship a browse-with-LLM feature, audit your renderer before your next release. The minimum bar: strip inline images from any model output that ingested untrusted web content, render links as visible URLs rather than styled buttons, and disable auto-fetch on any remote asset referenced in a span that was derived from external input. None of these mitigations are sufficient on their own, but shipping without any of them is not defensible after May 29.

The mitigations that look right but are not enough

A few responses are circulating. They are worth naming and grading honestly.

"We will sanitize Markdown." Necessary, not sufficient. Sanitization stops the most obvious payloads but does not address the QR-code chain (a sanitized image tag is still an image tag, and the QR code is still scanned), and it does not address the UI-mimicry chain (a sanitized link styled by your own CSS still looks like a native button).

"We will block external image hosts via CSP." This helps the tracking-pixel chain and partially helps the QR-code chain, but only if your CSP is strict enough to break legitimate model output as well. Most teams will not ship that policy because their own product also renders user-supplied images.

"We will warn users when output came from external content." Useful for sophisticated users, ignored by everyone else. The whole point of the UI-mimicry chain is that users have learned to trust assistant chrome. A banner above the reply does not retrain that instinct fast enough.

"We will use a stricter renderer for agent output." This is the closest to the right answer, but it requires the upstream model loop to actually tag spans with provenance. Today, most products get a raw token stream back and have no way to know which characters came from which input.

The honest position is that this is a hard architectural problem and the fixes will land in waves. The first wave is renderer hardening (weeks). The second wave is span-level provenance (quarters). The third wave is a redesign of how agents present external content at all (longer). ChatGPhish is the forcing function for wave one.

Where credential brokers fit, and where they do not

We build Authsome, an open-source credential broker for AI agents, and we want to be straightforward about what it does and does not do here.

Authsome solves a different half of the agent-security problem: outbound credential exfiltration. The broker holds your API keys and OAuth tokens in a local encrypted vault, hands the agent's environment a placeholder, and swaps the real header in at the proxy boundary on outbound HTTP. If an injection chain ever convinces an agent to POST your OpenAI key to attacker.example, the agent does not have the key to send. That is the failure mode covered in how prompt injection becomes credential exfiltration.

ChatGPhish is not that failure mode. ChatGPhish is a renderer-side bug: attacker content is displayed to the user as if the assistant authored it, and the user clicks it. No credential leaves the machine in the exploit itself. A credential broker does nothing to stop the phishing button from rendering, the QR code from being scanned, or the tracking pixel from firing. The renderer trust boundary is upstream of anything Authsome can see.

What the broker does change is the blast radius of the follow-on attack. If the user clicks the fake "Resolve security alert" button and lands on a credential-harvesting page, and then a downstream agent on the same machine gets tricked into making an API call with the harvested session, the agent's outbound traffic still has to clear a proxy whose vault was unlocked by a human, not by whoever phished the user. That is a partial answer, not a full one. The full answer requires the renderer to stop minting attacker UI.

So: constrain what the renderer accepts, and constrain what credentials a subverted agent could exfiltrate. The two controls live at different layers and neither subsumes the other. Anyone selling you a single product that solves both is selling you a story.

What changes for builders this week

  • If you ship browse-with-LLM: treat your Markdown renderer as a security boundary, not a presentation detail. Default-deny external images in spans derived from untrusted input.
  • If you ship an agent that emails or messages users: assume Markdown rendered in your product's chat UI is now an attack surface, even if you do not call it browsing.
  • If you operate corporate ChatGPT: add the QR-code pivot to your phishing training this quarter. It is the chain that bypasses your existing controls most completely.
  • If you build credential infrastructure: keep doing it, and stop pretending it covers the renderer. The two stories pair; they do not merge.

ChatGPhish is the first widely-publicized indirect-prompt-injection chain where the user is the asset being phished, not the model. It will not be the last. The bug class has been waiting since the first product shipped "summarize this page for me", and Permiso just published the proof that it is exploitable today.

Authsome

Authsome

Product & engineering

Local credential broker and vault for AI agents.