GitHub flipped the switch yesterday. As of June 1, 2026, every GitHub Copilot plan bills against a single new unit called a GitHub AI Credit, consumed at the published API rate for whichever model the agent picked. The old "premium request" quota that everyone learned to count against last year is gone, and so is the silent fallback to a cheaper model when you ran out. The announcement post puts it in one line: "Credits will be consumed based on token usage, including input, output, and cached tokens, according to the published API rates for each model."
The most-skipped detail is not the price. It is this sentence from the same post: "Fallback experiences will no longer be available." Under the old model, when your quota ran out, Copilot quietly downshifted to a cheaper model and kept working. Under the new model, when your credits run out, the request either bills overage or fails. There is no soft landing. For a team running a long-horizon coding agent that may spawn sub-agents, that is the single line that rewrites the spend-projection spreadsheet.
This post walks through what actually changed, what the published rates are, where the developer anger is coming from (the primary community thread sits at roughly 900 downvotes against ~20 upvotes as of writing, an unusually lopsided ratio for a GitHub product post), and what builders running coding agents at scale should do about it on a Monday morning. There is one honest credential-and-cost angle at the end. It is not the lead, because the lead is the meter.
What actually changed on June 1
The June 1 changelog entry is short. The substance lives in the announcement post and the docs.
The mechanics, in plain terms:
- Premium Request Units (PRUs) are deprecated. Whatever you used to track on the old quota dashboard is now translated into AI Credits at the corresponding API rate for each model.
- One AI Credit equals $0.01 USD. This is the rate published on the GitHub Copilot usage-based billing docs and repeated across every secondary explainer of the change. It is the unit you will see on the bill.
- Credits do not roll over. Per GitHub's billing FAQ, unused AI credits do not carry from month to month. If your team underuses the plan in week one, you do not get a buffer in week four.
- The fallback model is gone. When the credit pool is exhausted, you either pay overage or you stop. The product no longer silently downgrades you to keep going.
- Copilot code review now consumes GitHub Actions minutes on private repositories, per the April 27 changelog, in addition to AI Credits. That is a second meter, on a different budget line. Public repositories are unaffected.
- Code completions and next edit suggestions stay unmetered. They remain unlimited on all paid plans and do not draw from the credit pool.
- A preview billing tool launched in early May 2026 so admins could simulate the new spend before flip day. Few teams used it at the scale they are about to operate at.
The plan-level SKUs published on github.com/features/copilot/plans settle into roughly this shape (numbers are from those docs, not the announcement post):
| Plan | Monthly price | Included AI Credits | Overage |
|---|---|---|---|
| Free | $0 | small monthly allotment | none, hard stop |
| Pro | $10 / user | $10 in credits (1,000 credits) | $0.01 / credit |
| Pro+ | $39 / user | $39 in credits | $0.01 / credit |
| Business | $19 / user | $19 in credits | $0.01 / credit |
| Enterprise | $39 / user | $39 in credits | $0.01 / credit |
Two things worth flagging about that table:
- The "$19 in credits on a $19 plan" framing means there is no margin for the buyer between the seat price and the consumption pool. The plan is, at the limit, a prepaid debit account for tokens, with a seat on top.
- Because the credit is priced at API rates, the model choice silently determines the burn rate. A run on a frontier reasoning model costs many multiples of a run on a smaller chat model, even at the same token count, because input and output token prices differ across the catalog.
Annual Pro and Pro+ subscribers stay on the legacy premium-request model until their annual term expires, so the meter does not kick in for them on day one.
The backlash, with receipts
The community discussion thread is the cleanest single window into developer reaction. As of writing, the announcement post inside the thread is sitting at roughly 900 downvotes against ~20 upvotes, with hundreds of comments. That ratio is not normal for a GitHub product announcement.
The recurring complaints in the thread, paraphrased so the wording is not over-claimed:
- A request to bring back the free fallback models, on the grounds that removing them hollows out the value of individual plans.
- Refund requests from annual subscribers who feel the product they bought has materially changed.
- Pro-plan users reporting that single agentic requests consumed double-digit percentages of their monthly bucket, with one report of a single request burning hundreds of credits.
- Pro+ users projecting their full monthly allotment exhausting in days under normal agentic workloads.
These are user-reported numbers, not GitHub-validated telemetry, and they skew toward the worst cases because the people having a good time are not posting. But the consistency of the order of magnitude across hundreds of comments is the signal that matters.
Outside the thread, TechCrunch's coverage from May 30 pulls a Reddit user quoting projections of "$29 will become roughly $750 a month" and "$50 will become roughly $3,000 a month" for the heaviest agent users. Those are user math, not GitHub math, and they assume aggressive agentic sessions on top-tier models. Visual Studio Magazine framed the sentiment as "you will get less, but pay the same price." Directionally, an autonomous agent that reads a repo, writes a patch, runs tests, and self-corrects can burn through a Pro plan's monthly bucket inside a single task.
The Register's morning-after writeup collects the threatened-migration list: Cursor, Windsurf, OpenAI Codex, Claude Code, OpenRouter as a pass-through, and others. Most of these run on the same underlying frontier models, so the migration is partly an emotional reaction to GitHub specifically and partly a genuine search for tools that either ship cheaper open-weight models or absorb more of the consumption inside the seat price.
Why GitHub did this (and why the framing is honest)
The announcement post's argument is, on its own terms, defensible. The line that matters is: "Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount." That is true. Under the PRU model, a one-line tab completion and a 90-minute agentic refactor that read 400 files both counted as units against the same quota. The unit cost to GitHub of those two events differed by orders of magnitude.
When the cost of one product feature (agent mode) is many multiples of another (inline completion) and they share a pricing unit, one of two things happens. Either the heavy users subsidize themselves out of the market and the cheap features get throttled to recover margin, or the pricing unit splits. GitHub picked the second option, and they tied the new unit directly to the upstream API rate so they are not in the position of inventing their own opaque token pricing on top.
That is the polite version. The blunter version is that agent mode is now the load-bearing Copilot feature, agents burn tokens, and the seat-priced model could not sustain it at the previous list prices.
What this actually changes for teams running coding agents
Forget the consumer Pro plan for a moment. The interesting question is what changes operationally for a platform team running coding agents across an organization. Five concrete shifts.
1. Cost per task becomes a first-class metric. Under the seat model, the unit of telemetry that mattered was "active users per month." Under the credit model, the unit is "AI Credits per merged PR" or "AI Credits per closed Linear issue." If you do not start emitting that metric this week, finance will demand it next quarter and you will be stitching it together retroactively from billing CSVs.
2. Sub-agent fan-out becomes a budget question, not just a latency question. When an agent decides at runtime to spawn three sub-agents to parallelize a refactor, that decision now has a direct dollar cost. If you operate orchestrators like Claude Code, Cursor's background agents, or Copilot's agent mode at scale, you need a per-task ceiling that the orchestrator can see and respect, and a kill switch the platform team can pull. The old "premium request" cap was a clumsy version of this. The new model gives you finer-grained accounting and zero training wheels.
3. Model selection moves into policy. Because credits are priced at API rates, letting the agent freely pick a frontier model for every step is the most expensive default possible. A reasonable team policy now looks like: cheap model for retrieval and tool selection, mid-tier model for planning, frontier model only for the final synthesis step or when a confidence threshold is missed. Some competing tools ship this kind of routing as a built-in setting. Copilot leaves more of it to you.
4. The fallback removal changes failure modes. Previously, a hit quota meant degraded output. Now it means a failed run, mid-flight, on an agent that may have already partially applied changes to a working tree or opened a draft PR. Your wrappers around agent runs need to treat "out of credits" as a first-class error case with a defined cleanup path, the same way you already treat rate limits and 5xx responses.
5. Cost attribution per developer, per repo, per agent identity matters now. Under the seat model, "who is using Copilot" was a per-seat boolean. Under the credit model, "who burned 18,000 credits last week and on which repo" is the question. The answer depends on which Copilot identity (or, in mixed shops, which underlying API key) was attached to the agent at the time of the run.
The honest credential and cost angle
This is the one place where the credential story and the billing story line up cleanly, so we will be brief.
When every agent call costs real money per token, the question "which agent identity is hitting which repo with whose Copilot or API key?" stops being only a security question. It becomes a budget-attribution question with the same answer. If you cannot tell, from the proxy or audit log, which identity made a given outbound call, you also cannot tell which budget line the cost should land on.
Authsome is a local-first credential broker that bundles GitHub OAuth and lets you scope per-account connections with --connection <name>. The honest framing is that it was designed for credential isolation, not cost telemetry. Because every outbound call goes through a local proxy that writes to an append-only JSONL audit log, you get per-account attribution as a side effect of doing the credential side properly. If your team is already wrestling with multiple GitHub accounts per agent or thinking through GitHub token hygiene under the new billing reality, the same primitive answers both questions.
Two things Authsome does not do today, said plainly so this lands honestly:
- It does not ship a per-agent policy engine that decides which agent may use which provider. There is a global allow/deny proxy mode per run. Per-agent policy is not in the product today.
- It does not export OpenTelemetry. The audit log is append-only JSONL on disk. If you want cost-per-task metrics in your existing observability stack, you are writing the exporter or tailing the JSONL yourself.
If your cost problem is "we cannot tell which key paid for which run," the broker pattern helps. If your problem is "we need a real-time budget governor across a fleet of orchestrators," you are building it yourself this quarter, on top of whatever telemetry your agent framework already emits.
What to do this week
For a platform engineer reading this on June 2:
- Pull yesterday's billing CSV as a baseline. Whatever today's bill looks like is the floor; long-horizon agent adoption only adds to it.
- Set per-task and per-day credit ceilings in whichever orchestrator you run. Treat the absence of a ceiling as a production incident waiting to happen.
- Audit which model your agents default to. If the default is the most expensive frontier model in the catalog, change it. The cost delta across models is the single biggest lever you have.
- Wire "out of credits" into your agent error handling the same way you wire 429s. The old soft fallback is not coming back.
- Decide your attribution unit now. Per-developer, per-repo, per-team, or per-agent identity. Pick one and start tagging runs at the source. Reconciling against the GitHub bill at month end is much easier if the tags exist on day one.
The credit meter is live. The fallback is gone. The vendor's framing that this is the honest cost showing up on the bill is technically correct. The dev community's reaction that this turns a $10 plan into a variable-cost utility is also correct. Both can be true. The teams that adjust their telemetry and policy fastest will be the ones who get the productivity gains of agent mode without the next quarter's overage line item.
Next steps
Quickstart
Run an agent with credentials swapped in by a local proxy, never pasted into env vars.
Multiple GitHub accounts for agents
Scope per-account GitHub OAuth with --connection, so cost and access both attribute cleanly.
GitHub token hygiene for AI agents
What changes when every agent call against GitHub now also costs metered credits.
Codex and frontier models on Bedrock
The other side of the same story: where agent inference is moving and what it changes for builders.
Further reading
ChatGPhish: every page your agent summarizes is now a phishing surface
Permiso disclosed three working prompt-injection chains in ChatGPT's Markdown renderer on May 29, 2026: fake OpenAI security buttons, inline QR codes that pivot to mobile, and tracking pixels that leak IP and User-Agent. Why the renderer is the wrong trust boundary, and what every browse-with-LLM product just inherited.
Read postJun 2, 2026Claude Opus 4.8 ships 1,000-subagent Dynamic Workflows. Now multiply your blast radius.
Anthropic released Claude Opus 4.8 on May 28, 2026 with Dynamic Workflows, letting Claude Code orchestrate up to 1,000 subagents per run. Here is what the release actually changes, and the credential-management problem nobody is talking about.
Read postJun 2, 2026Alphabet's $80 Billion Equity Raise: What an AI Capex Doubling Means for Agent Builders
Alphabet is raising $80B in equity to fund a 2026 capex run rate near $190B. Here is the deal structure, the dilution math, and what it actually changes for teams building on Gemini, Vertex AI, and TPUs.
Read post