Safe API access for LangChain and LlamaIndex agents

If you build agents with LangChain or LlamaIndex, you have done this dance a hundred times. Set OPENAI_API_KEY in a .env, maybe a GitHub token and a SerpAPI key alongside it, construct your ChatOpenAI or your OpenAI LLM, wire up a few tools, and ship. It works on the first try, which is exactly the problem. The setup that works on the first try is also the setup that leaks every key the first time your agent reads something it shouldn't.

This post is the how-to for loading provider keys safely in a LangChain or LlamaIndex agent. We will start with how keys actually load today, because most guides stop there and call it done. Then we will look at the part nobody priced in: the moment an agent ingests untrusted text, your process environment becomes an exfiltration target. Finally, two concrete patterns that keep the real secret out of the agent's reach.

How LangChain and LlamaIndex load keys today

LangChain's ChatOpenAI reads its key from OPENAI_API_KEY by default. If you do not pass api_key, it is inferred from the environment:

python

import os
from langchain_openai import ChatOpenAI

os.environ["OPENAI_API_KEY"] = "sk-..."   # real secret, now in os.environ
llm = ChatOpenAI(model="gpt-4o-mini")      # inferred from OPENAI_API_KEY

LlamaIndex is the same story. The documented setup is literally an env var assignment followed by the client constructor:

python

import os
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings

os.environ["OPENAI_API_KEY"] = "sk-..."    # real secret in env
Settings.llm = OpenAI(model="gpt-4o-mini") # reads OPENAI_API_KEY, or pass api_key=

Both frameworks let you pass the key explicitly instead of reading the env var. LangChain's api_key parameter is worth a close look, because it accepts not just a string but a sync or async callable that returns a string:

python

# api_key can be a callable that returns a str (langchain-openai reference)
llm = ChatOpenAI(model="gpt-4o-mini", api_key=lambda: get_token())

Hold onto that callable form. It is the cleanest seam for the second safe pattern below.

Here is the thing the standard setup gets wrong, and it is not "you hardcoded a key." Plenty of teams have already moved past hardcoding. They use a secrets manager. The real issue is more subtle, and it survives every secrets manager on the market.

The question is not where you put the key

When developers think about API key hygiene, they think about the wrong question. The question they ask is "where do I put the key so it is not in git." That is a real question and tools answer it well. The question they do not ask is "what can read the key once it is loaded." Environment variables answer the first question and fail the second one completely.

Once a key is in os.environ, anything running in that process can read it. Your code can. Every dependency you imported can. Every LangChain tool, every subprocess your agent spawns, and crucially, the agent itself when it is steered by text it did not write. That last category is where agents differ from ordinary scripts. An ordinary script does what its author wrote. An agent does what the most recent untrusted input convinced it to do.

This is a demonstrated attack, not a thought experiment

Indirect prompt injection that dumps environment variables is an established class of attack with real, recent incidents behind it. Greshake and colleagues showed in 2023 that an agent could be instructed to read its environment and ship the contents to an attacker endpoint. The pattern has not gone away. In late 2025, a public advisory documented a shipped coding agent that could be coerced into exfiltrating env-var API keys over DNS, triggered by a malicious GitHub issue, with no user consent in the loop. The injection arrived as ordinary content the agent was asked to process, and the agent obliged.

Then there is CVE-2025-68664, nicknamed LangGrinch, which lands directly on LangChain users. It is a serialization-injection flaw in langchain-core rated CVSS 9.3. Crafted data containing a LangChain "secret" marker, an object shaped like {"lc": 1, "type": "secret", "id": ["ENV_VAR"]}, was treated as a genuine secret reference during deserialization. An attacker who could get such data into the deserialization path could load arbitrary environment variables, not just a known key name. It was exploitable because the relevant flag, secrets_from_env, defaulted to True. The fix in langchain-core 0.3.81 and 1.2.5 flips that default to False. Affected versions are below 0.3.81 and the 1.0.0 through 1.2.4 range.

LangGrinch is the cleanest evidence that "env-var secrets plus untrusted input in an orchestration loop" is a real exfiltration path. The vulnerability is patched, and you should patch it. But patching one CVE does not change the structural fact underneath it: if the real secret lives in the process environment, then the injection that reaches your process reaches the secret. The next LangGrinch will have a different CVE number.

If you want the full mechanism, we walk through it in How prompt injection becomes credential exfiltration, and the broader taxonomy is in AI agent security in 2026, four threat models.

Why secrets managers do not close this gap

The natural reaction is "fine, I will pull the key from a vault at runtime instead of hardcoding it." This is good practice and you should do it. It just does not fix the problem we are discussing, and it is important to be precise about why.

Look at what these tools actually do. Doppler:

bash

doppler run -- python my_agent.py

Doppler fetches the latest versions of your secrets and injects them as environment variables into the running process. Infisical is the same:

bash

infisical run --env=prod -- python my_agent.py

So is HashiCorp Vault Agent in process-supervisor mode, which renders secrets into environment variables for the child it supervises:

bash

vault agent generate-config -type="env-template" \
  -exec="python my_agent.py" -path="ai-app/openai" agent-config.hcl
vault agent -config=agent-config.hcl

Every one of these tools solves the git-commit problem, the .env-sprawl problem, and the rotation problem. Those are real wins. But notice the shared property: each one delivers the real secret into the child process's environment. An OPENAI_API_KEY pulled fresh from Vault sits in os.environ exactly the same way a hardcoded one does, and it is exactly as readable by an injected prompt. The provenance of the secret does not matter to the attacker. What matters is that the live secret is sitting in a place the agent can read.

We go deeper on this mismatch in AWS Secrets Manager isn't built for AI agents and Stop putting API keys in environment variables. The short version: a secrets manager moves the key from disk to memory. It does not remove the key from the agent's reach.

Pattern 1: run the agent behind a broker proxy

The fix that actually addresses the agent-reads-its-own-env problem is to make sure the real secret never enters the child's environment at all. That is what a credential broker does, as opposed to a secrets manager. The distinction matters: a secrets manager hands you the secret, a broker holds the secret and injects it at the network boundary on your behalf.

Authsome is one open-source, local-first implementation of this. You log in once per provider, then run your agent under a local proxy:

bash

authsome login openai
authsome login github
authsome run -- python my_agent.py

What the child process sees in its environment is a placeholder, not a key:

bash

authsome run -- env | grep -E 'PROXY|OPENAI'
# HTTP_PROXY=http://127.0.0.1:<port>
# HTTPS_PROXY=http://127.0.0.1:<port>
# OPENAI_API_KEY=authsome-proxy-managed

Your LangChain or LlamaIndex code does not change. ChatOpenAI reads OPENAI_API_KEY, sees authsome-proxy-managed, and initializes happily, because most SDKs only check that the variable is non-empty at construction time. When the SDK makes its outbound HTTPS call to api.openai.com, the local proxy matches the destination host and swaps in the real Authorization header on the way out. The same goes for a GitHub call to api.github.com, a SerpAPI call, or any other matched host.

Now replay the attack. A prompt injection convinces the agent to dump os.environ and POST it somewhere. What leaks is OPENAI_API_KEY=authsome-proxy-managed. That string is worth nothing. The real sk-... was never in the process for the injection to find. You have not made the agent un-injectable, but you have made the injection's payout a placeholder instead of a long-lived, broadly-scoped key.

This works with the existing tool wrappers too. A LangChain tool that reads its secret from an env var picks up the placeholder, and the proxy injects the real header for it:

python

from langchain_community.utilities import SerpAPIWrapper

# reads SERPAPI_API_KEY from env, which is now the proxy placeholder
search = SerpAPIWrapper()

For a provider it does not bundle, you add a small JSON file describing the host and auth so the proxy can match it. No code change in your agent.

The honest tradeoff

This pattern has a cost, and it would be dishonest to skip it. HTTPS interception requires the proxy's CA certificate to be trusted on the machine running the agent. Without that trust, every HTTPS call fails TLS verification. On a dev box or a controlled runner this is a one-time setup. In some locked-down environments it is a non-starter, which is precisely why the second pattern exists.

Two more honest limits. A broker proxy does not stop an injected agent from misusing a credential it is legitimately allowed to use. If the agent is permitted to call OpenAI and an injection tells it to call OpenAI in a wasteful or harmful way, the proxy will still inject the header and forward the request. The win is narrower and concrete: the static, exfiltratable secret is gone from the process, so a leaked environment is no longer a leaked key. And the proxy only handles HTTP and HTTPS traffic over its connection. Non-HTTP transports slip past it.

Pattern 2: resolve tokens in-process at client construction

When the proxy is not an option, because an SDK pins its TLS certificates and refuses the proxy CA, because you are on a non-HTTP transport like a WebSocket or gRPC stream, or because you need a different account per call inside one process, you drop into library mode. Here you read the credential programmatically and pass it straight into the client. The secret still touches the process, but only at the instant of client construction, and it never has to sit in os.environ for the lifetime of the run.

The library surface is create_auth_service(), which returns an object with get_access_token(provider, connection=...):

python

from authsome.server.dependencies import create_auth_service
from langchain_openai import ChatOpenAI
from langchain_community.tools.github.tool import GitHubAction

auth = create_auth_service()

llm = ChatOpenAI(
    api_key=auth.get_access_token("openai"),
    model="gpt-4o-mini",
)

github = GitHubAction(
    github_access_token=auth.get_access_token("github"),
)

This is where LangChain's callable api_key form can help. Because the parameter accepts a callable, you can hand ChatOpenAI a function that resolves the token instead of a bare string:

python

auth = create_auth_service()
llm = ChatOpenAI(model="gpt-4o-mini", api_key=lambda: auth.get_access_token("openai"))

The auth layer is synchronous, so the simplest and recommended habit is to resolve the token once at client construction and pass the value in. Construct clients per request rather than caching them for the whole run, and re-resolve if your process outlives a token's TTL.

LlamaIndex follows the same shape. Pass the resolved token into the LLM or the data reader at construction:

python

from authsome.server.dependencies import create_auth_service
from llama_index.core import Settings
from llama_index.llms.openai import OpenAI
from llama_index.readers.github import GithubRepositoryReader, GithubClient

auth = create_auth_service()

Settings.llm = OpenAI(api_key=auth.get_access_token("openai"))

reader = GithubRepositoryReader(
    github_client=GithubClient(auth.get_access_token("github")),
    owner="agentrhq",
    repo="authsome",
)
docs = reader.load_data(branch="main")

Token refresh is handled for you. If your process outlives a token's TTL, call get_access_token again rather than caching the string across the whole run.

Multiple accounts, one process

Library mode is also how you juggle multiple accounts for the same provider, which agents hit constantly when one chain touches a work GitHub org and a personal one in the same run:

python

work = auth.get_access_token("github", connection="work")
personal = auth.get_access_token("github", connection="personal")

We cover that workflow end to end in Managing multiple GitHub accounts for AI agents.

Which pattern, when

Here is the decision in one table.

Situation	Pattern
Agent calls providers over HTTPS, you can trust the proxy CA	Proxy. Lead with this.
You want the real key out of `os.environ` entirely	Proxy.
SDK pins TLS certs and rejects the proxy CA	Library mode.
Non-HTTP transport (WebSocket, gRPC, raw TCP)	Library mode.
Different connection per call inside one process	Library mode, with `get_access_token(..., connection=...)`.
Embedding in a larger orchestrator that manages its own subprocesses	Library mode.

The proxy is the simpler and stronger default because it requires no code change and removes the secret from the process completely. Library mode is the precise tool for the cases the proxy cannot reach. Most agents want the proxy. A minority of SDKs and transports force library mode, and that is fine. Both read from the same local encrypted vault and produce the same audit trail.

What you actually gained

Step back from the mechanics. Before, your LangChain or LlamaIndex agent held a long-lived, broadly-scoped API key in a process that, by design, executes instructions from text it did not author. A single successful injection, or one deserialization bug like LangGrinch, turned that into a stolen key with a long blast radius and an unmanaged rotation story.

After, with the proxy, the process holds a placeholder and the real key lives in a local broker that injects it only on the outbound request. The worst an injection can exfiltrate from the environment is the string authsome-proxy-managed. With library mode, the key touches the process only for the instant of client construction and never settles into os.environ. Neither pattern makes your agent immune to prompt injection. What they do is downgrade the consequence of a successful injection from "attacker now holds your OpenAI and GitHub keys" to "attacker made one request that had to go through the broker." That is the difference between an incident and a footnote.

If you want the wider context on where brokers fit among the 2026 tooling, Agent credential brokers in 2026 maps the landscape.

Next steps

Quickstart

Install authsome, log in to a provider, and run your first agent under the proxy in a few minutes.

LangChain integration

The full LangChain setup: proxy mode, tool wrappers, and library-mode token resolution.

LlamaIndex integration

Use authsome with LlamaIndex LLM clients, data loaders, and retrievers.

Run agents with the proxy

How placeholder env vars and host-based header injection work, plus the TLS tradeoff.

Safe API access for LangChain and LlamaIndex agents

How LangChain and LlamaIndex load keys today

The question is not where you put the key

This is a demonstrated attack, not a thought experiment

Why secrets managers do not close this gap

Pattern 1: run the agent behind a broker proxy

The honest tradeoff

Pattern 2: resolve tokens in-process at client construction

Multiple accounts, one process

Which pattern, when

What you actually gained

Next steps

Quickstart

LangChain integration

LlamaIndex integration

Run agents with the proxy

Further reading

How Authsome keeps agent credentials out of your env vars

GitHub token hygiene for AI agents: PATs, fine-grained tokens, GitHub Apps, and OAuth

GitHub token hygiene for AI agents: PATs, fine-grained tokens, GitHub Apps, and OAuth