what i learned shipping an oauth-protected mcp server

i added a model context protocol server to indebtio, my self-service debt-collection product. modest goal: let users connect claude, chatgpt, or cursor and ask things like “show me my overdue cases” or “which debtors haven’t paid in 30 days” without leaving their ai client.

mcp is a small, transport-agnostic protocol that lets ai clients call typed tools on a remote server — the equivalent of a function-calling api standardised across vendors. the agentcore gateway lesson covers the same shape from the other side: wrapping your own apis and lambdas as mcp tools for an agent to consume. this post is about the server you stand up when you are the third party — and specifically, the auth surface around it.

the protocol surface itself is small — a streaming endpoint and a handful of tools. the interesting work is everywhere else: oauth 2.1 underneath it, the discovery chain ai clients walk before they can talk to you, and the security boundary that has to hold even when the caller is an llm rather than a logged-in user.

i build auth code the same way i do for enterprise: the threat model sits next to the implementation, not after it. so once the protocol round-tripped end-to-end with claude.ai, i ran the surface through a security review — owasp’s api guidance, the oauth 2.1 security best-current-practice, threat modelling against the dcr flow — before opening it to anyone. it surfaced findings i didn’t expect, and a couple i should have.

what follows is the subset worth writing about. the obvious hardening (helmet, cors policy, rate-limit defaults, deploy hygiene) is left out; this is the stuff that’s specific to the shape of an oauth-protected mcp endpoint being called by ai clients, where the principal isn’t a logged-in user and the consent screen has to defend against threats that don’t exist in the human-facing flow.

the discovery chain is longer than it looks

when an ai client connects to an mcp url, the protocol assumes the client knows nothing. it walks a chain of standardized documents to find the authorization server and registers itself as a client. end to end:

POST https://mcp.example.com/mcp with no token returns 401 and a WWW-Authenticate header pointing at the protected-resource metadata.
client fetches /.well-known/oauth-protected-resource (rfc 9728) and reads authorization_servers.
client fetches /.well-known/oauth-authorization-server on the as (rfc 8414) to discover endpoints.
client posts to the registration_endpoint (rfc 7591, dynamic client registration) with its name and redirect uris. the as replies with a client_id.
client builds an authorization url with pkce, redirects the user.
user logs in, sees a consent screen, approves.
as redirects back to the client with a one-time code.
client exchanges the code at the token_endpoint for an access token and refresh token.
client calls /mcp again with the bearer token. the resource server validates audience, scope, and claims, then serves the request.

visualised, with the actors in columns:

  ai client                 mcp server         authorization server
      │                          │                       │
      │  POST /mcp ────────────► │                       │
      │  ◄──── 401 + WWW-Authenticate                    │
      │                                                  │
      │  /.well-known/oauth-protected-resource ─►        │
      │  ◄── { authorization_servers }                   │
      │                                                  │
      │  /.well-known/oauth-authorization-server ───────►│
      │  ◄────────────── endpoints ──────────────────────│
      │                                                  │
      │  POST /register (rfc 7591) ─────────────────────►│
      │  ◄──────────── { client_id } ────────────────────│
      │                                                  │
      │  /authorize + pkce ─────────────────────────────►│
      │       [ user logs in, consents ]                 │
      │  ◄──────────── one-time code ────────────────────│
      │                                                  │
      │  POST /token ───────────────────────────────────►│
      │  ◄────── { access_token, refresh_token } ────────│
      │                                                  │
      │  POST /mcp (bearer) ───► │                       │
      │  ◄──── result            │                       │

every one of those steps is a hop. every hop has its own failure modes. mcp gives you the last step. the other eight you build yourself.

”read-only” is not a security boundary if the backend doesn’t enforce it

the consent screen and the mcp tool catalog both said “read-only access”. the mcp server only exposes list_cases, get_debtor, and similar get-shaped tools. so from the user’s perspective, the ai client cannot write.

that’s a ui promise, not a security boundary. the oauth access token validated through the same JwtAuthGuard as a session token. a leaked or replayed token could call POST /v1/letters, DELETE /v1/cases/:id, or PATCH /v1/payment-plans/:id because nothing on the backend distinguished “this came from an oauth client” from “this came from the logged-in user”. owasp api1/api5 — broken function-level authorization.

the fix is small but load-bearing. tag tokens at signing time with typ: 'oauth_access' and scope: 'read'. surface those on req.user. then in the auth guard:

const SAFE_METHODS = new Set(['GET', 'HEAD', 'OPTIONS']);

if (user?.typ === 'oauth_access' && !SAFE_METHODS.has(req.method)) {
  throw new ForbiddenException({
    error: 'insufficient_scope',
    scope: 'write',
  });
}

twenty lines, but it’s the difference between a ui claim and an actual api boundary.

dynamic client registration is open by spec, and that has consequences

rfc 7591 lets any caller register an oauth client with any name and any redirect uri. the server is supposed to allow this so mcp clients can self-onboard. fine — until you remember that the consent screen renders client.name as the protagonist.

threat model: an attacker registers a client with client_name: "Claude" and redirect_uris: ["https://evil.example/cb"]. they send a victim a crafted authorization url. the user lands on a legitimate indebtio consent screen that says “claude would like to read your data.” user clicks connect. the code goes to evil.example. game over.

the mitigation isn’t to close dcr. it’s to surface a signal the attacker can’t fake. client names are typed in a registration form. hosts on redirect uris are something you actually have to own a dns record for.

so the consent screen now shows the redirect host as a labeled, monospaced field next to the app name, with a short disclaimer:

indebtio does not verify third-party apps. only continue if you trust claude and recognize the host above. anyone can register an app name; the host is harder to fake.

not perfect. it does shift the verification work onto the field that’s most expensive for the attacker to forge.

refresh-token rotation needs replay detection, or it does nothing

oauth 2.1 §4.13 says: when a rotated refresh token is presented a second time, revoke the entire token family. my first version revoked only the replayed token and returned invalid_grant. the legitimate descendant kept refreshing forever. which means an attacker who steals a refresh token, replays it, loses the race, and sees invalid_grant, has lost nothing. the legitimate client is still issuing fresh access tokens the attacker can’t see — but the next time the attacker grabs a token through some other route, they’re still in.

replay detection makes the leak self-healing. the schema change is small: add parentTokenId (self-relation) on the refresh-token table. when you rotate, link the new token to the old. when a revoked token is presented, walk forward through parentTokenId and revoke every descendant.

the same move closes the equivalent gap for authorization codes. oauth 2.1 §4.1.3: an exchanged auth code that gets presented a second time should also revoke any tokens that were issued from its lineage. same pattern, different root.

rs256 + jwks, because shared hmac secrets are an own-goal

the mcp server is a different runtime from the backend. it validates oauth access tokens. originally both shared a single JWT_SECRET and verified hs256. which means a compromise of the mcp container (the smaller, newer, less-hardened of the two) yielded the ability to mint backend session tokens with arbitrary sub and isAdmin: true. the mcp container has no business being able to do that.

asymmetric signing fixes the directionality. the backend signs oauth tokens with rs256 using a private key it never exports. it publishes the public half at /.well-known/jwks.json. the mcp server fetches that jwks lazily — jose.createRemoteJWKSet, which gives free key rotation via refresh-on-unknown-kid — and can only ever verify, never mint.

session jwts still use hs256 with the original secret because there’s only one process minting and verifying them. the single passport-jwt strategy now dispatches on the jwt header alg:

secretOrKeyProvider: (_req, raw, done) => {
  const header = JSON.parse(Buffer.from(raw.split('.')[0], 'base64url').toString());
  if (header.alg === 'RS256') return done(null, oauthPublicPem);
  if (header.alg === 'HS256') return done(null, sessionSecret);
  return done(new Error(`Unsupported alg ${header.alg}`));
},

the kid in the jwt header (rfc 7638 thumbprint of the public jwk) lets mcp re-fetch a fresh jwks when keys rotate, with no extra plumbing. (the agentcore identity lesson walks the same key-separation pattern from the agent side, where an agent runtime validates inbound oauth tokens and brokers outbound ones — different actors, same directionality argument.)

the smaller fixes that earned their keep

a pile of smaller items, in no particular order:

restrict redirect_uri to https, with a http://localhost carve-out for developer mcp clients. the default IsUrl({ require_protocol: true }) happily accepts plain http. auth codes leak over plaintext.
whitelist the scope parameter to a known value (@IsIn(['read'])). until a write scope actually ships, accept nothing else — empty, missing, and unknown all collapse to a 400, not a silent grant of “everything.”
two-tier rate limiting on the mcp endpoint: an ip-keyed limiter on every request (drops abuse before auth), and a tighter per-userid limiter behind authentication (drops a single account’s runaway agent without affecting other users). the second tier matters more than i expected — agents loop.
sanitize error logging. console.error('MCP request failed', err) looks fine until you notice that some sdk wrappers attach the original request object to thrown errors — which means the Authorization: Bearer <jwt> header ends up in logs. log err.name and err.message, never the raw object.

what i’d watch out for next time

two things.

threat-model the design, not the implementation. i ran the security review before opening the surface to anyone, which is the bar i meant to meet. but the structural mistake — scope not enforced at the api layer — is the kind of thing a threat model catches at the diagram stage, not in a code review against an existing implementation. the cost of moving it earlier is half an hour. the cost of finding it later is rewriting the auth guard.

open-by-spec is not the same as safe-by-default. rfc 7591 dynamic client registration, rfc 8414 metadata, rfc 9728 protected-resource metadata: all intentionally open. the protections that prevent abuse live in how you display the resulting state to the user, not in closing the endpoints. the redirect-host display on the consent screen is the single highest-impact ui change i made on this whole project.

mcp the protocol is small. the surface around it isn’t. if you’re adding it to something that holds real data, the protocol spec is the smaller half of the reading list. the oauth 2.1 errata and the owasp api top ten are the other half.

this is a /craft/ build-note. the agent-side counterpart of most of it lives under /ai/.