← Guides
Setup8 min read

Tracking LLM Costs in FastAPI Applications

Instrument a FastAPI service to track LLM spend per endpoint, user, and feature — with async fire-and-forget tracking that never blocks responses.

Install

pip
pip install "costlynx[openai]" fastapi uvicorn

Per-endpoint tracking

The recommended pattern is to call atrack_openai_response() (or atrack_anthropic_response()) after each LLM call. It is fire-and-forget — it does not await the network request before returning the API response to your user.

FastAPI endpoint
import os
from fastapi import FastAPI, Request
from openai import AsyncOpenAI
from costlynx import CostLynx

clx = CostLynx(
    ingestion_key=os.environ["COSTLYNX_INGESTION_KEY"],
    default_project="api-service",
    default_environment=os.getenv("ENV", "prod"),
)
openai = AsyncOpenAI()
app = FastAPI()

@app.post("/chat")
async def chat(request: Request):
    body = await request.json()
    response = await openai.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": body["message"]}],
    )
    # Fire-and-forget — does not block the response
    await clx.atrack_openai_response(
        response,
        feature="chat",
        user_identifier=request.headers.get("X-User-Id"),
    )
    return {"reply": response.choices[0].message.content}

Tip

Pass X-User-Id from your authentication layer to user_identifier to get per-user cost breakdowns in the dashboard.

Auto-track all calls with lifespan middleware

For services that make many LLM calls across multiple routes, patch the OpenAI client at startup to automatically track every response.

Lifespan middleware
from contextlib import asynccontextmanager
from typing import AsyncIterator

@asynccontextmanager
async def lifespan(app: FastAPI) -> AsyncIterator[None]:
    original_create = openai.chat.completions.create

    async def _tracked_create(*args, **kwargs):
        response = await original_create(*args, **kwargs)
        await clx.atrack_openai_response(
            response,
            feature=kwargs.get("extra_headers", {}).get("X-Feature"),
        )
        return response

    openai.chat.completions.create = _tracked_create
    yield

app = FastAPI(lifespan=lifespan)

Environment configuration

VariableDescription
COSTLYNX_INGESTION_KEYYour ingestion key from Settings → Configure
ENVprod, staging, or dev — passed as default_environment
DEBUG_COSTLYNXSet to 1 to print tracking errors to stderr in development