Private model access, behind one endpoint.

Use an issued bearer token with OpenAI-compatible routes. Requests are routed by model name to the right inference service.

Authorization: Bearer <token>