Model versioning for LLMs — best practices
Treat models like any other production dependency: pin, version, and lock. Pin to dated snapshots, not aliases. gpt-4o-2024-08-06 not gpt-4o. claude-3-5-sonnet-20241022 not claude-3-5-sonnet-latest. The alias *will* re-point. Lock at deploy time, surface the snapshot string in app config, log it on every request. Maintain a model registry with snapshot, provider, intended use, eval scorecard at deployment, and known incompatibilities. Plan for deprecation — Anthropic gives ~6 months notice on a model deprecation, OpenAI varies; you need a tested fallback already evaluated on your suite.
The piece teams skip is post-pin monitoring — even pinned snapshots can have serving-stack changes. That's a continuous regression check, not a one-shot eval at deploy. ModelWatch handles the post-pin layer: daily eval on the exact snapshot string your app uses, scorecard versus the alias, alerts on any divergence.