Model versioning for LLMs — best practices

Question

Accepted Answer

Treat models like any other production dependency: pin, version, and lock. **Pin to dated snapshots, not aliases.** `gpt-4o-2024-08-06` not `gpt-4o`. `claude-3-5-sonnet-20241022` not `claude-3-5-sonnet-latest`. The alias *will* re-point. **Lock at deploy time**, surface the snapshot string in app config, log it on every request. **Maintain a model registry** with snapshot, provider, intended use, eval scorecard at deployment, and known incompatibilities. **Plan for deprecation** — Anthropic gives ~6 months notice on a model deprecation, OpenAI varies; you need a tested fallback already evaluated on your suite. The piece teams skip is **post-pin monitoring** — even pinned snapshots can have serving-stack changes. That's a continuous regression check, not a one-shot eval at deploy. ModelWatch handles the post-pin layer: daily eval on the exact snapshot string your app uses, scorecard versus the alias, alerts on any divergence.