When to pin a model version vs use the latest pointer

Question

Accepted Answer

Default to pinned snapshots in production; use `*-latest` aliases only in clearly-scoped non-production contexts. The reasoning: **Pin (`gpt-4o-2024-08-06`, `claude-3-5-sonnet-20241022`) when:** (a) the model serves production user traffic at meaningful scale or stakes; (b) you have downstream contracts on output format (JSON schema, exact tool-use shape, refusal calibration) that a snapshot shift could break; (c) you have an eval suite and rollout process — pinning is what lets you evaluate before adopting, which is the whole point; (d) you're running anything where reproducibility matters (eval research, A/B tests, audit logs). The cost of pinning is small: you're typically 1–2 snapshots behind frontier capability at any moment, in exchange for behavioral stability. **Use latest (`gpt-4o`, `claude-3-5-sonnet-latest`) when:** (a) you're prototyping or exploring; (b) the use case explicitly wants whatever-is-newest with no production guarantees (e.g., a "try the latest model" demo); (c) cost-sensitive batch workloads where any frontier-class model is acceptable and you want provider-managed routing. The hybrid pattern that works for most production: **pin in code, but track latest in monitoring.** Your production app calls `gpt-4o-2024-08-06`. Your daily eval suite calls both `gpt-4o-2024-08-06` (the pinned one) and `gpt-4o` (the alias). When the alias diverges from your pinned snapshot — which happens whenever a new dated snapshot ships — you'll see the delta on your scorecard and can decide whether to upgrade. This is ModelWatch's default configuration on every tracked model.