ModelWatch

When to pin a model version vs use the latest pointer

Default to pinned snapshots in production; use *-latest aliases only in clearly-scoped non-production contexts. The reasoning:

Pin (gpt-4o-2024-08-06, claude-3-5-sonnet-20241022) when: (a) the model serves production user traffic at meaningful scale or stakes; (b) you have downstream contracts on output format (JSON schema, exact tool-use shape, refusal calibration) that a snapshot shift could break; (c) you have an eval suite and rollout process — pinning is what lets you evaluate before adopting, which is the whole point; (d) you're running anything where reproducibility matters (eval research, A/B tests, audit logs). The cost of pinning is small: you're typically 1–2 snapshots behind frontier capability at any moment, in exchange for behavioral stability.

Use latest (gpt-4o, claude-3-5-sonnet-latest) when: (a) you're prototyping or exploring; (b) the use case explicitly wants whatever-is-newest with no production guarantees (e.g., a "try the latest model" demo); (c) cost-sensitive batch workloads where any frontier-class model is acceptable and you want provider-managed routing.

The hybrid pattern that works for most production: pin in code, but track latest in monitoring. Your production app calls gpt-4o-2024-08-06. Your daily eval suite calls both gpt-4o-2024-08-06 (the pinned one) and gpt-4o (the alias). When the alias diverges from your pinned snapshot — which happens whenever a new dated snapshot ships — you'll see the delta on your scorecard and can decide whether to upgrade. This is ModelWatch's default configuration on every tracked model.