ModelWatch

Do OpenAI and Anthropic silently change their models?

Yes — both providers explicitly document that aliases like gpt-4o or claude-3-5-sonnet-latest point to whatever the current best snapshot is, and the underlying snapshot can be replaced. Even pinned versions (gpt-4-0613, claude-3-5-sonnet-20241022) can have serving-side changes: updated safety classifiers, updated system-prompt defaults, infrastructure-level quantization, or speculative-decoding tweaks. Anthropic publishes a model deprecation policy and changelog; OpenAI publishes model release notes but historically has been less explicit about silent serving changes.

Independent evidence of silent change includes the Chen et al. GPT-4 study (arXiv:2307.09009), recurring r/OpenAI and r/ClaudeAI threads with side-by-side comparisons, and Artificial Analysis benchmark fluctuations on the same model name across weeks. The defensible response is to run your own dated baseline. ModelWatch automates that — daily evals against fixed snapshots and aliases side-by-side, so you can see when *-latest diverges from the pinned snapshot.