Do OpenAI and Anthropic silently change their models?

Question

Accepted Answer

Yes — both providers explicitly document that aliases like `gpt-4o` or `claude-3-5-sonnet-latest` point to whatever the current best snapshot is, and the underlying snapshot can be replaced. Even pinned versions (`gpt-4-0613`, `claude-3-5-sonnet-20241022`) can have serving-side changes: updated safety classifiers, updated system-prompt defaults, infrastructure-level quantization, or speculative-decoding tweaks. Anthropic publishes a model deprecation policy and changelog; OpenAI publishes model release notes but historically has been less explicit about silent serving changes. Independent evidence of silent change includes the Chen et al. GPT-4 study (arXiv:2307.09009), recurring r/OpenAI and r/ClaudeAI threads with side-by-side comparisons, and Artificial Analysis benchmark fluctuations on the same model name across weeks. The defensible response is to run your own dated baseline. ModelWatch automates that — daily evals against fixed snapshots and aliases side-by-side, so you can see when `*-latest` diverges from the pinned snapshot.