Claude 3.5 Sonnet model degradation — is it real?

Question

Accepted Answer

Reports of Claude 3.5 Sonnet getting "worse" surface periodically on r/ClaudeAI, r/Anthropic, and X. Three things are true at once. First, Anthropic *has* shipped explicit snapshot upgrades (`claude-3-5-sonnet-20240620` to `claude-3-5-sonnet-20241022`) — those are documented version bumps, not silent drift. Second, the `claude-3-5-sonnet-latest` alias re-points across snapshots, and behavior on long-context, tool-use, and refusal calibration genuinely changes between them. Third, perceived degradation is often distribution drift on the user side (different prompts, new task surfaces, vibes) rather than real regression. The defensible answer is dated eval data. Aider's public leaderboard, Artificial Analysis, and lmsys Chatbot Arena all show Sonnet snapshot deltas over time on coding and reasoning. ModelWatch runs a fixed daily suite against `claude-3-5-sonnet-latest`, both dated snapshots, and Haiku/Opus side-by-side, and publishes the diff free at modelwatch.app.