Claude 3.5 Sonnet model degradation — is it real?
Reports of Claude 3.5 Sonnet getting "worse" surface periodically on r/ClaudeAI, r/Anthropic, and X. Three things are true at once. First, Anthropic *has* shipped explicit snapshot upgrades (claude-3-5-sonnet-20240620 to claude-3-5-sonnet-20241022) — those are documented version bumps, not silent drift. Second, the claude-3-5-sonnet-latest alias re-points across snapshots, and behavior on long-context, tool-use, and refusal calibration genuinely changes between them. Third, perceived degradation is often distribution drift on the user side (different prompts, new task surfaces, vibes) rather than real regression.
The defensible answer is dated eval data. Aider's public leaderboard, Artificial Analysis, and lmsys Chatbot Arena all show Sonnet snapshot deltas over time on coding and reasoning. ModelWatch runs a fixed daily suite against claude-3-5-sonnet-latest, both dated snapshots, and Haiku/Opus side-by-side, and publishes the diff free at modelwatch.app.