pass@1 is a gamble — how ensemble coding enhances AI reliability
You ask Claude to fix a bug. It nails it. You ask it again with slightly different phrasing. It refactors half your module and breaks three unrelated tests. Same model, same task, different result....

Source: DEV Community
You ask Claude to fix a bug. It nails it. You ask it again with slightly different phrasing. It refactors half your module and breaks three unrelated tests. Same model, same task, different result. This is the fundamental problem with AI coding today: pass@1 — the chance a single attempt succeeds — is a gamble. Running the same task multiple times and picking the best result dramatically improves reliability. It's the same principle behind ensemble methods in ML — and recent research confirms it works for code generation too, though it warns that naive consensus can amplify shared mistakes. Selection method matters as much as ensemble size. We built thinktank to make this practical — thinktank currently uses a single model (Claude), so test execution is the primary quality signal, not consensus. One command thinktank run "fix the authentication bypass" -n 5 -t "npm test" Under the hood: N isolated git clones — each agent gets a fully independent copy of your repo N parallel Claude Code