Baseline
missions
How it works
Missions
Start
Run the baseline
01
Which runner do you use most often?
Claude Code
Codex
Gemini CLI
Browser only
Other
02
What do you use your agent for most?
Code edits
Refactors
Research
Automation
Mixed
03
Where does it fail most often?
□
Loses context
□
Weak plans
□
Weak tool use
□
Weak review
□
Unreliable output
□
Not sure yet
Your answers shape the first mission.
Generate result