Weight: 10% of overall score · How the overall score is calculated
Definition
Task Breadth measures how wide a range of task types you delegate to Claude. An engineer who uses Claude exclusively for one type of work — say, debugging — is not a low-maturity user, but they are leaving significant value on the table. Full-stack AI collaboration requires trusting Claude across multiple domains, including ones outside your primary expertise.
This dimension rewards coverage, not volume.
How it's measured
Each session is classified into one of nine task types:
debuggingfeature_implementationrefactoringcode_understandingdesign_planningdata_sciencefront_endpapercut_fixother
The other category is excluded from the breadth count. The score is based on the number of distinct task types present across all your sessions (excluding other), scaled to 1–10:
score = min(10, (distinct_types / 8) × 10)
| Distinct types seen | Score |
|---|---|
| 8 | 10 |
| 6–7 | 7.5–8.75 |
| 4–5 | 5–6.25 |
| 2–3 | 2.5–3.75 |
| 1 | 1.25 |
What high vs low looks like
High (score 7–10)
- Using Claude for debugging one day, front-end implementation the next, data analysis the next
- Reaching outside your primary domain — a backend engineer using Claude to handle React work; a researcher using it for data science pipelines
- Sessions span at least 5–6 distinct task type categories
Low (score 1–3)
- All sessions are the same type — only debugging, or only code understanding
- Never delegating tasks outside your area of primary expertise
- Using Claude as a narrow productivity tool rather than a domain expander
Behavioural patterns in real sessions
Anthropic's work study found a consistent pattern across teams: engineers who used Claude most broadly reported the largest qualitative changes in how they worked.
The study noted that engineers were becoming "more full-stack" — a front-end engineer taking on data science tasks, a researcher handling deployment scripts — specifically because Claude reduced the cost of working in unfamiliar domains.
Team-specific data illustrates how task type distribution varies by role:
| Team | Dominant task type | Share |
|---|---|---|
| Pre-training | Feature building | 54.6% |
| Security | Code understanding | 48.9% |
| Alignment & Safety | Front-end development | 7.5% |
| Non-technical staff | Debugging | 51.5% |
High breadth does not mean uniform distribution — it means deliberate extension beyond your default. The non-technical staff cohort scoring 51.5% on debugging reflects a legitimate use pattern, but a low breadth score would reflect that no other task types were attempted.
The most commonly used task types across the full cohort were debugging (55% daily use), code understanding (42%), and new features (37%). Engineers who added data science, front-end, and design planning to this base showed the highest breadth scores.
How it affects your overall score
Task Breadth carries 10% of your total score.
A one-point improvement in this dimension adds 0.10 points to your overall score.
This is the most straightforward dimension to improve deliberately: intentionally trying Claude on one new task type in your next session will move this score. It is also the fastest to plateau — once you reach 6–7 distinct types, the marginal gain from adding more is small.
It interacts with Complexity Progression (broader task types expose you to higher-complexity work) and New Work Generation (unfamiliar domain tasks are more likely to generate net-new work).