10%of overall score

New Work Generation

Measures what proportion of your sessions involve tasks that wouldn't have been done without AI — genuine economic surplus, not just faster delivery of existing work.

Weight: 10% of overall score · How the overall score is calculated


Definition

New Work Generation measures the proportion of your Claude Code sessions that involve tasks that would not have been done at all without AI assistance. This is the difference between AI as a productivity tool (doing existing work faster) and AI as a capability multiplier (enabling work that previously wasn't feasible).

A high score here indicates that Claude is expanding your output, not just accelerating it.


How it's measured

Each session is classified by an LLM reviewer with a single boolean flag:

is_new_worktrue if the task described in the session would not realistically have been completed without AI assistance. This includes:

  • Tasks the engineer lacked the domain knowledge to do alone
  • Tasks too small or low-priority to justify the time without AI
  • Exploratory work that would have been deprioritized
  • Throwaway tooling or one-off scripts that have value but aren't worth manual effort

The proportion of sessions flagged as new work is mapped to a 1–10 score:

New work proportionScore
≥ 40%10
≥ 27%7–10 (linear interpolation)
> 0%1–7 (linear interpolation)
0%1

What high vs low looks like

High (score 7–10)

  • Building a quick internal tool that would have been deprioritized indefinitely without Claude
  • Exploring an unfamiliar codebase or library that would have taken days to understand alone
  • Writing one-off data transformation scripts that previously went undone
  • Taking on front-end or infrastructure work outside primary expertise

Low (score 1–3)

  • Every session involves work that was already planned and would have been done manually
  • Claude is used to do existing tasks faster, but not to do tasks that otherwise wouldn't happen
  • No exploratory, throwaway, or unfamiliar-domain sessions

Behavioural patterns in real sessions

Anthropic's work study measured this directly. Across its internal engineering cohort, 27% of Claude-assisted work consisted of tasks that would not otherwise have been completed — work that existed only because Claude made it feasible.

The study also found that 44% of Claude-assisted work involved tasks employees wouldn't enjoy doing manually — a closely related signal. Unenjoyable, low-priority, and high-friction tasks are precisely the category where AI makes the difference between done and undone.

The research explicitly frames this as economic surplus: AI is not just redistributing existing work across fewer hours, but creating net-new output that was previously inaccessible.

Individual qualitative interviews reinforced this pattern. Engineers described using Claude to cross into unfamiliar domains — a backend engineer shipping a React component, a researcher writing a deployment script — work they would not have attempted without the safety net of AI assistance.

The 27% figure is the benchmark this dimension is calibrated against. Reaching or exceeding it puts you at the upper range of the Anthropic cohort.

A specific sub-category worth tracking: papercut fixes accounted for 8.6% of all sessions in the cohort. Papercuts — small improvements that would never make a sprint backlog without AI — are a reliable, measurable form of new work generation. High papercut rates correlate strongly with high scores on this dimension.


How it affects your overall score

New Work Generation carries 10% of your total score.

A one-point improvement in this dimension adds 0.10 points to your overall score.

This dimension carries a lower weight not because net-new work is unimportant, but because it varies significantly by role, team, and project phase. An engineer in a high-velocity feature sprint will naturally generate less new work than one in an exploratory phase. The weight reflects this variability.

It interacts with Task Breadth (unfamiliar domain sessions are the most common source of new work) and Delegation Intelligence (new work tasks are often appropriately scoped for Claude — self-contained, verifiable, exploratory).

Analyze your sessions →

All 6 dimensions — Claude Code Maturity Score

Score your own sessions

See your New Work Generation score alongside all 6 dimensions.

Score Your Sessions