- Surveys / self-report
- Perception — see the METR chart above.
- AI upskilling platforms (Section et al.)
- General AI literacy — quizzes and prompt exercises for knowledge workers, not engineers in real codebases.
- DevEx dashboards
- Aggregate output — DORA, throughput, cycle time.
- Promptster
- Observed behavior in real work sessions.

Your whole team has Claude. Who's actually good with it?
Promptster records real work sessions and scores every engineer's AI fluency — who's strong, who needs training, and on exactly what.
Works where your engineers already work.
- task briefsthe spec they hand the agent
- transcriptscourse-correction when it drifts
- file diffswhat ships vs gets reworked
- test runsproof it works, or a guess
scored: discovery · implementation · verification
- plans & promptsdo they scope before coding
- agent transcriptssteering vs rubber-stamping
- shell commandshow they recover when stuck
- file diffsaccepted as-is vs reworked
scored: discovery · implementation · verification
- agent chatsthe context they feed it
- inline editssurgical fixes vs full-file pastes
- terminalhow they debug when it breaks
- test runsverified or vibes
scored: discovery · implementation · verification
Claude Code, Codex, and Cursor are all fully supported.
Ask engineers how good they are with AI.
Then measure it.
Experienced devs believed AI made them faster. It measurably made them slower.
METR, 2025 — a randomized controlled trial with early-2025 toolsFelt+20%Measured−19%Employees who say the AI training they got was sufficient. The rest are improvising on a tool they use daily.
36%say training was sufficient64% improvising dailyBCG, 2025The real spread inside one team: engineers measurably slowed to 0.8× sit next to the 5× the case studies celebrate. Nobody can say who's where.
METR, 2025 + vendor case studies0.8×5×
Every chart above is self-report or an aggregate — none can name who on your team uses AI well.
The cost isn't the subscription.
It's the salary next to it.
A $200-a-month AI seat sits next to a $200,000 engineer. The subscription is around 1% of the cost of the person using it. Optimizing the seat is a procurement exercise; what the engineer does with it is a payroll-scale variable.
That 19% drag from the METR RCT? On one $200k engineer, that's roughly $38,000 a year in lost output, from someone who believes the tool is making them faster. Multiply by however many engineers on your team match that profile. You don't know the number. Neither do they.
And the drag is only half the math. Vendors' own case studies put a 5× engineer at the top of the curve, while METR measured a 0.8× one — both are probably on your payroll right now. Uneven fluency means you're paying for the ceiling and collecting the floor.
The return on fixing it is denominated in engineer-hours recovered and defects that never ship, not in subscription line items.
Everyone sells AI training. Nobody verifies it worked.
We're the verification layer.
Install.
One CLI instruments the AI tools your engineers already use — Claude Code, Codex, Cursor — scoped to the repos you choose. Work outside those repos is never touched. Engineers see the scope before anything runs.
Baseline.
Two weeks of real work sessions — prompts, file diffs, terminal commands, test runs. No homework, no simulation day. Each session is scored across discovery, implementation, and verification, rolled up per engineer.
Fix.
Per-engineer training prescriptions, not a slide deck for the whole org. One engineer rubber-stamps AI diffs; another never feeds the agent context. Different problems, different fixes. Then re-assess and prove the delta.
Every score links to a moment
you can replay.
Sessions open in an IDE-style replay: file tree, syntax-highlighted diffs, the full prompt timeline. When the report says an engineer skipped verification, you can scrub to the minute it happened — and so can they.
Nothing in your stack
watches the actual work.
Surveys, AI upskilling platforms, and DevEx dashboards all orbit the question and miss it completely. None of them can tell you which engineer needs what training — because none of them see the work itself.
| Dimension | Surveys / self-reportWhat you have | AI upskilling platforms (Section et al.)What you have | DevEx dashboardsWhat you have | PromptsterObserved sessions |
|---|---|---|---|---|
| What it measures | Perception — see the METR chart above. | General AI literacy — quizzes and prompt exercises for knowledge workers, not engineers in real codebases. | Aggregate output — DORA, throughput, cycle time. | Observed behavior in real work sessions. |
| Resolution | Team-level vibes, anonymized by design. | Per-person, but on generic exercises — not your stack, not your repos. | Team or repo aggregates. Can't name names. | Per-engineer, per-dimension: discovery, implementation, verification. |
| Tells you WHO needs help, and why | No. | Who scored low on a quiz — not who rubber-stamps AI output at 4pm. | No. A slow team average has no name attached. | Yes — with the session replay as evidence. |
| What you do next | Run another survey next quarter. | A course and a certificate, verified by another quiz. | Argue about the dashboard in a planning meeting. | A per-engineer training prescription. Re-assess on real work to prove the delta. |
- Surveys / self-report
- Team-level vibes, anonymized by design.
- AI upskilling platforms (Section et al.)
- Per-person, but on generic exercises — not your stack, not your repos.
- DevEx dashboards
- Team or repo aggregates. Can't name names.
- Promptster
- Per-engineer, per-dimension: discovery, implementation, verification.
- Surveys / self-report
- No.
- AI upskilling platforms (Section et al.)
- Who scored low on a quiz — not who rubber-stamps AI output at 4pm.
- DevEx dashboards
- No. A slow team average has no name attached.
- Promptster
- Yes — with the session replay as evidence.
- Surveys / self-report
- Run another survey next quarter.
- AI upskilling platforms (Section et al.)
- A course and a certificate, verified by another quiz.
- DevEx dashboards
- Argue about the dashboard in a planning meeting.
- Promptster
- A per-engineer training prescription. Re-assess on real work to prove the delta.
This is not
surveillance.
Tools that grade people in secret deserve the side-eye they get. This one shows its work — to the people being scored, first.
- Scoped. Capture is limited to the repos and workspaces your company chooses. Nothing outside them, ever.
- Transparent. Every engineer sees the exact capture manifest before anything runs. No accounts, no dashboard to babysit — they just keep working.
- Growth-oriented. The output is “here's the training that makes you faster” — a report built to be shared with each engineer, not held over them.
- prompts
- file diffs
- terminal commands
- test runs
- keystrokes
- screen recording
- clipboard
- webcam
- browser activity
- anything outside scoped repos
Book a 15-min walkthrough —
we'll replay a scored session live.
We're running pilot cohorts with a handful of teams. Bring a VP Eng or platform lead, we'll bring a real session and its score — you decide in 15 minutes whether the signal is worth a two-week baseline.
Short answers,
no marketing answers.
Which AI tools does this work with?
Claude Code, Codex, and Cursor — all three fully supported, capturing full work sessions: prompts, file diffs, terminal commands, test runs. Engineers keep whichever tool they already use; the scoring is the same across all of them. Mixed-tool teams are the norm, not a problem.How long does setup take?
About 30 minutes for your platform team: install one CLI, choose which repos are in scope, and engineers get notified with exactly what will be captured. Engineers never need an account or touch a dashboard — the platform is for you; they just keep working. The two-week baseline starts as soon as people work.What does the report actually look like?
A team-level scoreboard with per-engineer scores across discovery, implementation, and verification — who's strong, who needs training, on exactly what. Each score links to the sessions behind it, viewable in an IDE-style replay (file tree, syntax-highlighted diffs, prompt timeline). Per-engineer training prescriptions come with it, and a re-assessment after training shows the delta.How is the data handled?
Capture is limited to the repos you scoped — nothing outside them. We capture prompts, file diffs, terminal commands, and test runs; we never capture keystrokes, screen, clipboard, webcam, or browser activity. Every engineer sees that capture manifest before anything runs, and per-engineer reports are built to be shared with them. Dashboard access stays with the managers you invite. Data is retained for the engagement and deleted on request.What are the pilot terms?
We're running pilot cohorts with a limited number of teams. A pilot is a two-week baseline on repos you choose, the full team report, and a working session to walk through it. Terms and pricing are discussed on the walkthrough call — we'd rather show you a scored session first.