Skip to content
Glossary

Agentic coding assessment — definition, examples, and why it matters in 2026

An agentic coding assessment evaluates a developer by watching them work with an AI agent like Claude Code on a real task. Definition, examples, scoring rubric, and how it differs from traditional coding interviews.

Also known asagentic coding interviewAI agent coding interviewClaude Code assessmentagentic developer evaluationAI-collaboration interview

Definition

An agentic coding assessment is a technical evaluation where the candidate works on a real coding task using their own AI coding agent — Claude Code today — and the platform captures the full session as process telemetry so reviewers can grade how the candidate orchestrated the agent, not just whether the final code passed tests. The signal is the candidate's workflow with the AI: prompts, diffs, commands, decisions, pushback. The final pull request is corroborating evidence, not the primary score.

Why the term exists / why now

"Coding assessment" used to mean one thing: a problem statement, a code editor, a clock, and a test runner. The candidate wrote a solution alone and the platform graded the final diff against a test suite. That model worked for two decades because writing code by yourself was the bottleneck the job actually had.

The bottleneck moved. 85% of developers now use AI coding tools daily. Meta piloted AI-enabled coding interviews in October 2025. Shopify tells candidates to use whatever tools they want. Claude Code finishes a typical take-home in fifteen minutes for forty cents of API credit. The thing the old assessment measured stopped predicting the thing teams cared about. An assessment that asks a senior engineer to solve the problem without their AI in 2026 is filtering for a skill that no longer correlates with on-the-job performance.

Agentic coding assessment is the term for what replaced it. Same shape — a problem, a candidate, a fixed time window — but the candidate is expected to use their agent, and the platform's job is to capture how they used it. The signal that matters is no longer "did the tests pass" (Claude Code can usually make them pass). The signal is everything around the code: how the candidate scoped the problem, where they pushed back on the agent, how they sequenced tool calls, whether they noticed when the model was wrong.

What an agentic coding assessment is NOT

  • Not a banned-AI coding test. Banning the AI in 2026 measures whether the candidate refused to use the tool their future coworkers use every day. It is the wrong signal.
  • Not an AI-conducted interview. Tools like Alex (formerly Apriora) have an AI conduct a conversation with the candidate. An agentic coding assessment is the opposite: the candidate works with their own AI on a coding task, and the assessment captures the work.
  • Not a screen recording with a chat pane bolted on. Some incumbents ship "AI assessment" products that consist of a hosted browser IDE + a built-in chat box + a video playback. That doesn't reach the candidate's real Claude Code session and produces a movie, not a queryable event log.
  • Not a "build a small app and submit a zip" take-home with a permission slip to use AI. Without process-telemetry capture, the final code is still all the reviewer sees, and the final code still tells you almost nothing in 2026.
  • Not a typing-speed or paste-frequency test. Keystroke heuristics designed to detect cheating in a pre-AI world flag good agentic coding as suspicious. An agentic assessment does not run on those signals.

How agentic coding assessments work in practice

A typical Promptster session looks like this:

  1. The recruiter sends the candidate an assessment link with a session ID (PST-XXXX).
  2. The candidate runs promptster start PST-XXXX on their own laptop. The CLI points Claude Code's ANTHROPIC_BASE_URL at a Promptster proxy and ANTHROPIC_API_KEY at a per-session token, and installs hooks for PreToolUse, PostToolUse, UserPromptSubmit, Stop, and Notification.
  3. The candidate accepts a consent screen that lists every event type captured and every type explicitly not captured.
  4. The candidate works on the problem inside their own Claude Code session, with their own dotfiles, MCP servers, and model preferences. They prompt, the model responds, files change, commands run, decisions get made.
  5. Every event flows into one timeline, source-tagged. Conversation events come from the proxy. Tool, file, and command events come from the hooks. Each event is signed locally with a per-session Ed25519 key — any later edit breaks the chain.
  6. When the session ends, the reviewer opens the dashboard. They see the timeline, the orchestration score with its six-factor breakdown, the line-level attribution between human and AI authorship, and a copy-paste-ready hiring brief drawn from the session evidence.

The scoring rubric is published and auditable. Six factors, each weighted, each linked to the replay timestamps that moved the score:

FactorWeightWhat it measures
Scoping before writing22%Did the candidate think the problem through before they prompted?
Tradeoff articulation18%Did they explain their tradeoffs out loud?
Adversarial prompting18%Did they pressure-test what the AI suggested?
Self-correction rate16%Did they catch their own mistakes?
Edge-case ownership14%Did they think through edge cases?
Tool-call sequencing12%Did they use the AI's tools in the right order?

Reviewers can disagree with the weights. The model is not a black box.

How agentic coding assessments differ from traditional coding assessments

Traditional coding assessment. Browser-based IDE. Candidate writes code, often alone, sometimes with a built-in chat assistant the platform controls. Score is pass/fail on a test suite plus optional cheating heuristics (typing linearity, paste frequency, focus tracking). Reviewer reads the final diff.

Agentic coding assessment. Candidate's own environment. Candidate works with their AI agent. Capture surface is a typed event log of prompts, diffs, commands, decisions, tool calls — not keystrokes, not video. Score is an orchestration percentile with a published factor breakdown. Reviewer reads the timeline and the brief.

The shorter version: a traditional assessment grades the artifact. An agentic assessment grades the process. The two answer different questions, and in 2026 only the second one predicts on-the-job performance for senior engineers.

Common questions

Are candidates allowed to use AI during an agentic coding assessment? Yes — that's the entire point. The assessment is built to measure how they use it.

Which AI agents work today? Claude Code today. Codex and Cursor adapters arrive Q3 2026; Windsurf Q4. The capture model generalizes across agents.

Is an agentic coding assessment harder for the candidate than a traditional take-home? For a strong senior, it's easier and more natural — they use their normal tools in their normal environment. For someone who has never actually orchestrated an AI agent on real work, it's revealing in a way that a traditional take-home isn't. Both outcomes are correct.

How long does an agentic coding assessment take? Typical sessions are 45–90 minutes. Reviewer time afterwards is 8 minutes on average for Promptster sessions, vs. 40+ minutes for a traditional take-home that has to be cloned and re-run locally to understand what the candidate was thinking.

Does the candidate have to install something? Yes — a single CLI that sets up the Claude Code hooks and points the agent at a per-session proxy. The install is scoped to the session. Candidates remove it with one command at the end.

Can candidates fake their way through an agentic coding assessment? The contradictions show up. A prompt history that doesn't match the final code, code that appears without a corresponding prompt or edit, a "decision" with no preceding reasoning — all of these surface in the timeline. The cheating signal in 2026 is contradiction in the record, not typing speed.

How is this different from an AI-conducted interview like Alex (formerly Apriora)? Different category. Alex's AI talks to the candidate; an agentic coding assessment watches the candidate work with their own AI. Alex's signal is what the candidate says about their work. An agentic assessment's signal is what the candidate actually did. Both can sit in the same hiring loop.

Related terms

Sources

On the record · signed · replayable

Read the process,
not just the commit.

Twelve founding teams will ship this with us. If you hire 5+ engineers a year and your current take-home can't tell paste from craft, we should talk.

Founding seat$499$199/seat/molocked through 20281 of 12 claimed
Claude Code todayCodex + Cursor adapters next