Skip to content

Why Promptster doesn't try to detect AI usage

Detection is a category mistake. The arms race is unwinnable, the question is wrong, and locking down the environment costs you the candidates you actually want. Here's what we measure instead — and why we treat AI as the environment, not the enemy.

Paarth Jamdagneya
positioningai-collaborationhiring

The pitch I keep getting asked to give and refuse to give: "Promptster catches candidates cheating with AI."

We don't. We never will. The whole framing is wrong, and the companies selling it are quietly losing a war they can't admit they entered.

This post is about why detection is a category mistake — and what we measure instead.

The arms race is over and the defenders lost

Every cheating-detection technology in the last three years has failed within months of the next model release. GPTZero couldn't separate ChatGPT-3.5 from ChatGPT-4. Turnitin's AI detection crumbled against Claude. CodeSignal's "suspicion scores" — typing linearity, paste frequency, pause patterns — were trained on a pre-agentic world. Fast linear typing with frequent pastes is what good agentic coding looks like now. So the detector flags the cheater and the senior engineer indistinguishably.

This isn't a temporary problem you fix in v2. It's the structure of the race. Detection sits on the wrong side of an asymmetric matchup: the generator gets a year of compute scaling, a new model family, and unlimited fine-tuning data. The detector gets a quarterly point release.

Anyone selling you "AI cheating detection" today is selling you a 90-day SLA on a problem that gets harder every quarter. The contract terms are honest about this. The marketing isn't.

Even perfect detection wouldn't help you

Set aside the arms race. Pretend you could perfectly detect AI usage tomorrow — a magic flag that lights up when a candidate runs Claude, Copilot, Cursor, anything.

You would still have no signal worth hiring on.

Every engineer at your company already uses AI tools. CodeSignal's March 2026 survey found 91% of engineers use agentic AI at work and 75% shipped AI-generated production code last quarter. "Did the candidate use AI" has the same answer as "is the candidate breathing." Yes. So what.

The question your VP Eng is actually hiring against — whether they've formalized it or not — is how well do they use it. That question is invisible to any detector. A flag that says "AI usage: YES" tells you exactly nothing about whether the candidate would survive their first sprint at your company.

The real skill is orchestration, and you can only see it by watching

What separates a senior engineer from a junior in 2026 is not whether they touch AI. It's how.

Do they prompt with context, or with verbs? "Fix the bug" versus "the test on line 47 fails because the timezone offset is calculated against the wrong reference date — investigate getOffsetFromTimezone." Same prompt length, opposite outcomes.

Do they read the failing test before they ask Claude to fix it? Or hand the model the error message and accept whatever comes back?

When Claude proposes a fix, do they review it critically — catch the off-by-one, the missing edge case, the silent dependency change? Or accept-all and ship?

Do they plan-mode before letting the agent write to disk, or YOLO every diff?

When something breaks at 11pm in production six months from now, do they have any model of what they actually built — or did the model build it and they signed off?

These are observable behaviors. None of them are detectable. All of them are visible if you watch.

"Detect-and-block" actively makes hiring worse

Most of the assessment industry has spent two years building bigger walls. Locked browsers. Webcam proctoring. Paste detection. Process monitoring overlays.

The unspoken theory: if we make cheating harder, candidates will stop cheating, and the test will measure what it used to measure.

This doesn't work. It also actively makes hiring worse. Three specific costs:

Great candidates self-select out. Real engineers don't apply to companies that treat them as suspects before they've shaken hands. The locked browser doesn't just block cheaters — it broadcasts a culture signal. Senior engineers read it loud.

False positives reject real signal. A candidate types fast, pastes often, finishes early — they look exactly like the orchestration-fluent senior you want, and exactly like the candidate paste-bombing answers from Claude. Your detector can't tell them apart. So you reject both, or accept both. Either way you're hiring by coin flip with a privacy invasion attached.

The candidate-positive companies are eating your lunch. Shopify tells candidates to use whatever tools they want. Meta piloted AI-enabled coding interviews in October 2025. The companies setting the norms at the top of the market aren't trying to catch cheaters — they're trying to evaluate orchestration. While the incumbents are auditing keystrokes, the people you're competing with for talent are watching how candidates think.

What Promptster measures instead

We sit inside the candidate's real coding session. Their own laptop, their own Claude Code, a real OSS bug brief — date-fns, prisma, vitest, the kind of problem they'd actually be solving at your company. The candidate consents to capture before anything runs. We see prompts and tool calls. We don't see their webcam, their other tabs, or their keystrokes outside the editor.

What we score:

  • Plan-mode discipline. Did they let the agent loose on disk, or did they think first?
  • Read-before-write. Did they open the failing test before they asked for a fix? Did they read the surrounding code before changing it?
  • Prompt specificity. Are their prompts grounded in context, or generic verbs?
  • Rejection rate. How often did they push back on the model's first answer? When?
  • Decision rationale. At the moments where they chose one approach over another, can they articulate why?
  • Recovery from error. When something broke, did they form a hypothesis, probe, fix — or spin in circles re-asking the same question?

None of this is detection. There's no flag that lights up. There's just a record of the work, structured into signal a hiring manager can scrub through in twelve minutes. The verdict — strong hire, no hire, lean — has every supporting signal attached. You're not trusting a score we generated. You're reading the evidence.

What this looks like on the other side

On the day of the interview, you walk in with three questions auto-generated from the candidate's own decision rationale. Not generic competency probes. The candidate's own words quoted back to them: "You wrote that you didn't trust the model's first fix because the offset was calculated against the wrong reference. Walk me through what convinced you of that."

They can't deflect to ChatGPT in the call, because the question is about a decision they already made. The interview gets shorter and sharper. You stop running the same LeetCode round you've run for ten years and start having a real conversation about engineering judgment.

That's the unlock. Not catching cheaters. Hiring on signal you can defend.

The honest part

I built Promptster after interviewing at thirty-plus companies and watching every take-home become a test of whether I'd be the chump who didn't use AI. The market split into two groups: companies that detected and lost, and companies that embraced and won. I bet on the second group.

If your current screen can't tell paste from craft — and most can't, anymore — the answer is not a bigger detector. The detectors don't work, and they hurt the candidates you actually want to hire. The answer is to watch the work, score the thinking, and treat AI as the environment instead of the enemy.

We're not running an arms race. We're running a different game.

On the record · signed · replayable

Read the process,
not just the commit.

Twelve founding teams will ship this with us. If you hire 5+ engineers a year and your current technical screen can't tell paste from craft, we should talk.

Founding rate$499$299/molocked through 20281 of 12 claimed