Detection vs. Proof: Why a Probability Score Isn't Enough When a Candidate Disputes It
Imagine the moment that actually matters. You've flagged a candidate for using AI assistance during an interview. They push back — they say they didn't, they're upset, maybe they imply legal action. Your recruiting lead asks you: how do we know?
The answer you can give in that moment depends entirely on what kind of tool you used. And it exposes a distinction most hiring teams don't think about until they're in exactly that situation: the difference between detection that infers and detection that proves.
Two fundamentally different kinds of answer
Most interview-integrity tools on the market produce a probability. They analyze behavior — response timing, gaze patterns, language style, typing rhythm — and output a likelihood that the candidate was assisted. Some are quite good at it. But what they give you is a statistical judgment: this performance looks assisted.
That's genuinely useful as an early-warning signal. It's far weaker as the thing you stand behind when challenged. A probability invites the obvious response: your software guessed, and it guessed wrong. And because these systems work by inference, they have a false-positive rate by definition — even a strong one is wrong some meaningful fraction of the time. When an honest candidate gets caught in that fraction, an inferred score is a hard thing to defend, and a worse thing to make a hiring decision on.
The other kind of answer is proof: deterministic evidence that a specific tool or condition was actually present on the machine during the interview, captured with a timestamp and recorded so it can be independently verified later. Not "this looks assisted" — "this tool was running at this moment, and here is the record."
Why this distinction gets sharper over time, not softer
As AI-assisted interviewing becomes more common, disputes will too. So will scrutiny — from candidates, from HR, from legal, and increasingly from regulators paying attention to how automated tools are used in hiring. In that environment, the standard for acting on a finding rises. A probability score that was a fine internal signal becomes a liability when it's the basis for rejecting a candidate who then challenges it.
There's also a fairness dimension that runs in the same direction. A tool that produces probabilistic judgments will sometimes be wrong about honest people, and the people most likely to be miscaught by behavioral inference are often those whose behavior is atypical for reasons that have nothing to do with cheating. Proof-based detection sidesteps this: it isn't judging whether someone seems like they're cheating, it's recording whether a tool was present.
What to ask before you trust a finding
If you're choosing how to protect your interviews, the question that cuts to the core is simple: when a candidate disputes a flag, what can this tool actually show me?
If the answer is a confidence percentage, you have a tripwire — valuable, but not something to build a defensible decision on alone. If the answer is timestamped, verifiable evidence of what was present, you have something that holds up.
Detection tells you to look closer. Proof tells you what was there. For a decision as consequential as a hire — and as exposed to challenge as one made on integrity grounds — the difference is not academic.
Capifiq produces deterministic, timestamped, independently verifiable evidence — not a probability score. See the difference on five free interviews.