The Consistency Math Kernel™ is evaluated on the public DeepMind Mathematics dataset spanning arithmetic, algebra, calculus, probability, comparison, and structured reasoning tasks.
Unlike probabilistic models that must produce an answer for every prompt, the Math Kernel returns a result only when correctness can be formally certified. When certification cannot be established under bounded deterministic constraints, the system is not certified by design.
- 560,000 problems evaluated
- 322,694 scored.
- 322,694 certified correct (57.624% coverage)
- Incorrect: 0
- Coverage: 57.6239%
- Runtime: 148.24 seconds
- Average time per problem: 0.2647 ms
The Consistency Math Kernel™ is evaluated on the geometry subset of the Hendrycks MATH benchmark to measure deterministic, fail-closed solving under formal verification constraints.
The kernel returns an answer only when correctness can be certified under bounded deterministic rules; otherwise the result is NOT CERTIFIED by design.
- 479 problems evaluated
- 257 answered
- 257 certified correct
- Incorrect: 0
- Not certified (abstain): 222
- Coverage: 53.65%
- Runtime: 0.684 seconds
- Average time per problem: 1.428 ms
The Consistency Math Kernel™ is evaluated on AsyMOB — a structurally adversarial symbolic benchmark designed to stress-test algebraic manipulation, trigonometric identities, limits, series expansion, and expression equivalence under obfuscation.
These problems are intentionally constructed to challenge pattern recognition systems by altering structure without changing semantic meaning. The kernel releases an answer only when correctness can be certified under deterministic constraints.
- 17,000+ problems evaluated
- Coverage expansion and validation in progress
- 0 incorrect results among answered outputs
- Remaining cases NOT CERTIFIED by design
The Consistency Logic Kernel™ certifies whether a proposed conclusion logically follows from a set of premises under propositional reasoning.
The system does not score likelihood or assign confidence. It returns a decision only when correctness can be established under strict deterministic rules. When certification cannot be established, the system is not certified.
- 1,696 problems evaluated (1,696 / 1,696)
- 0 incorrect results
- Deterministic decisioning suitable for audit-oriented reasoning pipelines
The Logic Kernel is evaluated on SATLIB / DIMACS CNF instances to validate binary SAT/UNSAT correctness under formal propositional constraints.
For each instance, the system determines whether a satisfying assignment exists (SAT) or whether no assignment exists (UNSAT). Correctness is enforced under a strict invariant: wrong must equal zero.
- 2,000 instances evaluated (1,000 SAT + 1,000 UNSAT)
- 0 incorrect results (SAT: 1000/1000; UNSAT: 1000/1000)
- Deterministic, reproducible SAT/UNSAT certification
Benchmarks show performance under controlled conditions. Developer Access lets you test real prompts, outputs, and fail-closed behavior in your own workflow. For production deployment options, review Licensing.


