Skip to Content
Reference ImplementationsAPL/AI-EvalOverview

APL/AI-Eval

Reference implementation of the APL/AI-Eval v1.0 vertical profile.

The AI-Eval profile is the first normative vertical profile on top of APL Core. It addresses the canonical narrative use case (AI evaluation observations) by tightening APL Core in ways that are specific to benchmark-style claims.

What it provides

  • AI-Eval–specific frame family (harness, grader, dataset version, decoding parameters)
  • Subject pinning by artifact_digest for model-build identity
  • Bridge semantics for cross-harness and cross-grader comparisons
  • Profile-specific diagnostic codes in the apl-ai-eval-... namespace extending the APL Core set
  • A zero-argument register() helper that registers the AI-Eval diagnostic codes into the APL-Core whitelist at application startup (it does not wire the profile into a verifier — the profile is passed explicitly via verify_receipt / evaluate_relation)

What it does NOT provide

  • Running the benchmarks (no harness, no execution engine)
  • Ranking, leaderboards, or cross-model synthesis
  • A registry — APL/AI-Eval is content-addressed like every other frame family

Relationship to APL Core

apl-ai-eval ships as a workspace member of apl-core. It is strictly additive:

  • It MUST NOT relax APL Core requirements on directionality, frame match, or scope match.
  • It MAY tighten bridge applicability (e.g. require matching scope under stricter rules).
  • It MAY add profile-specific diagnostics in the apl-ai-eval-... prefix.

Source code

Last updated on