APL/AI-Eval
Reference implementation of the APL/AI-Eval v1.0 vertical profile.
The AI-Eval profile is the first normative vertical profile on top of APL Core. It addresses the canonical narrative use case (AI evaluation observations) by tightening APL Core in ways that are specific to benchmark-style claims.
What it provides
- AI-Eval–specific frame family (harness, grader, dataset version, decoding parameters)
- Subject pinning by
artifact_digestfor model-build identity - Bridge semantics for cross-harness and cross-grader comparisons
- Profile-specific diagnostic codes in the
apl-ai-eval-...namespace extending the APL Core set - A zero-argument
register()helper that registers the AI-Eval diagnostic codes into the APL-Core whitelist at application startup (it does not wire the profile into a verifier — the profile is passed explicitly viaverify_receipt/evaluate_relation)
What it does NOT provide
- Running the benchmarks (no harness, no execution engine)
- Ranking, leaderboards, or cross-model synthesis
- A registry —
APL/AI-Evalis content-addressed like every other frame family
Relationship to APL Core
apl-ai-eval ships as a workspace member of apl-core. It is strictly additive:
- It MUST NOT relax APL Core requirements on directionality, frame match, or scope match.
- It MAY tighten bridge applicability (e.g. require matching scope under stricter rules).
- It MAY add profile-specific diagnostics in the
apl-ai-eval-...prefix.
Source code
- Repository: github.com/evidentum-io/apl-core/tree/main/apl-ai-eval
- License: Apache-2.0
Last updated on