.agents/skills/claw-score/SKILL.md
Use this skill when working on the OpenClaw maturity scorecard in this repo.
This is the openclaw-local version of the maintainer claw-score workflow:
it keeps the taxonomy and scorecard concepts, but excludes discrawl and the old
committed inventory/ report tree.
This skill owns the operational workflow for:
taxonomy.yamlqa/maturity-scores.yamldocs/concepts/qa-e2e-automation.mdqa/scenarios/index.yamlKeep person-specific, maintainer-private, Discord archive, and discrawl facts
out of this repo. If a score needs private evidence, use the redacted
qa-evidence.json artifact shape generated by OpenClaw QA workflows.
taxonomy.yaml is the hand-edited source of truth for surfaces, levels,
QA profiles, categories, feature coverage IDs, docs refs, LTS overrides, and
completeness-instruction paths.coverageIds are ANDed proof targets, not aliases. A feature may
list multiple IDs when each ID proves part of one capability.namespace.behavior form, with lowercase
alphanumeric/dash segments. Profile, surface, and category IDs may remain
dashed or dotted.qa/maturity-scores.yaml is the committed aggregate source for Quality,
Completeness, and LTS review state.extensions/qa-lab/src/scorecard-taxonomy.ts exports
qaMaturityScoresSchema and readValidatedQaMaturityScoreSources; use those
QA Lab utilities to validate score output.docs/maturity/scorecard.md and
docs/maturity/taxonomy.md; both come from pnpm maturity:render. Do not
hand-edit generated Markdown to change score results.qa-evidence.json artifacts provide per-run QA scorecard evidence. Release
profile artifacts are the source of truth for Coverage. They can enrich
generated artifact docs, but they are not committed as inventory.Run from the openclaw repo root.
Validate taxonomy YAML structure and the maturity score schema after source edits:
node --import tsx --input-type=module <<'NODE'
import fs from "node:fs";
import YAML from "yaml";
import { readValidatedQaMaturityScoreSources } from "./extensions/qa-lab/src/scorecard-taxonomy.ts";
for (const file of ["taxonomy.yaml", "qa/scenarios/index.yaml"]) {
YAML.parse(fs.readFileSync(file, "utf8"));
}
readValidatedQaMaturityScoreSources();
NODE
Check docs when touching docs prose:
pnpm check:docs
Run focused QA/profile checks when changing coverage IDs or profile membership:
pnpm openclaw qa coverage --json
When asked to score or refresh a surface:
taxonomy.yaml..agents/skills/claw-score/references/completeness/.qa-evidence.json artifacts for executed
proof.qa/maturity-scores.yaml only for Quality, Completeness, and LTS
review state backed by public or redacted artifact evidence.pnpm check:docs if docs prose changed, and focused QA coverage checks
if coverage IDs or profile membership changed.For subjective score changes, make the smallest defensible edit and leave the
evidence path in the PR or task summary. Keep manual prose in current docs and
keep score data in qa/maturity-scores.yaml.
Completeness is scored against the intended operator-visible workflow for each
category, not against test breadth or implementation quality. The completeness
reference files under references/completeness/ define the category scope and
any surface-specific variation from this default process.
By default, Completeness measures how fully OpenClaw exposes the intended surface capability set to the user, operator, author, or maintainer persona for that surface. Score whether each category delivers the full expected workflow, including setup, normal use, status or inspection, recovery, and important platform, provider, channel, security, or lifecycle variants where they apply.
Treat Surface-Specific Scoring Questions and Surface-Specific Guidance as
higher-priority instructions for that surface. The surface instructions may
flesh out, narrow, or intentionally conflict with the default ideas here; when
they do, follow the surface instructions and make the score rationale reflect
that surface-specific instruction. If a reference file does not include
surface-specific questions or guidance, apply this default process to the
surface's Category Scope.
For each category, ask:
Default guidance:
Default Completeness bands:
Clawesome (95-100): complete across expected workflows, variants, and
recovery branches, with only minor polish gaps.Stable (80-95): the expected workflow set is broadly present, with only
bounded missing branches.Beta (70-80): the main workflow exists, but meaningful branches or recovery
paths are still absent.Alpha (50-70): only a partial capability set is present; users can complete
some core tasks but not the full expected workflow.Experimental (0-50): the category exposes only fragments of the intended
capability.qa-evidence.json.scorecard feature fulfillment data.human_lts_override; do not hand-edit generated Markdown to change LTS
status.Bands:
Clawesome: 95-100Stable: 80-95Beta: 70-80Alpha: 50-70Experimental: 0-50Do not add the maintainer repo's docs/kevinslin/maturity-scorecard/inventory/
tree to openclaw. Evidence-enriched scorecard outputs belong in short-lived
artifacts, not committed generated docs, unless this repo adds an explicit
renderer/check workflow first.