Handbook
Benchmark harness (Sprint 0)
forge-lcdl ships a small offline-first benchmark layer under forge_lcdl.benchmarks. It measures whether catalog tasks complete end-to-end using either injected fake chat (default, no network) or an optional live…
Run the cheap-model baseline (offline)
From the repo root (with src on PYTHONPATH, or after pip install -e ".[dev]"):
PYTHONPATH=src python3 -m forge_lcdl.benchmarks.runner --suite cheap_model_baseline --out reports/baseline.json
The built-in suite cheap_model_baseline runs five catalog v1 tasks with deterministic fake completions:
llm_boolean_gatellm_enum_routeextract_schema_from_textdecompose_problemplan_decision_pack
No API keys or gateway URLs are required for this default path.
lcdl_dogfood_alpha (offline library probes)
Suite id lcdl_dogfood_alpha runs six cases with case_mode="dogfood": they call forge-lcdl APIs on this checkout (contract JSON, ContractSpec load, context pack, patch-unit planning, failure classification and repair reduction, proof report serialization). There is no fake-chat registration and no run_task / LLM transport for these cases.
PYTHONPATH=src python3 -m forge_lcdl.benchmarks.runner --suite lcdl_dogfood_alpha --out reports/dogfood.json
Maintainers: see DOGFOODING.md and optional env FORGE_LCDL_DOGFOOD_ROOT for nonstandard layouts.
Report shape
The JSON file includes suite-level metadata (suite_id, suite_name, started_at_utc, finished_at_utc, config_snapshot, passed, failed, total_elapsed_seconds) and a case_results array. Each case records case_id, task_id, contract_version, ok, error, attempts, elapsed_seconds, model_id, verification_status, output_summary, and optional bounded raw_output.
Keys are sorted and timestamps use UTC ISO-8601 with a Z suffix where applicable.
Live runs (--live)
To exercise the real transport stack (network + credentials):
PYTHONPATH=src python3 -m forge_lcdl.benchmarks.runner --suite cheap_model_baseline --out reports/baseline-live.json --live
Profile resolution uses :func:forge_lcdl.env.read_certificator_profile (LLM_BASE_URL, LLM_API_KEY or OPENAI_API_KEY, etc.). The :func:forge_lcdl.env.read_taxonomy_profile helper is an alias with kind="taxonomy" and reads the same variables.
Integration tests use the same credential style; they additionally load an optional env file when FORGE_LCDL_GRANITE_ENV_FILE is set or a workbench path exists (see tests/integration/conftest.py). For benchmarks, set the same variables in your environment (or export from a local env file yourself) before --live. Do not commit .env files.
If base URL or API key is missing, the CLI exits with a non-zero status and points you back to this document.
Programmatic use
from forge_lcdl.benchmarks import get_suite, run_benchmark_suite, BenchmarkRunConfig, write_suite_result_json
suite = get_suite("cheap_model_baseline")
result = run_benchmark_suite(suite, BenchmarkRunConfig(live=False))
write_suite_result_json("reports/out.json", result)
Listing suites
from forge_lcdl.benchmarks import list_suite_ids
print(list_suite_ids())
Built-in suites include cheap_model_baseline and lcdl_dogfood_alpha; registration lives in forge_lcdl.benchmarks.suites.