Handbook

Benchmark harness (Sprint 0)

forge-lcdl ships a small offline-first benchmark layer under forge_lcdl.benchmarks. It measures whether catalog tasks complete end-to-end using either injected fake chat (default, no network) or an optional live…

Run the cheap-model baseline (offline)

From the repo root (with src on PYTHONPATH, or after pip install -e ".[dev]"):

PYTHONPATH=src python3 -m forge_lcdl.benchmarks.runner --suite cheap_model_baseline --out reports/baseline.json

The built-in suite cheap_model_baseline runs five catalog v1 tasks with deterministic fake completions:

llm_boolean_gate
llm_enum_route
extract_schema_from_text
decompose_problem
plan_decision_pack

No API keys or gateway URLs are required for this default path.

`lcdl_dogfood_alpha` (offline library probes)

Suite id lcdl_dogfood_alpha runs six cases with case_mode="dogfood": they call forge-lcdl APIs on this checkout (contract JSON, ContractSpec load, context pack, patch-unit planning, failure classification and repair reduction, proof report serialization). There is no fake-chat registration and no run_task / LLM transport for these cases.

PYTHONPATH=src python3 -m forge_lcdl.benchmarks.runner --suite lcdl_dogfood_alpha --out reports/dogfood.json

Maintainers: see DOGFOODING.md and optional env FORGE_LCDL_DOGFOOD_ROOT for nonstandard layouts.

Report shape

The JSON file includes suite-level metadata (suite_id, suite_name, started_at_utc, finished_at_utc, config_snapshot, passed, failed, total_elapsed_seconds) and a case_results array. Each case records case_id, task_id, contract_version, ok, error, attempts, elapsed_seconds, model_id, verification_status, output_summary, and optional bounded raw_output.

Keys are sorted and timestamps use UTC ISO-8601 with a Z suffix where applicable.

Live runs (`--live`)

To exercise the real transport stack (network + credentials):

PYTHONPATH=src python3 -m forge_lcdl.benchmarks.runner --suite cheap_model_baseline --out reports/baseline-live.json --live

Profile resolution uses :func:forge_lcdl.env.read_certificator_profile (LLM_BASE_URL, LLM_API_KEY or OPENAI_API_KEY, etc.). The :func:forge_lcdl.env.read_taxonomy_profile helper is an alias with kind="taxonomy" and reads the same variables.

Integration tests use the same credential style; they additionally load an optional env file when FORGE_LCDL_GRANITE_ENV_FILE is set or a workbench path exists (see tests/integration/conftest.py). For benchmarks, set the same variables in your environment (or export from a local env file yourself) before --live. Do not commit .env files.

If base URL or API key is missing, the CLI exits with a non-zero status and points you back to this document.

Programmatic use

from forge_lcdl.benchmarks import get_suite, run_benchmark_suite, BenchmarkRunConfig, write_suite_result_json

suite = get_suite("cheap_model_baseline")
result = run_benchmark_suite(suite, BenchmarkRunConfig(live=False))
write_suite_result_json("reports/out.json", result)

Listing suites

from forge_lcdl.benchmarks import list_suite_ids

print(list_suite_ids())

Built-in suites include cheap_model_baseline and lcdl_dogfood_alpha; registration lives in forge_lcdl.benchmarks.suites.

forge-lcdl