Handbook
Contributing to forge-lcdl
This repository is private. Install and layout are summarized in README.md. This document states maintainer policy for question-source and related ingestion work: allowed scope, what not to add, how LLM tasks relate to…
This is operational guidance for contributors, not legal advice.
Authorized scope
Use forge-lcdl and dependent pipelines only for:
- Authorized practice exams and owned training content.
- Internal QA and question-bank ingestion where you have explicit rights to the material and automation.
Design features and documentation around those cases.
Out of scope — do not implement or document as goals
Do not add code, prompts, contracts, or examples whose purpose is to:
- Bypass proctored exams, paywalls, login protections, CAPTCHA, rate limits, or anti-bot measures.
- Steal or replay credentials, sessions, cookies, or tokens.
- Exfiltrate secrets or log sensitive material such as API keys, cookies, tokens, raw authorization headers, or full
.envcontents.
Logging and error helpers should remain safe-by-default; redact or summarize where network or auth details might appear.
Design defaults: deterministic first, LLM bounded
- Prefer deterministic extraction, parsing, validation, and checks before spending tokens on ambiguous spans.
- Use governed LLM tasks for discovery, routing, mechanics inference, repair (e.g. incremental diagnose), or benchmark modes — always with explicit JSON contracts and capped fields.
- Keep one
run_taskcall per contract; merge and validate outputs in code after each step.
The staged playbook for DOM/PDF convergence is described in docs/EXTRACTION-CONVERGENCE.md.
Layering: LCDL vs runtime / consumers
- forge-lcdl defines tasks, parsing helpers, operators, and injectable chat transport — not long-lived browser sessions.
- Playwright lifecycle (launching browsers, navigation loops, executing generated extractors against live pages) belongs in the source-ingest or runtime layer (for example consumers or sibling packages such as forge-lcdl-runtime), unless an existing repo convention explicitly documents an exception.
LCDL should infer or repair mechanics (e.g. synthesize extractor shapes, classify chunks); the outer pipeline executes them and compares results.
Implementation norms
- Changes should be incremental, typed, and covered by tests.
- Unit tests must use injectable / fake transports (for example
TaskRunner(chat=fake_chat)as in README.md), not live LLM calls or live websites. Reserve optional integration tests for gated environments (see README Live Granite tests). - Prefer small pure functions, dataclasses, protocols, and the existing
Result/Ok/Errstyle (src/forge_lcdl/result.py) where it already fits the surrounding module. - Inspect the real tree and tests before editing; extend existing modules instead of duplicating them when a suitable home already exists.
- Preserve existing public APIs and behavior unless a change request explicitly describes a deliberate migration (then document the migration path).
Contracts and tasks
New or revised tasks need a Markdown contract under src/forge_lcdl/contracts/<task_id>/<version>/contract.md, registration in runner.py or tasks/catalog_v1.py as appropriate, and unit tests with fake chat.
Related docs
- README.md — quick use, injectable transport example, layout table.
- docs/EXTRACTION-CONVERGENCE.md — staged LLM plus deterministic convergence.