Handbook

Contributing to forge-lcdl

This repository is private. Install and layout are summarized in README.md. This document states maintainer policy for question-source and related ingestion work: allowed scope, what not to add, how LLM tasks relate to…

This is operational guidance for contributors, not legal advice.

Authorized scope

Use forge-lcdl and dependent pipelines only for:

Authorized practice exams and owned training content.
Internal QA and question-bank ingestion where you have explicit rights to the material and automation.

Design features and documentation around those cases.

Out of scope — do not implement or document as goals

Do not add code, prompts, contracts, or examples whose purpose is to:

Bypass proctored exams, paywalls, login protections, CAPTCHA, rate limits, or anti-bot measures.
Steal or replay credentials, sessions, cookies, or tokens.
Exfiltrate secrets or log sensitive material such as API keys, cookies, tokens, raw authorization headers, or full .env contents.

Logging and error helpers should remain safe-by-default; redact or summarize where network or auth details might appear.

Design defaults: deterministic first, LLM bounded

Prefer deterministic extraction, parsing, validation, and checks before spending tokens on ambiguous spans.
Use governed LLM tasks for discovery, routing, mechanics inference, repair (e.g. incremental diagnose), or benchmark modes — always with explicit JSON contracts and capped fields.
Keep one run_task call per contract; merge and validate outputs in code after each step.

The staged playbook for DOM/PDF convergence is described in docs/EXTRACTION-CONVERGENCE.md.

Layering: LCDL vs runtime / consumers

forge-lcdl defines tasks, parsing helpers, operators, and injectable chat transport — not long-lived browser sessions.
Playwright lifecycle (launching browsers, navigation loops, executing generated extractors against live pages) belongs in the source-ingest or runtime layer (for example consumers or sibling packages such as forge-lcdl-runtime), unless an existing repo convention explicitly documents an exception.

LCDL should infer or repair mechanics (e.g. synthesize extractor shapes, classify chunks); the outer pipeline executes them and compares results.

Implementation norms

Changes should be incremental, typed, and covered by tests.
Unit tests must use injectable / fake transports (for example TaskRunner(chat=fake_chat) as in README.md), not live LLM calls or live websites. Reserve optional integration tests for gated environments (see README Live Granite tests).
Prefer small pure functions, dataclasses, protocols, and the existing Result / Ok / Err style (src/forge_lcdl/result.py) where it already fits the surrounding module.
Inspect the real tree and tests before editing; extend existing modules instead of duplicating them when a suitable home already exists.
Preserve existing public APIs and behavior unless a change request explicitly describes a deliberate migration (then document the migration path).

Contracts and tasks

New or revised tasks need a Markdown contract under src/forge_lcdl/contracts/<task_id>/<version>/contract.md, registration in runner.py or tasks/catalog_v1.py as appropriate, and unit tests with fake chat.

README.md — quick use, injectable transport example, layout table.
docs/EXTRACTION-CONVERGENCE.md — staged LLM plus deterministic convergence.