forge-lcdl

Contributing to forge-lcdl

This repository is private. Install and layout are summarized in README.md. This document states maintainer policy for question-source and related ingestion work: allowed scope, what not to add, how LLM tasks relate to…

This is operational guidance for contributors, not legal advice.

Authorized scope

Use forge-lcdl and dependent pipelines only for:

  • Authorized practice exams and owned training content.
  • Internal QA and question-bank ingestion where you have explicit rights to the material and automation.

Design features and documentation around those cases.

Out of scope — do not implement or document as goals

Do not add code, prompts, contracts, or examples whose purpose is to:

  • Bypass proctored exams, paywalls, login protections, CAPTCHA, rate limits, or anti-bot measures.
  • Steal or replay credentials, sessions, cookies, or tokens.
  • Exfiltrate secrets or log sensitive material such as API keys, cookies, tokens, raw authorization headers, or full .env contents.

Logging and error helpers should remain safe-by-default; redact or summarize where network or auth details might appear.

Design defaults: deterministic first, LLM bounded

  • Prefer deterministic extraction, parsing, validation, and checks before spending tokens on ambiguous spans.
  • Use governed LLM tasks for discovery, routing, mechanics inference, repair (e.g. incremental diagnose), or benchmark modes — always with explicit JSON contracts and capped fields.
  • Keep one run_task call per contract; merge and validate outputs in code after each step.

The staged playbook for DOM/PDF convergence is described in docs/EXTRACTION-CONVERGENCE.md.

Layering: LCDL vs runtime / consumers

  • forge-lcdl defines tasks, parsing helpers, operators, and injectable chat transport — not long-lived browser sessions.
  • Playwright lifecycle (launching browsers, navigation loops, executing generated extractors against live pages) belongs in the source-ingest or runtime layer (for example consumers or sibling packages such as forge-lcdl-runtime), unless an existing repo convention explicitly documents an exception.

LCDL should infer or repair mechanics (e.g. synthesize extractor shapes, classify chunks); the outer pipeline executes them and compares results.

Implementation norms

  • Changes should be incremental, typed, and covered by tests.
  • Unit tests must use injectable / fake transports (for example TaskRunner(chat=fake_chat) as in README.md), not live LLM calls or live websites. Reserve optional integration tests for gated environments (see README Live Granite tests).
  • Prefer small pure functions, dataclasses, protocols, and the existing Result / Ok / Err style (src/forge_lcdl/result.py) where it already fits the surrounding module.
  • Inspect the real tree and tests before editing; extend existing modules instead of duplicating them when a suitable home already exists.
  • Preserve existing public APIs and behavior unless a change request explicitly describes a deliberate migration (then document the migration path).

Contracts and tasks

New or revised tasks need a Markdown contract under src/forge_lcdl/contracts/<task_id>/<version>/contract.md, registration in runner.py or tasks/catalog_v1.py as appropriate, and unit tests with fake chat.