Handbook
Context packs (`forge_lcdl.context`)
Cheap models need smaller, relevant context, not whole-repository dumps. build_context_pack turns a task description and a repository path into a bounded, ordered ContextPack: scanned text files, keyword-ranked with…
Context packs (forge_lcdl.context)
Purpose
Cheap models need smaller, relevant context, not whole-repository dumps. build_context_pack turns a task description and a repository path into a bounded, ordered ContextPack: scanned text files, keyword-ranked with fixed boosts, trimmed to a UTF-8 byte budget, with reasons and provenance. No embeddings, no external search, and no network in this layer.
Flow
- Scan (
context/repo_scan.py): walk the tree safely; skip denylisted directories; do not read obvious secret filenames; read a capped UTF-8 prefix per file; skip binary or invalid UTF-8. - Rank (
context/rank.py): deterministic keyword overlap on path and first line of preview; boosts forsrc/forge_lcdl,tests/when the task mentions tests or verification,contracts/when the task mentions contracts ortask_id. - Pack / trim (
context/pack.py,context/trim.py): emitContextItemrows (file vs excerpt), then cap totalcontentUTF-8 bytes usingtruncate_utf8_bytes.
Budget
The budget_chars argument to build_context_pack is the maximum total UTF-8 byte length of all item content strings (same unit as truncate_utf8_bytes). The requested value is stored on the pack as token_or_char_budget; actual_content_utf8_bytes records the final total after trimming.
Skips and safety
Directories not descended: among others, .git, __pycache__, .venv, venv, dist, build, node_modules, .tox, reports, .pytest_cache, .mypy_cache, and *.egg-info directories.
Paths not read (content never loaded): examples include .env, *.env, names matching *secret*, id_rsa, *.pem, credentials.json, forge-certificator-secrets.env. These appear in excluded_files as (path, reason) pairs.
Per-file preview cap: MAX_PREVIEW_UTF8_BYTES (128 KiB) — large files contribute an excerpt only; the reason string may note preview_capped_bytes.
Non-text files are listed in `warnings (capped list length) and are not ranked into the pack.
Usage
From the repo root, with src on PYTHONPATH (or after pip install -e ".[dev]"):
from pathlib import Path
from forge_lcdl.context import build_context_pack, context_pack_to_dict
pack = build_context_pack(
"Update tests for task_id pw_page_kind_route",
Path("/path/to/forge-lcdl"),
budget_chars=40_000,
)
d = context_pack_to_dict(pack) # JSON-serializable dict
Optional now_iso=lambda: "…" fixes provenance["built_at"] in tests.
Serialization
Use context_pack_to_dict (and context_item_to_dict) for JSON-friendly structures: enums as strings, tuples as lists, excluded_files as {"path", "reason"} objects.
Composition with messages
messages.py defines LlmMessage / FileRef for transports. A future bridge can turn a ContextPack into a system or user string; Sprint 4 stops at the structured pack.
Model routing and contracts: MODEL-ROUTING.md, CONTRACT-SPEC.md.
Limits
Ranking is keyword-only (deterministic, no semantic search). For semantically richer retrieval, a later sprint might add embeddings while keeping the same ContextPack shape.