Handbook
Task `pw_static_mcq_mechanics_discover` v1
Given page_probe, chunks, classified_chunks, optional operator hints, and source_constraints, infer a structured mechanics specification for pages where questions, options, and answers are already visible in HTML/text…
Task pw_static_mcq_mechanics_discover v1
Summary
Given page_probe, chunks, classified_chunks, optional operator hints, and source_constraints, infer a structured mechanics specification for pages where questions, options, and answers are already visible in HTML/text (static blog / handbook style). This path avoids emitting arbitrary Python; generated extractor tasks (pw_extractor_synthesize_*) remain available as fallback.
Output uses schema_version page_mechanics.v1 (PAGE_MECHANICS_DOT_SCHEMA_VERSION in forge_lcdl.schemas.page_mechanics_v1). See also the checklist artifact page_mechanics/v1/contract.md.
Inputs
| Field | Type | Required | Notes |
|---|---|---|---|
url |
string | yes | Non-empty after strip. |
page_probe |
object | yes | Bounded probe; may be {}. |
chunks |
array | yes | Chunk list from deterministic segmentation (may be []). |
classified_chunks |
array | yes | Classifier output merged onto chunks (may be []). |
source_constraints |
object | yes | e.g. expected_option_count, expected_choice_mode; may be {}. |
operator_hints |
string | no | Short hints; omit or "". |
temperature |
number | no | Default 0.05. |
timeout_sec |
int | no | Default profile.timeout_sec. |
User JSON capped at 100000 UTF-8 bytes before the LLM call.
Output
JSON object with exactly these top-level keys:
| Key | Rule |
|---|---|
schema_version |
page_mechanics.v1 |
page_kind |
static_mcq_page |
confidence |
Number; coerced to [0.0, 1.0] |
question |
Object describing block/stem/options/correct-answer strategies and selectors (no Python). |
safety |
Object (e.g. forbid_freeform_js: true). |
notes |
String (may be ""). |
Forbidden fields
At any nesting level, object keys matching executable/code carriers are rejected (normalized):
extractor_python, script, code, eval, execute_js, javascript, python_source, generated_python, source_code, compiled_code, inline_script, and similar tokens in the validator.
Policy
- Output one JSON object only (no Python source, no markdown fences).
- Do not recommend bypassing login, CAPTCHA, paywalls, anti-bot controls, or proctored exams.
Implementation
run_json_contract_task+ post-validation (schema version, page kind, types, confidence, forbidden keys).
Changelog
- v1 — Initial static MCQ mechanics discovery.