forge-lcdl

Task `pw_static_mcq_mechanics_discover` v1

Given page_probe, chunks, classified_chunks, optional operator hints, and source_constraints, infer a structured mechanics specification for pages where questions, options, and answers are already visible in HTML/text…

Task pw_static_mcq_mechanics_discover v1

Summary

Given page_probe, chunks, classified_chunks, optional operator hints, and source_constraints, infer a structured mechanics specification for pages where questions, options, and answers are already visible in HTML/text (static blog / handbook style). This path avoids emitting arbitrary Python; generated extractor tasks (pw_extractor_synthesize_*) remain available as fallback.

Output uses schema_version page_mechanics.v1 (PAGE_MECHANICS_DOT_SCHEMA_VERSION in forge_lcdl.schemas.page_mechanics_v1). See also the checklist artifact page_mechanics/v1/contract.md.

Inputs

Field Type Required Notes
url string yes Non-empty after strip.
page_probe object yes Bounded probe; may be {}.
chunks array yes Chunk list from deterministic segmentation (may be []).
classified_chunks array yes Classifier output merged onto chunks (may be []).
source_constraints object yes e.g. expected_option_count, expected_choice_mode; may be {}.
operator_hints string no Short hints; omit or "".
temperature number no Default 0.05.
timeout_sec int no Default profile.timeout_sec.

User JSON capped at 100000 UTF-8 bytes before the LLM call.

Output

JSON object with exactly these top-level keys:

Key Rule
schema_version page_mechanics.v1
page_kind static_mcq_page
confidence Number; coerced to [0.0, 1.0]
question Object describing block/stem/options/correct-answer strategies and selectors (no Python).
safety Object (e.g. forbid_freeform_js: true).
notes String (may be "").

Forbidden fields

At any nesting level, object keys matching executable/code carriers are rejected (normalized):
extractor_python, script, code, eval, execute_js, javascript, python_source, generated_python, source_code, compiled_code, inline_script, and similar tokens in the validator.

Policy

  • Output one JSON object only (no Python source, no markdown fences).
  • Do not recommend bypassing login, CAPTCHA, paywalls, anti-bot controls, or proctored exams.

Implementation

  • run_json_contract_task + post-validation (schema version, page kind, types, confidence, forbidden keys).

Changelog

  • v1 — Initial static MCQ mechanics discovery.