# Semantic Map Generator & Validator Design ## Objective Create a Python script (`generate_semantic_map.py`) to: 1. Scan the codebase (`backend/`, `frontend/`) for Semantic Protocol markers. 2. Generate a **Full Semantic Map** (JSON) for detailed machine processing. 3. Generate a **Compressed Project Map** (Markdown) for LLM context window (~4000 tokens). 4. Generate a **Compliance Report** (Markdown) with history tracking. ## Scope * **Languages**: Python (`.py`), Svelte (`.svelte`), JavaScript/TypeScript (`.js`, `.ts`). * **Protocol**: Based on `semantic_protocol.md`. ## 1. Parsing Logic The script will use Regex to parse files textually. ### Python Patterns * **Anchor Start**: `r"#\s*\[DEF:(?P[\w\.]+):(?P\w+)\]"` * **Anchor End**: `r"#\s*\[/DEF:(?P[\w\.]+):(?P\w+)\]"` * **Tags**: `r"#\s*@(?P[A-Z_]+):\s*(?P.*)"` ### Svelte/JS Patterns * **HTML Anchor Start**: `r""` * **HTML Anchor End**: `r""` * **JS Anchor Start**: `r"//\s*\[DEF:(?P[\w\.]+):(?P\w+)\]"` * **JS Anchor End**: `r"//\s*\[/DEF:(?P[\w\.]+):(?P\w+)\]"` * **HTML Tags**: `r"@(?P[A-Z_]+):\s*(?P.*)"` (inside comments) * **JSDoc Tags**: `r"\*\s*@(?P[a-z]+)\s+(?P.*)"` * **Relation Tag**: `r"//\s*@RELATION:\s*(?P\w+)\s*->\s*(?P.*)"` ## 2. Output Files & Structure ### A. Full Map (`semantics/semantic_map.json`) Detailed hierarchical JSON containing all metadata, line numbers, and relations. ```json { "project_root": "/home/user/ss-tools", "generated_at": "ISO-DATE", "modules": [ ... ] } ``` ### B. Compressed Map (`specs/project_map.md`) Optimized for "Full Attention" (max ~4000 tokens). * **Format**: Markdown Tree. * **Content**: * List of high-level Modules/Components. * Only critical tags: `@PURPOSE`, `@LAYER`. * List of public Functions/Actions (names only, or brief summary). * Key Relations (`DEPENDS_ON`). * *Omit*: Internal implementation details, line numbers, minor tags. ### C. Compliance Report (`semantics/reports/semantic_report_YYYYMMDD_HHMMSS.md`) * **Metrics**: Global Coverage %, Module-level scores. * **Errors**: List of broken anchors or missing mandatory tags. * **History**: Timestamped filename ensures history tracking. ## 3. Validation Rules (Scoring) ### Critical Errors (0% score for entity) 1. **Unmatched Anchors**: Start tag without End tag. ### Metadata Warnings (Reduces score) 1. **Missing Mandatory Tags**: `Module`/`Component` needs `@PURPOSE`, `@LAYER`. ## 4. Implementation Plan 1. **Setup**: Create `generate_semantic_map.py` and directories (`semantics/`, `semantics/reports/`). 2. **Parser**: Implement Regex patterns. 3. **Walker**: Recursive file walk ignoring standard ignore patterns. 4. **Generators**: * `MapGenerator`: JSON dump. * `ContextGenerator`: Markdown tree builder with token awareness (heuristic). * `ReportGenerator`: Scoring and markdown formatting. 5. **Execution**: Run script.