project map script | semantic parcer

2026-01-01 16:58:21 +03:00
parent a747a163c8
commit 4c6fc8256d
84 changed files with 10178 additions and 537 deletions
--- a/specs/semantic_map_design.md
+++ b/specs/semantic_map_design.md
@@ -0,0 +1,76 @@
+# Semantic Map Generator & Validator Design
+
+## Objective
+Create a Python script (`generate_semantic_map.py`) to:
+1.  Scan the codebase (`backend/`, `frontend/`) for Semantic Protocol markers.
+2.  Generate a **Full Semantic Map** (JSON) for detailed machine processing.
+3.  Generate a **Compressed Project Map** (Markdown) for LLM context window (~4000 tokens).
+4.  Generate a **Compliance Report** (Markdown) with history tracking.
+
+## Scope
+*   **Languages**: Python (`.py`), Svelte (`.svelte`), JavaScript/TypeScript (`.js`, `.ts`).
+*   **Protocol**: Based on `semantic_protocol.md`.
+
+## 1. Parsing Logic
+
+The script will use Regex to parse files textually.
+
+### Python Patterns
+*   **Anchor Start**: `r"#\s*\[DEF:(?P<name>[\w\.]+):(?P<type>\w+)\]"`
+*   **Anchor End**: `r"#\s*\[/DEF:(?P<name>[\w\.]+):(?P<type>\w+)\]"`
+*   **Tags**: `r"#\s*@(?P<tag>[A-Z_]+):\s*(?P<value>.*)"`
+
+### Svelte/JS Patterns
+*   **HTML Anchor Start**: `r"<!--\s*\[DEF:(?P<name>[\w\.]+):(?P<type>\w+)\]\s*-->"`
+*   **HTML Anchor End**: `r"<!--\s*\[/DEF:(?P<name>[\w\.]+):(?P<type>\w+)\]\s*-->"`
+*   **JS Anchor Start**: `r"//\s*\[DEF:(?P<name>[\w\.]+):(?P<type>\w+)\]"`
+*   **JS Anchor End**: `r"//\s*\[/DEF:(?P<name>[\w\.]+):(?P<type>\w+)\]"`
+*   **HTML Tags**: `r"@(?P<tag>[A-Z_]+):\s*(?P<value>.*)"` (inside comments)
+*   **JSDoc Tags**: `r"\*\s*@(?P<tag>[a-z]+)\s+(?P<value>.*)"`
+*   **Relation Tag**: `r"//\s*@RELATION:\s*(?P<type>\w+)\s*->\s*(?P<target>.*)"`
+
+## 2. Output Files & Structure
+
+### A. Full Map (`semantics/semantic_map.json`)
+Detailed hierarchical JSON containing all metadata, line numbers, and relations.
+```json
+{
+  "project_root": "/home/user/ss-tools",
+  "generated_at": "ISO-DATE",
+  "modules": [ ... ]
+}
+```
+
+### B. Compressed Map (`specs/project_map.md`)
+Optimized for "Full Attention" (max ~4000 tokens).
+*   **Format**: Markdown Tree.
+*   **Content**:
+    *   List of high-level Modules/Components.
+    *   Only critical tags: `@PURPOSE`, `@LAYER`.
+    *   List of public Functions/Actions (names only, or brief summary).
+    *   Key Relations (`DEPENDS_ON`).
+    *   *Omit*: Internal implementation details, line numbers, minor tags.
+
+### C. Compliance Report (`semantics/reports/semantic_report_YYYYMMDD_HHMMSS.md`)
+*   **Metrics**: Global Coverage %, Module-level scores.
+*   **Errors**: List of broken anchors or missing mandatory tags.
+*   **History**: Timestamped filename ensures history tracking.
+
+## 3. Validation Rules (Scoring)
+
+### Critical Errors (0% score for entity)
+1.  **Unmatched Anchors**: Start tag without End tag.
+
+### Metadata Warnings (Reduces score)
+1.  **Missing Mandatory Tags**: `Module`/`Component` needs `@PURPOSE`, `@LAYER`.
+
+## 4. Implementation Plan
+
+1.  **Setup**: Create `generate_semantic_map.py` and directories (`semantics/`, `semantics/reports/`).
+2.  **Parser**: Implement Regex patterns.
+3.  **Walker**: Recursive file walk ignoring standard ignore patterns.
+4.  **Generators**:
+    *   `MapGenerator`: JSON dump.
+    *   `ContextGenerator`: Markdown tree builder with token awareness (heuristic).
+    *   `ReportGenerator`: Scoring and markdown formatting.
+5.  **Execution**: Run script.