Files
ss-tools/specs/010-refactor-cli-to-web/research.md
2026-01-07 18:59:49 +03:00

3.4 KiB

Research: Refactor CLI Scripts to Web Application

1. Search Tool Architecture

Problem: The search_script.py fetches metadata for all datasets and performs regex matching in memory. This can be resource-intensive and slow for large Superset instances.

Options:

  1. Synchronous API Endpoint: The frontend calls an API, waits, and displays results.
    • Pros: Simple, immediate feedback.
    • Cons: Risk of HTTP timeout (e.g., Nginx/Browser limits) if the dataset fetch takes too long.
  2. Asynchronous Task (TaskManager): The frontend triggers a task, polls for status, and displays results when done.
    • Pros: Robust, no timeouts, consistent with "Mapping" and "Migration" tools.
    • Cons: Slower user experience for quick searches.

Decision: Synchronous API with Optimization.

  • Rationale: Search is typically an interactive "read-only" operation. Users expect immediate results. The superset_tool client's get_datasets is reasonably efficient.
  • Mitigation: We will implement the API to return a standard JSON response. If performance becomes an issue in testing, we can easily wrap the service logic in a TaskManager plugin.

2. Dataset Mapper & Connection Management

Problem: run_mapper.py relies on command-line arguments and keyring for database credentials. The Web UI needs a way to store and reuse these credentials securely.

Options:

  1. Input Every Time: User enters DB credentials for every mapping operation.
    • Pros: Secure (no storage).
    • Cons: Poor UX, tedious.
  2. Saved Connections: Store connection details (Host, Port, DB, User, Password) in the application database.
    • Pros: Good UX.
    • Cons: Security risk if not encrypted.

Decision: Saved Connections (Encrypted).

  • Rationale: The spec explicitly requires: "Connection configurations must be saved for reuse".
  • Implementation:
    • Create a new SQLAlchemy model ConnectionConfig in backend/src/models/connection.py.
    • Store passwords encrypted (or at least obfuscated if full encryption infra isn't ready, but ideally encrypted). Given the scope, we will store them in the existing SQLite database.
    • The Mapper logic will be refactored into a MapperPlugin (or updated existing one) that accepts a connection_id or explicit config.

3. Debug Tools Integration

Problem: debug_db_api.py and get_dataset_structure.py are standalone scripts that print to stdout or write files.

Decision: Direct API Services.

  • Debug API: Create an endpoint POST /api/tools/debug/test-db-connection that runs the logic from debug_db_api.py and returns the log/result JSON.
  • Dataset Structure: Create an endpoint GET /api/tools/debug/dataset/{id}/structure that runs logic from get_dataset_structure.py and returns the JSON directly.

4. Legacy Code Cleanup

Plan:

  1. Implement the new Web tools.
  2. Verify feature parity.
  3. Delete:
    • search_script.py
    • run_mapper.py
    • debug_db_api.py
    • get_dataset_structure.py
    • backup_script.py (Spec confirms it's superseded by 009-backup-scheduler)

5. Security & Access

Decision: All authenticated users can access these tools.

  • Rationale: Spec says "All authenticated users".
  • Implementation: Use existing Depends(get_current_user) for all new routes.