63 lines
3.4 KiB
Markdown
63 lines
3.4 KiB
Markdown
# Research: Refactor CLI Scripts to Web Application
|
|
|
|
## 1. Search Tool Architecture
|
|
|
|
**Problem**: The `search_script.py` fetches metadata for *all* datasets and performs regex matching in memory. This can be resource-intensive and slow for large Superset instances.
|
|
|
|
**Options**:
|
|
1. **Synchronous API Endpoint**: The frontend calls an API, waits, and displays results.
|
|
* *Pros*: Simple, immediate feedback.
|
|
* *Cons*: Risk of HTTP timeout (e.g., Nginx/Browser limits) if the dataset fetch takes too long.
|
|
2. **Asynchronous Task (TaskManager)**: The frontend triggers a task, polls for status, and displays results when done.
|
|
* *Pros*: Robust, no timeouts, consistent with "Mapping" and "Migration" tools.
|
|
* *Cons*: Slower user experience for quick searches.
|
|
|
|
**Decision**: **Synchronous API with Optimization**.
|
|
* **Rationale**: Search is typically an interactive "read-only" operation. Users expect immediate results. The `superset_tool` client's `get_datasets` is reasonably efficient.
|
|
* **Mitigation**: We will implement the API to return a standard JSON response. If performance becomes an issue in testing, we can easily wrap the service logic in a TaskManager plugin.
|
|
|
|
## 2. Dataset Mapper & Connection Management
|
|
|
|
**Problem**: `run_mapper.py` relies on command-line arguments and `keyring` for database credentials. The Web UI needs a way to store and reuse these credentials securely.
|
|
|
|
**Options**:
|
|
1. **Input Every Time**: User enters DB credentials for every mapping operation.
|
|
* *Pros*: Secure (no storage).
|
|
* *Cons*: Poor UX, tedious.
|
|
2. **Saved Connections**: Store connection details (Host, Port, DB, User, Password) in the application database.
|
|
* *Pros*: Good UX.
|
|
* *Cons*: Security risk if not encrypted.
|
|
|
|
**Decision**: **Saved Connections (Encrypted)**.
|
|
* **Rationale**: The spec explicitly requires: "Connection configurations must be saved for reuse".
|
|
* **Implementation**:
|
|
* Create a new SQLAlchemy model `ConnectionConfig` in `backend/src/models/connection.py`.
|
|
* Store passwords encrypted (or at least obfuscated if full encryption infra isn't ready, but ideally encrypted). Given the scope, we will store them in the existing SQLite database.
|
|
* The Mapper logic will be refactored into a `MapperPlugin` (or updated existing one) that accepts a `connection_id` or explicit config.
|
|
|
|
## 3. Debug Tools Integration
|
|
|
|
**Problem**: `debug_db_api.py` and `get_dataset_structure.py` are standalone scripts that print to stdout or write files.
|
|
|
|
**Decision**: **Direct API Services**.
|
|
* **Debug API**: Create an endpoint `POST /api/tools/debug/test-db-connection` that runs the logic from `debug_db_api.py` and returns the log/result JSON.
|
|
* **Dataset Structure**: Create an endpoint `GET /api/tools/debug/dataset/{id}/structure` that runs logic from `get_dataset_structure.py` and returns the JSON directly.
|
|
|
|
## 4. Legacy Code Cleanup
|
|
|
|
**Plan**:
|
|
1. Implement the new Web tools.
|
|
2. Verify feature parity.
|
|
3. Delete:
|
|
* `search_script.py`
|
|
* `run_mapper.py`
|
|
* `debug_db_api.py`
|
|
* `get_dataset_structure.py`
|
|
* `backup_script.py` (Spec confirms it's superseded by `009-backup-scheduler`)
|
|
|
|
## 5. Security & Access
|
|
|
|
**Decision**: All authenticated users can access these tools.
|
|
* **Rationale**: Spec says "All authenticated users".
|
|
* **Implementation**: Use existing `Depends(get_current_user)` for all new routes.
|