backend/STYLE_GUIDE.md
Target: Python 3.12+ | Formatter/Linter: Ruff | Config: backend/pyproject.toml
This guide codifies the conventions used across the backend, and prescribes the target style for code written during the refactor (Phases 3-6). Existing code should be migrated incrementally -- don't reformat entire files in unrelated PRs.
Enforced by ruff format (Black-compatible).
") for strings. Single quotes are acceptable in f-string expressions and dict keys inside f-strings where avoiding escapes improves readability.Run: ruff format backend/
Enforced by ruff's isort rules (rule set I).
Grouping -- three blocks separated by a blank line:
import asyncio # 1. stdlib
from pathlib import Path
import numpy as np # 2. third-party
from fastapi import APIRouter, HTTPException
from sqlalchemy.orm import Session
from backend.config import get_data_dir # 3. local (absolute)
from .database import get_db # or relative
Rules:
backend package, use relative imports for sibling/child modules: from .database import get_db, from ..utils.audio import load_audio.main.py, server.py).from module import *).from X import Y when there are 4+ names; below that, comma-separated is fine.# lazy: heavy import.Python 3.12 means we use built-in generics and union syntax natively. No from __future__ import annotations, no typing.List/typing.Dict.
# Yes
def process(items: list[str], config: dict[str, int] | None = None) -> tuple[int, str]: ...
# No
from typing import List, Dict, Optional, Tuple
def process(items: List[str], config: Optional[Dict[str, int]] = None) -> Tuple[int, str]: ...
What to annotate:
-> SomeResponse return types when the route doesn't use response_model.Imports from typing that are still needed (no built-in equivalent):
Literal, TypeAlias, Protocol, runtime_checkable, Callable, Any, ClassVar, TypeVar, overload, TYPE_CHECKING.
Use collections.abc for abstract types: Sequence, Mapping, Iterable, Iterator, Generator.
| Thing | Convention | Example |
|---|---|---|
| Module | snake_case | task_queue.py |
| Class | PascalCase | ProgressManager |
| Function / method | snake_case | create_profile |
| Variable | snake_case | sample_rate |
| Constant | UPPER_SNAKE_CASE | DEFAULT_SAMPLE_RATE |
| Private | _leading_underscore | _generation_queue |
| Type alias | PascalCase | EffectChain = list[dict[str, Any]] |
Specific conventions:
DB prefix alias: from .database import VoiceProfile as DBVoiceProfile.VoiceProfileCreate, VoiceProfileResponse, GenerationRequest.MLXTTSBackend, PyTorchSTTBackend.Google style. Required on all public functions, classes, and modules.
def combine_voice_prompts(
profile_dir: Path,
*,
target_sr: int = 24000,
) -> tuple[np.ndarray, int]:
"""Load and concatenate all voice prompt files for a profile.
Reads .wav/.mp3/.flac files from the profile directory, resamples to
the target sample rate, normalizes, and concatenates into a single array.
Args:
profile_dir: Path to the voice profile directory containing audio files.
target_sr: Target sample rate for the output. Defaults to 24000.
Returns:
Tuple of (concatenated audio array, sample rate).
Raises:
FileNotFoundError: If profile_dir does not exist.
ValueError: If no valid audio files are found.
"""
Short form is fine for simple functions:
def get_db_path() -> Path:
"""Get the path to the SQLite database file."""
When to skip: Private helpers under ~5 lines where the name and signature make intent obvious.
Module docstrings: A single sentence at the top of every file describing its purpose.
"""Voice profile CRUD operations."""
Comments explain why, not what. If the code needs a comment to explain what it does, the code should be rewritten to be clearer. The exceptions are non-obvious performance choices, external constraints, and concurrency/race-condition reasoning -- those always deserve a comment.
Do not use ASCII dividers to create visual sections in files:
# No -- any of these:
# ============================================
# GENERATION ENDPOINTS
# ============================================
# ---------------------------------------------------------------------------
# Device detection
# ---------------------------------------------------------------------------
# --- Load model --------------------------------------------------
If a file needs section dividers to be navigable, the file is too long. Split it into modules. Within a function, if you need labeled sections to follow the logic, extract those sections into named functions.
Inline comments (end-of-line) are fine when they add information the code can't express:
# Yes -- explains a non-obvious constraint or gives context:
audio, sr = load_audio(path, sr=24000) # Qwen expects 24kHz mono
_generation_queue: asyncio.Queue = None # type: ignore # initialized at startup
"tauri://localhost", # Tauri webview (macOS)
# No -- restates the code:
# Check if profile name already exists
existing = db.query(DBVoiceProfile).filter_by(name=data.name).first()
# Delete from database
db.delete(sample)
# Update fields
profile.name = data.name
Delete comments that narrate what the next line of code obviously does. If the function name, variable name, or method call already communicates intent, the comment is noise.
Use block comments for why explanations -- constraints, workarounds, non-obvious decisions:
# PyInstaller + multiprocessing: child processes re-execute the frozen binary
# with internal arguments. freeze_support() handles this and exits early.
multiprocessing.freeze_support()
# Mark any stale "generating" records as failed -- these are leftovers
# from a previous process that was killed mid-generation.
db.query(Generation).filter_by(status="generating").update({"status": "failed"})
Keep block comments tight. Two to three lines is normal. If you need a paragraph, it probably belongs in the docstring or a design doc.
Always add a reason after noqa and type: ignore:
import intel_extension_for_pytorch # noqa: F401 -- side-effect import enables XPU
_queue: asyncio.Queue = None # type: ignore[assignment] # initialized at startup
Bare # noqa or # type: ignore with no explanation are not allowed.
Use sparingly. Every TODO must include a brief description of what needs doing. Don't use them as a substitute for tracking work properly:
# TODO: replace with async SQLAlchemy once CRUD modules are migrated (Phase 5)
result = await asyncio.to_thread(profiles.get_profile, profile_id, db)
Never commit HACK, XXX, or FIXME -- fix the problem or file an issue.
Delete it. That's what git is for. If you need to document that something was intentionally removed, a short tombstone comment is acceptable:
# Removed config.json-only check -- too lenient, doesn't confirm weights exist.
The refactor is standardizing on a two-layer pattern:
CRUD modules and services raise ValueError, FileNotFoundError, or (post-refactor) custom exceptions defined in backend/errors.py:
# backend/errors.py (to be created in Phase 4)
class NotFoundError(Exception):
"""Raised when a requested resource does not exist."""
class ConflictError(Exception):
"""Raised on uniqueness constraint violations."""
# In a service or CRUD module:
raise NotFoundError(f"Profile {profile_id} not found")
Route handlers catch domain exceptions and convert:
@router.post("/profiles")
async def create_profile(data: VoiceProfileCreate, db: Session = Depends(get_db)):
try:
return await profiles.create_profile(data, db)
except ConflictError as e:
raise HTTPException(status_code=409, detail=str(e))
Background tasks catch Exception broadly, log with logger.exception(), and update the task status to "failed".
Never: silently swallow exceptions, use bare except:, or catch BaseException.
async def unless the function awaits something. Several service modules still declare async def without awaiting -- these should be migrated to sync functions with asyncio.to_thread() at the route layer, or to real async SQLAlchemy.asyncio.to_thread():
audio, sr = await asyncio.to_thread(load_audio, source_path)
services/task_queue.py). Never call a backend's generate() directly from a route handler.asyncio.create_task() and track the task reference to prevent garbage collection:
task = asyncio.create_task(some_coro())
_background_tasks.add(task)
task.add_done_callback(_background_tasks.discard)
Use the logging module. Not print().
import logging
logger = logging.getLogger(__name__)
logger.info("Loading model %s on %s", model_name, device)
logger.warning("Cache miss for %s, downloading", repo_id)
logger.exception("Generation %s failed") # logs traceback automatically
Rules:
%s-style placeholders in log calls (not f-strings). This avoids formatting the string if the log level is filtered out.logger.exception() inside except blocks -- it captures the traceback.__name__ (yields backend.utils.audio, etc.).print() calls should be migrated to logging as files are touched during the refactor.UPPER_SNAKE_CASE.backend/config.py after Phase 6 consolidation.# No
if len(audio) > 24000 * 60 * 10:
# Yes
MAX_AUDIO_DURATION_SAMPLES = SAMPLE_RATE * 60 * 10
if len(audio) > MAX_AUDIO_DURATION_SAMPLES:
*) for functions with 3+ parameters, especially when several share the same type:
def is_model_cached(
hf_repo: str,
*,
weight_extensions: tuple[str, ...] = (".safetensors", ".bin"),
required_files: list[str] | None = None,
) -> bool:
%s-style for logging calls (lazy evaluation)..format(): avoid; f-strings are preferred.Framework: pytest with pytest-asyncio.
test_<module>.py in backend/tests/.conftest.py for shared fixtures (db sessions, test client, mock backends).class TestProfileCRUD:.@pytest.mark.asyncio for async tests.@pytest.mark.parametrize to reduce repetition.tests/ but are clearly marked (filename prefix manual_ or documented in tests/README.md).backend/
app.py # FastAPI app factory, CORS, lifecycle events
main.py # Entry point (imports app, runs uvicorn)
config.py # Data directory paths
models.py # Pydantic request/response schemas
server.py # Tauri sidecar launcher, parent-pid watchdog
routes/ # Thin HTTP handlers (validation, delegation, response formatting)
services/ # Business logic, CRUD, orchestration
backends/ # TTS/STT engine implementations
database/ # ORM models, session management, migrations, seeds
utils/ # Shared utilities (audio, effects, caching, progress)
tests/ # pytest suite
pyproject.toml configures ruff for linting and formatting. Run:
# Lint (check)
ruff check backend/
# Lint (auto-fix)
ruff check backend/ --fix
# Format
ruff format backend/
Introduce ruff fixes file-by-file as you touch them. Don't run --fix across the entire codebase in one shot -- that creates unreviewable diffs.