Back to Cli Anything

Audacity: Project-Specific Analysis & SOP

audacity/agent-harness/AUDACITY.md

latest6.3 KB
Original Source

Audacity: Project-Specific Analysis & SOP

Architecture Summary

Audacity is a multi-platform audio editor built on PortAudio for I/O and libsndfile for file format support. Its native .aup3 format is a SQLite database containing audio data and project metadata.

+-------------------------------------------------+
|                 Audacity GUI                    |
|  +----------+ +----------+ +----------------+  |
|  | Timeline | |  Mixer   | |    Effects     |  |
|  |  (wxGTK) | | (wxGTK)  | |   (wxGTK)     |  |
|  +----+-----+ +----+-----+ +------+---------+  |
|       |             |              |            |
|  +----+-------------+--------------+----------+ |
|  |         Internal Audio Engine              | |
|  |  Block-based audio storage, real-time      | |
|  |  processing, effect chain, undo history    | |
|  +--------------------+-----------------------+ |
+------------------------+------------------------+
                         |
          +--------------+--------------+
          | PortAudio (I/O) | libsndfile |
          | SoX resampler   | LAME (MP3) |
          +---------------------------------+

CLI Strategy: Python stdlib + JSON Project

Unlike applications with XML project files, Audacity's .aup3 is SQLite, making direct manipulation complex. Our strategy:

  1. JSON project format tracks all state (tracks, clips, effects, labels)
  2. Python stdlib (wave, struct, math) handles WAV I/O and audio processing
  3. pydub (optional) for advanced format support (MP3, FLAC, OGG)

Why Not .aup3 Directly?

The .aup3 format is a SQLite database with:

  • Binary audio block storage (custom compression)
  • Complex relational schema for tracks, clips, envelopes
  • Undo history embedded in the database
  • Project metadata interleaved with audio data

Parsing and writing this format requires deep knowledge of Audacity internals. Instead, we use a JSON manifest and render to standard audio formats.

The Project Format (.audacity-cli.json)

json
{
  "version": "1.0",
  "name": "my_podcast",
  "settings": {
    "sample_rate": 44100,
    "bit_depth": 16,
    "channels": 2
  },
  "tracks": [...],
  "labels": [...],
  "selection": {"start": 0.0, "end": 0.0},
  "metadata": {"title": "", "artist": "", "album": "", ...}
}

Command Map: GUI Action -> CLI Command

GUI ActionCLI Command
File -> Newproject new --name "My Project"
File -> Openproject open <path>
File -> Saveproject save [path]
File -> Export Audioexport render <output> [--preset wav]
Tracks -> Add New -> Audiotrack add --name "Track"
Track -> Removetrack remove <index>
Track -> Mute/Solotrack set <index> mute true
Track -> Volumetrack set <index> volume 0.8
Track -> Pantrack set <index> pan -0.5
File -> Import -> Audioclip add <track> <file>
Edit -> Removeclip remove <track> <clip>
Edit -> Clip Boundaries -> Splitclip split <track> <clip> <time>
Edit -> Move Clipclip move <track> <clip> <time>
Effect -> Amplifyeffect add amplify --track 0 -p gain_db=6.0
Effect -> Normalizeeffect add normalize --track 0 -p target_db=-1.0
Effect -> Fade Ineffect add fade_in --track 0 -p duration=2.0
Effect -> Reverseeffect add reverse --track 0
Effect -> Echoeffect add echo --track 0 -p delay_ms=500 -p decay=0.5
Edit -> Select Allselection all
Edit -> Labels -> Add Labellabel add 5.0 --text "Marker"
Edit -> Undosession undo
Edit -> Redosession redo

Effect Registry

CLI NameCategoryKey Parameters
amplifyvolumegain_db (-60 to 60)
normalizevolumetarget_db (-60 to 0)
fade_infadeduration (0.01-300s)
fade_outfadeduration (0.01-300s)
reversetransform(none)
silencegenerateduration (0.01-3600s)
tonegeneratefrequency, duration, amplitude
change_speedtransformfactor (0.1-10.0)
change_pitchtransformsemitones (-24 to 24)
change_tempotransformfactor (0.1-10.0)
echodelaydelay_ms, decay
low_passeqcutoff (20-20000 Hz)
high_passeqcutoff (20-20000 Hz)
compressdynamicsthreshold, ratio, attack, release
limitdynamicsthreshold_db (-60 to 0)
noise_reductionrestorationreduction_db (0-48)

Export Formats

PresetFormatBit DepthNotes
wavWAV16-bitStandard, native support
wav-24WAV24-bitHigh quality
wav-32WAV32-bitStudio quality
wav-8WAV8-bitLow quality
mp3MP3Requires pydub/ffmpeg
flacFLACRequires pydub/ffmpeg
oggOGGRequires pydub/ffmpeg
aiffAIFFRequires pydub/ffmpeg

Rendering Pipeline

  1. For each non-muted track (respecting solo): a. For each clip: read source WAV, apply trim, place at timeline position b. Mix overlapping clips on the same track c. Apply track effects chain in order d. Apply track volume
  2. Mix all tracks together (with pan and volume)
  3. Clamp to [-1.0, 1.0]
  4. Write to output format

Rendering Gap Assessment: Medium

  • WAV I/O works natively via Python's wave module
  • Basic effects (gain, fade, reverse, echo, filters) implemented in pure Python
  • Advanced effects (pitch shift, time stretch) use simplified algorithms
  • MP3/FLAC/OGG export requires external tools (pydub + ffmpeg)
  • No real-time preview capability

Test Coverage

  1. Unit tests (test_core.py): 60+ tests, synthetic data

    • Project CRUD and settings
    • Track add/remove/properties
    • Clip add/remove/split/move/trim
    • Effect registry, validation, add/remove/set
    • Label add/remove/list
    • Selection set/all/none
    • Session undo/redo
    • Audio utility functions
  2. E2E tests (test_full_e2e.py): 40+ tests, real WAV files

    • WAV read/write round-trips (16-bit, 24-bit, stereo)
    • Audio effect verification (gain, fade, reverse, echo, filters)
    • Full render pipeline (single track, multi-track, mute, solo)
    • Project save/load with effects preserved
    • Multi-step workflows (podcast creation)
    • CLI subprocess invocation
    • Media probing