optional-skills/creative/hyperframes/references/features.md
Load this file when a composition needs captions, TTS narration, audio-reactive visuals, marker-style text highlighting, or scene transitions. All patterns here are deterministic (no Math.random(), no Date.now(), no runtime audio analysis) and live on the same GSAP timeline as the rest of the composition.
Never use .en whisper models unless the audio is confirmed English. .en models TRANSLATE non-English audio into English instead of transcribing it.
npx hyperframes transcribe audio.mp3 --model small --language <code> (no .en)--model small.en--model small (auto-detects)If the user doesn't specify a caption style, detect it from the transcript tone:
| Tone | Font mood | Animation | Color | Size |
|---|---|---|---|---|
| Hype / launch | Heavy condensed, 800-900 | Scale-pop, back.out(1.7), 0.1-0.2s | Bright on dark | 72-96px |
| Corporate | Clean sans, 600-700 | Fade+slide, power3.out, 0.3s | White / neutral + muted accent | 56-72px |
| Tutorial | Mono / clean sans, 500-600 | Typewriter or fade, 0.4-0.5s | High contrast, minimal | 48-64px |
| Storytelling | Serif / elegant, 400-500 | Slow fade, power2.out, 0.5-0.6s | Warm muted tones | 44-56px |
| Social | Rounded sans, 700-800 | Bounce, elastic.out, word-by-word | Playful, colored pills | 56-80px |
Break on sentence boundaries, 150ms+ pauses, or a max word count.
position: absolute (never relative). One caption group visible at a time.Use the runtime helper so captions never overflow:
const result = window.__hyperframes.fitTextFontSize(group.text.toUpperCase(), {
fontFamily: "Outfit",
fontWeight: 900,
maxWidth: 1600, // 1600 landscape, 900 portrait
});
el.style.fontSize = result.fontSize + "px";
When per-word styling uses scale > 1.0, compute maxWidth = safeWidth / maxScale to leave headroom. Container needs overflow: visible (not hidden — hidden clips scaled emphasis words and glow).
Every group MUST have a hard kill after its exit tween — otherwise groups leak into later ones:
tl.to(groupEl, { opacity: 0, scale: 0.95, duration: 0.12, ease: "power2.in" }, group.end - 0.12);
tl.set(groupEl, { opacity: 0, visibility: "hidden" }, group.end); // deterministic kill
Scan the transcript for words that deserve distinct treatment:
Local, no API key. Runs on CPU. Model downloads on first use (~311 MB + ~27 MB voices, cached in ~/.cache/hyperframes/tts/).
| Content type | Voice | Why |
|---|---|---|
| Product demo | af_heart / af_nova | Warm, professional |
| Tutorial | am_adam / bf_emma | Neutral, easy to follow |
| Marketing | af_sky / am_michael | Energetic or authoritative |
| Documentation | bf_emma / bm_george | Clear British English |
| Casual | af_heart / af_sky | Approachable, natural |
Run npx hyperframes tts --list for all 54 voices across 8 languages.
Voice ID first letter encodes language: a=American English, b=British English, e=Spanish, f=French, h=Hindi, i=Italian, j=Japanese, p=Brazilian Portuguese, z=Mandarin. The CLI auto-infers the phonemizer locale from that prefix — you don't need --lang when voice and text match.
npx hyperframes tts "La reunión empieza a las nueve" --voice ef_dora --output es.wav
npx hyperframes tts "今日はいい天気ですね" --voice jf_alpha --output ja.wav
Pass --lang only to override auto-detection (e.g. stylized accents):
npx hyperframes tts "Hello there" --voice af_heart --lang fr-fr --output accented.wav
Valid --lang codes: en-us, en-gb, es, fr-fr, hi, it, pt-br, ja, zh. Non-English phonemization requires espeak-ng installed system-wide (apt-get install espeak-ng / brew install espeak-ng).
0.7-0.8 — tutorial, complex content1.0 — natural (default)1.1-1.2 — intros, upbeat content1.5+ — rarely appropriatenpx hyperframes tts script.txt --voice af_heart --output narration.wav
npx hyperframes transcribe narration.wav # → transcript.json (word-level)
Drive visuals from music, voice, or sound. Any GSAP-tweenable property can respond to pre-extracted audio data.
const AUDIO_DATA = {
fps: 30,
totalFrames: 900,
frames: [{ bands: [0.82, 0.45, 0.31, /* ... */] }, /* ... */],
};
frames[i].bands[] are frequency band amplitudes, 0-1. Index 0 = bass, higher indices = treble. Each band is normalized independently across the full track.
| Audio signal | Visual property | Effect |
|---|---|---|
Bass (bands[0]) | scale | Pulse on beat |
Treble (bands[12-14]) | textShadow, boxShadow | Glow intensity |
| Overall amplitude | opacity, y, backgroundColor | Breathe, lift, color shift |
Mid-range (bands[4-8]) | borderRadius, width | Shape morphing |
Any GSAP-tweenable property works — clipPath, filter, SVG attributes, CSS custom properties. Let content guide the visual and let audio drive its behavior. Never add equalizer bars, spectrum analyzers, waveform displays, rainbow cycling, or generic particle systems — they look cheap.
Audio reactivity needs per-frame sampling via a for loop of tl.call(), NOT a single tween. A single long tween does NOT react to audio:
for (let f = 0; f < AUDIO_DATA.totalFrames; f++) {
tl.call(
((frame) => () => draw(frame))(AUDIO_DATA.frames[f]),
[],
f / AUDIO_DATA.fps,
);
}
rgba(255,255,255,0.3)) renders a visible glow rectangle behind every child. Apply the glow to active words individually, not to the container.Deterministic CSS + GSAP implementations of the classic "highlight / circle / burst / scribble / sketchout" drawing modes for emphasizing text. Fully seekable — no animated SVG filters, no JS timers.
<span class="mh-highlight-wrap">
<span class="mh-highlight-bar" id="hl-1"></span>
<span class="mh-highlight-text">highlighted text</span>
</span>
.mh-highlight-wrap { position: relative; display: inline; }
.mh-highlight-bar {
position: absolute; inset: 0 -6px;
background: #fdd835; opacity: 0.35;
transform: scaleX(0); transform-origin: left center;
border-radius: 3px; z-index: 0;
}
.mh-highlight-text { position: relative; z-index: 1; }
tl.to("#hl-1", { scaleX: 1, duration: 0.5, ease: "power2.out" }, 0.6);
Multi-line: apply to .mh-highlight-bar with stagger: 0.3.
Hand-drawn ellipse around a word. Use a positioned ::before with border-radius: 50%, slight rotation, and clip-path to avoid covering the letters. Animate clip-path or stroke-dashoffset on an inline SVG circle.
Short radiating lines around a word. Render 6-12 small <span> elements positioned in a radial pattern; animate scaleY from 0.
A chaotic overlay created by animating stroke-dashoffset on an inline SVG <path> with a d attribute describing a zig-zag. Seed values, never Math.random().
A rough rectangle outline. Two <rect>s with slight transform offsets, animated via stroke-dashoffset.
All five modes tween CSS transforms or stroke-dashoffset only — both tween cleanly, are deterministic, and seek correctly.
Every multi-scene composition MUST use transitions. No jump cuts.
| Energy | CSS primary | Shader primary | Accent | Duration | Easing |
|---|---|---|---|---|---|
| Calm (wellness, brand, luxury) | Blur crossfade, focus pull | Cross-warp morph, thermal distortion | Light leak, circle iris | 0.5-0.8s | sine.inOut, power1 |
| Medium (corporate, SaaS) | Push slide, staggered blocks | Whip pan, cinematic zoom | Squeeze, vertical push | 0.3-0.5s | power2, power3 |
| High (promos, sports, launch) | Zoom through, overexposure | Ridged burn, glitch, chromatic split | Staggered blocks, gravity drop | 0.15-0.3s | power4, expo |
Pick ONE primary (60-70% of scene changes) plus 1-2 accents. Never use a different transition for every scene.
| Mood | Transitions |
|---|---|
| Warm / inviting | Light leak, blur crossfade, focus pull, film burn · Shader: thermal distortion, cross-warp morph |
| Cold / clinical | Squeeze, zoom out, blinds, shutter, grid dissolve · Shader: gravitational lens |
| Editorial / magazine | Push slide, vertical push, diagonal split, shutter · Shader: whip pan |
| Tech / futuristic | Grid dissolve, staggered blocks, blinds · Shader: glitch, chromatic split |
| Tense / edgy | Glitch, VHS, chromatic aberration, ripple · Shader: ridged burn, domain warp |
| Playful / fun | Elastic push, 3D flip, circle iris, morph circle · Shader: swirl vortex, ripple waves |
| Dramatic / cinematic | Zoom through, gravity drop, overexposure · Shader: cinematic zoom, gravitational lens |
| Premium / luxury | Focus pull, blur crossfade, color dip to black · Shader: cross-warp morph |
| Retro / analog | Film burn, light leak, VHS, clock wipe · Shader: light leak |
| Preset | Duration | Easing |
|---|---|---|
snappy | 0.2s | power4.inOut |
smooth | 0.4s | power2.inOut |
gentle | 0.6s | sine.inOut |
dramatic | 0.5s | power3.in → out |
instant | 0.15s | expo.inOut |
luxe | 0.7s | power1.inOut |
npx hyperframes add flash-through-white
npx hyperframes add --list
clip-path, and filters. Simpler to set up.@hyperframes/shader-transitions instead of writing raw GLSL.Don't mix CSS and shader transitions in the same composition — once a composition uses shader transitions, the WebGL canvas replaces DOM-based scene switching for every transition.
Shader transitions capture DOM scenes to WebGL textures via html2canvas. The canvas 2D pipeline doesn't match CSS exactly:
transparent keyword in gradients — use the target color at zero alpha: rgba(200,117,51,0) not transparent. (Canvas interpolates transparent as rgba(0,0,0,0) creating dark fringes.)background-color on thin accent lines.var()) on elements visible during capture — html2canvas doesn't reliably resolve custom properties. Use literal color values.data-no-capture — they stay on the live DOM but are absent from the shader texture..scene div must have explicit background-color, AND pass the same color as bgColor in the init() config. Without either, the texture renders as black.These rules only apply to shader transition compositions. CSS-only compositions have no restrictions.