v3/@claude-flow/performance/docs/ATTENTION.md
Integration of @ruvector/attention Flash Attention capabilities into the V3 performance module.
This module provides high-performance attention mechanisms optimized for V3's 2.49x-7.47x speedup targets. Flash Attention reduces memory usage by ~50% while achieving significant performance improvements through block-wise computation.
The @ruvector/attention package is already installed as a dependency:
npm install @ruvector/attention@latest
import { createFlashAttentionOptimizer } from '@claude-flow/performance';
// Create optimizer
const optimizer = createFlashAttentionOptimizer(512, 64);
// Prepare input
const input = {
query: new Float32Array(512).fill(1.0),
keys: Array.from({ length: 100 }, () => new Float32Array(512).fill(1.0)),
values: Array.from({ length: 100 }, () => new Float32Array(512).fill(1.0)),
};
// Run optimized attention
const output = await optimizer.optimize(input);
console.log(`Execution time: ${output.executionTimeMs}ms`);
console.log(`Runtime: ${output.runtime}`); // 'napi', 'wasm', or 'js'
import { quickBenchmark } from '@claude-flow/performance';
// Quick benchmark
const result = await quickBenchmark(512);
console.log(`Speedup: ${result.speedup.toFixed(2)}x`);
console.log(`Meets target: ${result.meetsTarget ? 'YES' : 'NO'}`);
import { quickValidation } from '@claude-flow/performance';
// Validate V3 performance targets (2.49x-7.47x)
const isValid = await quickValidation();
// Prints detailed validation report
import { runAndDisplaySuite } from '@claude-flow/performance';
// Run full benchmark suite across multiple dimensions
const suite = await runAndDisplaySuite();
// Prints detailed report with all benchmarks
Main class for optimizing attention computations.
new FlashAttentionOptimizer(dim?: number, blockSize?: number)
dim: Vector dimension (default: 512)blockSize: Flash Attention block size (default: 64)Optimize attention computation using Flash Attention.
const output = await optimizer.optimize({
query: Float32Array,
keys: Float32Array[],
values: Float32Array[],
});
Run comprehensive benchmark comparing Flash Attention vs baseline.
const result = await optimizer.benchmark();
console.log(result.speedup); // e.g., 4.23x
Get current average speedup from accumulated metrics.
const speedup = optimizer.getSpeedup();
Get detailed performance metrics.
const metrics = optimizer.getMetrics();
console.log(metrics.averageSpeedup);
console.log(metrics.peakSpeedup);
console.log(metrics.successRate);
Comprehensive benchmark suite runner.
Run benchmarks across multiple dimensions (128, 256, 512, 768, 1024).
const runner = new AttentionBenchmarkRunner();
const suite = await runner.runComprehensiveSuite();
Run single benchmark comparing Flash vs baseline.
const result = await runner.runComparison(512, 100, 1000);
Profile memory usage across different dimensions.
const profiles = await runner.runMemoryProfile([256, 512, 1024]);
Validate against V3 performance targets (2.49x-7.47x).
const validation = await runner.validateV3Targets();
console.log(validation.meetsMinimum); // true if ≥2.49x
The V3 module targets the following Flash Attention performance improvements:
See /src/examples/flash-attention-demo.ts for comprehensive examples:
# Run all examples
npx tsx v3/@claude-flow/performance/src/examples/flash-attention-demo.ts
The optimizer automatically selects the best available runtime:
Flash Attention achieves memory efficiency through:
Benchmarks measure:
Performance metrics are automatically exported for the V3 metrics dashboard:
import { FlashAttentionOptimizer } from '@claude-flow/performance';
const optimizer = new FlashAttentionOptimizer();
// ... run operations ...
// Export metrics for dashboard
const metrics = optimizer.getMetrics();
// Can be integrated with hooks metrics system
dim parameter (larger dimensions benefit more)numKeys (more keys = more benefit)blockSize for lower memory footprintgetMetrics().totalMemorySavedBytesThe package includes native bindings for:
Falls back to WebAssembly or JavaScript if native bindings unavailable.
When adding new attention mechanisms or optimizations:
attention-integration.tsattention-benchmarks.tsindex.tsexamples/flash-attention-demo.tsMIT OR Apache-2.0 (follows @ruvector/attention license)