documentation/cookbook/sql/finance/vpin.md
VPIN measures the probability that informed traders are active in the market by looking at order flow imbalance across fixed-volume buckets. Unlike time-based metrics, VPIN synchronizes to volume — each bucket contains the same total traded quantity, so high-activity and low-activity periods are weighted equally.
You want to detect when informed traders are likely active. Time-based imbalance metrics can be noisy — a 1-minute window during a quiet period captures very different market dynamics than a 1-minute window during a news event. VPIN normalizes by volume instead of time, giving a more consistent signal.
Split the trade stream into fixed-volume buckets, compute the buy/sell imbalance within each bucket, then take a rolling average over the last N buckets:
WITH bucketed AS (
SELECT
t.timestamp,
t.symbol,
t.side,
t.price,
t.quantity,
floor(
sum(t.quantity) OVER (PARTITION BY symbol ORDER BY timestamp)
/ 1000000
) AS vol_bucket
FROM fx_trades t
WHERE t.symbol = 'EURUSD'
AND t.timestamp IN '$yesterday'
),
bucket_stats AS (
SELECT
symbol,
vol_bucket,
min(timestamp) AS bucket_start,
max(timestamp) AS bucket_end,
count() AS trade_count,
sum(quantity) AS total_vol,
sum(CASE WHEN side = 'buy' THEN quantity ELSE 0.0 END) AS buy_vol,
sum(CASE WHEN side = 'sell' THEN quantity ELSE 0.0 END) AS sell_vol,
abs(
sum(CASE WHEN side = 'buy' THEN quantity ELSE 0.0 END)
- sum(CASE WHEN side = 'sell' THEN quantity ELSE 0.0 END)
) / sum(quantity) AS bucket_imbalance
FROM bucketed
GROUP BY symbol, vol_bucket
)
SELECT
symbol,
vol_bucket,
bucket_start,
bucket_end,
total_vol,
buy_vol,
sell_vol,
bucket_imbalance,
avg(bucket_imbalance) OVER (
PARTITION BY symbol
ORDER BY vol_bucket
ROWS BETWEEN 49 PRECEDING AND CURRENT ROW
) AS vpin
FROM bucket_stats
ORDER BY vol_bucket;
The first CTE assigns a vol_bucket ID to each trade using a cumulative volume sum divided by the bucket size (1,000,000 units). All trades within the same bucket share the same ID. This is the key difference from time-based analysis — each bucket represents the same amount of market activity regardless of how long it took.
For each bucket, compute the absolute imbalance between buy and sell volume as a fraction of total volume. A bucket where 90% of the volume was buy-initiated has an imbalance of 0.8 (|0.9 − 0.1|). A perfectly balanced bucket has imbalance 0.0.
Average the bucket imbalance over a rolling window of 50 buckets. This is the VPIN estimate. The window size controls the trade-off between responsiveness and noise — fewer buckets react faster but are noisier.
VPIN ranges from 0 to 1:
Watch for VPIN spikes — sudden jumps from a stable baseline indicate a regime change, often preceding large price moves. The 2010 Flash Crash, for example, was preceded by elevated VPIN readings.
1000000): Adjust per symbol to get a reasonable number of buckets per day. For major FX pairs with billions in daily volume, 1M per bucket is fine. For less liquid instruments, reduce the bucket size.50 buckets): The original VPIN paper uses 50 buckets. Shorter windows (20–30) are more responsive but noisier. Longer windows (100+) give a smoother signal but lag.WHERE t.symbol = 'EURUSD' filter ensures volume bucketing doesn't mix symbols. To compute VPIN for multiple symbols, remove the filter — the PARTITION BY symbol in the window function handles separation.Partition by ECN to see which venues carry more informed flow. An ECN with consistently higher VPIN is attracting (or routing) more informed traders:
WITH bucketed AS (
SELECT
t.timestamp,
t.symbol,
t.ecn,
t.side,
t.quantity,
floor(
sum(t.quantity) OVER (PARTITION BY symbol, ecn ORDER BY timestamp)
/ 1000000
) AS vol_bucket
FROM fx_trades t
WHERE t.symbol = 'EURUSD'
AND t.timestamp IN '$yesterday'
),
bucket_stats AS (
SELECT
symbol,
ecn,
vol_bucket,
min(timestamp) AS bucket_start,
max(timestamp) AS bucket_end,
sum(quantity) AS total_vol,
abs(
sum(CASE WHEN side = 'buy' THEN quantity ELSE 0.0 END)
- sum(CASE WHEN side = 'sell' THEN quantity ELSE 0.0 END)
) / sum(quantity) AS bucket_imbalance
FROM bucketed
GROUP BY symbol, ecn, vol_bucket
)
SELECT
symbol,
ecn,
vol_bucket,
bucket_start,
bucket_end,
total_vol,
bucket_imbalance,
avg(bucket_imbalance) OVER (
PARTITION BY symbol, ecn
ORDER BY vol_bucket
ROWS BETWEEN 49 PRECEDING AND CURRENT ROW
) AS vpin
FROM bucket_stats
ORDER BY ecn, vol_bucket;
Compare VPIN time series across ECNs. An ECN that shows elevated VPIN while others stay flat is where informed flow is concentrated. Combine with the ECN scorecard to cross-reference against markout-based toxicity — the two signals should align. When they diverge (high VPIN but flat markouts), the imbalance may be from correlated retail flow rather than informed trading.
:::info Related documentation