documentation/concepts/symbol.md
SYMBOL is a data type designed for columns with repetitive string values.
Internally, symbols use dictionary encoding—each unique string is stored once
in a lookup table, and rows store integer references to that table. This is
the same approach used by columnar formats like Parquet and Arrow. The result
is much faster filtering and grouping compared to regular strings.
Use SYMBOL for categorical data with a limited set of repeated values:
AAPL, GOOGL, MSFT)US, EU, APAC)pending, completed, failed)WHERE or GROUP BYCREATE TABLE trades (
timestamp TIMESTAMP,
symbol SYMBOL, -- Good: limited set of tickers
side SYMBOL, -- Good: just BUY/SELL
price DOUBLE,
quantity DOUBLE
) TIMESTAMP(timestamp) PARTITION BY DAY;
Use VARCHAR when values are unique or very high cardinality:
UUID type instead)| Operation | VARCHAR | SYMBOL |
|---|---|---|
| Storage | Full string per row | Integer + shared dictionary |
Filtering (WHERE symbol = 'X') | String comparison | Integer comparison |
Grouping (GROUP BY) | String hashing | Integer grouping |
| Disk usage | Higher | Lower |
Symbols provide:
CREATE TABLE orders (
timestamp TIMESTAMP,
symbol SYMBOL,
side SYMBOL,
order_type SYMBOL,
price DOUBLE
) TIMESTAMP(timestamp) PARTITION BY DAY;
Symbol capacity scales automatically as new values are added. No manual configuration is needed.
<details> <summary>Note for users upgrading from versions before 9.0.0</summary>Prior to QuestDB 9.0.0, symbol capacity required manual configuration. You had to estimate the number of distinct values upfront and set the capacity explicitly. Undersizing caused performance issues; oversizing wasted memory.
From 9.0.0 onwards, symbol capacity is fully automatic. The CAPACITY setting
is now obsolete and can be removed from your table definitions.
By default, QuestDB caches the symbol dictionary in memory for fast lookups. For columns with very high cardinality (10 million+ distinct values), this cache can consume significant memory.
Use NOCACHE to disable dictionary caching:
CREATE TABLE trades (
timestamp TIMESTAMP,
client_id SYMBOL NOCACHE,
symbol SYMBOL
) TIMESTAMP(timestamp) PARTITION BY DAY;
Trade-off: NOCACHE reduces memory usage but makes dictionary lookups
slower. Only use it for symbols with millions of distinct values where memory
is a concern.
To toggle caching on an existing column:
-- Disable cache
ALTER TABLE trades ALTER COLUMN client_id NOCACHE;
-- Re-enable cache
ALTER TABLE trades ALTER COLUMN client_id CACHE;
For columns frequently used in WHERE clauses, add an index:
CREATE TABLE trades (
timestamp TIMESTAMP,
symbol SYMBOL INDEX,
price DOUBLE
) TIMESTAMP(timestamp) PARTITION BY DAY;
Or add an index later:
ALTER TABLE trades ALTER COLUMN symbol ADD INDEX;
See Indexes for more information.