Back to Clickhouse

StripeLog table engine

docs/en/engines/table-engines/log-family/stripelog.md

26.4.1.1-new4.4 KB
Original Source

import CloudNotSupportedBadge from '@theme/badges/CloudNotSupportedBadge';

StripeLog table engine

<CloudNotSupportedBadge/>

This engine belongs to the family of log engines. See the common properties of log engines and their differences in the Log Engine Family article.

Use this engine in scenarios when you need to write many tables with a small amount of data (less than 1 million rows). For example, this table can be used to store incoming data batches for transformation where atomic processing of them is required. 100k instances of this table type are viable for a ClickHouse server. This table engine should be preferred over Log when a high number of tables are required. This is at the expense of read efficiency.

Creating a table {#table_engines-stripelog-creating-a-table}

sql
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    column1_name [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    column2_name [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = StripeLog

See the detailed description of the CREATE TABLE query.

Writing the data {#table_engines-stripelog-writing-the-data}

The StripeLog engine stores all the columns in one file. For each INSERT query, ClickHouse appends the data block to the end of a table file, writing columns one by one.

For each table ClickHouse writes the files:

  • data.bin — Data file.
  • index.mrk — File with marks. Marks contain offsets for each column of each data block inserted.

The StripeLog engine does not support the ALTER UPDATE and ALTER DELETE operations.

Reading the data {#table_engines-stripelog-reading-the-data}

The file with marks allows ClickHouse to parallelize the reading of data. This means that a SELECT query returns rows in an unpredictable order. Use the ORDER BY clause to sort rows.

Example of use {#table_engines-stripelog-example-of-use}

Creating a table:

sql
CREATE TABLE stripe_log_table
(
    timestamp DateTime,
    message_type String,
    message String
)
ENGINE = StripeLog

Inserting data:

sql
INSERT INTO stripe_log_table VALUES (now(),'REGULAR','The first regular message')
INSERT INTO stripe_log_table VALUES (now(),'REGULAR','The second regular message'),(now(),'WARNING','The first warning message')

We used two INSERT queries to create two data blocks inside the data.bin file.

ClickHouse uses multiple threads when selecting data. Each thread reads a separate data block and returns resulting rows independently as it finishes. As a result, the order of blocks of rows in the output does not match the order of the same blocks in the input in most cases. For example:

sql
SELECT * FROM stripe_log_table
text
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:27:32 │ REGULAR      │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING      │ The first warning message  │
└─────────────────────┴──────────────┴────────────────────────────┘
┌───────────timestamp─┬─message_type─┬─message───────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR      │ The first regular message │
└─────────────────────┴──────────────┴───────────────────────────┘

Sorting the results (ascending order by default):

sql
SELECT * FROM stripe_log_table ORDER BY timestamp
text
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR      │ The first regular message  │
│ 2019-01-18 14:27:32 │ REGULAR      │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING      │ The first warning message  │
└─────────────────────┴──────────────┴────────────────────────────┘