docs/en/interfaces/formats/ORC.md
| Input | Output | Alias |
|---|---|---|
| ✔ | ✔ |
Apache ORC is a columnar storage format widely used in the Hadoop ecosystem.
The table below compares supported ORC data types and their corresponding ClickHouse data types in INSERT and SELECT queries.
ORC data type (INSERT) | ClickHouse data type | ORC data type (SELECT) |
|---|---|---|
Boolean | UInt8 | Boolean |
Tinyint | Int8/UInt8/Enum8 | Tinyint |
Smallint | Int16/UInt16/Enum16 | Smallint |
Int | Int32/UInt32 | Int |
Bigint | Int64/UInt32 | Bigint |
Float | Float32 | Float |
Double | Float64 | Double |
Decimal | Decimal | Decimal |
Date | Date32 | Date |
Timestamp | DateTime64 | Timestamp |
String, Char, Varchar, Binary | String | Binary |
List | Array | List |
Struct | Tuple | Struct |
Map | Map | Map |
Int | IPv4 | Int |
Binary | IPv6 | Binary |
Binary | Int128/UInt128/Int256/UInt256 | Binary |
Binary | Decimal256 | Binary |
Nullable type as an argument. Tuple and Map types also can be nested.Using an ORC file with the following data, named as football.orc:
┌───────date─┬─season─┬─home_team─────────────┬─away_team───────────┬─home_team_goals─┬─away_team_goals─┐
1. │ 2022-04-30 │ 2021 │ Sutton United │ Bradford City │ 1 │ 4 │
2. │ 2022-04-30 │ 2021 │ Swindon Town │ Barrow │ 2 │ 1 │
3. │ 2022-04-30 │ 2021 │ Tranmere Rovers │ Oldham Athletic │ 2 │ 0 │
4. │ 2022-05-02 │ 2021 │ Port Vale │ Newport County │ 1 │ 2 │
5. │ 2022-05-02 │ 2021 │ Salford City │ Mansfield Town │ 2 │ 2 │
6. │ 2022-05-07 │ 2021 │ Barrow │ Northampton Town │ 1 │ 3 │
7. │ 2022-05-07 │ 2021 │ Bradford City │ Carlisle United │ 2 │ 0 │
8. │ 2022-05-07 │ 2021 │ Bristol Rovers │ Scunthorpe United │ 7 │ 0 │
9. │ 2022-05-07 │ 2021 │ Exeter City │ Port Vale │ 0 │ 1 │
10. │ 2022-05-07 │ 2021 │ Harrogate Town A.F.C. │ Sutton United │ 0 │ 2 │
11. │ 2022-05-07 │ 2021 │ Hartlepool United │ Colchester United │ 0 │ 2 │
12. │ 2022-05-07 │ 2021 │ Leyton Orient │ Tranmere Rovers │ 0 │ 1 │
13. │ 2022-05-07 │ 2021 │ Mansfield Town │ Forest Green Rovers │ 2 │ 2 │
14. │ 2022-05-07 │ 2021 │ Newport County │ Rochdale │ 0 │ 2 │
15. │ 2022-05-07 │ 2021 │ Oldham Athletic │ Crawley Town │ 3 │ 3 │
16. │ 2022-05-07 │ 2021 │ Stevenage Borough │ Salford City │ 4 │ 2 │
17. │ 2022-05-07 │ 2021 │ Walsall │ Swindon Town │ 0 │ 3 │
└────────────┴────────┴───────────────────────┴─────────────────────┴─────────────────┴─────────────────┘
Insert the data:
INSERT INTO football FROM INFILE 'football.orc' FORMAT ORC;
Read data using the ORC format:
SELECT *
FROM football
INTO OUTFILE 'football.orc'
FORMAT ORC
:::tip
ORC is a binary format that does not display in a human-readable form on the terminal. Use the INTO OUTFILE to output ORC files.
:::
| Setting | Description | Default |
|---|---|---|
output_format_arrow_string_as_string | Use Arrow String type instead of Binary for String columns. | false |
output_format_orc_compression_method | Compression method used in output ORC format. Default value | none |
input_format_arrow_case_insensitive_column_matching | Ignore case when matching Arrow columns with ClickHouse columns. | false |
input_format_arrow_allow_missing_columns | Allow missing columns while reading Arrow data. | false |
input_format_arrow_skip_columns_with_unsupported_types_in_schema_inference | Allow skipping columns with unsupported types while schema inference for Arrow format. | false |
To exchange data with Hadoop, you can use HDFS table engine.