docs/source/cpp/acero/substrait.rst
.. Licensed to the Apache Software Foundation (ASF) under one .. or more contributor license agreements. See the NOTICE file .. distributed with this work for additional information .. regarding copyright ownership. The ASF licenses this file .. to you under the Apache License, Version 2.0 (the .. "License"); you may not use this file except in compliance .. with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing, .. software distributed under the License is distributed on an .. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY .. KIND, either express or implied. See the License for the .. specific language governing permissions and limitations .. under the License.
.. default-domain:: cpp .. highlight:: cpp .. cpp:namespace:: arrow::engine::substrait
.. _acero-substrait:
In order to use Acero you will need to create an execution plan. This is the model that describes the computation you want to apply to your data. Acero has its own internal representation for execution plans but most users should not interact with this directly as it will couple their code to Acero.
Substrait <https://substrait.io>_ is an open standard for execution plans.
Acero implements the Substrait "consumer" interface. This means that Acero can
accept a Substrait plan and fulfill the plan, loading the requested data and
applying the desired computation. By using Substrait plans users can easily
switch out to a different execution engine at a later time.
Substrait defines a broad set of operators and functions for many different situations and it is unlikely that Acero will ever completely satisfy all defined Substrait operators and functions. To help understand what features are available the following sections define which features have been currently implemented in Acero and any caveats that apply.
Plans ^^^^^
Extensions ^^^^^^^^^^
arrow::engine::ExtensionProvider.Relations (in general) ^^^^^^^^^^^^^^^^^^^^^^
Read Relations ^^^^^^^^^^^^^^
projection property is not supported and plans containing this
property will be rejected.VirtualTable and ExtensionTable read types are not supported.
Plans containing these types will be rejected.file schemepartition_index, start, and length are not supported. Plans containing
non-default values for these properties will be rejected.filter be completely satisfied by a read
relation. However, Acero only uses a read filter for pushdown projection and
it may not be fully satisfied. Users should generally attach an additional
filter relation with the same filter expression after the read relation.Filter Relations ^^^^^^^^^^^^^^^^
Project Relations ^^^^^^^^^^^^^^^^^
Join Relations ^^^^^^^^^^^^^^
JOIN_TYPE_SINGLE is not supported and plans containing this
will be rejected.equal or is_not_distinct_from
functions. Both arguments to the call must be direct references. Only a single
join key is supported.post_join_filter property is not supported and will be ignored.Aggregate Relations ^^^^^^^^^^^^^^^^^^^
Expressions (general) ^^^^^^^^^^^^^^^^^^^^^
Literals ^^^^^^^^
Types ^^^^^
.. list-table:: Substrait / Arrow Type Mapping :widths: 25 25 50 :header-rows: 1
Functions ^^^^^^^^^
The following functions have caveats or are not supported at all. Note that this is not a comprehensive list. Functions are being added to Substrait at a rapid pace and new functions may be missing.
and, or, xorSubstrait has not yet clearly identified the form that URIs should take for
standard functions. Acero will look for the URIs to the main GitHub branch.
In other words, for the file functions_arithmetic.yaml Acero expects
https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml
Acero has functions that are not yet a part of Substrait (or may never be added
as official functions). To invoke these functions you can use the special URI
urn:arrow:substrait_simple_extension_function. If this URI is encountered
then Acero will match only on function name and will ignore any function options.
Alternatively, the URI can be left completely empty and Acero will match based only on function name. This fallback mechanism is non-standard and should be considered deprecated in favor of the special URI above.