docs/rfcs/2024-01-17-dataflow-framework.md
This RFC proposes a Lightweight Module for executing continuous aggregation queries on a stream of data.
Being able to do continuous aggregation is a very powerful tool. It allows you to do things like:
Hence, we choose the third option, and use a simple logical plan that's anagonistic to the underlying dataflow framework, as it only describe how the dataflow graph should be doing, not how it do that. And we built operator in hydroflow to execute the plan. And the result hydroflow graph is wrapped in a engine that only support data in/out and tick event to flush and compute the result. This provide a thin middle layer that's easy to maintain and allow switching to other dataflow framework if necessary.
CREATE TASK avg_over_5m WINDOW_SIZE = "5m" AS SELECT avg(value) FROM table WHERE time > now() - 5m GROUP BY time(1m). Flow job then got stored in Metasrv.The workflow is shown in the following diagram
graph TB
subgraph Flownode["Flownode"]
subgraph Dataflows
df1("Dataflow_1")
df2("Dataflow_2")
end
end
subgraph Frontend["Frontend"]
newLines["Mirror Insert
Create Task From Query
Write result from flow node"]
end
subgraph Datanode["Datanode"]
end
User --> Frontend
Frontend -->|Register Task| Metasrv
Metasrv -->|Read Task Metadata| Frontend
Frontend -->|Create Task| Flownode
Frontend -->|Mirror Insert| Flownode
Flownode -->|Write back| Frontend
Frontend --> Datanode
Datanode --> Frontend