doc/user/content/sql/create-source/load-generator.md
{{% create-source/intro %}} Load generator sources produce synthetic data for use in demos and performance tests. {{% /create-source/intro %}}
{{% include-syntax file="examples/create_source_load_generator" example="syntax" %}}
Materialize has several built-in load generators, which provide a quick way to get up and running with no external dependencies before plugging in your own data sources. If you would like to see an additional load generator, please submit a feature request.
The auction load generator simulates an auction house, where users are bidding
on an ongoing series of auctions. The auction source will be automatically demuxed
into multiple subsources when the CREATE SOURCE command is executed. This will
create the following subsources:
organizations describes the organizations known to the auction
house.
| Field | Type | Description |
|---|---|---|
| id | bigint | A unique identifier for the organization. |
| name | text | The organization's name. |
users describes the users that belong to each organization.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the user. |
org_id | bigint | The identifier of the organization to which the user belongs. References organizations.id. |
name | text | The user's name. |
accounts describes the account associated with each organization.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the account. |
org_id | bigint | The identifier of the organization to which the account belongs. References organizations.id. |
balance | bigint | The balance of the account in dollars. |
auctions describes all past and ongoing auctions.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the auction. |
seller | bigint | The identifier of the user selling the item. References users.id. |
item | text | The name of the item being sold. |
end_time | timestamp with time zone | The time at which the auction closes. |
bids describes the bids placed in each auction.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the bid. |
buyer | bigint | The identifier vof the user placing the bid. References users.id. |
auction_id | bigint | The identifier of the auction in which the bid is placed. References auctions.id. |
amount | bigint | The bid amount in dollars. |
bid_time | timestamp with time zone | The time at which the bid was placed. |
The organizations, users, and accounts are fixed at the time the source is created. Each tick interval, either a new auction is started, or a new bid is placed in the currently ongoing auction.
The marketing load generator simulates a marketing organization that is using a machine learning model to send coupons to potential leads. The marketing source will be automatically demuxed
into multiple subsources when the CREATE SOURCE command is executed. This will
create the following subsources:
customers describes the customers that the marketing team may target.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the customer. |
email | text | The customer's email. |
income | bigint | The customer's income in pennies. |
impressions describes online ads that have been seen by a customer.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the impression. |
customer_id | bigint | The identifier of the customer that saw the ad. References customers.id. |
impression_time | timestamp with time zone | The time at which the ad was seen. |
clicks describes clicks of ads.
| Field | Type | Description |
|---|---|---|
impression_id | bigint | The identifier of the impression that was clicked. References impressions.id. |
click_time | timestamp with time zone | The time at which the impression was clicked. |
leads describes a potential lead for a purchase.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the lead. |
customer_id | bigint | The identifier of the customer we'd like to convert. References customers.id. |
created_at | timestamp with time zone | The time at which the lead was created. |
converted_at | timestamp with time zone | The time at which the lead was converted. |
conversion_amount | bigint | The amount the lead converted for in pennies. |
coupons describes coupons given to leads.
| Field | Type | Description |
|---|---|---|
id | bigint | A unique identifier for the coupon. |
lead_id | bigint | The identifier of the lead we're attempting to convert. References leads.id. |
created_at | timestamp with time zone | The time at which the coupon was created. |
amount | bigint | The amount the coupon is for in pennies. |
conversion_predictions describes the predictions made by a highly sophisticated machine learning model.
| Field | Type | Description |
|---|---|---|
lead_id | bigint | The identifier of the lead we're attempting to convert. References leads.id. |
experiment_bucket | text | Whether the lead is a control or experiment. |
created_at | timestamp with time zone | The time at which the prediction was made. |
score | numeric | The predicted likelihood the lead will convert. |
The TPCH load generator implements the TPC-H benchmark specification.
The TPCH source must be used with FOR ALL TABLES, which will create the standard TPCH relations.
If TICK INTERVAL is specified, after the initial data load, an order and its lineitems will be changed at this interval.
If not specified, the dataset will not change over time.
By default, load generator sources expose progress metadata as a subsource that
you can use to monitor source ingestion progress. The name of the progress
subsource can be specified when creating a source using the EXPOSE PROGRESS AS clause; otherwise, it will be named <src_name>_progress.
The following metadata is available for each source as a progress subsource:
| Field | Type | Meaning |
|---|---|---|
offset | uint8 | The minimum offset for which updates to this sources are still undetermined. |
And can be queried using:
SELECT "offset"
FROM <src_name>_progress;
As long as the offset continues increasing, Materialize is generating data. For more details on monitoring source ingestion progress and debugging related issues, see Troubleshooting.
To create a load generator source that simulates an auction house and emits new data every second:
CREATE SOURCE auction_house
FROM LOAD GENERATOR AUCTION
(TICK INTERVAL '1s')
FOR ALL TABLES;
To display the created subsources:
SHOW SOURCES;
name | type
------------------------+----------------
accounts | subsource
auction_house | load-generator
auction_house_progress | progress
auctions | subsource
bids | subsource
organizations | subsource
users | subsource
To examine the simulated bids:
SELECT * from bids;
id | buyer | auction_id | amount | bid_time
----+-------+------------+--------+----------------------------
10 | 3844 | 1 | 59 | 2022-09-16 23:24:07.332+00
11 | 1861 | 1 | 40 | 2022-09-16 23:24:08.332+00
12 | 3338 | 1 | 97 | 2022-09-16 23:24:09.332+00
To create a load generator source that simulates an online marketing campaign:
CREATE SOURCE marketing
FROM LOAD GENERATOR MARKETING
FOR ALL TABLES;
To display the created subsources:
SHOW SOURCES;
name | type
------------------------+---------------
clicks | subsource
conversion_predictions | subsource
coupons | subsource
customers | subsource
impressions | subsource
leads | subsource
marketing | load-generator
marketing_progress | progress
To find all impressions and clicks associated with a campaign over the last 30 days:
WITH
click_rollup AS
(
SELECT impression_id AS id, count(*) AS clicks
FROM clicks
WHERE click_time - INTERVAL '30' DAY <= mz_now()
GROUP BY impression_id
),
impression_rollup AS
(
SELECT id, campaign_id, count(*) AS impressions
FROM impressions
WHERE impression_time - INTERVAL '30' DAY <= mz_now()
GROUP BY id, campaign_id
)
SELECT campaign_id, sum(impressions) AS impressions, sum(clicks) AS clicks
FROM impression_rollup LEFT JOIN click_rollup USING(id)
GROUP BY campaign_id;
campaign_id | impressions | clicks
-------------+-------------+--------
0 | 350 | 33
1 | 325 | 28
2 | 319 | 24
3 | 315 | 38
4 | 305 | 28
5 | 354 | 31
6 | 346 | 25
7 | 337 | 36
8 | 329 | 38
9 | 305 | 24
10 | 345 | 27
11 | 323 | 30
12 | 320 | 29
13 | 331 | 27
14 | 310 | 22
15 | 324 | 28
16 | 315 | 32
17 | 329 | 36
18 | 329 | 28
To create the load generator source and its associated subsources:
CREATE SOURCE tpch
FROM LOAD GENERATOR TPCH (SCALE FACTOR 1)
FOR ALL TABLES;
To display the created subsources:
SHOW SOURCES;
name | type
---------------+---------------
tpch | load-generator
tpch_progress | progress
supplier | subsource
region | subsource
partsupp | subsource
part | subsource
orders | subsource
nation | subsource
lineitem | subsource
customer | subsource
To run the Pricing Summary Report Query (Q1), which reports the amount of billed, shipped, and returned items:
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) AS sum_qty,
sum(l_extendedprice) AS sum_base_price,
sum(l_extendedprice * (1 - l_discount)) AS sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,
avg(l_quantity) AS avg_qty,
avg(l_extendedprice) AS avg_price,
avg(l_discount) AS avg_disc,
count(*) AS count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '90' day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
l_returnflag | l_linestatus | sum_qty | sum_base_price | sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc | count_order
--------------+--------------+----------+----------------+-----------------+-------------------+--------------------+--------------------+---------------------+-------------
A | F | 37772997 | 56604341792 | 54338346989.17 | 57053313118.2657 | 25.490380624798817 | 38198.351517998075 | 0.04003729114831228 | 1481853
N | F | 986796 | 1477585066 | 1418531782.89 | 1489171757.0798 | 25.463731840115603 | 38128.27564317601 | 0.04007431682708436 | 38753
N | O | 74281600 | 111337230039 | 106883023012.04 | 112227399730.9018 | 25.49430183051871 | 38212.221432873834 | 0.03999775539657235 | 2913655
R | F | 37770949 | 56610551077 | 54347734573.7 | 57066196254.4557 | 25.496431466814634 | 38213.68205054471 | 0.03997848687172654 | 1481421