Back to Starrocks

StarRocks with Helm

docs/en/quick_start/helm.md

4.1.025.8 KB
Original Source

import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import OperatorPrereqs from '../_assets/deployment/_OperatorPrereqs.mdx' import DDL from '../_assets/quick-start/_DDL.mdx' import Clients from '../_assets/quick-start/_clientsAllin1.mdx' import SQL from '../_assets/quick-start/_SQL.mdx' import Curl from '../_assets/quick-start/_curl.mdx'

StarRocks with Helm

Goals

The goals of this quickstart are:

  • Deploy the StarRocks Kubernetes Operator and a StarRocks cluster with Helm
  • Configure a password for the StarRocks database user root
  • Provide for high-availability with three FEs and three BEs
  • Store metadata in persistent storage
  • Store data in persistent storage
  • Allow MySQL clients to connect from outside the Kubernetes cluster
  • Allow loading data from outside the Kubernetes cluster using Stream Load
  • Load some public datasets
  • Query the data

:::tip The datasets and queries are the same as the ones used in the Basic Quick Start. The main difference here is deploying with Helm and the StarRocks Operator. :::

The data used is provided by NYC OpenData and the National Centers for Environmental Information.

Both of these datasets are large, and because this tutorial is intended to help you get exposed to working with StarRocks we are not going to load data for the past 120 years. You can run this with a GKE Kubernetes cluster built on three e2-standard-4 machines (or similar) with 80GB disk. For larger deployments, we have other documentation and will provide that later.

There is a lot of information in this document, and it is presented with step-by-step content at the beginning, and the technical details at the end. This is done to serve these purposes in this order:

  1. Get the system deployed with Helm.
  2. Allow the reader to load data in StarRocks and analyze that data.
  3. Explain the basics of data transformation during loading.

Prerequisites

<OperatorPrereqs />

SQL client

You can use the SQL client provided in the Kubernetes environment, or use one on your system. This guide uses the mysql CLI Many MySQL-compatible clients will work.

curl

curl is used to issue the data load job to StarRocks, and to download the datasets. Check to see if you have it installed by running curl or curl.exe at your OS prompt. If curl is not installed, get curl here.


Terminology

FE

Frontend nodes are responsible for metadata management, client connection management, query planning, and query scheduling. Each FE stores and maintains a complete copy of metadata in its memory, which guarantees indiscriminate services among the FEs.

BE

Backend nodes are responsible for both data storage and executing query plans.


Add the StarRocks Helm chart repo

The Helm Chart contains the definitions of the StarRocks Operator and the custom resource StarRocksCluster.

  1. Add the Helm Chart Repo.

    Bash
    helm repo add starrocks https://starrocks.github.io/starrocks-kubernetes-operator
    
  2. Update the Helm Chart Repo to the latest version.

    Bash
    helm repo update
    
  3. View the Helm Chart Repo that you added.

    Bash
    helm search repo starrocks
    
    NAME                              	CHART VERSION	APP VERSION	DESCRIPTION
    starrocks/kube-starrocks	1.9.7        	3.2-latest 	kube-starrocks includes two subcharts, operator...
    starrocks/operator      	1.9.7        	1.9.7      	A Helm chart for StarRocks operator
    starrocks/starrocks     	1.9.7        	3.2-latest 	A Helm chart for StarRocks cluster
    starrocks/warehouse     	1.9.7        	3.2-latest 	Warehouse is currently a feature of the StarRoc...
    

Download the data

Download these two datasets to your machine.

New York City crash data

bash
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv

Weather data

bash
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csv

Create a Helm values file

The goals for this quick start are:

  1. Configure a password for the StarRocks database user root
  2. Provide for high-availability with three FEs and three BEs
  3. Store metadata in persistent storage
  4. Store data in persistent storage
  5. Allow MySQL clients to connect from outside the Kubernetes cluster
  6. Allow loading data from outside the Kubernetes cluster using Stream Load

The Helm chart provides options to satisfy all of these goals, but they are not configured by default. The rest of this section covers the configuration needed to meet all of these goals. A complete values spec will be provided, but first read the details for each of the six sections and then copy the full spec.

1. Password for the database user

This bit of YAML instructs the StarRocks operator to set the password for the database user root to the value of the password key of the Kubernetes secret `starrocks-root-pass.

yaml
starrocks:
    initPassword:
        enabled: true
        # Set a password secret, for example:
        # kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
        passwordSecret: starrocks-root-pass
  • Task: Create the Kubernetes secret

    bash
    kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
    

2. High Availability with 3 FEs and 3 BEs

By setting starrocks.starrockFESpec.replicas to 3, and starrocks.starrockBeSpec.replicas to 3 you will have enough FEs and BEs for high availability. Setting the CPU and memory requests low allows the pods to be created in a small Kubernetes environment.

yaml
starrocks:
    starrocksFESpec:
        replicas: 3
        resources:
            requests:
                cpu: 1
                memory: 1Gi

    starrocksBeSpec:
        replicas: 3
        resources:
            requests:
                cpu: 1
                memory: 2Gi

3. Store metadata in persistent storage

Setting a value for starrocks.starrocksFESpec.storageSpec.name to anything other than "" causes:

  • Persistent storage to be used
  • the value of starrocks.starrocksFESpec.storageSpec.name to be used as the prefix for all storage volumes for the service.

By setting the value to fe these PVs will be created for FE 0:

  • fe-meta-kube-starrocks-fe-0
  • fe-log-kube-starrocks-fe-0
yaml
starrocks:
    starrocksFESpec:
        storageSpec:
            name: fe

4. Store data in persistent storage

Setting a value for starrocks.starrocksBeSpec.storageSpec.name to anything other than "" causes:

  • Persistent storage to be used
  • the value of starrocks.starrocksBeSpec.storageSpec.name to be used as the prefix for all storage volumes for the service.

By setting the value to be these PVs will be created for BE 0:

  • be-data-kube-starrocks-be-0
  • be-log-kube-starrocks-be-0

Setting the storageSize to 15Gi reduces the storage from the default of 1Ti to fit smaller quotas for storage.

yaml
starrocks:
    starrocksBeSpec:
        storageSpec:
            name: be
            storageSize: 15Gi

5. LoadBalancer for MySQL clients

By default, access to the FE service is through cluster IPs. To allow external access, service.type is set to LoadBalancer

yaml
starrocks:
    starrocksFESpec:
        service:
            type: LoadBalancer

6. LoadBalancer for external data loading

Stream Load requires external access to both FEs and BEs. The requests are sent to the FE and then the FE assigns a BE to process the upload. To allow the curl command to be redirected to the BE the starroclFeProxySpec needs to be enabled and set to type LoadBalancer.

yaml
starrocks:
    starrocksFeProxySpec:
        enabled: true
        service:
            type: LoadBalancer

The complete values file

The above snippets combined provide a full values file. Save this to my-values.yaml:

yaml
starrocks:
    initPassword:
        enabled: true
        # Set a password secret, for example:
        # kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
        passwordSecret: starrocks-root-pass

    starrocksFESpec:
        replicas: 3
        service:
            type: LoadBalancer
        resources:
            requests:
                cpu: 1
                memory: 1Gi
        storageSpec:
            name: fe

    starrocksBeSpec:
        replicas: 3
        resources:
            requests:
                cpu: 1
                memory: 2Gi
        storageSpec:
            name: be
            storageSize: 15Gi

    starrocksFeProxySpec:
        enabled: true
        service:
            type: LoadBalancer

Set the StarRocks root database user password

To load data from outside of the Kubernetes cluster the StarRocks database will be exposed externally. You should set a password for the StarRocks database user root. The operator will apply the password to the FE and BE nodes.

bash
kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
secret/starrocks-root-pass created

Deploy the operator and StarRocks cluster

bash
helm install -f my-values.yaml starrocks starrocks/kube-starrocks
NAME: starrocks
LAST DEPLOYED: Wed Jun 26 20:25:09 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing kube-starrocks-1.9.7 kube-starrocks chart.
It will install both operator and starrocks cluster, please wait for a few minutes for the cluster to be ready.

Please see the values.yaml for more operation information: https://github.com/StarRocks/starrocks-kubernetes-operator/blob/main/helm-charts/charts/kube-starrocks/values.yaml

Check the status of the StarRocks cluster

You can check the progress with these commands:

bash
kubectl --namespace default get starrockscluster -l "cluster=kube-starrocks"
NAME             PHASE         FESTATUS      BESTATUS      CNSTATUS   FEPROXYSTATUS
kube-starrocks   reconciling   reconciling   reconciling              reconciling
bash
kubectl get pods

:::note The kube-starrocks-initpwd pod will go through error and CrashLoopBackOff states as it attempts to connect to the FE and BE pods to set the StarRocks root password. You should ignore these errors and wait for a status of Completed for this pod. :::

NAME                                       READY   STATUS             RESTARTS      AGE
kube-starrocks-be-0                        0/1     Running            0             20s
kube-starrocks-be-1                        0/1     Running            0             20s
kube-starrocks-be-2                        0/1     Running            0             20s
kube-starrocks-fe-0                        1/1     Running            0             66s
kube-starrocks-fe-1                        0/1     Running            0             65s
kube-starrocks-fe-2                        0/1     Running            0             66s
kube-starrocks-fe-proxy-56f8998799-d4qmt   1/1     Running            0             20s
kube-starrocks-initpwd-m84br               0/1     CrashLoopBackOff   3 (50s ago)   92s
kube-starrocks-operator-54ffcf8c5c-xsjc8   1/1     Running            0             92s
bash
kubectl get pvc
NAME                          STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
be-data-kube-starrocks-be-0   Bound    pvc-4ae0c9d8-7f9a-4147-ad74-b22569165448   15Gi       RWO            standard-rwo   <unset>                 82s
be-data-kube-starrocks-be-1   Bound    pvc-28b4dbd1-0c8f-4b06-87e8-edec616cabbc   15Gi       RWO            standard-rwo   <unset>                 82s
be-data-kube-starrocks-be-2   Bound    pvc-c7232ea6-d3d9-42f1-bfc1-024205a17656   15Gi       RWO            standard-rwo   <unset>                 82s
be-log-kube-starrocks-be-0    Bound    pvc-6193c43d-c74f-4d12-afcc-c41ace3d5408   1Gi        RWO            standard-rwo   <unset>                 82s
be-log-kube-starrocks-be-1    Bound    pvc-c01f124a-014a-439a-99a6-6afe95215bf0   1Gi        RWO            standard-rwo   <unset>                 82s
be-log-kube-starrocks-be-2    Bound    pvc-136df15f-4d2e-43bc-a1c0-17227ce3fe6b   1Gi        RWO            standard-rwo   <unset>                 82s
fe-log-kube-starrocks-fe-0    Bound    pvc-7eac524e-d286-4760-b21c-d9b6261d976f   5Gi        RWO            standard-rwo   <unset>                 2m23s
fe-log-kube-starrocks-fe-1    Bound    pvc-38076b78-71e8-4659-b8e7-6751bec663f6   5Gi        RWO            standard-rwo   <unset>                 2m23s
fe-log-kube-starrocks-fe-2    Bound    pvc-4ccfee60-02b7-40ba-a22e-861ea29dac74   5Gi        RWO            standard-rwo   <unset>                 2m23s
fe-meta-kube-starrocks-fe-0   Bound    pvc-5130c9ff-b797-4f79-a1d2-4214af860d70   10Gi       RWO            standard-rwo   <unset>                 2m23s
fe-meta-kube-starrocks-fe-1   Bound    pvc-13545330-63be-42cf-b1ca-3ed6f96a8c98   10Gi       RWO            standard-rwo   <unset>                 2m23s
fe-meta-kube-starrocks-fe-2   Bound    pvc-609cadd4-c7b7-4cf9-84b0-a75678bb3c4d   10Gi       RWO            standard-rwo   <unset>                 2m23s

Verify that the cluster is healthy

:::tip These are the same commands as above, but show the desired state. :::

bash
kubectl --namespace default get starrockscluster -l "cluster=kube-starrocks"
NAME             PHASE     FESTATUS   BESTATUS   CNSTATUS   FEPROXYSTATUS
kube-starrocks   running   running    running               running
bash
kubectl get pods

:::tip The system is ready when all of the pods except for kube-starrocks-initpwd show 1/1 in the READY column. The kube-starrocks-initpwd pod should show 0/1 and a STATUS of Completed. :::

NAME                                       READY   STATUS      RESTARTS   AGE
kube-starrocks-be-0                        1/1     Running     0          57s
kube-starrocks-be-1                        1/1     Running     0          57s
kube-starrocks-be-2                        1/1     Running     0          57s
kube-starrocks-fe-0                        1/1     Running     0          103s
kube-starrocks-fe-1                        1/1     Running     0          102s
kube-starrocks-fe-2                        1/1     Running     0          103s
kube-starrocks-fe-proxy-56f8998799-d4qmt   1/1     Running     0          57s
kube-starrocks-initpwd-m84br               0/1     Completed   4          2m9s
kube-starrocks-operator-54ffcf8c5c-xsjc8   1/1     Running     0          2m9s

The EXTERNAL-IP addresses in the highlighted lines will be used to provide SQL client and Stream Load access from outside the Kubernetes cluster.

bash
kubectl get services
bash
NAME                              TYPE           CLUSTER-IP       EXTERNAL-IP     PORT(S)                                                       AGE
kube-starrocks-be-search          ClusterIP      None             <none>          9050/TCP                                                      78s
kube-starrocks-be-service         ClusterIP      34.118.228.231   <none>          9060/TCP,8040/TCP,9050/TCP,8060/TCP                           78s
# highlight-next-line
kube-starrocks-fe-proxy-service   LoadBalancer   34.118.230.176   34.176.12.205   8080:30241/TCP                                                78s
kube-starrocks-fe-search          ClusterIP      None             <none>          9030/TCP                                                      2m4s
# highlight-next-line
kube-starrocks-fe-service         LoadBalancer   34.118.226.82    34.176.215.97   8030:30620/TCP,9020:32461/TCP,9030:32749/TCP,9010:30911/TCP   2m4s
kubernetes                        ClusterIP      34.118.224.1     <none>          443/TCP                                                       8h

:::tip Store the EXTERNAL-IP addresses from the highlighted lines in environment variables so that you have them handy:

export MYSQL_IP=`kubectl get services kube-starrocks-fe-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
export FE_PROXY=`kubectl get services kube-starrocks-fe-proxy-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`:8080

:::


Connect to StarRocks with a SQL client

:::tip

If you are using a client other than the mysql CLI, open that now. :::

This command will run the mysql command in a Kubernetes pod:

sql
kubectl exec --stdin --tty kube-starrocks-fe-0 -- \
  mysql -P9030 -h127.0.0.1 -u root --prompt="StarRocks > "

If you have the mysql CLI installed locally, you can use it instead of the one in the Kubernetes cluster:

sql
mysql -P9030 -h $MYSQL_IP -u root --prompt="StarRocks > " -p

Create some tables

bash
mysql -P9030 -h $MYSQL_IP -u root --prompt="StarRocks > " -p
<DDL />

Exit from the MySQL client, or open a new shell to run commands at the command line to upload data.

sql
exit

Upload data

There are many ways to load data into StarRocks. For this tutorial, the simplest way is to use curl and StarRocks Stream Load.

Upload the two datasets that you downloaded earlier.

:::tip Open a new shell as these curl commands are run at the operating system prompt, not in the mysql client. The commands refer to the datasets that you downloaded, so run them from the directory where you downloaded the files.

Since this is a new shell, run the export commands again:

bash

export MYSQL_IP=`kubectl get services kube-starrocks-fe-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`

export FE_PROXY=`kubectl get services kube-starrocks-fe-proxy-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`:8080

You will be prompted for a password. Use the password that you added to the Kubernetes secret starrocks-root-pass. If you used the command provided, the password is g()()dpa$$word. :::

The curl commands look complex, but they are explained in detail at the end of the tutorial. For now, we recommend running the commands and running some SQL to analyze the data, and then reading about the data loading details at the end.

bash
curl --location-trusted -u root             \
    -T ./NYPD_Crash_Data.csv                \
    -H "label:crashdata-0"                  \
    -H "column_separator:,"                 \
    -H "skip_header:1"                      \
    -H "enclose:\""                         \
    -H "max_filter_ratio:1"                 \
    -H "columns:tmp_CRASH_DATE, tmp_CRASH_TIME, CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5" \
    # highlight-next-line
    -XPUT http://$FE_PROXY/api/quickstart/crashdata/_stream_load
Enter host password for user 'root':
{
    "TxnId": 2,
    "Label": "crashdata-0",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 423726,
    "NumberLoadedRows": 423725,
    "NumberFilteredRows": 1,
    "NumberUnselectedRows": 0,
    "LoadBytes": 96227746,
    "LoadTimeMs": 2483,
    "BeginTxnTimeMs": 42,
    "StreamLoadPlanTimeMs": 122,
    "ReadDataTimeMs": 1610,
    "WriteDataTimeMs": 2253,
    "CommitAndPublishTimeMs": 65,
    "ErrorURL": "http://kube-starrocks-be-2.kube-starrocks-be-search.default.svc.cluster.local:8040/api/_load_error_log?file=error_log_5149e6f80de42bcb_eab2ea77276de4ba"
}
bash
curl --location-trusted -u root             \
    -T ./72505394728.csv                    \
    -H "label:weather-0"                    \
    -H "column_separator:,"                 \
    -H "skip_header:1"                      \
    -H "enclose:\""                         \
    -H "max_filter_ratio:1"                 \
    -H "columns: STATION, DATE, LATITUDE, LONGITUDE, ELEVATION, NAME, REPORT_TYPE, SOURCE, HourlyAltimeterSetting, HourlyDewPointTemperature, HourlyDryBulbTemperature, HourlyPrecipitation, HourlyPresentWeatherType, HourlyPressureChange, HourlyPressureTendency, HourlyRelativeHumidity, HourlySkyConditions, HourlySeaLevelPressure, HourlyStationPressure, HourlyVisibility, HourlyWetBulbTemperature, HourlyWindDirection, HourlyWindGustSpeed, HourlyWindSpeed, Sunrise, Sunset, DailyAverageDewPointTemperature, DailyAverageDryBulbTemperature, DailyAverageRelativeHumidity, DailyAverageSeaLevelPressure, DailyAverageStationPressure, DailyAverageWetBulbTemperature, DailyAverageWindSpeed, DailyCoolingDegreeDays, DailyDepartureFromNormalAverageTemperature, DailyHeatingDegreeDays, DailyMaximumDryBulbTemperature, DailyMinimumDryBulbTemperature, DailyPeakWindDirection, DailyPeakWindSpeed, DailyPrecipitation, DailySnowDepth, DailySnowfall, DailySustainedWindDirection, DailySustainedWindSpeed, DailyWeather, MonthlyAverageRH, MonthlyDaysWithGT001Precip, MonthlyDaysWithGT010Precip, MonthlyDaysWithGT32Temp, MonthlyDaysWithGT90Temp, MonthlyDaysWithLT0Temp, MonthlyDaysWithLT32Temp, MonthlyDepartureFromNormalAverageTemperature, MonthlyDepartureFromNormalCoolingDegreeDays, MonthlyDepartureFromNormalHeatingDegreeDays, MonthlyDepartureFromNormalMaximumTemperature, MonthlyDepartureFromNormalMinimumTemperature, MonthlyDepartureFromNormalPrecipitation, MonthlyDewpointTemperature, MonthlyGreatestPrecip, MonthlyGreatestPrecipDate, MonthlyGreatestSnowDepth, MonthlyGreatestSnowDepthDate, MonthlyGreatestSnowfall, MonthlyGreatestSnowfallDate, MonthlyMaxSeaLevelPressureValue, MonthlyMaxSeaLevelPressureValueDate, MonthlyMaxSeaLevelPressureValueTime, MonthlyMaximumTemperature, MonthlyMeanTemperature, MonthlyMinSeaLevelPressureValue, MonthlyMinSeaLevelPressureValueDate, MonthlyMinSeaLevelPressureValueTime, MonthlyMinimumTemperature, MonthlySeaLevelPressure, MonthlyStationPressure, MonthlyTotalLiquidPrecipitation, MonthlyTotalSnowfall, MonthlyWetBulb, AWND, CDSD, CLDD, DSNW, HDSD, HTDD, NormalsCoolingDegreeDay, NormalsHeatingDegreeDay, ShortDurationEndDate005, ShortDurationEndDate010, ShortDurationEndDate015, ShortDurationEndDate020, ShortDurationEndDate030, ShortDurationEndDate045, ShortDurationEndDate060, ShortDurationEndDate080, ShortDurationEndDate100, ShortDurationEndDate120, ShortDurationEndDate150, ShortDurationEndDate180, ShortDurationPrecipitationValue005, ShortDurationPrecipitationValue010, ShortDurationPrecipitationValue015, ShortDurationPrecipitationValue020, ShortDurationPrecipitationValue030, ShortDurationPrecipitationValue045, ShortDurationPrecipitationValue060, ShortDurationPrecipitationValue080, ShortDurationPrecipitationValue100, ShortDurationPrecipitationValue120, ShortDurationPrecipitationValue150, ShortDurationPrecipitationValue180, REM, BackupDirection, BackupDistance, BackupDistanceUnit, BackupElements, BackupElevation, BackupEquipment, BackupLatitude, BackupLongitude, BackupName, WindEquipmentChangeDate" \
    # highlight-next-line
    -XPUT http://$FE_PROXY/api/quickstart/weatherdata/_stream_load
Enter host password for user 'root':
{
    "TxnId": 4,
    "Label": "weather-0",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 22931,
    "NumberLoadedRows": 22931,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 15558550,
    "LoadTimeMs": 404,
    "BeginTxnTimeMs": 1,
    "StreamLoadPlanTimeMs": 7,
    "ReadDataTimeMs": 157,
    "WriteDataTimeMs": 372,
    "CommitAndPublishTimeMs": 23
}

Connect with a MySQL client

Connect with a MySQL client if you are not connected. Remember to use the external IP address of the kube-starrocks-fe-service service and the password that you configured in the Kubernetes secret starrocks-root-pass.

bash
mysql -P9030 -h $MYSQL_IP -u root --prompt="StarRocks > " -p

Answer some questions

<SQL />
sql
exit

Cleanup

Run this command if you are finished and would like to remove the StarRocks cluster and the StarRocks operator.

bash
helm delete starrocks

Summary

In this tutorial you:

  • Deployed StarRocks with Helm and the StarRocks Operator
  • Loaded crash data provided by New York City and weather data provided by NOAA
  • Analyzed the data using SQL JOINs to find out that driving in low visibility or icy streets is a bad idea

There is more to learn; we intentionally glossed over the data transformation done during the Stream Load. The details on that are in the notes on the curl commands below.


Notes on the curl commands

<Curl />

More information

Default values.yaml

Stream Load

The Motor Vehicle Collisions - Crashes dataset is provided by New York City subject to these terms of use and privacy policy.

The Local Climatological Data(LCD) is provided by NOAA with this disclaimer and this privacy policy.

Helm is a package manager for Kubernetes. A Helm Chart is a Helm package and contains all of the resource definitions necessary to run an application on a Kubernetes cluster.

starrocks-kubernetes-operator and kube-starrocks Helm Chart.