docs/en/quick_start/helm.md
import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; import OperatorPrereqs from '../_assets/deployment/_OperatorPrereqs.mdx' import DDL from '../_assets/quick-start/_DDL.mdx' import Clients from '../_assets/quick-start/_clientsAllin1.mdx' import SQL from '../_assets/quick-start/_SQL.mdx' import Curl from '../_assets/quick-start/_curl.mdx'
The goals of this quickstart are:
root:::tip The datasets and queries are the same as the ones used in the Basic Quick Start. The main difference here is deploying with Helm and the StarRocks Operator. :::
The data used is provided by NYC OpenData and the National Centers for Environmental Information.
Both of these datasets are large, and because this tutorial is intended to help you get exposed to working with StarRocks we are not going to load data for the past 120 years. You can run this with a GKE Kubernetes cluster built on three e2-standard-4 machines (or similar) with 80GB disk. For larger deployments, we have other documentation and will provide that later.
There is a lot of information in this document, and it is presented with step-by-step content at the beginning, and the technical details at the end. This is done to serve these purposes in this order:
You can use the SQL client provided in the Kubernetes environment, or use one on your system. This guide uses the mysql CLI Many MySQL-compatible clients will work.
curl is used to issue the data load job to StarRocks, and to download the datasets. Check to see if you have it installed by running curl or curl.exe at your OS prompt. If curl is not installed, get curl here.
Frontend nodes are responsible for metadata management, client connection management, query planning, and query scheduling. Each FE stores and maintains a complete copy of metadata in its memory, which guarantees indiscriminate services among the FEs.
Backend nodes are responsible for both data storage and executing query plans.
The Helm Chart contains the definitions of the StarRocks Operator and the custom resource StarRocksCluster.
Add the Helm Chart Repo.
helm repo add starrocks https://starrocks.github.io/starrocks-kubernetes-operator
Update the Helm Chart Repo to the latest version.
helm repo update
View the Helm Chart Repo that you added.
helm search repo starrocks
NAME CHART VERSION APP VERSION DESCRIPTION
starrocks/kube-starrocks 1.9.7 3.2-latest kube-starrocks includes two subcharts, operator...
starrocks/operator 1.9.7 1.9.7 A Helm chart for StarRocks operator
starrocks/starrocks 1.9.7 3.2-latest A Helm chart for StarRocks cluster
starrocks/warehouse 1.9.7 3.2-latest Warehouse is currently a feature of the StarRoc...
Download these two datasets to your machine.
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/NYPD_Crash_Data.csv
curl -O https://raw.githubusercontent.com/StarRocks/demo/master/documentation-samples/quickstart/datasets/72505394728.csv
The goals for this quick start are:
rootThe Helm chart provides options to satisfy all of these goals, but they are not configured by default. The rest of this section covers the configuration needed to meet all of these goals. A complete values spec will be provided, but first read the details for each of the six sections and then copy the full spec.
This bit of YAML instructs the StarRocks operator to set the password for the database user root to the value of the password key of the Kubernetes secret `starrocks-root-pass.
starrocks:
initPassword:
enabled: true
# Set a password secret, for example:
# kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
passwordSecret: starrocks-root-pass
Task: Create the Kubernetes secret
kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
By setting starrocks.starrockFESpec.replicas to 3, and starrocks.starrockBeSpec.replicas to 3 you will have enough FEs and BEs for high availability. Setting the CPU and memory requests low allows the pods to be created in a small Kubernetes environment.
starrocks:
starrocksFESpec:
replicas: 3
resources:
requests:
cpu: 1
memory: 1Gi
starrocksBeSpec:
replicas: 3
resources:
requests:
cpu: 1
memory: 2Gi
Setting a value for starrocks.starrocksFESpec.storageSpec.name to anything other than "" causes:
starrocks.starrocksFESpec.storageSpec.name to be used as the prefix for all storage volumes for the service.By setting the value to fe these PVs will be created for FE 0:
fe-meta-kube-starrocks-fe-0fe-log-kube-starrocks-fe-0starrocks:
starrocksFESpec:
storageSpec:
name: fe
Setting a value for starrocks.starrocksBeSpec.storageSpec.name to anything other than "" causes:
starrocks.starrocksBeSpec.storageSpec.name to be used as the prefix for all storage volumes for the service.By setting the value to be these PVs will be created for BE 0:
be-data-kube-starrocks-be-0be-log-kube-starrocks-be-0Setting the storageSize to 15Gi reduces the storage from the default of 1Ti to fit smaller quotas for storage.
starrocks:
starrocksBeSpec:
storageSpec:
name: be
storageSize: 15Gi
By default, access to the FE service is through cluster IPs. To allow external access, service.type is set to LoadBalancer
starrocks:
starrocksFESpec:
service:
type: LoadBalancer
Stream Load requires external access to both FEs and BEs. The requests are sent to the FE and then the FE assigns a BE to process the upload. To allow the curl command to be redirected to the BE the starroclFeProxySpec needs to be enabled and set to type LoadBalancer.
starrocks:
starrocksFeProxySpec:
enabled: true
service:
type: LoadBalancer
The above snippets combined provide a full values file. Save this to my-values.yaml:
starrocks:
initPassword:
enabled: true
# Set a password secret, for example:
# kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
passwordSecret: starrocks-root-pass
starrocksFESpec:
replicas: 3
service:
type: LoadBalancer
resources:
requests:
cpu: 1
memory: 1Gi
storageSpec:
name: fe
starrocksBeSpec:
replicas: 3
resources:
requests:
cpu: 1
memory: 2Gi
storageSpec:
name: be
storageSize: 15Gi
starrocksFeProxySpec:
enabled: true
service:
type: LoadBalancer
To load data from outside of the Kubernetes cluster the StarRocks database will be exposed externally. You should set
a password for the StarRocks database user root. The operator will apply the password to the FE and BE nodes.
kubectl create secret generic starrocks-root-pass --from-literal=password='g()()dpa$$word'
secret/starrocks-root-pass created
helm install -f my-values.yaml starrocks starrocks/kube-starrocks
NAME: starrocks
LAST DEPLOYED: Wed Jun 26 20:25:09 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thank you for installing kube-starrocks-1.9.7 kube-starrocks chart.
It will install both operator and starrocks cluster, please wait for a few minutes for the cluster to be ready.
Please see the values.yaml for more operation information: https://github.com/StarRocks/starrocks-kubernetes-operator/blob/main/helm-charts/charts/kube-starrocks/values.yaml
You can check the progress with these commands:
kubectl --namespace default get starrockscluster -l "cluster=kube-starrocks"
NAME PHASE FESTATUS BESTATUS CNSTATUS FEPROXYSTATUS
kube-starrocks reconciling reconciling reconciling reconciling
kubectl get pods
:::note
The kube-starrocks-initpwd pod will go through error and CrashLoopBackOff states as it attempts to connect to the FE and BE pods to set the StarRocks root password. You should ignore these errors and wait for a status of Completed for this pod.
:::
NAME READY STATUS RESTARTS AGE
kube-starrocks-be-0 0/1 Running 0 20s
kube-starrocks-be-1 0/1 Running 0 20s
kube-starrocks-be-2 0/1 Running 0 20s
kube-starrocks-fe-0 1/1 Running 0 66s
kube-starrocks-fe-1 0/1 Running 0 65s
kube-starrocks-fe-2 0/1 Running 0 66s
kube-starrocks-fe-proxy-56f8998799-d4qmt 1/1 Running 0 20s
kube-starrocks-initpwd-m84br 0/1 CrashLoopBackOff 3 (50s ago) 92s
kube-starrocks-operator-54ffcf8c5c-xsjc8 1/1 Running 0 92s
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
be-data-kube-starrocks-be-0 Bound pvc-4ae0c9d8-7f9a-4147-ad74-b22569165448 15Gi RWO standard-rwo <unset> 82s
be-data-kube-starrocks-be-1 Bound pvc-28b4dbd1-0c8f-4b06-87e8-edec616cabbc 15Gi RWO standard-rwo <unset> 82s
be-data-kube-starrocks-be-2 Bound pvc-c7232ea6-d3d9-42f1-bfc1-024205a17656 15Gi RWO standard-rwo <unset> 82s
be-log-kube-starrocks-be-0 Bound pvc-6193c43d-c74f-4d12-afcc-c41ace3d5408 1Gi RWO standard-rwo <unset> 82s
be-log-kube-starrocks-be-1 Bound pvc-c01f124a-014a-439a-99a6-6afe95215bf0 1Gi RWO standard-rwo <unset> 82s
be-log-kube-starrocks-be-2 Bound pvc-136df15f-4d2e-43bc-a1c0-17227ce3fe6b 1Gi RWO standard-rwo <unset> 82s
fe-log-kube-starrocks-fe-0 Bound pvc-7eac524e-d286-4760-b21c-d9b6261d976f 5Gi RWO standard-rwo <unset> 2m23s
fe-log-kube-starrocks-fe-1 Bound pvc-38076b78-71e8-4659-b8e7-6751bec663f6 5Gi RWO standard-rwo <unset> 2m23s
fe-log-kube-starrocks-fe-2 Bound pvc-4ccfee60-02b7-40ba-a22e-861ea29dac74 5Gi RWO standard-rwo <unset> 2m23s
fe-meta-kube-starrocks-fe-0 Bound pvc-5130c9ff-b797-4f79-a1d2-4214af860d70 10Gi RWO standard-rwo <unset> 2m23s
fe-meta-kube-starrocks-fe-1 Bound pvc-13545330-63be-42cf-b1ca-3ed6f96a8c98 10Gi RWO standard-rwo <unset> 2m23s
fe-meta-kube-starrocks-fe-2 Bound pvc-609cadd4-c7b7-4cf9-84b0-a75678bb3c4d 10Gi RWO standard-rwo <unset> 2m23s
:::tip These are the same commands as above, but show the desired state. :::
kubectl --namespace default get starrockscluster -l "cluster=kube-starrocks"
NAME PHASE FESTATUS BESTATUS CNSTATUS FEPROXYSTATUS
kube-starrocks running running running running
kubectl get pods
:::tip
The system is ready when all of the pods except for kube-starrocks-initpwd show 1/1 in the READY column. The kube-starrocks-initpwd pod should show 0/1 and a STATUS of Completed.
:::
NAME READY STATUS RESTARTS AGE
kube-starrocks-be-0 1/1 Running 0 57s
kube-starrocks-be-1 1/1 Running 0 57s
kube-starrocks-be-2 1/1 Running 0 57s
kube-starrocks-fe-0 1/1 Running 0 103s
kube-starrocks-fe-1 1/1 Running 0 102s
kube-starrocks-fe-2 1/1 Running 0 103s
kube-starrocks-fe-proxy-56f8998799-d4qmt 1/1 Running 0 57s
kube-starrocks-initpwd-m84br 0/1 Completed 4 2m9s
kube-starrocks-operator-54ffcf8c5c-xsjc8 1/1 Running 0 2m9s
The EXTERNAL-IP addresses in the highlighted lines will be used to provide SQL client and Stream Load access from outside the Kubernetes cluster.
kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-starrocks-be-search ClusterIP None <none> 9050/TCP 78s
kube-starrocks-be-service ClusterIP 34.118.228.231 <none> 9060/TCP,8040/TCP,9050/TCP,8060/TCP 78s
# highlight-next-line
kube-starrocks-fe-proxy-service LoadBalancer 34.118.230.176 34.176.12.205 8080:30241/TCP 78s
kube-starrocks-fe-search ClusterIP None <none> 9030/TCP 2m4s
# highlight-next-line
kube-starrocks-fe-service LoadBalancer 34.118.226.82 34.176.215.97 8030:30620/TCP,9020:32461/TCP,9030:32749/TCP,9010:30911/TCP 2m4s
kubernetes ClusterIP 34.118.224.1 <none> 443/TCP 8h
:::tip
Store the EXTERNAL-IP addresses from the highlighted lines in environment variables so that you have them handy:
export MYSQL_IP=`kubectl get services kube-starrocks-fe-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
export FE_PROXY=`kubectl get services kube-starrocks-fe-proxy-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`:8080
:::
:::tip
If you are using a client other than the mysql CLI, open that now. :::
This command will run the mysql command in a Kubernetes pod:
kubectl exec --stdin --tty kube-starrocks-fe-0 -- \
mysql -P9030 -h127.0.0.1 -u root --prompt="StarRocks > "
If you have the mysql CLI installed locally, you can use it instead of the one in the Kubernetes cluster:
mysql -P9030 -h $MYSQL_IP -u root --prompt="StarRocks > " -p
mysql -P9030 -h $MYSQL_IP -u root --prompt="StarRocks > " -p
Exit from the MySQL client, or open a new shell to run commands at the command line to upload data.
exit
There are many ways to load data into StarRocks. For this tutorial, the simplest way is to use curl and StarRocks Stream Load.
Upload the two datasets that you downloaded earlier.
:::tip
Open a new shell as these curl commands are run at the operating system prompt, not in the mysql client. The commands refer to the datasets that you downloaded, so run them from the directory where you downloaded the files.
Since this is a new shell, run the export commands again:
export MYSQL_IP=`kubectl get services kube-starrocks-fe-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`
export FE_PROXY=`kubectl get services kube-starrocks-fe-proxy-service --output jsonpath='{.status.loadBalancer.ingress[0].ip}'`:8080
You will be prompted for a password. Use the password that you added to the Kubernetes secret starrocks-root-pass. If you used the command provided, the password is g()()dpa$$word.
:::
The curl commands look complex, but they are explained in detail at the end of the tutorial. For now, we recommend running the commands and running some SQL to analyze the data, and then reading about the data loading details at the end.
curl --location-trusted -u root \
-T ./NYPD_Crash_Data.csv \
-H "label:crashdata-0" \
-H "column_separator:," \
-H "skip_header:1" \
-H "enclose:\"" \
-H "max_filter_ratio:1" \
-H "columns:tmp_CRASH_DATE, tmp_CRASH_TIME, CRASH_DATE=str_to_date(concat_ws(' ', tmp_CRASH_DATE, tmp_CRASH_TIME), '%m/%d/%Y %H:%i'),BOROUGH,ZIP_CODE,LATITUDE,LONGITUDE,LOCATION,ON_STREET_NAME,CROSS_STREET_NAME,OFF_STREET_NAME,NUMBER_OF_PERSONS_INJURED,NUMBER_OF_PERSONS_KILLED,NUMBER_OF_PEDESTRIANS_INJURED,NUMBER_OF_PEDESTRIANS_KILLED,NUMBER_OF_CYCLIST_INJURED,NUMBER_OF_CYCLIST_KILLED,NUMBER_OF_MOTORIST_INJURED,NUMBER_OF_MOTORIST_KILLED,CONTRIBUTING_FACTOR_VEHICLE_1,CONTRIBUTING_FACTOR_VEHICLE_2,CONTRIBUTING_FACTOR_VEHICLE_3,CONTRIBUTING_FACTOR_VEHICLE_4,CONTRIBUTING_FACTOR_VEHICLE_5,COLLISION_ID,VEHICLE_TYPE_CODE_1,VEHICLE_TYPE_CODE_2,VEHICLE_TYPE_CODE_3,VEHICLE_TYPE_CODE_4,VEHICLE_TYPE_CODE_5" \
# highlight-next-line
-XPUT http://$FE_PROXY/api/quickstart/crashdata/_stream_load
Enter host password for user 'root':
{
"TxnId": 2,
"Label": "crashdata-0",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 423726,
"NumberLoadedRows": 423725,
"NumberFilteredRows": 1,
"NumberUnselectedRows": 0,
"LoadBytes": 96227746,
"LoadTimeMs": 2483,
"BeginTxnTimeMs": 42,
"StreamLoadPlanTimeMs": 122,
"ReadDataTimeMs": 1610,
"WriteDataTimeMs": 2253,
"CommitAndPublishTimeMs": 65,
"ErrorURL": "http://kube-starrocks-be-2.kube-starrocks-be-search.default.svc.cluster.local:8040/api/_load_error_log?file=error_log_5149e6f80de42bcb_eab2ea77276de4ba"
}
curl --location-trusted -u root \
-T ./72505394728.csv \
-H "label:weather-0" \
-H "column_separator:," \
-H "skip_header:1" \
-H "enclose:\"" \
-H "max_filter_ratio:1" \
-H "columns: STATION, DATE, LATITUDE, LONGITUDE, ELEVATION, NAME, REPORT_TYPE, SOURCE, HourlyAltimeterSetting, HourlyDewPointTemperature, HourlyDryBulbTemperature, HourlyPrecipitation, HourlyPresentWeatherType, HourlyPressureChange, HourlyPressureTendency, HourlyRelativeHumidity, HourlySkyConditions, HourlySeaLevelPressure, HourlyStationPressure, HourlyVisibility, HourlyWetBulbTemperature, HourlyWindDirection, HourlyWindGustSpeed, HourlyWindSpeed, Sunrise, Sunset, DailyAverageDewPointTemperature, DailyAverageDryBulbTemperature, DailyAverageRelativeHumidity, DailyAverageSeaLevelPressure, DailyAverageStationPressure, DailyAverageWetBulbTemperature, DailyAverageWindSpeed, DailyCoolingDegreeDays, DailyDepartureFromNormalAverageTemperature, DailyHeatingDegreeDays, DailyMaximumDryBulbTemperature, DailyMinimumDryBulbTemperature, DailyPeakWindDirection, DailyPeakWindSpeed, DailyPrecipitation, DailySnowDepth, DailySnowfall, DailySustainedWindDirection, DailySustainedWindSpeed, DailyWeather, MonthlyAverageRH, MonthlyDaysWithGT001Precip, MonthlyDaysWithGT010Precip, MonthlyDaysWithGT32Temp, MonthlyDaysWithGT90Temp, MonthlyDaysWithLT0Temp, MonthlyDaysWithLT32Temp, MonthlyDepartureFromNormalAverageTemperature, MonthlyDepartureFromNormalCoolingDegreeDays, MonthlyDepartureFromNormalHeatingDegreeDays, MonthlyDepartureFromNormalMaximumTemperature, MonthlyDepartureFromNormalMinimumTemperature, MonthlyDepartureFromNormalPrecipitation, MonthlyDewpointTemperature, MonthlyGreatestPrecip, MonthlyGreatestPrecipDate, MonthlyGreatestSnowDepth, MonthlyGreatestSnowDepthDate, MonthlyGreatestSnowfall, MonthlyGreatestSnowfallDate, MonthlyMaxSeaLevelPressureValue, MonthlyMaxSeaLevelPressureValueDate, MonthlyMaxSeaLevelPressureValueTime, MonthlyMaximumTemperature, MonthlyMeanTemperature, MonthlyMinSeaLevelPressureValue, MonthlyMinSeaLevelPressureValueDate, MonthlyMinSeaLevelPressureValueTime, MonthlyMinimumTemperature, MonthlySeaLevelPressure, MonthlyStationPressure, MonthlyTotalLiquidPrecipitation, MonthlyTotalSnowfall, MonthlyWetBulb, AWND, CDSD, CLDD, DSNW, HDSD, HTDD, NormalsCoolingDegreeDay, NormalsHeatingDegreeDay, ShortDurationEndDate005, ShortDurationEndDate010, ShortDurationEndDate015, ShortDurationEndDate020, ShortDurationEndDate030, ShortDurationEndDate045, ShortDurationEndDate060, ShortDurationEndDate080, ShortDurationEndDate100, ShortDurationEndDate120, ShortDurationEndDate150, ShortDurationEndDate180, ShortDurationPrecipitationValue005, ShortDurationPrecipitationValue010, ShortDurationPrecipitationValue015, ShortDurationPrecipitationValue020, ShortDurationPrecipitationValue030, ShortDurationPrecipitationValue045, ShortDurationPrecipitationValue060, ShortDurationPrecipitationValue080, ShortDurationPrecipitationValue100, ShortDurationPrecipitationValue120, ShortDurationPrecipitationValue150, ShortDurationPrecipitationValue180, REM, BackupDirection, BackupDistance, BackupDistanceUnit, BackupElements, BackupElevation, BackupEquipment, BackupLatitude, BackupLongitude, BackupName, WindEquipmentChangeDate" \
# highlight-next-line
-XPUT http://$FE_PROXY/api/quickstart/weatherdata/_stream_load
Enter host password for user 'root':
{
"TxnId": 4,
"Label": "weather-0",
"Status": "Success",
"Message": "OK",
"NumberTotalRows": 22931,
"NumberLoadedRows": 22931,
"NumberFilteredRows": 0,
"NumberUnselectedRows": 0,
"LoadBytes": 15558550,
"LoadTimeMs": 404,
"BeginTxnTimeMs": 1,
"StreamLoadPlanTimeMs": 7,
"ReadDataTimeMs": 157,
"WriteDataTimeMs": 372,
"CommitAndPublishTimeMs": 23
}
Connect with a MySQL client if you are not connected. Remember to use the external IP address of the kube-starrocks-fe-service service and the password that you configured in the Kubernetes secret starrocks-root-pass.
mysql -P9030 -h $MYSQL_IP -u root --prompt="StarRocks > " -p
exit
Run this command if you are finished and would like to remove the StarRocks cluster and the StarRocks operator.
helm delete starrocks
In this tutorial you:
There is more to learn; we intentionally glossed over the data transformation done during the Stream Load. The details on that are in the notes on the curl commands below.
Default values.yaml
The Motor Vehicle Collisions - Crashes dataset is provided by New York City subject to these terms of use and privacy policy.
The Local Climatological Data(LCD) is provided by NOAA with this disclaimer and this privacy policy.
Helm is a package manager for Kubernetes. A Helm Chart is a Helm package and contains all of the resource definitions necessary to run an application on a Kubernetes cluster.
starrocks-kubernetes-operator and kube-starrocks Helm Chart.