Back to Predictionio

Apache Spark Helm Chart

docker/charts/spark/README.md

0.14.06.2 KB
Original Source
<!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -->

Apache Spark Helm Chart

Apache Spark is a fast and general-purpose cluster computing system.

This chart is based on stable/spark in Helm Charts.

Chart Details

This chart will do the following:

  • 1 x Spark Master with port 8080 exposed on an external LoadBalancer
  • 3 x Spark Workers with HorizontalPodAutoscaler to scale to max 10 pods when CPU hits 50% of 100m
  • All using Kubernetes Deployments

Prerequisites

Installing the Chart

To install the chart with the release name my-release:

bash
$ helm install --name my-release stable/spark

Configuration

The following table lists the configurable parameters of the Spark chart and their default values.

Spark Master

ParameterDescriptionDefault
Master.NameSpark master namespark-master
Master.ImageContainer image namebde2020/spark-master
Master.ImageTagContainer image tag2.2.2-hadoop2.7
Master.Replicask8s deployment replicas1
Master.Componentk8s selector keyspark-master
Master.Cpucontainer requested cpu100m
Master.Memorycontainer requested memory512Mi
Master.ServicePortk8s service port7077
Master.ContainerPortContainer listening port7077
Master.DaemonMemoryMaster JVM Xms and Xmx option1g
Master.ServiceType Kubernetes Service typeLoadBalancer

Spark WebUi

ParameterDescriptionDefault
WebUi.NameSpark webui namespark-webui
WebUi.ServicePortk8s service port8080
WebUi.ContainerPortContainer listening port8080

Spark Worker

ParameterDescriptionDefault
Worker.NameSpark worker namespark-worker
Worker.ImageContainer image namebde2020/spark-worker
Worker.ImageTagContainer image tag2.2.2-hadoop2.7
Worker.Replicask8s hpa and deployment replicas3
Worker.ReplicasMaxk8s hpa max replicas10
Worker.Componentk8s selector keyspark-worker
Worker.Cpucontainer requested cpu100m
Worker.Memorycontainer requested memory512Mi
Worker.ContainerPortContainer listening port7077
Worker.CpuTargetPercentagek8s hpa cpu targetPercentage50
Worker.DaemonMemoryWorker JVM Xms and Xmx setting1g
Worker.ExecutorMemoryWorker memory available for executor1g
Worker.AutoscalingEnable horizontal pod autoscalingfalse

Specify each parameter using the --set key=value[,key=value] argument to helm install.

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart. For example,

bash
$ helm install --name my-release -f values.yaml stable/spark

Tip: You can use the default values.yaml