Back to Charts

Spring Cloud Data Flow Chart

stable/spring-cloud-data-flow/README.md

latest18.6 KB
Original Source

Spring Cloud Data Flow Chart

Spring Cloud Data Flow is a toolkit for microservices-based Streaming and Batch data processing pipelines in Cloud Foundry and Kubernetes

Data processing pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks. This makes Spring Cloud Data Flow suitable for a range of data processing use cases, from import/export to event streaming and predictive analytics.

This Helm chart is deprecated

Given the stable deprecation timeline, the Bitnami maintained Spring Cloud Data Flow Helm chart is now located at bitnami/charts.

The Bitnami repository is already included in the Hubs and we will continue providing the same cadence of updates, support, etc that we've been keeping here these years. Installation instructions are very similar, just adding the bitnami repo and using it during the installation (bitnami/<chart> instead of stable/<chart>)

bash
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm install my-release bitnami/<chart>           # Helm 3
$ helm install --name my-release bitnami/<chart>    # Helm 2

To update an exisiting stable deployment with a chart hosted in the bitnami repository you can execute

bash
$ helm repo add bitnami https://charts.bitnami.com/bitnami
$ helm upgrade my-release bitnami/<chart>

Issues and PRs related to the chart itself will be redirected to bitnami/charts GitHub repository. In the same way, we'll be happy to answer questions related to this migration process in this issue created as a common place for discussion.

Chart Details

This chart will provision a fully functional and fully featured Spring Cloud Data Flow installation that can deploy and manage data processing pipelines in the cluster that it is deployed to.

Either the default MySQL deployment or an external database can be used as the data store for Spring Cloud Data Flow state and either RabbitMQ or Kafka can be used as the messaging layer for streaming apps to communicate with one another.

For more information on Spring Cloud Data Flow and its capabilities, see it's documentation.

Prerequisites

Assumes that serviceAccount credentials are available so the deployed Data Flow server can access the API server (Works on GKE and Minikube by default). See Configure Service Accounts for Pods

Installing the Chart

To install the chart with the release name my-release:

bash
$ helm install --name my-release stable/spring-cloud-data-flow

If you are using a cluster that does not have a load balancer (like Minikube) then you can install using a NodePort:

bash
$ helm install --name my-release --set server.service.type=NodePort stable/spring-cloud-data-flow

To restrict the load balancer to an IP address range:

bash
$ helm install --name my-release  --set server.service.loadBalancerSourceRanges='[10.0.0.0/8]' stable/spring-cloud-data-flow

Data Store

By default, MySQL is deployed with this chart. However, if you wish to use an external database, please use the following set flags to the helm command to disable MySQL deployment, for example:

--set mysql.enabled=false

In addition, you are required to set all fields listed in External Database Configuration.

Messaging Layer

There are three messaging layers available in this chart:

  • RabbitMQ (default)
  • RabbitMQ HA
  • Kafka

To change the messaging layer to a highly available (HA) version of RabbitMQ, use the following set flags to the helm command, for example:

--set rabbitmq-ha.enabled=true,rabbitmq.enabled=false

Alternatively, to change the messaging layer to Kafka, use the following set flags to the helm command, for example:

--set kafka.enabled=true,rabbitmq.enabled=false

Only one messaging layer can be used at a given time. If RabbitMQ and Kafka are enabled, both charts will be installed with RabbitMQ being used in the deployment.

Note that this chart pulls in many different Docker images so can take a while to fully install.

Feature Toggles

If you only need to deploy tasks and schedules, streams can be disabled:

--set features.streaming.enabled=false --set rabbitmq.enabled=false

If you only need to deploy streams, tasks and schedules can be disabled:

--set features.batch.enabled=false

NOTE: Both features.streaming.enabled and features.batch.enabled should not be set to false at the same time.

Streaming and batch applications can be monitored through Prometheus and Grafana. To deploy these components and enable monitoring, set the following:

--set features.monitoring.enabled=true

When using Minikube, the Grafana URL can be obtained for example, via:

minikube service my-release-grafana --url

On a platform that provides a LoadBalancer such as GKE, the following can be checked against until the EXTERNAL-IP field is populated with the assigned load balancer IP address:

kubectl get svc my-release-grafana

See the Grafana table below for default credentials and override parameters.

Using an Ingress

If you would like to use an Ingress instead of having the services use the LoadBalancer type there are a few things to consider.

First you need to have an Ingress Controller installed in your cluster. If you don't already have one instaled, you can use the following helm command to install an NGINX Ingress Controller:

bash
kubectl create namespace nginx-ingress
helm install --name nginx-ingress --namespace nginx-ingress stable/nginx-ingress

You can look up the IP address used by the NGINX Ingress Controller with:

bash
ingress=$(kubectl get svc nginx-ingress-controller -n nginx-ingress -ojsonpath='{.status.loadBalancer.ingress[0].ip}')

This is useful if you would like to use xip.io instead of your own DNS resolution. The folowing options assume that you will use xip.io but you can replace the host values below with your own DNS hosts if you prefer.

To enable the creation of an Ingress resource and configure the services to use ClusterIP type use the following set options in your helm install command:

bash
  --set server.service.type=ClusterIP \
  --set ingress.enabled=true \
  --set ingress.protocol=http \
  --set ingress.server.host=scdf.${ingress}.xip.io \

If you want to use an Ingress with the monitoring feature enabled, then use thes options instead:

bash
  --set features.monitoring.enabled=true \
  --set server.service.type=ClusterIP \
  --set grafana.service.type=ClusterIP \
  --set prometheus.proxy.service.type=ClusterIP \
  --set ingress.enabled=true \
  --set ingress.protocol=http \
  --set ingress.server.host=scdf.${ingress}.xip.io \
  --set ingress.grafana.host=grafana.${ingress}.xip.io \

Configuration

The following tables list the configurable parameters and their default values.

RBAC Configuration

ParameterDescriptionDefault
rbac.createCreate RBAC configurationstrue

ServiceAccount Configuration

ParameterDescriptionDefault
serviceAccount.createCreate ServiceAccounttrue
serviceAccount.nameServiceAccount name(generated if not specified)

Data Flow Server Configuration

ParameterDescriptionDefault
server.versionThe version/tag of the Data Flow server2.6.0
server.imagePullPolicyThe imagePullPolicy of the Data Flow serverIfNotPresent
server.service.typeThe service type for the Data Flow serverLoadBalancer
server.service.annotationsExtra annotations for service resource{}
server.service.externalPortThe external port for the Data Flow server80
server.service.labelsExtra labels for the service resource{}
server.service.loadBalancerSourceRangesA list of IP address ranges to allow through the load balancerno restriction
server.platformNameThe name of the configured platform accountdefault
server.configMapCustom ConfigMap name for Data Flow server configuration
server.trustCertsTrust self signed certsfalse
server.extraEnvExtra environment variables to add to the server container{}
server.containerConfiguration.container.registry-configurations.<NAME>.registry-hostThe registry host to use for the profile represented by <NAME>
server.containerConfiguration.container.registry-configurations.<NAME>.authorization-typeThe registry authorization type to use for the profile represented by <NAME>

Skipper Server Configuration

ParameterDescriptionDefault
skipper.versionThe version/tag of the Skipper server2.5.0
skipper.imagePullPolicyThe imagePullPolicy of the Skipper serverIfNotPresent
skipper.platformNameThe name of the configured platform accountdefault
skipper.service.typeThe service type for the Skipper serverClusterIP
skipper.service.annotationsExtra annotations for service resources{}
skipper.service.labelsExtra labels for the service resource{}
skipper.configMapCustom ConfigMap name for Skipper server configuration
skipper.trustCertsTrust self signed certsfalse
skipper.extraEnvExtra environment variables to add to the skipper container{}

Spring Cloud Deployer for Kubernetes Configuration

ParameterDescriptionDefault
deployer.resourceLimits.cpuDeployer resource limit for cpu500m
deployer.resourceLimits.memoryDeployer resource limit for memory1024Mi
deployer.readinessProbe.initialDelaySecondsDeployer readiness probe initial delay120
deployer.livenessProbe.initialDelaySecondsDeployer liveness probe initial delay90

RabbitMQ Configuration

ParameterDescriptionDefault
rabbitmq.enabledEnable RabbitMQ as the middleware to usetrue
rabbitmq.rabbitmq.usernameRabbitMQ user nameuser
rabbitmq.rabbitmq.passwordRabbitMQ password to encode into the secretchangeme

RabbitMQ HA Configuration

ParameterDescriptionDefault
rabbitmq-ha.enabledEnable RabbitMQ HA as the middleware to usefalse
rabbitmq-ha.rabbitmqUsernameRabbitMQ user nameuser

Kafka Configuration

ParameterDescriptionDefault
kafka.enabledEnable RabbitMQ as the middleware to usefalse
kafka.replicasThe number of Kafka replicas to use1
kafka.configurationOverridesKafka deployment configuration overridesreplication.factor=1, metrics.enabled=false
kafka.zookeeper.replicaCountThe number of ZooKeeper replicates to use1

MySQL Configuration

ParameterDescriptionDefault
mysql.enabledEnable deployment of MySQLtrue
mysql.mysqlDatabaseMySQL database namedataflow

External Database Configuration

ParameterDescriptionDefault
database.driverDatabase drivernil
database.schemeDatabase schemenil
database.hostDatabase hostnil
database.portDatabase portnil
database.userDatabase userscdf
database.passwordDatabase passwordnil
database.dataflowDatabase name for SCDF serverdataflow
database.skipperDatabase name for SCDF skipperskipper

Feature Toggles

ParameterDescriptionDefault
features.streaming.enabledEnables or disables streamstrue
features.batch.enabledEnables or disables tasks and schedulestrue
features.monitoring.enabledEnables or disables monitoringfalse

Ingress

ParameterDescriptionDefault
ingress.enabledEnables or disables ingress supporttrue
ingress.protocolSets the protocol used by ingress serverhttps
ingress.server.hostSets the host used for serverdata-flow.local
ingress.server.hostSets the host used for grafanagrafana.local

Grafana

ParameterDescriptionDefault
grafana.service.typeService type to useLoadBalancer
grafana.admin.existingSecretExisting Secret to use for login credentialsscdf-grafana-secret
grafana.admin.userKeySecret userKey fieldadmin-user
grafana.admin.passwordKeySecret passwordKey fieldadmin-password
grafana.admin.defaultUsernameThe default base64 encoded login username used in the secretadmin
grafana.admin.defaultPasswordThe default base64 encoded login password used in the secretpassword
grafana.extraConfigmapMountsConfigMap mount for datasourcesscdf-grafana-ds-cm
grafana.dashboardProvidersDashboard provider for imported dashboardsdefault
grafana.dashboardsDashboards to auto importSCDF Apps, Streams & Tasks

Prometheus

ParameterDescriptionDefault
prometheus.server.global.scrape_intervalScrape interval10s
prometheus.server.global.scrape_timeoutScrape timeout9s
prometheus.server.global.evaluation_intervalEvaluation interval10s
prometheus.extraScrapeConfigsAdditional scrape configs for proxied applicationsproxied-applications & proxies jobs
prometheus.podSecurityPolicyEnable or disable PodSecurityContexttrue
prometheus.alertmanagerEnable or disable alert managerfalse
prometheus.kubeStateMetricsEnable or disable kube state metricsfalse
prometheus.nodeExporterEnable or disable node exporterfalse
prometheus.pushgatewayEnable or disable push gatewayfalse
prometheus.proxy.service.typeService type to useLoadBalancer