Back to Charts

⚠️ Repo Archive Notice

stable/pachyderm/README.md

latest8.7 KB
Original Source

⚠️ Repo Archive Notice

As of Nov 13, 2020, charts in this repo will no longer be updated. For more information, see the Helm Charts Deprecation and Archive Notice, and Update.

Pachyderm Helm Chart

Pachyderm is a language-agnostic and cloud infrastructure-agnostic large-scale data processing framework based on software containers. This chart can be used to deploy Pachyderm backed by object stores of different Cloud providers.

DEPRECATION NOTICE

This chart is deprecated and no longer supported.

Prerequisites Details

  • Dynamic provisioning of PVs (for non-local deployments)

General chart settings

The following table lists the configurable parameters of pachd and their default values:

ParameterDescriptionDefault
rbac.createEnable RBACtrue
pachd.exposeObjApiExpose S3 APIfalse
pachd.image.repositoryContainer image namepachyderm/pachd
pachd.pfsCacheFile System cache size0G
*.image.tagContainer image tag<latest version>
*.image.pullPolicyImage pull policyAlways
*.worker.repositoryWorker image namepachyderm/worker
*.worker.tagWorker image tag<latest version>
*.replicaCountNumber of pachds1
*.resources.requestsMemory and cpu request{512M,250m}
*.resources.limitsMemory and cpu limitnil
*.service.grpc.annotationsGRPC service additional annotations{}
*.service.grpc.prodGRPC service pord30650
*.service.grpc.typeGRPC service typeNodePort

Next table lists the configurable parameters of etcd and their default values:

ParameterDescriptionDefault
etcd.image.repositoryContainer image namequay.io/coreos/etcd
*.image.tagContainer image tag<latest version>
*.image.pullPolicyImage pull policyIfNotPresent
*.resources.requestsMemory and cpu request{250M,250m}
*.resources.limitsMemory and cpu limitnil
*.persistence.enabledEnable persistencefalse
*.persistence.sizeStorage request20G
*.persistence.accessModeAccess mode for PVReadWriteOnce
*.persistence.storageClassPVC storage classnil

In order to set which object store credentials you want to use, please set the flag credentials with one of the following values: local | s3 | google | amazon | microsoft.

ParameterDescriptionDefault
credentialsBackend credentials""

Based on the storage credentials used, fill in the corresponding parameters for your object store. Note that The local installation will deploy Pachyderm on your local Kubernetes cluster (i.e: minikube) backed by your local storage unit.

On-premises deployment

  • On an on-premise environment like Openstack, a S3 endpoint can be used as storage backend. The following credentials (such as Minio credentials) are configurable:
ParameterDescriptionDefault
s3.accessKeyS3 access key""
s3.secretKeyS3 secret key""
s3.bucketNameS3 bucket name""
s3.endpointS3 endpoint""
s3.secureS3 secure"0"
s3.signatureS3 signature"1"

Google Cloud

  • With Google Cloud credentials, you must define your GCS bucket name:
ParameterDescriptionDefault
google.bucketNameGCS bucket name""
google.credentialsGCP credentials""

Amazon Web Services

  • On Amazon Web Services, please set the next values:
ParameterDescriptionDefault
amazon.bucketNameAmazon bucket name""
amazon.distributionAmazon distribution""
amazon.idAmazon id""
amazon.regionAmazon region""
amazon.roleArnAmazon role arn""
amazon.secretAmazon secret""
amazon.tokenAmazon token""

Microsoft Azure

  • As for Microsoft Azure, you must specify the following parameters:
ParameterDescriptionDefault
microsoft.containerContainer""
microsoft.idAccount name""
microsoft.secretAccount key""

How to install the chart

We strongly suggest that the installation of Pachyderm should be performed in its own namespace. Note that you should have RBAC enabled in your cluster to make the installation work with the default settings. The default installation will deploy Pachyderm on your local Kubernetes cluster:

console
$ helm install --namespace pachyderm --name my-release stable/pachyderm

You should install the chart specifying each parameter using the --set key=value[,key=value] argument to helm install. Please consult the values.yaml file for more information regarding the parameters. For example:

console
$ helm install --namespace pachyderm --name my-release \
--set credentials=s3,s3.accessKey=myaccesskey,s3.secretKey=mysecretkey,s3.bucketName=default_bucket,s3.endpoint=domain.subdomain:8080,etcd.persistence.enabled=true,etcd.persistence.accessMode=ReadWriteMany \
stable/pachyderm

Alternatively, a YAML file that specifies the values for the parameters can be provided while installing the chart:

console
$ helm install --namespace pachyderm --name my-release -f values.yaml stable/pachyderm

Specifying a pachyderm version

To specify a pachyderm version run the following command:

console
$ helm install --namespace pachyderm --name my-release \
--set pachd.image.tag=1.8.6,pachd.worker.tag=1.8.6 \
stable/pachyderm

Accessing the pachd service

In order to use Pachyderm, please login through ssh to the master node and install the Pachyderm client:

console
$ curl -o /tmp/pachctl.deb -L https://github.com/pachyderm/pachyderm/releases/download/v1.8.6/pachctl_1.8.6_amd64.deb && sudo dpkg -i /tmp/pachctl.deb

Please note that the client version should correspond with the pachd service version. For more information please consult the official documentation . Also, if you have your kubernetes client properly configured to talk with your remote cluster, you can simply install pachctl on your local machine and execute: pachctl --namespace <namespace> port-forward &.

Clean-up

In order to remove the Pachyderm release, you can execute the following commands:

console
$ helm list
$ helm delete --purge <release-name>