(kubernetes-docs)=

{fas}`dharmachakra` Kubernetes Support

{toctree}

:hidden:

kubernetes

Jina-serve is a cloud-native framework and therefore runs natively and easily on Kubernetes. Deploying a Jina-serve Deploymenr or Flow on Kubernetes is actually the recommended way to use Jina-serve in production.

A {class}~jina.Deployment and {class}~jina.Flow are services composed of single or multiple microservices called {class}~jina.Executor and {class}~jina.Gateways which natively run in containers. This means that Kubernetes can natively take over the lifetime management of Executors.

Deploying a {class}~jina.Deployment or ~jina.Flow on Kubernetes means wrapping these services containers in the appropriate K8s abstraction (Deployment, StatefulSet, and so on), exposing them internally via K8s service and connecting them together by passing the right set of parameters.

{hint}

This documentation is designed for users who want to **manually** deploy a Jina-serve project on Kubernetes.

Check out {ref}`jcloud` if you want a **one-click** solution to deploy and host Jina, leveraging a cloud-native stack of Kubernetes, Prometheus and Grafana, **without worrying about provisioning**.

Automatically translate a Deployment or Flow to Kubernetes concept

{hint}

Manually building these Kubernetes YAML object is long and cumbersome. Therefore we provide a helper function {meth}`~jina.Flow.to_kubernetes_yaml` that does most of this
translation work automatically.

This helper function can be called from:

Jina-serve's Python interface to translate a Flow defined in Python to K8s YAML files
Jina-serve's CLI interface to export a YAML Flow to K8s YAML files

{seealso}

More detail in the {ref}`Deployment export documentation<deployment-kubernetes-export>` and {ref}`Flow export documentation <flow-kubernetes-export>`

Extra Kubernetes options

In general, Jina-serve follows a single principle when it comes to deploying in Kubernetes: You, the user, know your use case and requirements the best. This means that, while Jina-serve generates configurations for you that run out of the box, as a professional user you should always see them as just a starting point to get you off the ground.

{hint}

The export function {meth}`~jina.Deployment.to_kubernetes_yaml` and {meth}`~jina.Flow.to_kubernetes_yaml` are helper functions to get your stared off the ground. **There are meant to be updated and adapted to every use case**

{admonition}

:class: caution
If you change the Docker images for {class}`~jina.Executor` and {class}`~jina.Gateway` in your Kubernetes-generated file, ensure that all of them are built with the same Jina-serve version to guarantee compatibility.

You can't add basic Kubernetes features like Secrets, ConfigMap or Labels via the Pythonic or YAML interface. This is intentional and doesn't mean that we don't support these features. On the contrary, we let you fully express your Kubernetes configuration by using the Kubernetes API to add your own Kubernetes standard to Jina-serve.

{admonition}

:class: hint
We recommend you dump the Kubernetes configuration files and then edit them to suit your needs.

Here are possible configuration options you may need to add or change

Add labels selectors to the Deployments to suit your case
Add requests and limits for the resources of the different Pods
Set up persistent volume storage to save your data on disk
Pass custom configuration to your Executor with ConfigMap
Manage credentials of your Executor with Kubernetes secrets, you can use f.add(..., env_from_secret={'SECRET_PASSWORD': {'name': 'mysecret', 'key': 'password'}}) to map them to Pod environment variables
Edit the default rolling update configuration

(service-mesh-k8s)=

Required service mesh

{caution}

A Service Mesh is required to be installed and correctly configured in the K8s cluster in which your deployed your Flow.

Service meshes work by attaching a tiny proxy to each of your Kubernetes Pods, allowing for smart rerouting, load balancing, request retrying, and host of other features.

Jina relies on a service mesh to load balance requests between replicas of the same Executor. You can use your favourite Kubernetes service mesh in combination with your Jina services, but the configuration files generated by to_kubernetes_yaml() already include all necessary annotations for the Linkerd service mesh.

{admonition}

:class: hint
You can use any service mesh with Jina-serve, but Jina-serve Kubernetes configurations come with Linkerd annotations out of the box.

To use Linkerd you can follow the install the Linkerd CLI guide.

{admonition}

:class: caution

Many service meshes can perform retries themselves.
Be careful about setting up service mesh level retries in combination with Jina, as it may lead to unwanted behaviour in combination with
Jina's own {ref}`retry policy <flow-error-handling>`.

Instead, you can disable Jina level retries by setting `Flow(retries=0)` in Python, or `retries: 0` in the Flow
YAML's `with` block.

(kubernetes-replicas)=

Scaling Executors: Replicas and shards

Jina supports two types of scaling:

Replicas can be used with any Executor type and are typically used for performance and availability.
Shards are used for partitioning data and should only be used with indexers since they store state.

Check {ref}here <scale-out> for more information about these scaling mechanisms.

For shards, Jina creates one separate Deployment in Kubernetes per Shard. Setting Deployment(..., shards=num_shards) is sufficient to create a corresponding Kubernetes configuration.

For replicas, Jina-serve uses Kubernetes native replica scaling and relies on a service mesh to load-balance requests between replicas of the same Executor. Without a service mesh installed in your Kubernetes cluster, all traffic will be routed to the same replica.

{admonition}

:class: seealso

The impossibility of load balancing between different replicas is a limitation of Kubernetes in combination with gRPC.
If you want to learn more about this limitation, see [this](https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/) Kubernetes Blog post.

Scaling the Gateway

The {ref}Gateway <gateway> is responsible for providing the API of the {ref}Flow <flow>. If you have a large Flow with many Clients and many replicated Executors, the Gateway can become the bottleneck. In this case you can also scale up the Gateway deployment to be backed by multiple Kubernetes Pods. For this reason, you can add replicas parameter to your Gateway before converting the Flow to Kubernetes. This can be done in a Pythonic way or in YAML:

{tab}


You can use {meth}`~jina.Flow.config_gateway` to add `replicas` parameter
```python
from jina import Flow

f = Flow().config_gateway(replicas=3).add()

f.to_kubernetes_yaml('./k8s_yaml_path')
```

{tab}

You can add `replicas` in the `gateway` section of your Flow YAML
```yaml
jtype: Flow
gateway:
    replicas: 3
executors:
  - name: encoder
```

Alternatively, this can be done by the regular means of Kubernetes: Either increase the number of replicas in the {ref}generated yaml configuration files <kubernetes-export> or add replicas while running. To expose your Gateway replicas outside Kubernetes, you can add a load balancer as described {ref}here <kubernetes-expose>.

{admonition}

:class: hint
You can use a custom Docker image for the Gateway deployment by setting the environment variable `JINA_GATEWAY_IMAGE` to the desired image before generating the configuration.

{fas}`dharmachakra` Kubernetes Support

{fas}dharmachakra Kubernetes Support

Automatically translate a Deployment or Flow to Kubernetes concept

Extra Kubernetes options

Required service mesh

Scaling Executors: Replicas and shards

Scaling the Gateway

See also

{fas}`dharmachakra` Kubernetes Support