docs/concepts/jcloud/configuration.md
(jcloud-configuration)=
file-code ConfigurationJCloud extends Jina-serve's {ref}Flow YAML specification<flow-yaml-spec> by introducing the special field jcloud. This lets you define resources and scaling policies for each Executor and Gateway.
Here's a Flow with two Executors that have specific resource needs: indexer requires a 10 GB ebs disk, whereas encoder requires a G4 instance, which implies that two cores and 4 GB RAM are used. See the below sections for further information about instance types.
---
emphasize-lines: 5-7,10-16
---
jtype: Flow
executors:
- name: encoder
uses: jinaai+docker://<username>/Encoder
jcloud:
resources:
instance: C4
- name: indexer
uses: jinaai+docker://<username>/Indexer
jcloud:
resources:
storage:
kind: ebs
size: 10G
Since each Executor has its own business logic, it may require different cloud resources. One Executor might need more RAM, whereas another might need a bigger disk.
In JCloud, you can pass highly customizable, finely-grained resource requests for each Executor using the jcloud.resources argument in your Flow YAML.
JCloud uses the concept of an "instance" to represent a specific set of hardware specifications. In the above example, a C4 instance type represents two cores and 4 GB RAM based on the CPU tiers instance definition table below.
:class: note
We will translate the raw numbers from input to instance tier that fits most closely if you are still using the legacy resource specification interface, such as:
```{code-block} yaml
jcloud:
resources:
cpu: 8
memory: 8G
```
There are circumstances in the instance tier where they don't exactly fulfill the CPU cores and memory you need, like in the above example.
In cases like this we "ceil" the requests to the lowest tier that satisfies all the specifications.
In this case, `C6` would be considered, as `C5`'s `Cores` are lower than what's being requested (4 vs 8).
There are also two types of instance tiers, one for CPU instances, one for GPU.
(jcloud-pricing)=
Each instance has a fixed Credits Per Hour number, indicating how many credits JCloud will charge
if a certain instance is used. For example, if an Executor uses C3, it implies that 10 credits will be spent
from the operating user account. Other important facts to note:
Credits Per Hour is on an Executor/Gateway basis, the total Credits Per Hour of a Flow is the sum of all the credits
each components cost.Credits Per Hour will be multiplied.
For example, if an Executor uses C3 and it has two replicas, the Credits Per Hour for the Executor would double to 20.
The only exception is when sharding is used. In that case C1 would be used for the shards head, regardless of what instance type has been entered for the shared Executor.Please visit [Jina AI Cloud Pricing](https://cloud.jina.ai/pricing/) for more information about billing and credits.
| Instance | Cores | Memory | Credits per hour |
|---|---|---|---|
| C1 | 0.1 | 0.2 GB | 1 |
| C2 | 0.5 | 1 GB | 5 |
| C3 | 1 | 2 GB | 10 |
| C4 | 2 | 4 GB | 20 |
| C5 | 4 | 8 GB | 40 |
| C6 | 8 | 16 GB | 80 |
| C7 | 16 | 32 GB | 160 |
| C8 | 32 | 64 GB | 320 |
By default, C1 is allocated to each Executor and Gateway.
JCloud offers the general Intel Xeon processor (Skylake 8175M or Cascade Lake 8259CL) for the CPU instances.
JCloud supports GPU workloads with two different usages: shared or dedicated.
If GPU is enabled, JCloud will provide NVIDIA A10G Tensor Core GPUs with 24 GB memory for workloads in both usage types.
When using GPU resources, it may take a few extra minutes before all Executors are ready to serve traffic.
| Instance | GPU | Memory | Credits per hour |
|---|---|---|---|
| G1 | shared | 14 GB | 100 |
| G2 | 1 | 14 GB | 125 |
| G3 | 2 | 24 GB | 250 |
| G4 | 4 | 56 GB | 500 |
An Executor using a shared GPU shares this GPU with up to four other Executors.
This enables time-slicing, which allows workloads that land on oversubscribed GPUs to interleave with one another.
To use shared GPU, G1 needs to be specified as the instance type.
The tradeoffs with a shared GPU are increased latency, jitter, and potential out-of-memory (OOM) conditions when many different applications are time-slicing on the GPU. If your application is consuming a lot of memory, we suggest using a dedicated GPU.
Using a dedicated GPU is the default way to provision a GPU for an Executor. This automatically creates nodes or assigns the Executor to a GPU node. In this case, the Executor owns the whole GPU.
To use a dedicated GPU, G2/ G3 / G4 needs to be specified as instance type.
JCloud supports three kinds of storage: ephemeral (default), efs (network file storage) and ebs (block device).
ephemeral storage will assign space to an Executor when it is created. Data in ephemeral storage is deleted permanently if Executors are restarted or rescheduled.
By default, we assign `ephemeral` storage to all Executors in a Flow. This lets the storage resize dynamically, so you don't need to shrink/grow volumes manually.
If your Executor needs to share data with other Executors and retain data persistency, consider using `efs`. Note that:
- IO performance is slower compared to `ebs` or `ephemeral`
- The disk can be shared with other Executors or Flows.
- Default storage size is 5 GB.
If your Executor needs high IO, you can use `ebs` instead. Note that:
- The disk cannot be shared with other Executors or Flows.
- Default storage size is 5 GB.
JCloud also supports retaining the data that a Flow was using while it was active. You can set the retain argument to true to enable this feature.
---
emphasize-lines: 5-10,12,15
---
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
resources:
storage:
kind: ebs
size: 10G
retain: true
- name: executor2
uses: jinaai+docker://<username>/Executor2
jcloud:
resources:
storage:
kind: efs
Here are the numbers in terms of credits per GB per month for the three kinds of storage described above.
| Instance | Credits per GB per month |
|---|---|
| Ephemeral | 0 |
| EBS | 30 |
| EFS | 75 |
For example, using 10 GB of EBS storage for a month costs 30 credits.
If shards/replicas are used, we will multiply credits further by the number of storages created.
On JCloud, demand-based autoscaling functionality is naturally offered thanks to the underlying Kubernetes architecture. This means that you can maintain serverless deployments in a cost-effective way with no headache of setting the right number of replicas anymore!
jinaai+serverless://The easiest way to scale out your Executor is to use a Serverless Executor. This can be enabled by using jinaai+serverless:// instead of jinaai+docker:// in Executor's uses, such as:
---
emphasize-lines: 4
---
jtype: Flow
executors:
- name: executor1
uses: jinaai+serverless://<username>/Executor1
JCloud autoscaling leverages Knative behind the scenes, and jinahub+serverless uses a set of Knative configurations as defaults.
For more information about the Knative autoscaling configurations, please visit [Knative autoscaling](https://knative.dev/docs/serving/autoscaling/).
If jinaai+serverless:// doesn't meet your requirements, you can further customize autoscaling configurations by using the autoscale argument on a per-Executor basis in the Flow YAML, such as:
---
emphasize-lines: 5-10
---
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
autoscale:
min: 1
max: 2
metric: rps
target: 50
Below are the defaults and requirements for the configurations:
| Name | Default | Allowed | Description |
|---|---|---|---|
| min | 1 | int | Minimum number of replicas (0 means serverless) |
| max | 2 | int, up to 5 | Maximum number of replicas |
| metric | concurrency | concurrency / rps / cpu / memory | Metric for scaling |
| scale_down_delay | 30s | str, 0s <= value <= 1h | Time window which must pass at reduced concurrency before a scaling down |
| target | 100 | int | Target number the replicas try to maintain. |
The unit of target depends of the metric specified. Refer to the table below:
| Metric | Target |
|---|---|
concurrency | Number of concurrent requests processed at any given time. |
rps | Number of requests processed per second per replica. |
cpu | Average % CPU utilization of each pod |
(e.g. 60 means replicas will be scaled up when pods on average reach 60% CPU utilization) | |
memory | Average mebibytes of memory used by each pod |
(e.g. 200 means replicas will be scaled up when the average pods' memory consumption exceeds 200MiB). |
After you make a JCloud deployment using the autoscaling configuration, the Flow serving part is just the same: the only difference you may notice is it takes a few extra seconds to handle the initial requests since it needs to scale the deployments behind the scenes. Let JCloud handle the scaling from now on, and you can deal with the code!
Note, that if metric is cpu or memory, min will be reset to 1 if user sets it to set to 0.
At present, pricing for autoscaled Executor/Gateway largely follows the same {ref}JCloud pricing rules <jcloud-pricing> as other Jina AI services.
We track the minimum number of replicas in autoscale configurations and use it as a multiplier for the replicas used when calculating the
Credits Per Hour.
- Autoscale does not currently allow the use of `ebs` as a storage type in combination. Please use `efs` and `ephemeral` instead.
- Autoscale is not supported for multi-protocol Gateways.
If service issues cause disruption of Executors, JCloud lets you specify a tolerance level for number of replicas that stay up or go down.
The JCloud parameters minAvailable and maxUnavailable ensure that Executors will stay up even if a certain number of replicas go down.
| Name | Default | Allowed | Description |
|---|---|---|---|
minAvailable | N/A | Lower than number of replicas | Minimum number of replicas available during disruption |
maxUnavailable | N/A | Lower than numbers of replicas | Maximum number of replicas unavailable during disruption |
---
emphasize-lines: 5-6
---
jtype: Flow
executors:
- uses: jinaai+docker://<username>/Executor1
replicas: 5
jcloud:
minAvailable: 2
In case of disruption, ensure at least two replicas will still be available, while three may be down.
---
emphasize-lines: 5-6
---
jtype: Flow
executors:
- uses: jinaai+docker://<username>/Executor1
replicas: 5
jcloud:
maxUnavailable: 2
In case of disruption, ensure that if a maximum of two replicas are down, at least three replicas will still be available.
The Gateway can be customized just like an Executor.
By default, the Gateway will close connections that have been idle for over 600 seconds. If you want a longer connection timeout threshold, change the timeout parameter under gateway.jcloud.
---
emphasize-lines: 2-4
---
jtype: Flow
gateway:
jcloud:
timeout: 800
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
To customize the Gateway's CPU or memory, specify the instance type under gateway.jcloud.resources:
---
emphasize-lines: 2-6
---
jtype: Flow
gateway:
jcloud:
resources:
instance: C3
executors:
- name: encoder
uses: jinaai+docker://<username>/Encoder
A Flow deployment without a Gateway is often used for {ref}external-executors, which can be shared between different Flows. You can expose an Executor by setting expose: true (and un-expose the Gateway by setting expose: false):
---
emphasize-lines: 2-4, 8-9
---
jtype: Flow
gateway:
jcloud:
expose: false # don't expose the Gateway
executors:
- name: custom
uses: jinaai+docker://<username>/CustomExecutor
jcloud:
expose: true # expose the Executor
:width: 70%
You can expose the Gateway along with Executors:
---
emphasize-lines: 2-4,8-9
---
jtype: Flow
gateway:
jcloud:
expose: true
executors:
- name: custom1
uses: jinaai+docker://<username>/CustomExecutor1
jcloud:
expose: true # expose the Executor
:width: 70%
You can use the name argument to specify the Flow name in the Flow YAML:
---
emphasize-lines: 2-3
---
jtype: Flow
jcloud:
name: my-name
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
To control Jina's version while deploying a Flow to jcloud, you can pass the version argument in the Flow YAML:
---
emphasize-lines: 2-3
---
jtype: Flow
jcloud:
version: 3.10.0
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
You can use labels (as key-value pairs) to attach metadata to your Flows and Executors:
Flow level labels:
---
emphasize-lines: 2-5
---
jtype: Flow
jcloud:
labels:
username: johndoe
app: fashion-search
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
Executor level labels:
---
emphasize-lines: 5-8
---
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
labels:
index: partial
group: backend
Keys in `labels` have the following restrictions:
- Must be 63 characters or fewer.
- Must begin and end with an alphanumeric character ([a-z0-9A-Z]) with dashes (-), underscores (_), dots (.), and alphanumerics between.
- The following keys are skipped if passed in the Flow YAML.
- `user`
- `jina`-version
To enable tracing support in Flows, you can pass enable: true argument in the Flow YAML. (Tracing support is not enabled by default in JCloud)
---
emphasize-lines: 2-5
---
jtype: Flow
jcloud:
monitor:
traces:
enable: true
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
You can pass the enable: true argument to gateway to only enable tracing support in the Gateway:
---
emphasize-lines: 2-6
---
jtype: Flow
gateway:
jcloud:
monitor:
traces:
enable: true
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
You can also only enable tracing support in executor1.
---
emphasize-lines: 5-8
---
jtype: Flow
executors:
- name: executor1
uses: jinaai+docker://<username>/Executor1
jcloud:
monitor:
traces:
enable: true