legacy_rfcs/text/0013_saved_object_migrations.md
Improve the Saved Object migration algorithm to ensure a smooth Kibana upgrade procedure.
Kibana version upgrades should have a minimal operational impact. To achieve this, users should be able to rely on:
The biggest hurdle to achieving the above is Kibana’s Saved Object migrations. Migrations aren’t resilient and require manual intervention anytime an error occurs (see 3. Saved Object Migration Errors).
It is impossible to discover these failures before initiating downtime. Errors
often force users to roll-back to a previous version of Kibana or cause hours
of downtime. To retry the migration, users are asked to manually delete a
.kibana_x index. If done incorrectly this can lead to data loss, making it a
terrifying experience (restoring from a pre-upgrade snapshot is a safer
alternative but not mentioned in the docs or logs).
Cloud users don’t have access to Kibana logs to be able to identify and remedy the cause of the migration failure. Apart from blindly retrying migrations by restoring a previous snapshot, cloud users are unable to remedy a failed migration and have to escalate to support which can further delay resolution.
Taken together, version upgrades are a major operational risk and discourage users from adopting the latest features.
Any of the following classes of errors could result in a Saved Object migration failure which requires manual intervention to resolve:
circuit_breaking_exception (insufficient heap memory)process_cluster_event_timeout_exception for index-aliases, create-index, put-mappingsTooManyRequests while doing a count of documents requiring a migrationThe proposed design makes several important assumptions and tradeoffs.
Background:
The 7.x upgrade documentation lists taking an Elasticsearch snapshot as a
required step, but we instruct users to retry migrations and perform rollbacks
by deleting the failed .kibana_n index and pointing the .kibana alias to
.kibana_n-1:
Assumptions and tradeoffs:
plugin_a_type
then pluginB must never register that same type, even if pluginA is
disabled. Although we cannot enforce it on third-party plugins, breaking
this assumption may lead to data loss.Achieves goals: (2.3) Mitigates errors: (3.1), (3.2)
Achieves goals: (2.2), (2.6) Mitigates errors (3.3) and (3.4)
External conditions such as failures from an unhealthy Elasticsearch cluster (3.3) can cause the migration to fail. The Kibana cluster should be able to recover automatically once these external conditions are resolved. There are two broad approaches to solving this problem based on whether or not migrations are idempotent:
| Idempotent migrations | Description |
|---|---|
| Yes | Idempotent migrations performed without coordination |
| No | Single node migrations coordinated through a lease / lock |
Idempotent migrations don't require coordination making the algorithm significantly less complex and will never require manual intervention to retry. We, therefore, prefer this solution, even though it introduces restrictions on migrations (4.2.1.1). For other alternatives that were considered see section (5).
The migration system can be said to be idempotent if the same results are produced whether the migration was run once or multiple times. This property should hold even if new (up to date) writes occur in between migration runs which introduces the following restrictions:
Although these restrictions require significant changes, it does not prevent known upcoming migrations such as sharing saved-objects in multiple spaces or splitting a saved object into multiple child documents. To ensure that these migrations are idempotent, they will have to generate new saved object id's deterministically with e.g. UUIDv5.
Note:
Locate the source index by fetching kibana indices:
GET '/_indices/.kibana,.kibana_7.10.0'
The source index is:
.kibana alias points to, or if it doesn't exist,.kibana indexIf none of the aliases exists, this is a new Elasticsearch cluster and no
migrations are necessary. Create the .kibana_7.10.0_001 index with the
following aliases: .kibana and .kibana_7.10.0.
If the source is a < v6.5 .kibana index or < 7.4 .kibana_task_manager
index prepare the legacy index for a migration:
/<index>/_close API, we expect to receive "acknowledged" : true and "shards_acknowledged" : true. If all shards don’t acknowledge within the timeout, retry the operation until it succeeds..kibana_pre6.5.0_001 or
.kibana_task_manager_pre7.4.0_001. Ignore index already exists errors.convertToAlias script if specified. Use wait_for_completion: false
to run this as a task. Ignore errors if the legacy source doesn't exist.POST /_aliases
{
"actions" : [
{ "remove_index": { "index": ".kibana" } }
{ "add": { "index": ".kibana_pre6.5.0_001", "alias": ".kibana" } },
]
}
```.
Unlike the delete index API, the `remove_index` action will fail if
provided with an _alias_. Therefore, if another instance completed this
step, the `.kibana` alias won't be added to `.kibana_pre6.5.0_001` a
second time. This avoids a situation where `.kibana` could point to both
`.kibana_pre6.5.0_001` and `.kibana_7.10.0_001`. These actions are
applied atomically so that other Kibana instances will always see either
a `.kibana` index or an alias, but never neither.
Ignore "The provided expression [.kibana] matches an alias, specify the
corresponding concrete indices instead." or "index_not_found_exception"
errors as this means another instance has already completed this step.
.kibana_pre6.5.0_001 as the source for the rest of the migration algorithm.If .kibana and .kibana_7.10.0 both exists and are pointing to the same index this version's migration has already been completed.
Fail the migration if:
.kibana is pointing to an index that belongs to a later version of Kibana .e.g. .kibana_7.12.0_001Search the source index for documents with types not registered within Kibana. Fail the migration if any document is found.
Set a write block on the source index. This prevents any further writes from outdated nodes.
Create a new temporary index .kibana_7.10.0_reindex_temp with dynamic: false on the top-level mappings so that any kind of document can be written to the index. This allows us to write untransformed documents to the index which might have fields which have been removed from the latest mappings defined by the plugin. Define minimal mappings for the migrationVersion and type fields so that we're still able to search for outdated documents that need to be transformed.
Reindex the source index into the new temporary index using a 'client-side' reindex, by reading batches of documents from the source, migrating them, and indexing them into the temp index.
op_type=index so that multiple instances can perform the reindex in parallel (last node running will override the documents, with no effect as the input data is the same)version_conflict_engine_exception exceptions as they just mean that another node was indexing the same documentstarget_index_had_write_block exception is encountered for all document of a batch, assume that another node already completed the temporary index reindex, and jump to the next stepClone the temporary index into the target index .kibana_7.10.0_001. Since any further writes will only happen against the cloned target index this prevents a lost delete from occuring where one instance finishes the migration and deletes a document and another instance's reindex operation re-creates the deleted document.
001 postfix in the target index name isn't used by Kibana, but allows for re-indexing an index should this be required by an Elasticsearch upgrade. E.g. re-index .kibana_7.10.0_001 into .kibana_7.10.0_002 and point the .kibana_7.10.0 alias to .kibana_7.10.0_002.)Transform documents by reading batches of outdated documents from the target index then transforming and updating them with optimistic concurrency control.
Update the mappings of the target index
migrationMappingPropertyHashes metadata.PUT /.kibana_7.10.0_001/_mapping. The API deeply merges any updates so this won't remove the mappings of any plugins that are disabled on this instance but have been enabled on another instance that also migrated this index.POST /.kibana_7.10.0_001/_update_by_query?conflicts=proceed. In the future we could optimize this query by only targeting documents:Mark the migration as complete. This is done as a single atomic operation (requires https://github.com/elastic/elasticsearch/pull/58100) to guarantee that when multiple versions of Kibana are performing the migration in parallel, only one version will win. E.g. if 7.11 and 7.12 are started in parallel and migrate from a 7.9 index, either 7.11 or 7.12 should succeed and accept writes, but not both.
.kibana alias is still pointing to the source index.kibana_7.10.0 and .kibana aliases to the target index..kibana_7.10.0_reindex_temp.kibana again:
.kibana is not pointing to our target index fail the migration..kibana is pointing to our target index the migration has succeeded and we can proceed to step (12).Start serving traffic. All saved object reads/writes happen through the
version-specific alias .kibana_7.10.0.
Together with the limitations, this algorithm ensures that migrations are idempotent. If two nodes are started simultaneously, both of them will start transforming documents in that version's target index, but because migrations are idempotent, it doesn’t matter which node’s writes win.
(Also present in our existing migration algorithm since v7.4) When the task manager index gets reindexed a reindex script is applied. Because we delete the original task manager index there is no way to rollback a failed task manager migration without a snapshot. Although losing the task manager data has a fairly low impact.
(Also present in our existing migration algorithm since v6.5) If the outdated instance isn't shutdown before starting the migration, the following data-loss scenario is possible:
.kibana
alias to .kibana_7.11.0_001.kibana.Note:
It is possible to work around this weakness by introducing a new alias such as
.kibana_current so that after a migration the .kibana alias will continue
to point to the outdated index. However, we decided to keep using the
.kibana alias despite this weakness for the following reasons:
.kibana alias for snapshots, so if this alias no
longer points to the latest index their snapshots would no longer backup
kibana's latest data.Although the migration algorithm guarantees there's no data loss while providing read-only access to outdated nodes, this could cause plugins to behave in unexpected ways. If we wish to persue it in the future, enabling read-only functionality during the downtime window will be it's own project and must include an audit of all plugins' behaviours.
</details>When a newer Kibana starts an upgrade, it blocks all writes to the outdated index to prevent data loss. Since Kibana is not designed to gracefully handle a read-only index this could have unintended consequences such as a task executing multiple times but never being able to write that the task was completed successfully. To prevent unintended consequences, the following procedure should be followed when upgrading Kibana:
SIGTERM signal
503 from it's healthcheck endpoint to signal to
the load balancer that it's no longer accepting new traffic (requires https://github.com/elastic/kibana/issues/46984).connection: close header.To rollback to a previous version of Kibana with a snapshot
To rollback to a previous version of Kibana without a snapshot: (Assumes the migration to 7.11.0 failed)
DELETE /.kibana_7.11.0.kibana alias
PUT /.kibana/_settings {"index.blocks.write": false}It is possible for a plugin to create documents in one version of Kibana, but then when upgrading Kibana to a newer version, that plugin is disabled. Because the plugin is disabled it cannot register it's Saved Objects type including the mappings or any migration transformation functions. These "orphan" documents could cause future problems:
As a concrete example of the above, consider a user taking the following steps:
There are several approaches we could take to dealing with these orphan documents:
Start up but refuse to query on types with outdated documents until a user manually triggers a re-migration
Advantages:
Disadvantages:
/status endpoint.To perform a re-migration:
.kibana_7.10.0 aliasmigrations.target_index_postfix: '002' to create a new target index .kibana_7.10.0_002 and keep the .kibana_7.10.0_001 index to be able to perform a rollback.Refuse to start Kibana until the plugin is enabled or it's data deleted
Advantages:
Disadvantages:
Refuse to start a migration until the plugin is enabled or it's data deleted
Advantages:
Disadvantages:
Use a hash of enabled plugins as part of the target index name
Using a migration target index name like
.kibana_7.10.0_${hash(enabled_plugins)}_001 we can migrate all documents
every time a plugin is enabled / disabled.
Advantages:
Disadvantages:
Transform outdated documents (step 8) on every startup Advantages:
Disadvantages:
We prefer option (3) since it provides flexibility for disabling plugins in the same version while also protecting users' data in all cases during an upgrade migration. However, because this is a breaking change we will implement (5) during 7.x and only implement (3) during 8.x.
We considered implementing rolling upgrades to provide zero downtime migrations. However, this would introduce significant complexity for plugins: they will need to maintain up and down migration transformations and ensure that queries match both current and outdated documents across all versions. Although we can afford the once-off complexity of implementing rolling upgrades, the complexity burden of maintaining plugins that support rolling-upgrades will slow down all development in Kibana. Since a predictable downtime window is sufficient for our users, we decided against trying to achieve zero downtime with rolling upgrades. See "Rolling upgrades" in https://github.com/elastic/kibana/issues/52202 for more information.
This alternative is a proposed algorithm for coordinating migrations so that these only happen on a single node and therefore don't have the restrictions found in (4.2.1.1). We decided against this algorithm primarily because it is a lot more complex, but also because it could still require manual intervention to retry from certain unlikely edge cases.
<details> <summary>It's impossible to guarantee that a single node performs the migration and automatically retry failed migrations.</summary>Coordination should ensure that only one Kibana node performs the migration at a given time which can be achived with a distributed lock built on top of Elasticsearch. For the Kibana cluster to be able to retry a failed migration, requires a specialized lock which expires after a given amount of inactivity. We will refer to such expiring locks as a "lease".
If a Kibana process stalls, it is possible that the process' lease has expired but the process doesn't yet recognize this and continues the migration. To prevent this from causing data loss each lease should be accompanied by a "guard" that prevents all writes after the lease has expired. See how to do distributed locking for an in-depth discussion.
Elasticsearch doesn't provide any building blocks for constructing such a guard.
</details>However, we can implement a lock (that never expires) with strong data-consistency guarantees. Because there’s no expiration, a failure between obtaining the lock and releasing it will require manual intervention. Instead of trying to accomplish the entire migration after obtaining a lock, we can only perform the last step of the migration process, moving the aliases, with a lock. A permanent failure in only this last step is not impossible, but very unlikely.
.kibana_3ef25ff1-090a-4335-83a0-307a47712b4e)..kibana →
.kibana_3ef25ff1-090a-4335-83a0-307a47712b4e. This automatically releases
the document lock (and any leases) because the new index will contain an
empty kibana_cluster_state.If a process crashes or is stopped after (3) but before (4) the lock will have
to be manually removed by deleting the kibana_cluster_state document from
.kibana or restoring from a snapshot.
To improve on the existing Saved Objects migrations lock, a locking algorithm needs to satisfy the following requirements:
n for removing the correct .kibana_n index).Algorithm:
kibana_cluster_state lease document from .kibanaheartbeat_interval seconds by sending an
update operation that adds it’s UUID to the nodes array and sets the
lastSeen value to the current local node time. If the update fails due to
a version conflict the update operation is retried after a random delay by
fetching the document again and attempting the update operation once more.kibana_cluster_state documenthasLock === false it sets it’s own hasLock to
true and attempts to write the document. If the update fails
(presumably because of another node’s heartbeat update) it restarts the
process to obtain a lease from step (3).hasLock === true the node failed to acquire a
lock and waits until the active lock has expired before attempting to
obtain a lock again.hasLock = false. The fetch + update operations are retried until
this node’s hasLock === false.Each machine writes a UUID to a file, so a single machine may have multiple
processes with the same Kibana UUID, so we should rather generate a new UUID
just for the lifetime of this process.
KibanaClusterState document format:
nodes: {
"852bd94e-5121-47f3-a321-e09d9db8d16e": {
version: "7.6.0",
lastSeen: [ 1114793, 555149266 ], // hrtime() big int timestamp
hasLease: true,
hasLock: false,
},
"8d975c5b-cbf6-4418-9afb-7aa3ea34ac90": {
version: "7.6.0",
lastSeen: [ 1114862, 841295591 ],
hasLease: false,
hasLock: false,
},
"3ef25ff1-090a-4335-83a0-307a47712b4e": {
version: "7.6.0",
lastSeen: [ 1114877, 611368546 ],
hasLease: false,
hasLock: false,
},
},
oplog: [
{op: 'ACQUIRE_LOCK', node: '852bd94e...', timestamp: '2020-04-20T11:58:56.176Z'}
]
}
The simplest way to check for lease expiry is to inspect the lastSeen value.
If lastSeen + expiry_timeout > now the lock is considered expired. If there
are clock drift or daylight savings time adjustments, there’s a risk that a
node loses it’s lease before expiry_timeout has occurred. Since losing a
lock prematurely will not lead to data loss it’s not critical that the
expiry time is observed under all conditions.
A slightly safer approach is to use a monotonically increasing clock
(process.hrtime()) and relative time to determine expiry. Using a
monotonically increasing clock guarantees that the clock will always increase
even if the system time changes due to daylight savings time, NTP clock syncs,
or manually setting the time. To check for expiry, other nodes poll the
cluster state document. Once they see that the lastSeen value has increased,
they capture the current hr time current_hr_time and starts waiting until
process.hrtime() - current_hr_time > expiry_timeout if at that point
lastSeen hasn’t been updated the lease is considered to have expired. This
means other nodes can take up to 2*expiry_timeout to recognize an expired
lease, but a lease will never expire prematurely.
Any node that detects an expired lease can release that lease by setting the
expired node’s hasLease = false. It can then attempt to acquire its lease.
When multiple versions of Kibana are running at the same time, writes from the outdated node can end up either in the outdated Kibana index, the newly migrated index, or both. New documents added (and some updates) into the old index while a migration is in-progress will be lost. Writes that end up in the new index will be in an outdated format. This could cause queries on the data to only return a subset of the results which leads to incorrect results or silent data loss.
Minimizing data loss from mixed 7.x versions, introduces two additional steps to rollback to a previous version without a snapshot:
.kibana alias to the previous Kibana index .kibana_n-1.kibana_n.kibana_n-1.kibana_n-1Since our documentation and server logs have implicitly encouraged users to rollback without using snapshots, many users might have to rely on these additional migration steps to perform a rollback. Since even the existing steps are error prone, introducing more steps will likely introduce more problems than what it solves.
.kibana_saved_objects
alias to locate the current index. If .kibana_saved_objects doesn't
exist, newer versions will fallback to reading .kibana..kibana_saved_objects points to and then read and write directly from
the index instead of the alias..kibana index with a
migrationVersion set to the current version of Kibana. If an outdated
node is started up after a migration was started it will detect that
newer documents are present in the index and refuse to start up..kibana is never advanced,
it will be pointing to a read-only index which prevent writes from
6.8+ releases which are already online.We considered an algorithm that re-uses the same index for migrations and an approach to minimize data-loss if our upgrade procedures aren't followed. This is no longer our preferred approach because of several downsides:
- It requires taking snapshots to prevent data loss so we can only release this in 8.x
- Minimizing data loss with unsupported upgrade configurations adds significant complexity and still doesn't guarantee that data isn't lost.
migrationVersion numbers.Advantages:
.kibana_n.Drawbacks:
.kibana_n indices as backups.
(Apart from the need to educate users, snapshot restores provide many
benefits).This alternative can reduce some data loss when our upgrade procedure isn't followed with the algorithm in (5.4.1).
Even if (4.5.2) is the only supported upgrade procedure, we should try to prevent data loss when these instructions aren't followed.
To prevent data loss we need to prevent any writes from older nodes. We use a version-specific alias for this purpose. Each time a migration is started, all other aliases are removed. However, aliases are stored inside Elasticsearch's ClusterState and this state could remain inconsistent between nodes for an unbounded amount of time. In addition, bulk operations that were accepted before the alias was removed will continue to run even after removing the alias.
As a result, Kibana cannot guarantee that there would be no data loss but instead, aims to minimize it as much as possible by adding the bold sections to the migration algorithm from (5.4.1)
action.auto_create_index for the Kibana system indices..kibana index with a newer version.migrationVersion numbers..kibana index .e.g .kibana_8.0.1. During and
after the migration, all saved object reads and writes use this alias
instead of reading or writing directly to the index. By using the atomic
POST /_aliases API we minimize the chance that an outdated node creating
new outdated documents can cause data loss..kibana_n.Steps (2) and (3) from the migration algorithm in minimize the chances of the following scenarios occuring but cannot guarantee it. It is therefore useful to enumarate some scenarios and their worst case impact:
This alternative prevents a failed migration when there's a migration transform function bug or a document with invalid data. Although it seems preferable to not fail the entire migration because of a single saved object type's migration transform bug or a single invalid document this has several pitfalls:
- When an object fails to migrate the data for that saved object type becomes inconsistent. This could load to a critical feature being unavailable to a user leaving them with no choice but to downgrade.
- Because Kibana starts accepting traffic after encountering invalid objects a rollback will lead to data loss leaving users with no clean way to recover. As a result we prefer to let an upgrade fail and making it easy for users to rollback until they can resolve the root cause.
Achieves goals: (2.2) Mitigates Errors (3.1), (3.2)