src/doc/rgw/noblock-reshard.md
Non-block resharding a bucket require upgraded to a supported release
Backward compatibility
Split the bucket resharding into two phases: logrecord and progress. A duplicated copy of index entry will be written with index operation to src shards in first phase, and the client writes will not be blocked; we then block the client writes, go through the recording log and copy the changed index entries to dest shards in second phase. In this way, we can greatly reduce the time blocking client writes in resharding a bucket.
The record log key is like 0x802001_idx, the idx uses the same key with original index entry but under this new 2001_ namespace, with versioned entries under 2001_1000_ or 2001_1001_.
In logrecord state, copy inventoried index entries to dest shards and record a duplicated copy for new writting entry.
In progress state, block the writes, listing the copys written in logrecord state and copy then to dest shards. If the index key exists in dest shard but not in src shard, then delete it from dest shard too.
When a bucket reshard faild in the logrecord phase, the duplicated copys should be stopped written within a short time. To achieve it, we judge whether the resharding is executing properly in recording log once in the while, and the time is rgw_reshard_progress_judge_interval. If it has already failed, we clear resharding status and stop recording copys.
The privious release only has one reshard phase: the progress phase which will block client writes. Because our release contains this phase and the process is same too, that means it is superset of privious release. So when privious rgw initiates a reshard, it will execute as before.
When a updated rgw initiates a reshard, it firstly enter the logrecord phase which privious releases do not realized. That means the nodes which do not upgraded will deal with client write operations without recording copys. It may leads to part of these index entries missed. So we forbit this scene by adding trim_reshard_log_entries() and cls_rgw_bucket_init_index2() control source and target versions, old osds would fail the request with -EOPNOTSUPP. so radosgw could start by trying that on all shards. if there are no errors, it can safely proceed with the new scheme. If any of the osds do return -EOPNOTSUPP there, then rgw fall back to the current resharding scheme where writes are blocked the whole time.