doc/releases/emperor.rst
Emperor is the 5th stable release of Ceph. It is named after the emperor squid.
Monitor 'auth' read-only commands now expect the user to have 'rx' caps. This is the same behavior that was present in dumpling, but in emperor and more recent development releases the 'r' cap was sufficient. Note that this backported security fix will break mon keys that are using the following commands but do not have the 'x' bit in the mon capability::
ceph auth export ceph auth get ceph auth get-key ceph auth print-key ceph auth list
This is the second bugfix release for the v0.72.x Emperor series. We have fixed a hang in radosgw, and fixed (again) a problem with monitor CLI compatibility with mixed version monitors. (In the future this will no longer be a problem.)
The JSON schema for the 'osd pool set ...' command changed slightly. Please avoid issuing this particular command via the CLI while there is a mix of v0.72.1 and v0.72.2 monitor daemons running.
As part of fix for #6796, 'ceph osd pool set <pool> <var> <arg>' now receives <arg> as an integer instead of a string. This affects how 'hashpspool' flag is set/unset: instead of 'true' or 'false', it now must be '0' or '1'.
For more detailed information, see :download:the complete changelog <../changelog/v0.72.2.txt>.
When you are upgrading from Dumpling to Emperor, do not run any of the "ceph osd pool set" commands while your monitors are running separate versions. Doing so could result in inadvertently changing cluster configuration settings that exhaust compute resources in your OSDs.
This release addresses issue #6761. Upgrading to Emperor can cause reads to begin returning ENFILE (too many open files). v0.72.1 fixes that upgrade issue and adds a tool ceph_filestore_tool to repair osd stores affected by this bug.
To repair a cluster affected by this bug:
#. Upgrade all osd machines to v0.72.1 #. Install the ceph-test package on each osd machine to get ceph_filestore_tool #. Stop all osd processes #. To see all lost objects, run the following on each osd with the osd stopped and the osd data directory mounted::
ceph_filestore_tool --list-lost-objects=true --filestore-path=<path-to-osd-filestore> --journal-path=<path-to-osd-journal>
#. To fix all lost objects, run the following on each osd with the osd stopped and the osd data directory mounted::
ceph_filestore_tool --fix-lost-objects=true --list-lost-objects=true --filestore-path=<path-to-osd-filestore> --journal-path=<path-to-osd-journal>
#. Once lost objects have been repaired on each osd, you can restart the cluster.
Note, the ceph_filestore_tool performs a scan of all objects on the osd and may take some time.
This is the fifth major release of Ceph, the fourth since adopting a 3-month development cycle. This release brings several new features, including multi-datacenter replication for the radosgw, improved usability, and lands a lot of incremental performance and internal refactoring work to support upcoming features in Firefly.
When you are upgrading from Dumpling to Emperor, do not run any of the "ceph osd pool set" commands while your monitors are running separate versions. Doing so could result in inadvertently changing cluster configuration settings that exhaust compute resources in your OSDs.
Coincident with core Ceph, the Emperor release also brings:
Packages for both are available on ceph.com.
There are no specific upgrade restrictions on the order or sequence of upgrading from 0.67.x Dumpling. However, you cannot run any of the "ceph osd pool set" commands while your monitors are running separate versions. Doing so could result in inadvertently changing cluster configuration settings and exhausting compute resources in your OSDs.
It is also possible to do a rolling upgrade from 0.61.x Cuttlefish, but there are ordering restrictions. (This is the same set of restrictions for Cuttlefish to Dumpling.)
#. Upgrade ceph-common on all nodes that will use the command line 'ceph' utility. #. Upgrade all monitors (upgrade ceph package, restart ceph-mon daemons). This can happen one daemon or host at a time. Note that because cuttlefish and dumpling monitors can't talk to each other, all monitors should be upgraded in relatively short succession to minimize the risk that an a untimely failure will reduce availability. #. Upgrade all osds (upgrade ceph package, restart ceph-osd daemons). This can happen one daemon or host at a time. #. Upgrade radosgw (upgrade radosgw package, restart radosgw daemons).
ceph-fuse and radosgw now use the same default values for the admin socket and log file paths that the other daemons (ceph-osd, ceph-mon, etc.) do. If you run these daemons as non-root, you may need to adjust your ceph.conf to disable these options or to adjust the permissions on /var/run/ceph and /var/log/ceph.
The MDS now disallows snapshots by default as they are not considered stable. The command 'ceph mds set allow_snaps' will enable them.
For clusters that were created before v0.44 (pre-argonaut, Spring 2012) and store radosgw data, the auto-upgrade from TMAP to OMAP objects has been disabled. Before upgrading, make sure that any buckets created on pre-argonaut releases have been modified (e.g., by PUTing and then DELETEing an object from each bucket). Any cluster created with argonaut (v0.48) or a later release or not using radosgw never relied on the automatic conversion and is not affected by this change.
Any direct users of the 'tmap' portion of the librados API should be aware that the automatic tmap -> omap conversion functionality has been removed.
Most output that used K or KB (e.g., for kilobyte) now uses a lower-case k to match the official SI convention. Any scripts that parse output and check for an upper-case K will need to be modified.
librados::Rados::pool_create_async() and librados::Rados::pool_delete_async() don't drop a reference to the completion object on error, caller needs to take care of that. This has never really worked correctly and we were leaking an object
'ceph osd crush set <id> <weight> <loc..>' no longer adds the osd to the specified location, as that's a job for 'ceph osd crush add'. It will however continue to work just the same as long as the osd already exists in the crush map.
The OSD now enforces that class write methods cannot both mutate an object and return data. The rbd.assign_bid method, the lone offender, has been removed. This breaks compatibility with pre-bobtail librbd clients by preventing them from creating new images.
librados now returns on commit instead of ack for synchronous calls. This is a bit safer in the case where both OSDs and the client crash, and is probably how it should have been acting from the beginning. Users are unlikely to notice but it could result in lower performance in some circumstances. Those who care should switch to using the async interfaces, which let you specify safety semantics precisely.
The C++ librados AioComplete::get_version() method was incorrectly returning an int (usually 32-bits). To avoid breaking library compatibility, a get_version64() method is added that returns the full-width value. The old method is deprecated and will be removed in a future release. Users of the C++ librados API that make use of the get_version() method should modify their code to avoid getting a value that is truncated from 64 to to 32 bits.
This development release includes a significant amount of new code and refactoring, as well as a lot of preliminary functionality that will be needed for erasure coding and tiering support. There are also several significant patch sets improving this with the MDS.
The MDS now disallows snapshots by default as they are not considered stable. The command 'ceph mds set allow_snaps' will enable them.
For clusters that were created before v0.44 (pre-argonaut, Spring 2012) and store radosgw data, the auto-upgrade from TMAP to OMAP objects has been disabled. Before upgrading, make sure that any buckets created on pre-argonaut releases have been modified (e.g., by PUTing and then DELETEing an object from each bucket). Any cluster created with argonaut (v0.48) or a later release or not using radosgw never relied on the automatic conversion and is not affected by this change.
Any direct users of the 'tmap' portion of the librados API should be aware that the automatic tmap -> omap conversion functionality has been removed.
Most output that used K or KB (e.g., for kilobyte) now uses a lower-case k to match the official SI convention. Any scripts that parse output and check for an upper-case K will need to be modified.
librados::Rados::pool_create_async() and librados::Rados::pool_delete_async() don't drop a reference to the completion object on error, caller needs to take care of that. This has never really worked correctly and we were leaking an object
'ceph osd crush set <id> <weight> <loc..>' no longer adds the osd to the specified location, as that's a job for 'ceph osd crush add'. It will however continue to work just the same as long as the osd already exists in the crush map.
The sysvinit /etc/init.d/ceph script will, by default, update the CRUSH location of an OSD when it starts. Previously, if the monitors were not available, this command would hang indefinitely. Now, that step will time out after 10 seconds and the ceph-osd daemon will not be started.
Users of the librados C++ API should replace users of get_version() with get_version64() as the old method only returns a 32-bit value for a 64-bit field. The existing 32-bit get_version() method is now deprecated.
The OSDs are now more picky that request payload match their declared size. A write operation across N bytes that includes M bytes of data will now be rejected. No known clients do this, but the because the server-side behavior has changed it is possible that an application misusing the interface may now get errors.
The OSD now enforces that class write methods cannot both mutate an object and return data. The rbd.assign_bid method, the lone offender, has been removed. This breaks compatibility with pre-bobtail librbd clients by preventing them from creating new images.
librados now returns on commit instead of ack for synchronous calls. This is a bit safer in the case where both OSDs and the client crash, and is probably how it should have been acting from the beginning. Users are unlikely to notice but it could result in lower performance in some circumstances. Those who care should switch to using the async interfaces, which let you specify safety semantics precisely.
The C++ librados AioComplete::get_version() method was incorrectly returning an int (usually 32-bits). To avoid breaking library compatibility, a get_version64() method is added that returns the full-width value. The old method is deprecated and will be removed in a future release. Users of the C++ librados API that make use of the get_version() method should modify their code to avoid getting a value that is truncated from 64 to to 32 bits.
'ceph osd crush set <id> <weight> <loc..>' no longer adds the osd to the specified location, as that's a job for 'ceph osd crush add'. It will however continue to work just the same as long as the osd already exists in the crush map.
The OSD now enforces that class write methods cannot both mutate an object and return data. The rbd.assign_bid method, the lone offender, has been removed. This breaks compatibility with pre-bobtail librbd clients by preventing them from creating new images.
librados now returns on commit instead of ack for synchronous calls. This is a bit safer in the case where both OSDs and the client crash, and is probably how it should have been acting from the beginning. Users are unlikely to notice but it could result in lower performance in some circumstances. Those who care should switch to using the async interfaces, which let you specify safety semantics precisely.
The C++ librados AioComplete::get_version() method was incorrectly returning an int (usually 32-bits). To avoid breaking library compatibility, a get_version64() method is added that returns the full-width value. The old method is deprecated and will be removed in a future release. Users of the C++ librados API that make use of the get_version() method should modify their code to avoid getting a value that is truncated from 64 to to 32 bits.