Back to 3fs

Metric System of 3FS

docs/metrics.md

latest10.2 KB
Original Source

Metric System of 3FS

Type of Metrics

3FS calculates metrics in each service and store metrics to ClickHouse through monitor service. There are some types of metrics in 3FS, and the metric may or may not be reset after each report.

There are many types of metrics:

TypeClassStorage TableDescription
valueValueRecorder3fs.countersSet by code and stored to ClickHouse directly. E.g. capacity
countCountRecorder3fs.countersIncrease by code and usually reset to zero after each report. E.g., report read counter every second mean read IOPS.

It is also be used to track current on-flight operations, which is increased at the beginning and is decreased after the operation finished, rather than reset automatically after report. | | distribution | DistributionRecorder | 3fs.distributions | Used to calculate P90, P99 and etc for the period between two reports. | | latency | LatencyRecorder | 3fs.distributions | Special distribution implementation, record latency with unit ns (nanosecond). |

List of Metrics

Following is a partial list of the metrics and their meanings:

Metric NameTypeReset After ReportDescription
fuse.dirty_inodesvalueYCurrent number of dirty inodes
fuse.opcountYNumber of client operations; instance field is the operation name
fuse.piov.bwcountYNumber of bytes written by the fuse client to the storage service
fuse.write.latencylatencyYWrite operation latency
fuse.write.sizedistributionYWrite operation size
meta_server.batch_op_sizecountYBatch operation count of metadata (sync + close + setattr)
meta_server.dist_set_mapcountYNumber of set disk server map operations
meta_server.op_codecountYNumber of times a specific error code appears, tag is the specific error code
meta_server.op_failedcountYNumber of failed operations
meta_server.op_idempotentcountYNumber of idempotent operations executed
meta_server.op_duplicatedcountYNumber of duplicated operations
meta_server.op_runningcountNNumber of running operations
meta_server.op_totalcountYTotal number of operations
meta_server.op_latencylatencyYMetadata operation latency (nanoseconds)
meta_server.open_writecountYNumber of open/create file operations
meta_server.stat_dircountYNumber of directory stat operations executed
meta_server.stat_filecountYNumber of file stat operations executed
meta_server.stat_symlinkcountYNumber of symbolic link stat operations executed
meta_server.auth_failedcountYNumber of authentication failures
storage.check_disk.currentcountNNumber of currently running disk check operations
storage.check_disk.totalcountYTotal number of disk check operations
storage.chunk_engine.
allocate_timescountYNumber of chunks created
storage.chunk_engine.
checksum_combinecountYChecksum and merge count (includes checksum and merge, and append operations)
storage.chunk_engine.
checksum_recalculatecountYChecksum recalculation count
storage.chunk_engine.
copy_on_write_read_bytescountYNumber of bytes read by COW (Copy-on-Write)
storage.chunk_engine.
copy_on_write_read_timescountYNumber of times COW read data
storage.chunk_engine.
copy_on_write_timescountYNumber of COW executions
storage.chunk_engine.newvalueNCurrent number of chunk_engines? (Need to check the code)
storage.chunk_engine.
pwrite_timescountYNumber of pwrite calls
storage.chunk_engine.
safe_write_direct_appendcountYNumber of direct append writes during safe_write
storage.chunk_engine.
safe_write_indirect_appendcountYNumber of indirect append writes during safe_write
storage.chunk_remove.timescountYNumber of chunk delete operations
storage.chunk_write.timescountYNumber of chunk write operations
storage.disk_info.availablevalueNDisk available capacity (target unused space + reserved space + file system available space, bytes)
storage.disk_info.capacityvalueNTotal file system capacity (bytes)
storage.disk_info.read_onlyvalueNWhether the file system is read-only (1: read-only, implemented by writing .hf3fs_check file in the file system root directory)
storage.disk_info.freevalueNFile system available space (bytes)
storage.do_commit.currentcountNNumber of IOs currently being committed
storage.do_commit.totalcountYTotal number of committed IOs
storage.do_commit.failscountYNumber of failed commit IOs
storage.do_commit.succ_latencylatencyYSuccess commit latency
storage.do_commit.fail_latencylatencyYFailure commit latency
storage.do_query.num_chunkscountYNumber of chunks in query results
storage.do_query.totalcountYTotal number of chunk query operations
storage.do_query.currentcountNNumber of current chunk query operations
storage.do_query.succ_latencylatencyYSuccess query latency
storage.do_query.fail_latencylatencyYFailure query latency
storage.do_remove.currentcountNNumber of chunk remove operations
storage.do_remove.failscountYNumber of failed chunk remove operations
storage.do_remove.num_chunkscountYNumber of chunks removed
storage.do_remove.totalcountYTotal number of chunk remove operations
storage.do_remove.succ_latencylatencyYSuccess remove latency
storage.do_remove.fail_latencylatencyYFailure remove latency
storage.do_update.currentcountNNumber of current update chunk operations
storage.do_update.totalcountYTotal number of update chunk operations
storage.do_update.failscountYNumber of failed update chunk operations
storage.do_update.succ_latencylatencyYSuccess update latency
storage.do_update.fail_latencylatencyYFailure update latency
storage.remove_range.currentcountNNumber of current range remove chunk operations
storage.remove_range.failscountYNumber of failed range remove chunk operations
storage.remove_range.totalcountYTotal number of range remove chunk operations
storage.remove_range.succ_latencylatencyYSuccess remove range latency
storage.remove_range.fail_latencylatencyYFailure remove range latency
storage.req_remove_chunks.currentcountNNumber of current remove chunk requests
storage.req_remove_chunks.totalcountYTotal number of remove chunk requests
storage.req_update.bytescountYNumber of chunk update bytes
storage.req_update.currentcountNNumber of current chunk update requests
storage.req_update.failscountYNumber of failed chunk update requests
storage.req_update.totalcountYTotal number of chunk update requests
storage.req_write.bytescountYNumber of bytes written by chunk write requests
storage.req_write.currentcountNNumber of current chunk write requests
storage.req_write.totalcountYTotal number of chunk write requests
storage.engine_commit.currentcountNNumber of current commit operations
storage.engine_commit.totalcountYTotal number of commit operations
storage.engine_commit.failscountYNumber of failed commit operations
storage.engine_update.currentcountNNumber of current chunk update operations
storage.engine_update.totalcountYTotal number of chunk update operations
storage.engine_update.failscountYNumber of failed chunk update operations
storage.forward.write_bytescountYNumber of forwarded write bytes
storage.forward.syncing_bytescountYNumber of forwarded sync bytes
storage.reliable_forward.currentcountNNumber of current forward operations
storage.reliable_forward.totalcountYTotal number of forward operations
storage.reliable_forward.failscountYNumber of failed forward operations
storage.target.used_sizevalueNTarget used space (bytes)
storage.target.reserved_sizevalueNTarget reserved space (bytes)
storage.target.unrecycled_sizevalueNTarget unrecycled space (bytes)
storage.target_statevalueNTarget state (0:invalid 1:uptodate 2:online 4:offline)
storage.write.bytescountYTotal bytes written (includes write and update operations)