compute/cpu-memory/README.md
This is a tiny chapter, since usually there are very few nuances one needs to know about CPU memory - which is a good thing!
Most of the ML workload compute happens on GPUs, but typically there should be at least as much CPU memory on each node as there is on the GPUs. So, for example, if you're on a H100 node with 8x 80GB GPUs, you have 640GB of GPU memory. Thus you want at least as much of CPU memory. Most recent high end cloud packages usually come with 1-2TBs of CPU memory.
forward pass, and which need to be available for the backward path can also be offloaded to CPU, rather than discarded and then recomputed during the backward pass to save the unnecessary overheadDataLoader is usually one of the main users of CPU memory and at times it may consume very large amounts of memory. Typically there are at least 2x 8 DL workers running on each node, so you need enough memory to support at least 16 processes each holding some data. For example, in the case of streaming data from the cloud, if the data shards are large, these processes could easily eat up hundreds of GBs of CPU memory.DataLoader uses HF datasets in mmap mode the Resident memory usage may appear to be using a huge amount of CPU memory as it'll try to map out the whole datasets to the memory. Except this is misleading, since if the memory is needed elsewhere the OS will page out any unneeded mmap'ed pages back to the system. You can read more about it here. This awareness, of course, applies to any dataset using mmap, I was using HF datasets as an example since it's very widely used.