guide/src/cuda/pipeline.md
CUDA is traditionally used via CUDA C/C++ files which have a .cu extension. These files can be
compiled using NVCC (NVIDIA CUDA Compiler) into an executable.
CUDA files consist of device and host functions. Device functions run on the GPU, and are also called kernels. Host functions run on the CPU and usually include logic on how to allocate GPU memory and call device functions.
Behind the scenes, NVCC has several stages of compilation.
First, NVCC separates device and host functions and compiles them separately. Device functions are compiled to NVVM IR, a subset of LLVM IR with additional restrictions including the following.
i4 or i111 are unsupported and will segfault (however in
theory they should be supported)...libNVVM is a closed source library which takes NVVM IR, optimizes it further, then converts it to PTX. PTX is a low level, assembly-like format with an open specification which can be targeted by any language. For an assembly format, PTX is fairly user-friendly.
PTX can be run on NVIDIA GPUs using the driver API or runtime API. Those APIs will convert the PTX into a final format called SASS which is register allocated and executed on the GPU.