python/docs/cutlass.emit.html
Common utilities for emitting CUTLASS kernels
Utilities for generating source for building a PyTorch CUDA extension that using a CUTLASS kernel. If specified, the extension can be JIT compiled via PyTorch’s cpp_extension.load method.
Example usage with JIT compilation:
plan=cutlass.op.Gemm(element=torch.float32,layout=cutlass.LayoutType.RowMajor)op=plan.construct()mod=cutlass.emit.pytorch(op,'cutlass\_gemm',80,jit=True)# Generate inputs for the GEMMA,B,C=[torch.ones((512,512)).to('cuda')for\_inrange(3)]# Run the moduleD=mod.run(A,B,C)
Example usage without JIT compilation:
plan=cutlass.op.Gemm(element=torch.float32,layout=cutlass.LayoutType.RowMajor)op=plan.construct()cutlass.emit.pytorch(op,'cutlass\_gemm',80,jit=False,sourcedir='output')
After this call, the directory output contains setup.py, cutlass_gemm.cpp, and cutlass_gemm_kernel.cu. The module can be built from within output by running: TORCH_CUDA_ARCH_LIST="8.0" python setup.py develop --user.
The module can later be used in Python via:
importtorchimportcutlass\_gemm# Generate inputs for the GEMMA,B,C=[torch.ones((512,512)).to('cuda')for\_inrange(3)]# Run the moduleD=cutlass\_gemm.run(A,B,C)
cutlass.emit.pytorch.pytorch(op, name, cc, jit=False, sourcedir='')[source]#
Generates source for building a PyTorch CUDA module that leverages the CUTLASS kernel specified by op. If the jit parameter is set to true, the module is just-in-time compiled, loaded, and returned.
The result of this method is files within sourcedir that can be used for building a PyTorch module.
Parameters:
op – operation to emit in the module
name (str) – name of the module to generate
cc (int) – compute capability of the device the module should target
jit (bool) – whether the module should be just-in-time compiled
sourcedir (str) – directory to which generated source files should be written
Returns:
loaded PyTorch module (if jit=True) or None