Emitters#

Common#

Common utilities for emitting CUTLASS kernels

PyTorch#

Utilities for generating source for building a PyTorch CUDA extension that using a CUTLASS kernel. If specified, the extension can be JIT compiled via PyTorch’s cpp_extension.load method.

Example usage with JIT compilation:

plan=cutlass.op.Gemm(element=torch.float32,layout=cutlass.LayoutType.RowMajor)op=plan.construct()mod=cutlass.emit.pytorch(op,'cutlass\_gemm',80,jit=True)# Generate inputs for the GEMMA,B,C=[torch.ones((512,512)).to('cuda')for\_inrange(3)]# Run the moduleD=mod.run(A,B,C)

Example usage without JIT compilation:

plan=cutlass.op.Gemm(element=torch.float32,layout=cutlass.LayoutType.RowMajor)op=plan.construct()cutlass.emit.pytorch(op,'cutlass\_gemm',80,jit=False,sourcedir='output')

After this call, the directory output contains setup.py, cutlass_gemm.cpp, and cutlass_gemm_kernel.cu. The module can be built from within output by running: TORCH_CUDA_ARCH_LIST="8.0" python setup.py develop --user.

The module can later be used in Python via:

importtorchimportcutlass\_gemm# Generate inputs for the GEMMA,B,C=[torch.ones((512,512)).to('cuda')for\_inrange(3)]# Run the moduleD=cutlass\_gemm.run(A,B,C)

cutlass.emit.pytorch.pytorch(op, name, cc, jit=False, sourcedir='')[source]#

Generates source for building a PyTorch CUDA module that leverages the CUTLASS kernel specified by op. If the jit parameter is set to true, the module is just-in-time compiled, loaded, and returned.

The result of this method is files within sourcedir that can be used for building a PyTorch module.

Parameters:

op – operation to emit in the module
name (str) – name of the module to generate
cc (int) – compute capability of the device the module should target
jit (bool) – whether the module should be just-in-time compiled
sourcedir (str) – directory to which generated source files should be written

Returns:

loaded PyTorch module (if jit=True) or None