media/docs/pythonDSL/cute_dsl_general/dsl_introduction.rst
.. _dsl_introduction: .. |DC| replace:: dynamic compilation .. |IR| replace:: IR .. |DSL| replace:: CuTe DSL
|DSL| is a Python-based domain-specific language (DSL) designed for |DC| of high-performance GPU kernels. It evolved from the C++ CUTLASS library and is now available as a decorator-based DSL.
Its primary goals are:
DLPack <https://github.com/dmlc/dlpack>_ integration, enabling seamless
interop with frameworks (e.g., PyTorch, JAX).|DSL| provides two main Python decorators for generating optimized code via |DC|:
@jit — Host-side JIT-compiled functions@kernel — GPU kernel functionsBoth decorators can optionally use a preprocessor that automatically expands Python control flow (loops, conditionals) into operations consumable by the underlying |IR|.
@jit
Declares JIT-compiled functions that can be invoked from Python or from other |DSL| functions.
**Decorator Parameters**:
* ``preprocessor``:
* ``True`` (default) — Automatically translate Python flow control (e.g., loops, if-statements) into |IR| operations.
* ``False`` — No automatic expansion; Python flow control must be handled manually or avoided.
**Call-site Parameters**:
- ``no_cache``:
- ``True`` — Disables JIT caching, forcing a fresh compilation each call.
- ``False`` (default) — Enables caching for faster subsequent calls.
``@kernel``
Defines GPU kernel functions, compiled as specialized GPU symbols through |DC|.
Decorator Parameters:
preprocessor:
True (default) — Automatically expands Python loops/ifs into GPU-compatible |IR| operations.False — Expects manual or simplified kernel implementations.Kernel Launch Parameters:
grid
Specifies the grid size as a list of integers.block
Specifies the block size as a list of integers.cluster
Specifies the cluster size as a list of integers.smem
Specifies the size of shared memory in bytes (integer)... list-table:: :header-rows: 1 :widths: 20 20 15 25
@jit@kernel@jit@jit@jit@jit@kernel@kernel@jit@kernel@kernel@kernel