Back to Cutlass

CUTLASS: default_mma_core.h Source File

docs/default__mma__core_8h_source.html

4.4.25.7 KB
Original Source

| | CUTLASS

CUDA Templates for Linear Algebra Subroutines and Solvers |

default_mma_core.h

[Go to the documentation of this file.](default mma core_8h.html)

1 /***************************************************************************************************

2 * Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.

3 *

4 * Redistribution and use in source and binary forms, with or without modification, are permitted

5 * provided that the following conditions are met:

6 * * Redistributions of source code must retain the above copyright notice, this list of

7 * conditions and the following disclaimer.

8 * * Redistributions in binary form must reproduce the above copyright notice, this list of

9 * conditions and the following disclaimer in the documentation and/or other materials

10 * provided with the distribution.

11 * * Neither the name of the NVIDIA CORPORATION nor the names of its contributors may be used

12 * to endorse or promote products derived from this software without specific prior written

13 * permission.

14 *

15 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR

16 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND

17 * FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE

18 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,

19 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;

20 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,

21 * STRICT LIABILITY, OR TOR (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE

22 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

23 *

24 **************************************************************************************************/

32 #pragma once

33

34 #include "cutlass/cutlass.h"

35 #include "cutlass/array.h"

36

37 #include "cutlass/numeric_types.h"

38 #include "cutlass/matrix_shape.h"

39

40 #include "cutlass/gemm/warp/mma.h"

41 #include "cutlass/gemm/threadblock/mma_pipelined.h"

42 #include "cutlass/gemm/threadblock/mma_singlestage.h"

44

45 namespace cutlass {

46 namespace gemm {

47 namespace threadblock {

48

50

53 template <

55typename Shape,

57typename WarpShape,

59typename InstructionShape,

61typename ElementA,

63typename LayoutA,

65typename ElementB,

67typename LayoutB,

69typename ElementC,

71typename LayoutC,

73typename OperatorClass,

75int Stages = 2,

77typename Operator = typename platform::conditional<

78 (platform::is_same<OperatorClass,

79 cutlass::arch::OpClassTensorOp>::value) &&

80 (platform::is_same<ElementA, int8_t>::value ||

81platform::is_same<ElementA, int4b_t>::value ||

82platform::is_same<ElementA, uint8_t>::value ||

83platform::is_same<ElementA, uint4b_t>::value),

84 cutlass::arch::OpMultiplyAddSaturate,

85 cutlass::arch::OpMultiplyAdd>::type,

88bool AccumulatorsInRowMajor = false

89 >

90 struct DefaultMmaCore;

91

93

94 } // namespace threadblock

95 } // namespace gemm

96 } // namespace cutlass

cutlass::platform::integral_constant::value

static const value_t value

Definition: platform.h:261

cutlass

Definition: aligned_buffer.h:35

mma_pipelined.h

Template for a double-buffered threadblock-scoped GEMM kernel.

cutlass::gemm::threadblock::DefaultMmaCore

Definition: default_mma_core.h:90

array.h

Statically sized array of elements that accommodates all CUTLASS-supported numeric types and is safe ...

mma.h

Templates exposing architecture support for warp-level multiply-add operations.

matrix_shape.h

Defines a Shape template for matrix tiles.

mma_singlestage.h

Template for a double-buffered threadblock-scoped GEMM kernel.

numeric_types.h

Top-level include for all CUTLASS numeric types.

cutlass.h

Basic include for CUTLASS.


Generated by 1.8.11