docs/default__gemv_8h_source.html
| | CUTLASS
CUDA Templates for Linear Algebra Subroutines and Solvers |
default_gemv.h
Go to the documentation of this file.
1 /***************************************************************************************************
2 * Copyright (c) 2017-2019, NVIDIA CORPORATION. All rights reserved.
3 *
4 * Redistribution and use in source and binary forms, with or without modification, are permitted
5 * provided that the following conditions are met:
6 * * Redistributions of source code must retain the above copyright notice, this list of
7 * conditions and the following disclaimer.
8 * * Redistributions in binary form must reproduce the above copyright notice, this list of
9 * conditions and the following disclaimer in the documentation and/or other materials
10 * provided with the distribution.
11 * * Neither the name of the NVIDIA CORPORATION nor the names of its contributors may be used
12 * to endorse or promote products derived from this software without specific prior written
13 * permission.
14 *
15 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR
16 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
17 * FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE
18 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
19 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
20 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
21 * STRICT LIABILITY, OR TOR (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
22 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
23 *
24 **************************************************************************************************/
25
26 #pragma once
27
28 #include "cutlass/gemm/threadblock/gemv.h"
29 #include "[cutlass/gemm/threadblock/default_gemv_core.h](default gemv core_8h.html)"
30 #include "cutlass/gemm/threadblock/threadblock_swizzle.h"
31
32 namespace cutlass {
33 namespace gemm {
34 namespace kernel {
35
37
38 template <
40typename ThreadBlockShape_,
42typename ThreadShape_,
44typename ElementA_,
46typename LayoutA_,
48typename ElementB_,
50typename LayoutB_,
52typename ElementCD_,
54typename LayoutCD_,
56typename ElementAccumulator_ = ElementCD_>
57 struct DefaultGemv {
58
60using ThreadBlockShape = ThreadBlockShape_;
61
63using ThreadShape = ThreadShape_;
64
67
70
73
76
78using ElementAccumulator = ElementAccumulator_;
79
81using LayoutAccumulator = LayoutCD_;
82
84using ElementCD = ElementCD_;
85
88
89// Define the core components
90using Core = typename cutlass::gemm::threadblock::DefaultGemvCore<
91ThreadBlockShape, ThreadShape, ElementA, LayoutA, ElementB, LayoutB,
92ElementAccumulator, LayoutAccumulator>;
93
94// Define the threadblock-scoped gemv
95using ThreadBlockGemv = cutlass::gemm::threadblock::Gemv<Core>;
96
97// Iterator for multiplicand A
98using IteratorA = typename ThreadBlockGemv::IteratorA;
99
100// Iterator for multiplicand B
101using IteratorB = typename ThreadBlockGemv::IteratorB;
102
104using IteratorPolicyCD = typename platform::conditional<
105platform::is_same<LayoutCD, layout::RowMajor>::value,
106cutlass::transform::PitchLinearTilePolicyStripminedThreadContiguous<
107layout::PitchLinearShape<ThreadBlockShape::kN, ThreadBlockShape::kM>, Core::kThreadsPerN, ThreadShape::kN>,
108cutlass::transform::PitchLinearTilePolicyStripminedThreadStrided<
109layout::PitchLinearShape<ThreadBlockShape::kM, ThreadBlockShape::kN>, Core::kThreadsPerN, ThreadShape::kM>>::type;
110
112using IteratorCD = cutlass::transform::threadblock::PredicatedTileIterator<
113cutlass::MatrixShape<ThreadBlockShape::kM, ThreadBlockShape::kN>, ElementCD, LayoutCD, 0, IteratorPolicyCD>;
114
116using FragmentCD = typename IteratorCD::Fragment;
117
118// Define the threadblock swizzle
119using ThreadBlockSwizzle = cutlass::gemm::threadblock::GemvBatchedStridedThreadblockDefaultSwizzle;
120 };
121
123
124 } // namespace kernel
125 } // namespace gemm
126 } // namespace cutlass
Describes the size of a matrix tile.
Definition: matrix_shape.h:42
Definition: aligned_buffer.h:35
std::is_same (false specialization)
Definition: platform.h:394
cutlass::gemm::kernel::DefaultGemv::ElementAccumulator
ElementAccumulator_ ElementAccumulator
Data type of accumulators.
Definition: default_gemv.h:78
cutlass::gemm::kernel::DefaultGemv::IteratorB
typename ThreadBlockGemv::IteratorB IteratorB
Definition: default_gemv.h:101
cutlass::gemm::kernel::DefaultGemv::ThreadShape
ThreadShape_ ThreadShape
Shape of warp-level matrix operation (concept: GemmShape)
Definition: default_gemv.h:63
cutlass::gemm::kernel::DefaultGemv::LayoutCD
LayoutCD_ LayoutCD
Layout of input/output matrix C/D.
Definition: default_gemv.h:87
cutlass::gemm::kernel::DefaultGemv::LayoutAccumulator
LayoutCD_ LayoutAccumulator
Data type of accumulators (same as C/D)
Definition: default_gemv.h:81
cutlass::layout::PitchLinearShape
Template defining a shape used by pitch-linear operators.
Definition: pitch_linear.h:43
cutlass::gemm::kernel::DefaultGemv::ElementA
ElementA_ ElementA
Data type of multiplicand A.
Definition: default_gemv.h:66
cutlass::gemm::kernel::DefaultGemv::FragmentCD
typename IteratorCD::Fragment FragmentCD
Fragment storage for C/D.
Definition: default_gemv.h:116
cutlass::gemm::threadblock::Gemv
Structure to compute the matrix-vector product using SIMT math instructions.
Definition: gemv.h:50
cutlass::gemm::kernel::DefaultGemv::LayoutA
LayoutA_ LayoutA
Layout of multiplicand A.
Definition: default_gemv.h:69
cutlass::gemm::threadblock::DefaultGemvCore
Definition: default_gemv_core.h:68
cutlass::platform::conditional
std::conditional (true specialization)
Definition: platform.h:325
cutlass::gemm::kernel::DefaultGemv
Definition: default_gemv.h:57
cutlass::gemm::kernel::DefaultGemv::IteratorPolicyCD
typename platform::conditional< platform::is_same< LayoutCD, layout::RowMajor >::value, cutlass::transform::PitchLinearTilePolicyStripminedThreadContiguous< layout::PitchLinearShape< ThreadBlockShape::kN, ThreadBlockShape::kM >, Core::kThreadsPerN, ThreadShape::kN >, cutlass::transform::PitchLinearTilePolicyStripminedThreadStrided< layout::PitchLinearShape< ThreadBlockShape::kM, ThreadBlockShape::kN >, Core::kThreadsPerN, ThreadShape::kM >>::type IteratorPolicyCD
Policy for the iterator that reads/writes C/D.
Definition: default_gemv.h:109
cutlass::gemm::threadblock::Gemv::IteratorA
typename Core_::IteratorA IteratorA
Iterates over A in global memory.
Definition: gemv.h:58
cutlass::transform::threadblock::PredicatedTileIterator
Definition: transform/threadblock/predicated_tile_iterator.h:133
[default_gemv_core.h](default gemv core_8h.html)
Defines basic properties needed by CTA-level batched GEMV assuming expectations about data layout of ...
cutlass::gemm::threadblock::Gemv::IteratorB
typename Core_::IteratorB IteratorB
Iterates over B in global memory.
Definition: gemv.h:61
cutlass::gemm::threadblock::GemvBatchedStridedThreadblockDefaultSwizzle
Threadblock swizzling function for batched GEMVs.
Definition: gemm/threadblock/threadblock_swizzle.h:296
Template for a threadblock-scoped GEMV kernel.
cutlass::gemm::kernel::DefaultGemv::IteratorA
typename ThreadBlockGemv::IteratorA IteratorA
Definition: default_gemv.h:98
Implements several possible threadblock-swizzling functions mapping blockIdx to GEMM problems...
cutlass::gemm::kernel::DefaultGemv::Core
typename cutlass::gemm::threadblock::DefaultGemvCore< ThreadBlockShape, ThreadShape, ElementA, LayoutA, ElementB, LayoutB, ElementAccumulator, LayoutAccumulator > Core
Definition: default_gemv.h:92
cutlass::gemm::kernel::DefaultGemv::ElementB
ElementB_ ElementB
Data type of multiplicand B.
Definition: default_gemv.h:72
cutlass::transform::PitchLinearTilePolicyStripminedThreadStrided
Definition: pitch_linear_thread_map.h:168
cutlass::gemm::kernel::DefaultGemv::LayoutB
LayoutB_ LayoutB
Layout of multiplicand B.
Definition: default_gemv.h:75
cutlass::transform::PitchLinearTilePolicyStripminedThreadContiguous
Definition: pitch_linear_thread_map.h:140
cutlass::gemm::kernel::DefaultGemv::ThreadBlockShape
ThreadBlockShape_ ThreadBlockShape
Shape of Threadblock-level matrix operation (concept: GemmShape)
Definition: default_gemv.h:60
cutlass::gemm::kernel::DefaultGemv::ElementCD
ElementCD_ ElementCD
Data type of input/output matrix C/D.
Definition: default_gemv.h:84
Generated by 1.8.11