CUTLASS: host_reorder.h Source File - Cutlass

| | CUTLASS

CUDA Templates for Linear Algebra Subroutines and Solvers |

host_reorder.h

Go to the documentation of this file.

1 /***************************************************************************************************

3 *

4 * Redistribution and use in source and binary forms, with or without modification, are permitted

5 * provided that the following conditions are met:

6 * * Redistributions of source code must retain the above copyright notice, this list of

7 * conditions and the following disclaimer.

8 * * Redistributions in binary form must reproduce the above copyright notice, this list of

9 * conditions and the following disclaimer in the documentation and/or other materials

10 * provided with the distribution.

11 * * Neither the name of the NVIDIA CORPORATION nor the names of its contributors may be used

12 * to endorse or promote products derived from this software without specific prior written

13 * permission.

14 *

15 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR

16 * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND

17 * FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE

18 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,

19 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;

20 * OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,

21 * STRICT LIABILITY, OR TOR (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE

22 * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

23 *

24 **************************************************************************************************/

30 #pragma once

32 #include "cutlass/coord.h"

33 #include "cutlass/util/host_tensor.h"

34 #include "cutlass/tensor_view.h"

35 #include "[cutlass/util/tensor_view_io.h](tensor view io_8h.html)"

36 #include "cutlass/util/reference/host/gemm.h"

38 namespace cutlass {

40 template <int Interleaved, typename Element, typename Layout>

41 void reorder_column(TensorRef<Element, Layout> dest,

42TensorRef<Element, Layout> src,

43cutlass::gemm::GemmCoord problem_size) {

44const int InstructionShapeCol = 8;

45// 4 threads per Quad

46const int ElementsPerThread = InstructionShapeCol / 4;

47// 4 threads per Quad

48const int ReorderedElementsPerThread =

49 Interleaved / 4;

51for (int n = 0; n < problem_size.n(); n++) {

52for (int k = 0; k < problem_size.k(); k++) {

53 dest.at({k, (n / Interleaved) * Interleaved +

54 ((n % ReorderedElementsPerThread) / ElementsPerThread) *

55 InstructionShapeCol +

56 ((n % Interleaved) / ReorderedElementsPerThread) *

57 ElementsPerThread +

58 (n % ElementsPerThread)}) = src.at({k, n});

59 }

60 }

61 }

63 } // namespace cutlass

cutlass

Definition: aligned_buffer.h:35

cutlass::reorder_column

void reorder_column(TensorRef< Element, Layout > dest, TensorRef< Element, Layout > src, cutlass::gemm::GemmCoord problem_size)

Definition: host_reorder.h:41

coord.h

A Coord is a coordinate of arbitrary rank into a tensor or matrix.

cutlass::gemm::GemmCoord

Definition: include/cutlass/gemm/gemm.h:94

cutlass::gemm::GemmCoord::n

CUTLASS_HOST_DEVICE Index const & n() const

Returns the GEMM N coordinate.

Definition: include/cutlass/gemm/gemm.h:137

tensor_view.h

Defines a structure containing strides and a pointer to tensor data.

cutlass::gemm::GemmCoord::k

CUTLASS_HOST_DEVICE Index const & k() const

Returns the GEMM K coordinate.

Definition: include/cutlass/gemm/gemm.h:145

cutlass::TensorRef< Element, Layout >

[tensor_view_io.h](tensor view io_8h.html)

cutlass::TensorRef::at

CUTLASS_HOST_DEVICE Reference at(TensorCoord const &coord) const

Returns a reference to the element at a given Coord.

Definition: tensor_ref.h:307

host_tensor.h

HostTensor contributes management for both host and device memory.

gemm.h

Reference implementation for GEMM in host-side code.

Generated by 1.8.11