Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add kernel fusing using RAJA #167

Open
wants to merge 38 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c9ac694
Add TransactionFuseable class
davidbeckingsale Dec 11, 2020
5ff40e2
Fixup handling fuseable/regular transactions
davidbeckingsale Jan 5, 2021
5cc28c3
Adding new API for fusing
davidbeckingsale Mar 4, 2021
beccc9f
Add guard for non-umpire builds
nselliott Apr 14, 2021
cdf1111
Add #define to signal existance of KernelFuser
nselliott Sep 30, 2021
049bfd0
Merge branch 'master' into feature/fuse-comm
davidbeckingsale Oct 15, 2021
6d600c8
Make CoarsenCopyTransaction fuseable
davidbeckingsale Oct 15, 2021
c0a037d
Merge branch 'master' into feature/nselliott/kernel-fuser
nselliott Nov 11, 2021
91f0636
Start to turn on workgroups for KernelFuser, and add placeholder methods
nselliott Nov 11, 2021
31b3798
Add to the implementation of KernelFuser
nselliott Nov 18, 2021
c249b86
Add kernel fuser allocator to AllocatorDatabase
nselliott Nov 18, 2021
e91952a
Add checks to do fuser launch and synchronize only when needed.
nselliott Nov 19, 2021
3ab15d8
Add kernel fuser cleanup
nselliott Nov 19, 2021
eaf6fd2
Change KernelFuser to a singleton and begin adding it to some ArrayData
nselliott Nov 24, 2021
d43092e
Add enqueue calls to some of the for_alls, add fuser in more places
nselliott Nov 30, 2021
f426913
Change name of virtual methods for fuseable operations in PatchData,
nselliott Dec 14, 2021
f8a5518
Add missing initialization in ArrayData
nselliott Dec 22, 2021
4fe3d2c
Make KernelFuser a true no-op in non-RAJA builds.
nselliott Jan 7, 2022
c57914d
Rearrange split message receives in AsyncCommPeer to avoid CUDA
nselliott Jan 8, 2022
b6fee9f
Add cmake option for setting number of threads for RAJA WorkGroup policy
nselliott Feb 3, 2022
d9c4ce3
Merge branch 'feature/fuse-comm' of github.com:LLNL/SAMRAI into featu…
nselliott Feb 3, 2022
899a9fa
Add logic to avoid synchronize calls when it is known that no kernels
nselliott Feb 8, 2022
9b08ece
Merge branch 'master' into feature/fuse-comm-merge
nselliott Mar 15, 2022
7c24184
Add methods for applications to indicate need for synchronization
nselliott Apr 4, 2022
78c618d
Clarify some documentation comments
nselliott Apr 4, 2022
2b1a9e1
Add option to set a synchronize between refine and postprocessRefine
nselliott Apr 13, 2022
1eb8cf5
Add KernelFuserStages as a singleton to hold and use KernelFuser
nselliott Apr 14, 2022
5a62605
Add ScheduleKernelFuser and change KernelFuserStages to
nselliott Apr 15, 2022
879e75e
Add StagedKernelFusers launch/cleanup in RefineSchedule
nselliott Apr 25, 2022
67ef461
Stop some synchronize calls between the communicate and refine steps
nselliott Apr 26, 2022
8a8a98e
Change tbox::Schedule to use StagedKernelFusers
nselliott May 3, 2022
f2230e9
Add optional synchronization around boundary conditions.
nselliott May 3, 2022
8df7260
Add StagedKernelFusers calls to PatchLevel allocate`
nselliott May 13, 2022
6f6c6bd
Add cuda dependency in some tests
nselliott May 18, 2022
6397677
Add kernel fusion calls to PatchLevel deallocate and CoarsenSchedule
nselliott May 24, 2022
41a65a6
Revise StagedKernelFusers, remove isActive, add check on whether it
nselliott Jul 7, 2022
cb6fa43
Remove stray printf
nselliott Jul 21, 2022
4837d89
Small fixes to work with RAJA 2022.03
nselliott Nov 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion config/SAMRAI_config.h.cmake.in
Original file line number Diff line number Diff line change
Expand Up @@ -358,7 +358,9 @@
/* Configure for compiling on BGL family of machines */
#undef __BGL_FAMILY__


#ifdef HAVE_RAJA
#define HAVE_KERNEL_FUSER
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SAMRAI_HAVE_KERNEL_FUSER?

#endif

namespace SAMRAI {
static const unsigned short MAX_DIM_VAL = SAMRAI_MAXIMUM_DIMENSION;
Expand Down
27 changes: 27 additions & 0 deletions source/SAMRAI/hier/PatchData.C
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,33 @@ PatchData::~PatchData()
{
}

void
PatchData::copy(
const PatchData& src,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser)
{
copy(src, overlap);
}

void
PatchData::packStream(
tbox::MessageStream& stream,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser)
{
packStream(stream, overlap);
}

void
PatchData::unpackStream(
tbox::MessageStream& stream,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser)
{
unpackStream(stream, overlap);
}

/*
*************************************************************************
*
Expand Down
41 changes: 41 additions & 0 deletions source/SAMRAI/hier/PatchData.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,15 @@
#include "SAMRAI/tbox/Utilities.h"

namespace SAMRAI {

/*
* Forward declaration of KernelFuser class - required here because it sucks in
* RAJA and requires CUDA.
*/
namespace tbox {
class KernelFuser;
}

namespace hier {

/**
Expand Down Expand Up @@ -160,6 +169,12 @@ class PatchData
const PatchData& src,
const BoxOverlap& overlap) = 0;

virtual void
copy(
const PatchData& src,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it stands I'll have to implement both if I want to use fusion. I guess I can have one implementation for both that takes the fuser pointer and use or not use it under an abstraction layer to keep things single source. I'll need to use some macros to maintain support for older versions of samrai but that was pretty much inevitable.


/**
* Copy data from the source into the destination using the designated
* overlap descriptor. The overlap description will have been computed
Expand Down Expand Up @@ -206,6 +221,19 @@ class PatchData
tbox::MessageStream& stream,
const BoxOverlap& overlap) const = 0;

/**
* Pack data lying on the specified index set into the output stream using
* the given KernelFuser. The default implementation of this method will
* call packStream without the fuser argument. See the abstract stream
* virtual base class for more information about the packing operators
* defined for streams.
*/
virtual void
packStream(
tbox::MessageStream& stream,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not const?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, looks like this should be const.


/**
* Unpack data from the message stream into the specified index set.
* See the abstract stream virtual base class for more information about
Expand All @@ -216,6 +244,19 @@ class PatchData
tbox::MessageStream& stream,
const BoxOverlap& overlap) = 0;

/**
* Unpack data from the message stream into the specified index set using
* the given KernelFuser. The default implementation of this method will
* call unpackStream without the fuser argument. See the abstract stream
* virtual base class for more information about the packing operators
* defined for streams.
*/
virtual void
unpackStream(
tbox::MessageStream& stream,
const BoxOverlap& overlap,
tbox::KernelFuser& fuser);

/**
* Checks that class version and restart file version are equal. If so,
* reads in the data members common to all patch data types from restart
Expand Down
4 changes: 4 additions & 0 deletions source/SAMRAI/pdat/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,10 @@ target_include_directories(
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/source>
$<INSTALL_INTERFACE:include>)

blt_print_target_properties(TARGET SAMRAI_pdat)
blt_print_target_properties(TARGET raja)
blt_print_target_properties(TARGET RAJA)

install(TARGETS SAMRAI_pdat
EXPORT SAMRAITargets
ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR}
Expand Down
9 changes: 8 additions & 1 deletion source/SAMRAI/tbox/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ set ( tbox_headers
TimerManager.h
Tracer.h
Transaction.h
TransactionFuseable.h
Utilities.h)

set_source_files_properties(
Expand Down Expand Up @@ -115,6 +116,7 @@ set (tbox_sources
TimerManager.C
Tracer.C
Transaction.C
TransactionFuseable.C
Utilities.C)

if (ENABLE_HDF5)
Expand Down Expand Up @@ -146,9 +148,11 @@ if (ENABLE_RAJA)
endif ()

if (ENABLE_CUDA)
set(cuda_sources Schedule.C)
set(cuda_sources TransactionFuseable.C Schedule.C)
set_source_files_properties(${cuda_sources} PROPERTIES LANGUAGE CUDA)

set (tbox_depends ${tbox_depends} cuda)

if (ENABLE_NVTX_REGIONS)
find_package(CUDA REQUIRED)

Expand Down Expand Up @@ -176,6 +180,9 @@ target_include_directories( SAMRAI_tbox
$<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/source>
$<INSTALL_INTERFACE:include>)

blt_print_target_properties(
TARGET SAMRAI_tbox)


install(TARGETS SAMRAI_tbox
EXPORT SAMRAITargets
Expand Down
5 changes: 5 additions & 0 deletions source/SAMRAI/tbox/ExecutionPolicy.h
Original file line number Diff line number Diff line change
Expand Up @@ -112,6 +112,11 @@ struct policy_traits<policy::parallel> {
>;

using ReductionPolicy = RAJA::cuda_reduce;

using WorkGroupPolicy = RAJA::WorkGroupPolicy<
RAJA::cuda_work_async<1024>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a way to set the workgroup block size in case people run into cuda linking issues?

RAJA::unordered_cuda_loop_y_block_iter_x_threadblock_average,
RAJA::constant_stride_array_of_objects>;
};

#else
Expand Down
50 changes: 50 additions & 0 deletions source/SAMRAI/tbox/KernelFuser.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
#ifndef included_tbox_KernelFuser
#define included_tbox_KernelFuser

#include "SAMRAI/tbox/ExecutionPolicy.h"
#include "SAMRAI/tbox/AllocatorDatabase.h"

// #include "RAJA/RAJA.hpp"

namespace SAMRAI {
namespace tbox {

class KernelFuser
{
public:
// KernelFuser() :
// d_workpool(AllocatorDatabase::getDatabase()->getStreamAllocator())
// {}

template<typename Kernel>
void enqueue(int begin, int end, Kernel&& kernel) {
//d_workpool.enqueue(RAJA::RangeSegment(begin, end), std::forward<Kernel>(kernel));
}

void launch()
{
// d_workgroup = d_workpool.instantiate();
// d_worksite = d_workgroup.run();
}

private:
#ifdef HAVE_UMPIRE
using Allocator = umpire::TypedAllocator<char>;
#else
using Allocator = ResourceAllocator;
#endif

// using Policy = typename tbox::detail::policy_traits< tbox::policy::parallel >::WorkGroupPolicy;
// using WorkPool = RAJA::WorkPool <Policy, int, RAJA::xargs<>, Allocator>;
// using WorkGroup = RAJA::WorkGroup<Policy, int, RAJA::xargs<>, Allocator>;
// using WorkSite = RAJA::WorkSite <Policy, int, RAJA::xargs<>, Allocator>;

// WorkPool d_workpool;
// WorkGroup d_workgroup;
// WorkSite d_worksite;
};

}
}

#endif
Loading