Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor usage of accelerator API for constituents and add documentation #1875

Merged
merged 4 commits into from
Dec 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions arcane/doc/doc_user/chap_acceleratorapi/0_acceleratorapi.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,13 @@ Sommaire de ce chapitre :
Présente les mécanismes disponibles dans %Arcane pour pouvoir utiliser
les accélérateurs (GPU) dans %Arcane.

2. \subpage arcanedoc_acceleratorapi_reduction <br>
2. \subpage arcanedoc_acceleratorapi_materials <br>
Présente l'utilisation de l'API accélérateur pour les matériaux.

3. \subpage arcanedoc_acceleratorapi_reduction <br>
Présente les mécanismes disponibles pour effectuer des réductions.

3. \subpage arcanedoc_acceleratorapi_memorypool <br>
4. \subpage arcanedoc_acceleratorapi_memorypool <br>
Présente le mécanisme de pool de mémoire.
____

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -667,4 +667,7 @@ ____
<span class="back_section_button">
\ref arcanedoc_acceleratorapi
</span>
<span class="next_section_button">
\ref arcanedoc_accelerator_materials
</span>
</div>
87 changes: 87 additions & 0 deletions arcane/doc/doc_user/chap_acceleratorapi/3_materials.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Utilisation pour les matériaux {#arcanedoc_acceleratorapi_materials}

[TOC]

L'utilisation de l'API accélérateur pour les matériaux est similaire à
l'utilisation sur les entités du maillage. La macro
RUNCOMMAND_MAT_ENUMERATE() permet d'itérer sur un milieu
\arcanemat{IMeshEnvironment} ou un matériau \arcanemat{IMeshMaterial}.

Les valeurs possibles pour cette macro sont:

<table>
<tr>
<th>Type d'itération</th><th>Valeur de l'itérateur</th>
<th>Type du conteneur</th><th>Description</th>
</tr>

<tr>
<td>EnvAndGlobalCell</td>
<td>\arcanemat{EnvAndGlobalCellIteratorValue}</td>
<td>\arcanemat{IMeshEnvironment} <br></br>
\arcanemat{EnvCellVectorView}</td>
<td>Itération sur un milieu permettant de récupérer pour chaque itération
le numéro local de la maille milieu, l'index de l'itération et le
numéro local de la maille globale associée</td>
</tr>

<tr>
<td>MatAndGlobalCell</td>
<td>\arcanemat{MatAndGlobalCellIteratorValue}</td>
<td>\arcanemat{IMeshMaterial} <br></br> \arcanemat{MatCellVectorView}</td>
<td>Itération sur un matériau permettant de récupérer pour chaque itération
le numéro local de la maille matériau, l'index de l'itération et le
numéro local de la maille globale associée</td>
</tr>

<tr>
<td>AllEnvCell</td>
<td>\arcanemat{AllEnvCell}</td>
<td>\arcanemat{AllEnvCellVectorView}</td>
<td>Itération sur les AllEnvCell</td>
</tr>

<tr>
<td>EnvCell</td>
<td>\arcanemat{EnvCellLocalId}</td>
<td>\arcanemat{IMeshEnvironment} <br></br> \arcanemat{EnvCellVectorView}</td>
<td>Itération sur un milieu permettant de récupérer pour chaque itération
uniquement le numéro local de la maille milieu</td>
</tr>

<tr>
<td>MatCell</td>
<td>\arcanemat{MatCellLocalId}</td>
<td>\arcanemat{IMeshMaterial} <br></br> \arcanemat{MatCellVectorView}</td>
<td>Itération sur un matériau permettant de récupérer pour chaque itération
uniquement le numéro local de la maille matériau</td>
</tr>

</table>

Si on souhaite uniquement accéder aux numéros locaux des mailles
matériaux ou milieux il est préférable pour des raisons de performance
d'utiliser la version avec `EnvCell` ou `MatCell` comme type
d'itérateur.

Voici un exemple de code pour itérer sur une maille milieu avec
l'information de l'index de l'itération et de la maille globale associée

\snippet MeshMaterialAcceleratorUnitTest.cc SampleEnvAndGlobalCell

Voici un autre exemple pour itérer sur les \arcanemat{AllEnvCell} et
récupérer les informations sur les milieux et matériaux présents dans
chaque \arcanemat{AllEnvCell}

\snippet MeshMaterialAcceleratorUnitTest.cc SampleAllEnvCell

____

<div class="section_buttons">
<span class="back_section_button">
\ref arcanedoc_parallel_accelerator
</span>
<span class="next_section_button">
\ref arcanedoc_acceleratorapi_reduction
</span>
</div>
2 changes: 1 addition & 1 deletion arcane/doc/doc_user/chap_acceleratorapi/5_reduce.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ ____

<div class="section_buttons">
<span class="back_section_button">
\ref arcanedoc_parallel_accelerator
\ref arcanedoc_acceleratorapi_materials
</span>
<span class="next_section_button">
\ref arcanedoc_acceleratorapi_memorypool
Expand Down
56 changes: 28 additions & 28 deletions arcane/src/arcane/accelerator/KernelLauncher.h
Original file line number Diff line number Diff line change
Expand Up @@ -69,27 +69,27 @@ ARCCORE_HOST_DEVICE auto privatize(const T& item) -> Privatizer<T>
/*---------------------------------------------------------------------------*/
/*---------------------------------------------------------------------------*/
/*!
* \brief Classe pour appliquer la finalisation des réductions.
* \brief Classe pour appliquer la finalisation pour les arguments supplémentaires.
*/
class KernelReducerHelper
class KernelRemainingArgsHelper
{
public:

//! Applique les fonctors des arguments additionnels.
template <typename... ReducerArgs> static inline ARCCORE_DEVICE void
applyReducerArgs(Int32 index, ReducerArgs&... reducer_args)
template <typename... RemainingArgs> static inline ARCCORE_DEVICE void
applyRemainingArgs(Int32 index, RemainingArgs&... remaining_args)
{
// Applique les réductions
(reducer_args._internalExecWorkItem(index), ...);
(remaining_args._internalExecWorkItem(index), ...);
}

#if defined(ARCANE_COMPILING_SYCL)
//! Applique les fonctors des arguments additionnels.
template <typename... ReducerArgs> static inline ARCCORE_HOST_DEVICE void
applyReducerArgs(sycl::nd_item<1> x, ReducerArgs&... reducer_args)
template <typename... RemainingArgs> static inline ARCCORE_HOST_DEVICE void
applyRemainingArgs(sycl::nd_item<1> x, RemainingArgs&... remaining_args)
{
// Applique les réductions
(reducer_args._internalExecWorkItem(x), ...);
(remaining_args._internalExecWorkItem(x), ...);
}
#endif
};
Expand Down Expand Up @@ -142,8 +142,8 @@ doDirectGPULambdaArrayBounds(LoopBoundType bounds, Lambda func)
}
}

template <typename TraitsType, typename Lambda, typename... ReducerArgs> __global__ void
doIndirectGPULambda2(SmallSpan<const Int32> ids, Lambda func, ReducerArgs... reducer_args)
template <typename TraitsType, typename Lambda, typename... RemainingArgs> __global__ void
doIndirectGPULambda2(SmallSpan<const Int32> ids, Lambda func, RemainingArgs... remaining_args)
{
using BuilderType = TraitsType::BuilderType;
using LocalIdType = BuilderType::ValueType;
Expand All @@ -155,37 +155,37 @@ doIndirectGPULambda2(SmallSpan<const Int32> ids, Lambda func, ReducerArgs... red
Int32 i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < ids.size()) {
LocalIdType lid(ids[i]);
body(BuilderType::create(i, lid), reducer_args...);
body(BuilderType::create(i, lid), remaining_args...);
}
KernelReducerHelper::applyReducerArgs(i, reducer_args...);
KernelRemainingArgsHelper::applyRemainingArgs(i, remaining_args...);
}

template <typename ItemType, typename Lambda, typename... ReducerArgs> __global__ void
doDirectGPULambda2(Int32 vsize, Lambda func, ReducerArgs... reducer_args)
template <typename ItemType, typename Lambda, typename... RemainingArgs> __global__ void
doDirectGPULambda2(Int32 vsize, Lambda func, RemainingArgs... remaining_args)
{
// TODO: a supprimer quand il n'y aura plus les anciennes réductions
auto privatizer = privatize(func);
auto& body = privatizer.privateCopy();

Int32 i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < vsize) {
body(i, reducer_args...);
body(i, remaining_args...);
}
KernelReducerHelper::applyReducerArgs(i, reducer_args...);
KernelRemainingArgsHelper::applyRemainingArgs(i, remaining_args...);
}

template <typename LoopBoundType, typename Lambda, typename... ReducerArgs> __global__ void
doDirectGPULambdaArrayBounds2(LoopBoundType bounds, Lambda func, ReducerArgs... reducer_args)
template <typename LoopBoundType, typename Lambda, typename... RemainingArgs> __global__ void
doDirectGPULambdaArrayBounds2(LoopBoundType bounds, Lambda func, RemainingArgs... remaining_args)
{
// TODO: a supprimer quand il n'y aura plus les anciennes réductions
auto privatizer = privatize(func);
auto& body = privatizer.privateCopy();

Int32 i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < bounds.nbElement()) {
body(bounds.getIndices(i), reducer_args...);
body(bounds.getIndices(i), remaining_args...);
}
KernelReducerHelper::applyReducerArgs(i, reducer_args...);
KernelRemainingArgsHelper::applyRemainingArgs(i, remaining_args...);
}

/*---------------------------------------------------------------------------*/
Expand All @@ -204,16 +204,16 @@ class DoDirectSYCLLambdaArrayBounds
{
public:

void operator()(sycl::nd_item<1> x, LoopBoundType bounds, Lambda func, RemainingArgs... reducer_args) const
void operator()(sycl::nd_item<1> x, LoopBoundType bounds, Lambda func, RemainingArgs... remaining_args) const
{
auto privatizer = privatize(func);
auto& body = privatizer.privateCopy();

Int32 i = static_cast<Int32>(x.get_global_id(0));
if (i < bounds.nbElement()) {
body(bounds.getIndices(i), reducer_args...);
body(bounds.getIndices(i), remaining_args...);
}
KernelReducerHelper::applyReducerArgs(x, reducer_args...);
KernelRemainingArgsHelper::applyRemainingArgs(x, remaining_args...);
}
void operator()(sycl::id<1> x, LoopBoundType bounds, Lambda func) const
{
Expand All @@ -228,12 +228,12 @@ class DoDirectSYCLLambdaArrayBounds
};

//! Boucle 1D avec indirection
template <typename TraitsType, typename Lambda, typename... ReducerArgs>
template <typename TraitsType, typename Lambda, typename... RemainingArgs>
class DoIndirectSYCLLambda
{
public:

void operator()(sycl::nd_item<1> x, SmallSpan<const Int32> ids, Lambda func, ReducerArgs... reducer_args) const
void operator()(sycl::nd_item<1> x, SmallSpan<const Int32> ids, Lambda func, RemainingArgs... reducer_args) const
{
using BuilderType = TraitsType::BuilderType;
using LocalIdType = BuilderType::ValueType;
Expand Down Expand Up @@ -364,14 +364,14 @@ _applyKernelHIP(impl::RunCommandLaunchInfo& launch_info, const HipKernel& kernel
* \param func fonction à exécuter par le noyau
* \param args arguments de la fonction lambda
*/
template <typename SyclKernel, typename Lambda, typename LambdaArgs, typename... ReducerArgs>
template <typename SyclKernel, typename Lambda, typename LambdaArgs, typename... RemainingArgs>
void _applyKernelSYCL(impl::RunCommandLaunchInfo& launch_info, SyclKernel kernel, Lambda& func,
const LambdaArgs& args, [[maybe_unused]] const ReducerArgs&... reducer_args)
const LambdaArgs& args, [[maybe_unused]] const RemainingArgs&... remaining_args)
{
#if defined(ARCANE_COMPILING_SYCL)
sycl::queue s = SyclUtils::toNativeStream(launch_info._internalNativeStream());
sycl::event event;
if constexpr (sizeof...(ReducerArgs) > 0) {
if constexpr (sizeof...(RemainingArgs) > 0) {
auto tbi = launch_info.kernelLaunchArgs();
Int32 b = tbi.nbBlockPerGrid();
Int32 t = tbi.nbThreadPerBlock();
Expand Down
2 changes: 1 addition & 1 deletion arcane/src/arcane/accelerator/Reduce.h
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ template <typename DataType, typename ReduceFunctor>
class HostDeviceReducer2
: public HostDeviceReducerBase<DataType, ReduceFunctor>
{
friend impl::KernelReducerHelper;
friend impl::KernelRemainingArgsHelper;
friend ::Arcane::impl::HostReducerHelper;

public:
Expand Down
Loading
Loading