Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prov/sharp: Add mocks for SHARP #6

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

ldorau
Copy link
Member

@ldorau ldorau commented Dec 6, 2022

This patch requires the original SHARP header in /usr/include/mellanox/sharp.h

Signed-off-by: Lukasz Dorau [email protected]

Requires:


This change is Reviewable

grom72 and others added 10 commits December 5, 2022 14:01
coll_cq implementation can be reused by other collective providers.

Signed-off-by: Tomasz Gromadzki <[email protected]>
…initialization

It is rxm provider responsability to initialize collective offload provider's fabric.
Otherwise collective offload functionality will not be available

Signed-off-by: Tomasz Gromadzki <[email protected]>
…IDER

FI_OFFLOAD_PROVIDER environment variable shall be set to offload provider name
to instruct libcabric to setup and use particular provider.

Signed-off-by: Tomasz Gromadzki <[email protected]>
Peer provider must create peer_eq for offload provider, to allow offload provider
reporting events to peer provider.

Signed-off-by: Tomasz Gromadzki <[email protected]>
Offload provider may execute collective operations via util_coll provider.
It must call fi_join() operation to get struct mc required for collective operations.
It can only call fi_join() on it's peer provider (e.g. rxm). FI_PEER flag is used
to inform peer provider to coll fi_join() operation for util_coll_ep

Signed-off-by: Tomasz Gromadzki <[email protected]>
offload_coll_mask value is calculated based on the actual offload capabilities
confirmed by fi_query_collective().

Signed-off-by: Tomasz Gromadzki <[email protected]>
This patch requires the original SHARP header in
/usr/include/mellanox/sharp.h

Signed-off-by: Lukasz Dorau <[email protected]>
@ldorau ldorau marked this pull request as draft December 6, 2022 20:53
grom72 pushed a commit that referenced this pull request Mar 24, 2023
If a posted receive matches with a saved receive, we may need to
increment the rx counter.  Set the rx counter increment callback
to match that of the posted receive.  This fixes an assert in
xnet_cntr_inc() accessing a NULL cntr_inc function pointer.

Program received signal SIGABRT, Aborted.
0x0000155552d4d37f in raise () from /lib64/libc.so.6
#0  0x0000155552d4d37f in raise () from /lib64/libc.so.6
#1  0x0000155552d37db5 in abort () from /lib64/libc.so.6
#2  0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6
#3  0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6
#4  0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347
#5  0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354
#6  0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153
#7  0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188
#8  0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445
#9  0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558
ofiwg#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91
ofiwg#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212

Signed-off-by: Sean Hefty <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants