-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prov/peer: Introducing peer "provider" with peer_cq to begin with #4
base: main
Are you sure you want to change the base?
Conversation
bc6897d
to
c459fc3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 7 files at r1, all commit messages.
Reviewable status: all files reviewed, 13 unresolved discussions (waiting on @grom72)
-- commits
line 2 at r1:
introducing (small letter)
Code quote:
Introducing
-- commits
line 5 at r1:
peer-related
Code quote:
peer related
-- commits
line 9 at r1:
See man/fi_peer.3.md.
- the commit message is in the plain-text format, not MD.
Code quote:
See [man/fi_peer.3.md](man/fi_peer.3.md)
Makefile.am
line 89 at r1 (raw file):
prov/util/src/cuda_ipc_monitor.c \ prov/peer/src/peer_cq.c \ prov/peer/src/peer.h \
too many spaces or tabs (use only tabs, not spaces)
Code quote:
peer.h \
include/ofi_peer.h
line 2 at r1 (raw file):
/* * Copyright (c) 2019-2022 Intel Corporation, Inc. All rights reserved.
Why not just 2022?
Code quote:
2019-2022
include/ofi_peer.h
line 40 at r1 (raw file):
#include <ofi_list.h> #include <ofi_atom.h> #include <ofi_bitmask.h>
Are they really required here?
Code quote:
#include <ofi_atom.h>
#include <ofi_bitmask.h>
include/ofi_peer.h
line 45 at r1 (raw file):
int ofi_peer_cq_open(struct fid_domain *domain, struct fi_cq_attr *attr,
Too many empty lines in between.
Code quote:
#include <ofi_util.h>
int ofi_peer_cq_open(struct fid_domain *domain, struct fi_cq_attr *attr,
include/ofi_peer.h
line 52 at r1 (raw file):
fi_addr_t src); #endif // _OFI_PEER_H_
/* _OFI_PEER_H_ */
- //
is a C++ comment
Code quote:
// _OFI_PEER_H_
prov/coll/src/coll_coll.c
line 726 at r1 (raw file):
ep = container_of(coll_op->ep, struct coll_ep, util_ep.ep_fid); ret = ofi_peer_cq_write(&ep->util_ep.tx_cq->cq_fid, coll_op->context,
redundant space at the end
Code quote:
,·
prov/peer/src/peer.h
line 39 at r1 (raw file):
#if HAVE_CONFIG_H #include <config.h> #endif
What is it for?
Code quote:
#if HAVE_CONFIG_H
#include <config.h>
#endif
prov/peer/src/peer.h
line 43 at r1 (raw file):
#include <ofi.h> #include <ofi_util.h>
redundant empty line
prov/peer/src/peer_cq.c
line 99 at r1 (raw file):
ret = ofi_cq_init(provider, domain, attr, &cq->util_cq, &ofi_cq_progress, NULL);
Doesn't it fit into one line?
Code quote:
ret = ofi_cq_init(provider, domain, attr, &cq->util_cq,
&ofi_cq_progress, NULL);
prov/peer/src/peer_cq.c
line 122 at r1 (raw file):
ssize_t ofi_peer_cq_write(struct fid_cq *cq_fid, void *context, uint64_t flags, size_t len, void *buf, uint64_t data, uint64_t tag, fi_addr_t src)
Doesn't it fit into one line?
Code quote:
size_t len, void *buf, uint64_t data, uint64_t tag,
fi_addr_t src)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 3 of 7 files reviewed, 13 unresolved discussions (waiting on @ldorau)
Previously, ldorau (Lukasz Dorau) wrote…
introducing (small letter)
No - see Sean's info: ofiwg#8278 (comment)
Previously, ldorau (Lukasz Dorau) wrote…
peer-related
Done.
Previously, ldorau (Lukasz Dorau) wrote…
See man/fi_peer.3.md.
- the commit message is in the plain-text format, not MD.
Done.
Makefile.am
line 89 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
too many spaces or tabs (use only tabs, not spaces)
Done.
include/ofi_peer.h
line 2 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
Why not just 2022?
Done.
include/ofi_peer.h
line 40 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
Are they really required here?
Done.
include/ofi_peer.h
line 45 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
Too many empty lines in between.
Done.
include/ofi_peer.h
line 52 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
/* _OFI_PEER_H_ */
-//
is a C++ comment
Done.
prov/coll/src/coll_coll.c
line 726 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
redundant space at the end
Done.
prov/peer/src/peer.h
line 39 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
What is it for?
I do not know yet.
Code quote (from prov/coll/src/coll_coll.c):
ofi_peer_cq_write
prov/peer/src/peer.h
line 43 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
redundant empty line
Done.
prov/peer/src/peer_cq.c
line 99 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
Doesn't it fit into one line?
No
prov/peer/src/peer_cq.c
line 122 at r1 (raw file):
Previously, ldorau (Lukasz Dorau) wrote…
Doesn't it fit into one line?
No - we have only 80 characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @grom72)
Previously, grom72 (Tomasz Gromadzki) wrote…
No - see Sean's info: ofiwg#8278 (comment)
OK
prov/peer/src/peer_cq.c
line 92 at r2 (raw file):
cq->peer_cq = peer_context->cq; ret = ofi_cq_init(provider, domain, attr, &cq->util_cq,
Redundant space at the end
Code quote:
,·
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @grom72)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 9 files at r3, all commit messages.
Reviewable status: 3 of 11 files reviewed, 1 unresolved discussion (waiting on @grom72 and @haichangsi)
Prior to this patch, the provider would ignore the domain name in hints->domain_attr->name even if this was set. This patch fixed this issue. Signed-off-by: Vishwas Dsouza <[email protected]>
This test tests the -d <domain_name> argument of the fabtests. All the EFA domain names are tested with this test with the corresponding fabtest. Signed-off-by: Vishwas Dsouza <[email protected]>
This patch added a call to FI_WARN to print endpoint's address and libfabric version after endpoint is created. Signed-off-by: Wei Zhang <[email protected]>
include/ofi_peer.h file along with prov/peer directory defines space where peer-related common code is placed. peer_cq is the first structure that has been defined. It is based on coll_cq structure. coll provider uses peer_cq instead of removed coll_cq. See man/fi_peer.3.md Signed-off-by: Tomasz Gromadzki <[email protected]>
Adding struct fi_provider* as the first parameter to peer_cq_init() makes it similar to ofi_cq_init(). Signed-off-by: Tomasz Gromadzki <[email protected]>
struct peer_cq is not used at all. All operation are based on fid_peer_cq. There are no real CQ on util_coll. Only fid_peer_cq is binded to util_coll_ep. Signed-off-by: Tomasz Gromadzki <[email protected]>
util_coll_cq is no longer needed as we use only fid_peer_cq to link util_coll_ep and rxm_ep.util_cq directly with help of bind() operation. Signed-off-by: Tomasz Gromadzki <[email protected]>
struct peer_cq is no longer needed. Signed-off-by: Tomasz Gromadzki <[email protected]>
If a posted receive matches with a saved receive, we may need to increment the rx counter. Set the rx counter increment callback to match that of the posted receive. This fixes an assert in xnet_cntr_inc() accessing a NULL cntr_inc function pointer. Program received signal SIGABRT, Aborted. 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #0 0x0000155552d4d37f in raise () from /lib64/libc.so.6 #1 0x0000155552d37db5 in abort () from /lib64/libc.so.6 #2 0x0000155552d37c89 in __assert_fail_base.cold.0 () from /lib64/libc.so.6 #3 0x0000155552d45a76 in __assert_fail () from /lib64/libc.so.6 #4 0x00001555522967f9 in xnet_cntr_inc (ep=0x6e4c70, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:347 #5 0x0000155552296836 in xnet_report_cntr_success (ep=0x6e4c70, cq=0x6ca930, xfer_entry=0x6f7a30) at prov/tcp/src/xnet_cq.c:354 #6 0x000015555229970d in xnet_complete_saved (saved_entry=0x6f7a30) at prov/tcp/src/xnet_progress.c:153 #7 0x0000155552299961 in xnet_recv_saved (saved_entry=0x6f7a30, rx_entry=0x6f7840) at prov/tcp/src/xnet_progress.c:188 #8 0x00001555522946f8 in xnet_srx_tag (srx=0x6dd1c0, recv_entry=0x6f7840) at prov/tcp/src/xnet_srx.c:445 #9 0x0000155552294bb1 in xnet_srx_trecv (ep_fid=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_srx.c:558 ofiwg#10 0x000015555228f60e in fi_trecv (ep=0x6dd1c0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at ./include/rdma/fi_tagged.h:91 ofiwg#11 0x00001555522900a7 in xnet_rdm_trecv (ep_fid=0x6d9fe0, buf=0x6990c4, len=4, desc=0x0, src_addr=0, tag=21474836494, ignore=3458764513820540928, context=0x7ffffffeb180) at prov/tcp/src/xnet_rdm.c:212 Signed-off-by: Sean Hefty <[email protected]>
include/ofi_peer.h file along with prov/peer directory defines space where peer-related common code is placed.
peer_cq is the first structure that has been defined. It is based on the coll_cq structure. coll provider uses peer_cq instead of removed coll_cq.
See man/fi_peer.3.md
Signed-off-by: Tomasz Gromadzki [email protected]
This change is