Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tcp mt #7

Closed
wants to merge 4 commits into from
Closed

Tcp mt #7

wants to merge 4 commits into from

Conversation

ooststep
Copy link
Owner

No description provided.

@ooststep ooststep force-pushed the tcp-mt branch 2 times, most recently from 6499fa6 to 17fd7cc Compare June 25, 2024 15:47
@ooststep ooststep force-pushed the tcp-mt branch 3 times, most recently from 1146426 to 838a840 Compare July 3, 2024 18:48
@ooststep ooststep force-pushed the tcp-mt branch 2 times, most recently from 68e1651 to bdd6d5f Compare July 9, 2024 21:45
Copy link

@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long. Wanted to be thorough and also kept getting distracted... I think it looks good! Can't see any glaring issues

@@ -313,6 +341,7 @@ int ofi_av_insert_addr(struct util_av *av, const void *addr, fi_addr_t *fi_addr)
FI_INFO(av->prov, FI_LOG_AV, "fi_addr: %" PRIu64 "\n",
ofi_buf_index(entry));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop line

Comment on lines 525 to 506
int xnet_multiplex_av_open(struct fid_domain *domain_fid, struct fi_av_attr *attr,
struct fid_av **fid_av, void *context);

int xnet_domain_open(struct fid_fabric *fabric, struct fi_info *info,
struct fid_domain **domain, void *context);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate definitions of these two calls. I also think the only reason xnet_domain_open was moved up here was to call in create_subdomain which I think can be removed so likely need to undo this change

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xnet_domain_open is the original default, we add the check to see if the domain is multiplexed there. it already existed, just moved a few lines (see line 547 on the old version of this file)

Copy link

@aingerson aingerson Jul 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I'm talking about lines 499-503 in the new file - it looks like you added it twice up here. But I don't think they need to move because I think they had to move to call in the new static inline for create_subdomains which is never called. So think this whole block can be dropped except the xnet_domain_multiplexed call. At least that's how I'm reading it

struct fid_domain **domain, void *context);

static inline
int xnet_create_subdomain(struct xnet_domain *domain, struct xnet_domain **subdomain)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we ever call create_subdomain anymore - I think we change it to just call fi_domain directly

@@ -0,0 +1,234 @@
/*
* Copyright (c) 2017-2022 Intel Corporation. All rights reserved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop years

*/

#include <stdlib.h>
#include <string.h>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be wrong but double check that this header is needed. I don't see an obvious reason for it

struct fid_list_entry *item;

domain = container_of(fid, struct xnet_domain, util_domain.domain_fid.fid);
while(!dlist_empty(&domain->subdomain_list)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space after while

domain = calloc(1, sizeof(*domain));
if (!domain)
return -FI_ENOMEM;

ret = ofi_domain_init(fabric_fid, info, &domain->util_domain, context,
OFI_LOCK_NONE);
info->domain_attr->threading == FI_THREAD_SAFE ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember why we added this - was it a separate optimization or was it needed for these changes?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should actually be removed. think it's left over from some earlier changes and likely has no real affect (since our new changes only apply to FI_THREAD_COMPLETION and this code is in the non-FI_THREAD_COMPLETION path, it should remain unchanged)

for (i = 0; i < resp->hdr.base_hdr.rma_iov_cnt; i++) {
ret = ofi_mr_verify(&ep->util_ep.domain->mr_map, rma_iov[i].len,
(uintptr_t *) &rma_iov[i].addr,
ret = ofi_mr_verify(&domain->util_domain.mr_map,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change was added before we moved everything to use the xnet_domain so I think we can take this out

.flags = 0,
};
size_t addr_size;
char addr[sizeof(struct sockaddr_in6)];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a line after this

return NULL;
}

void set_subdomain(struct xnet_rdm *rdm, struct xnet_domain *domain,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static

@ooststep ooststep force-pushed the tcp-mt branch 2 times, most recently from b3acd52 to 225862c Compare July 11, 2024 18:33
Comment on lines 499 to 501
int xnet_domain_open(struct fid_fabric *fabric, struct fi_info *info,
struct fid_domain **domain, void *context);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move this back down since it's not needed anymore (unless you want to have it up here for organization purposes 🤷‍♀️ )

@ooststep ooststep force-pushed the tcp-mt branch 2 times, most recently from eb38aac to 1d50324 Compare July 11, 2024 19:15
@@ -52,7 +52,6 @@ dup_mr_attr(const struct fi_mr_attr *attr, uint64_t flags)
dup_attr->mr_iov = (struct iovec *) (dup_attr + 1);

/*
* dup_mr_attr is only used insided ofi_mr_map_insert.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry one last comment - this was changed from static and declared in the header but I don't see it called in this PR?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably leftover crud from a prior attempt. will revert

@ooststep ooststep force-pushed the tcp-mt branch 2 times, most recently from 0587a3f to 61a00b5 Compare July 11, 2024 19:58
ooststep and others added 4 commits July 30, 2024 11:00
replace the debug-only allocated check with a check usable in release
builds by checking presence in the free list

Co-authored-by: Alexia Ingerson <alexia.ingerson@intel.com>
Signed-off-by: Stephen Oost <stephen.oost@intel.com>
there situations where being able to insert addresses into
specific indexes of the av would be convienient

example: enable the tcp provider to duplicate established
avs into other domains

Co-authored-by: Alexia Ingerson <alexia.ingerson@intel.com>
Signed-off-by: Stephen Oost <stephen.oost@intel.com>
Co-authored-by: Alexia Ingerson <alexia.ingerson@intel.com>
Signed-off-by: Stephen Oost <stephen.oost@intel.com>
By default, the tcp provider is optimized for single threaded
applications that make use of FI_THREAD_DOMAIN but is inefficient for
multithreaded applications that intend to follow FI_THREAD_COMPLETION
semantics.  The primary limiting factoe  is that the progress engine is
tied to the domain, and all endpoint under a single domain are
synchronized by this lone progress engine

In an effort to increase efficiency and performance of these
multithreaded applications, multiplex the application's domain reference
into a subdomain per ep, each having their own progress engine

Co-authored-by: Alexia Ingerson <alexia.ingerson@intel.com>
Signed-off-by: Stephen Oost <stephen.oost@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants