-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
main: use hash list for fds instead of fixed array #428
Conversation
b0954f8
to
9e9c8ef
Compare
7409fa0
to
535659d
Compare
For some reason I was not notified of this PR. I haven't looked closely yet, but from the first glance the |
I have refactored the poll backend with a independent index value now. |
4a7152c
to
dff8575
Compare
If this PR does not meet your multithreading needs I think its better to close this one too (keeps libre simple) and you can maintain your own patch easier. |
Just to be clear, I think with some work it could be made an acceptable solution. In particular, I don't mind against a hash table in principle, I just noted that it should be used with care. |
99e93bf
to
880869c
Compare
7c235d6
to
970aeca
Compare
91959bc
to
7ff2086
Compare
cb1bc55
to
87bf668
Compare
in general I am worried about such a large change in main I think we can extend the windows mode to use binary search, and make it generic in any case we have to test that it works with a very large set of file descriptors, |
I tried to keep the changes as minimal and fast as possible, but yes we should test this very well. This PR tries to fix two primary issues:
@Lastique do you see any remaining ABA problems? If you are worried about I did some performance tests, but will do some more (with static uint32_t fhs_hash(struct re *re, re_sock_t fd)
{
if (fd < re->maxfds)
return (uint32_t)fd;
... To avoid |
a good way to pack struct members is to put the largest elements first, example
the compiler will try to get small size and good performance the "bool active" flag can be moved to "int flags" is it an option to use a large sorted array and |
52e9a22
to
c168893
Compare
I see no real benefit here, static uint32_t fhs_hash(struct re *re, re_sock_t fd)
{
if (fd < re->maxfds)
return (uint32_t)fd;
... This should be the common case, since also on windows Here are some performance measurements for a TCP Echo Server (maxfds=1024, active=50, idle=64):
And some simple HTTP (no keepalive):
libre main epoll (maxfds=1024)
libre main select (maxfds=1024)
libre fhs_hash epoll (maxfds=1024)
libre fhs_hash select (maxfds=1024)
libre fhs_hash epoll (maxfds=64)
libre fhs_hash select (maxfds=64)
https://gist.github.com/sreimers/337c9ee783cbe6298d7352b338ed8c3c Linux Kernel 6.1.7 - AMD EPYC 3251 8-Core Processor
Right, but $ pahole -Cfhs build/libre.a
struct fhs {
struct le le; /* 0 32 */
re_sock_t fd; /* 32 4 */
int flags; /* 36 4 */
_Bool active; /* 40 1 */
/* XXX 7 bytes hole, try to pack */
fd_h * fh; /* 48 8 */
void * arg; /* 56 8 */
/* size: 64, cachelines: 1, members: 6 */
/* sum members: 57, holes: 1, sum holes: 7 */
}; |
8473afd
to
4d34ec1
Compare
I think, the suggestion was to use one bit in But in practice that wouldn't save any memory. On x86-64 dynamic memory allocations are 16-byte aligned, which means the real amount of memory reserved for one allocation must be divisible by 16. You're using PS: Again, due to alignment only to 16 bytes, |
4d34ec1
to
1804b0f
Compare
The polling method currently can be changed in runtime. Is this an important usecase ? If we change the polling method from runtime to compile-time config, it would simplify in my simplistic view, the typical method would be:
|
Or maybe enough to allow it once before |
re_sock_t fd; /**< File Descriptor */ | ||
int flags; /**< Polling flags (Read, Write, etc.) */ | ||
fd_h* fh; /**< Event handler */ | ||
void* arg; /**< Handler argument */ | ||
bool active; /**< In-use */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might be possible to move storage of this struct to the application.
in this case there is no need for array/hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm not fully understand your suggestion, wouldn't this break all applications badly? Handling all the event details is not easy to implement. So we have to implement it twice (retest and baresip)?
I can think of some memory size optimizations for fhs
:
- instead of
mem_zalloc
we can usemalloc/memset/free
directly this saves 16 byte per fhs entry. struct le
can be replaced by avoid *next
for a simple linked list and custom hash lookup solution. This saves 24 byte per fhs entry and 8 byte per bucket. the fhs pointer could be re-calculated withoffsetof
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you look at struct tmr
, the object is stored by the user
the idea was to do something similar, example:
int fd_listen(struct fhs *fhs, re_sock_t fd, int flags, fd_h *fh, void *arg);
void fd_close(struct fhs *fhs);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, but I think this is problematic for multithreading. If fhs
is destroyed by the application while fd_poll
is waiting a dangling fhs
is used. But maybe we can simply mem_ref fhs
to avoid this. I will think about this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just gave this approach a try: #805 (not ready yet)
in this case, how many entries do we need in the epoll/kqueue struct ? #ifdef HAVE_EPOLL
struct epoll_event *events; /**< Event set for epoll() */
int epfd; /**< epoll control file descriptor */
#endif
#ifdef HAVE_KQUEUE
struct kevent *evlist;
int kqfd;
#endif |
We can control this size with |
Just a thought, could a library like wepoll (BSD-2-clause) be integrated into re for Windows to emulate HAVE_EPOLL? I have been using libre privately with this method for some time now. |
I think this PR is taking way too long as it is. IMHO, the working refactored version should be merged first, and further optimizations and extensions can be worked on later in separate PRs. |
hi Lastique, if you are interested in this work, perhaps you can help to test it ? I am not sure what is next regarding this work. Personally I dont need it for any of my projects. If we are not sure what to do, we could close it down and revisit it later... Alfred |
I am interested in a less limited and more efficient implementation of the main loop. Once this work is finished and released, I would try and use it instead of our own patch we are currently using. Regarding what to do with the PR, I think it should be merged when it is finished. IMHO, the scope of this work should not be conflated with further optimizations - those can come later in separate PRs. Just get this thing to a finished working state and then merge. |
closed in favor of #805 |
If an application uses multiple (maybe thousands) of libre event loop threads, the unused pre-allocated memory for
fhs
can be quite high. This PR uses a dynamic hash list for keepingfhs
structs. This wayEPOLL
andKQUEUE
have no hard fd limit anymore (re->maxfds
limits only returned active event handlers perfd_poll
). The fd limit forSELECT
stays the same.I did some benchmarks (between this PR and main) and performance differences for
fd_listen
andfd_close
are negligible if the hash bucket size is well choosen (depends on re->maxfds, which can be set with fd_setsize() like before).