-
Notifications
You must be signed in to change notification settings - Fork 820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iocp: fix crash, GetQueuedCompletionStatus() write freed WSAOVERLAPPED memory #4136
Conversation
Great! I assume you've tested this and it does fix the issue :) I think we also need to run under somekind of stress test, e.g: using ioqueue test in pjlib-test, to make sure all memory pools (of pending ops) are properly released. Note that the after an ioqueue key is unregistered, the key will be put into the closing-key-list and soon into the free-key-list to be reused by another socket. We need to make sure that all pending op has been freed before the key is freed & reused. Next, perhaps we can apply a little bit optimization, e.g: instead of mem-pool for each pending-op, perhaps mem-pool per ioqueue-key to avoid multiple alloc+free for multiple pending-op, using same mechanism as ioqueue key (employing additional list for keeping unused pending-op instances to be reused later). |
Note: |
When |
Tried to run
Not sure if this is the same issue, but this assertion does not happen when using ioqueue select. |
@nanangizz no this patch, Is there this assert? |
Yes, same assert without this patch. |
I found the reason: key double unregister. |
Thanks @jimying . Honestly I haven't got a chance to reproduce the original issue and test the proposed solution. I believe you are using this ioqueue in real world, experienced the issue, and find this solution does work, is that correct? Next, here are few notes about the proposed solution:
Also, this ioqueue has been disabled for quite sometime and some improvement in the ioqueue area may not be integrated into this ioqueue, e.g: group lock for key. So please understand that there may still be some steps required to enable this ioqueue again :) |
@nanangizz i write a simple demo to reproduce the crash issue in msys2, #4172 I have tested it, in old code, it can 100% reproduce the crash. To test new code we can git cherry-pick the demo patch to this branch. |
Thanks @jimying. |
…D memory try to fix issue pjsip#985
new commits do:
|
The pool is owned by key/socket, instead of by ioqueue, to avoid possible infinite memory grow in ioqueue.
Update ioq_stress_test not to use the global group lock for key registration, as otherwise the keys won't be released until the global group lock is destroyed (i.e: after ioqueue destroy).
I think this is ready for review @jimying , @sauwming , @trengginas. |
- added info to clarify codes - added copyright for test code - minors.
The last commit should cover all review comments above. @jimying, re: copyright text, feel free to change the name :) |
@@ -0,0 +1,208 @@ | |||
/* | |||
* Copyright (C) 2024 jimying at github dot com. | |||
* Copyright (C) 2024 Teluu Inc. (http://www.teluu.com) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, Teluu is usually put above, for two reasons: 1. to signify that the original author has agreed to contribute it to Teluu (as per CLA), 2. to make it easier to update copyright year (i.e. only the latest/first copyright info will get updated, the rest will remain the same).
* operations must be cancelled. As cancelling ops is asynchronous, | ||
* IOCP destroy may need to wait for the maximum time specified here. | ||
*/ | ||
#define TIMEOUT_CANCEL_OP 5000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIMEOUT_CANCEL_OP macro is unused. WAIT_KEY_MS (in pj_ioqueue_destroy()) the same value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually WAIT_KEY_MS should have been replaced by TIMEOUT_CANCEL_OP, so WAIT_KEY_MS is unused (and undefined).
pjlib/src/pj/ioqueue_winnt.c
Outdated
|
||
pj_list_push_back(&ioqueue->free_list, key); | ||
} | ||
#endif | ||
ioqueue->max_fd = pj_list_size(&ioqueue->free_list); // max_fd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why compute again use pj_list_size()? better revert to ioqueue->max_fd= max_fd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops right, forgot to revert.
pj_gettickcount(&timeout); | ||
if (PJ_TIME_VAL_GTE(timeout, stop)) { | ||
PJ_LOG(3, (THIS_FILE, "Warning, IOCP destroy timeout in waiting " | ||
"for cancelling ops, after %dms, pending keys=%d", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor build warning: format '%d' expects argument of type 'int', but argument 4 has type 'pj_size_t'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
commit 1e2f121 Author: sauwming <ming@teluu.com> Date: Thu Feb 13 10:30:58 2025 +0800 Fixed CI Mac failure (#4304) commit 10b4d30 Author: sauwming <ming@teluu.com> Date: Thu Feb 13 09:06:54 2025 +0800 Audio and video stream refactoring (#4300) commit 6cbf0e6 Author: Riza Sulistyo <trengginas@users.noreply.github.com> Date: Wed Feb 12 15:23:37 2025 +0700 Check if ice strans is valid before using it to send (#4301) commit fdd4041 Author: Maciej Lisowski <39798354+MaciejDromin@users.noreply.github.com> Date: Tue Feb 11 12:48:17 2025 +0100 Add missing OnTimerParam import in Android Example (#4299) commit 4ded10f Author: sauwming <ming@teluu.com> Date: Fri Feb 7 13:27:00 2025 +0800 Fixed msg_data assertion in pjsua_acc_send_request() API (#4298) commit 65c4bc9 Author: sauwming <ming@teluu.com> Date: Fri Feb 7 07:18:22 2025 +0800 Fix pjsua sample app user agent (#4296) commit c53ace9 Author: sauwming <ming@teluu.com> Date: Fri Feb 7 07:18:09 2025 +0800 Fixed Java make clean error (#4297) commit e6196ad Author: LeonidGoltsblat <138720759+LeonidGoltsblat@users.noreply.github.com> Date: Fri Feb 7 02:15:10 2025 +0300 Aligned memory allocation (#4277) * aligned memory allocaion * Fix alt API implementations (PJ_HAS_POOL_ALT_API) * pool test: add testing for bug in pj_pool_allocate_find with big alignment, and refactor to use unit test API * misc fixes on code review * pool_dbg alignment support + incompatible tests disabled for PJ_HAS_POOL_ALT_API --------- Co-authored-by: bennylp <bennylp@pjsip.org> commit 2fff775 Author: sauwming <ming@teluu.com> Date: Thu Feb 6 13:11:01 2025 +0800 Add API to register custom SDP comparison callback (#4286) commit 99b4d1e Author: sauwming <ming@teluu.com> Date: Thu Feb 6 13:10:38 2025 +0800 Fixed issue with SDP version when reoffer is rejected (#4289) commit 0252152 Author: Benny Prijono <bennylp@pjsip.org> Date: Thu Feb 6 11:09:20 2025 +0700 Use cirunner to capture and analyze GitHub action CI crash (#4288) * Windows runner implementation * Set timeout * Remove initial implementation of ci-runner here (it is on separate repo now) * Remove crash handling (-n) in main.c of unit tests * Install cirunner to CI workflows * Adding crash to timestamp test * Fix missing cirunner in one of the job * Reinstall core_pattern on Linux * Removed intentional crash in timestamp_test() * Upload program and core dump on crash * Add crash code in uri_test.c * Removed injected crash in uri_test. Disable stdout/stderr buffering for unit tests * Minor: remove space left out by previous clean up commit 3fcce51 Author: sauwming <ming@teluu.com> Date: Wed Feb 5 11:43:27 2025 +0800 Fixed OpenSSL log error reading cert (#4291) commit cbfbbc4 Author: jimying <yingqw.js@gmail.com> Date: Wed Feb 5 11:03:53 2025 +0800 iocp: fix crash, GetQueuedCompletionStatus() write freed WSAOVERLAPPED memory (#4136) commit 205baf0 Author: sauwming <ming@teluu.com> Date: Tue Feb 4 17:04:25 2025 +0800 Fixed warnings in sip auth client (#4287) commit abffe0d Author: sauwming <ming@teluu.com> Date: Tue Feb 4 08:38:55 2025 +0800 Fixed CI test failure (#4284) commit 986fc78 Author: Johannes <johannes.westhuis@gmail.com> Date: Mon Feb 3 08:31:28 2025 +0100 Share an auth session between multiple dialogs/regc (#4262) commit 46111c4 Author: Nanang Izzuddin <nanang@teluu.com> Date: Mon Feb 3 11:36:54 2025 +0700 Best effort avoid crash when media transport adapter not using group lock (#4281) commit f986ad8 Author: Benny Prijono <bennylp@pjsip.org> Date: Fri Jan 31 09:45:19 2025 +0700 Add link to coding style documentation (#4280) commit 727ee32 Author: Nanang Izzuddin <nanang@teluu.com> Date: Fri Jan 31 09:08:02 2025 +0700 Fix build error when PJ_LOG_MAX_LEVEL is zero (#4279) The `pj_log_get_log_func()` is not defined when PJ_LOG_MAX_LEVEL is set to zero. Thanks to Giorgio Alfarano for the report. commit dae52f6 Author: Perry Ismangil <perry@teluu.com> Date: Thu Jan 30 08:43:54 2025 +0000 Fixing typo (#4274) Acoustic commit 1a4cd67 Author: sauwming <ming@teluu.com> Date: Thu Jan 30 15:21:49 2025 +0800 Modify iOS sample apps dev team ID (#4278) commit dfcfa13 Author: Tarteszeus <37761609+Tarteszeus@users.noreply.github.com> Date: Thu Jan 30 02:42:36 2025 +0100 Add queried names to server address record, and add the address record in parameter for on_verify_cb callback (#4256) commit f9e56d8 Author: Jan Tojnar <jtojnar@gmail.com> Date: Wed Jan 29 07:42:18 2025 +0100 Fix duplicate function name in 100rel docs (#4275) commit 960597e Author: Nanang Izzuddin <nanang@teluu.com> Date: Wed Jan 29 13:36:34 2025 +0700 Various works on SWIG Java (#4273) * Various works on SWIG Java 1. Fix type mapping (SWIGTYPE_*): a. Map C "void*" & "void**" to Java long (was SWIGTYPE_p_void & SWIGTYPE_p_p_void which are not really usable), this should fix #4242. b. Map pjmedia_aud_dev_index to int. c. Map unsigned char[20] for SslCertInfo.serialNo to Java "short array" This also updates pjsua.i, e.g: tab->space, reorder things. 2. Update swig_java_pjsua2.vcxproj: a. Rename config "Debug" & "Release" to "Debug-Dynamic" & "Release-Dynamic" in , as the project actually builds dynamic libs. Also fix the property sheet dependencies from *-static to *-dynamic. b. Update other settings, e.g: built tool version from 140 to 143. 3. Update symbols.lst: added missing new types, tab->space, reorder alphabetically. * Update ci-win.yml * Add sample code for passing user data using utilTimerSchedule() * Add sample for cancelling timer commit c36ed2c Author: Benny Prijono <bennylp@pjsip.org> Date: Tue Jan 28 16:58:55 2025 +0700 Minor modifications to Android build and samples to match new documentation (#4271) * To streamline the command, also clean swig and pjsua jni output directories when make distclean and realclean is called * Kotlin sample: add account, modify video size and bandwidth, and audio codec priorities to use AMR-WB * Android CLI app: fix armeabi hardcoded arch and also copy stdc++.so commit bab33d6 Author: Noel Morgan <noel@vwci.com> Date: Tue Jan 28 01:51:50 2025 -0600 Added support for updated RFC7866 content-type sub type with XML extension (#4270) commit a89917e Author: sauwming <ming@teluu.com> Date: Fri Jan 24 14:49:59 2025 +0800 OpenSSL: Set ciphersuites only if not using BoringSSL (#4269) commit 377a80c Author: Nanang Izzuddin <nanang@teluu.com> Date: Fri Jan 24 13:48:31 2025 +0700 Fix various compile errors & warnings in MSVC2005 (#4268) commit de3f2e1 Author: sauwming <ming@teluu.com> Date: Fri Jan 24 10:20:28 2025 +0800 Set CI vars in GH workflow file (#4263) commit cdb1294 Author: sauwming <ming@teluu.com> Date: Thu Jan 23 11:01:27 2025 +0800 Various fixes for Apple SSL backend (#4257)
Try to fix issue #985. The idea is to call CancelIoEx() for the unregistering socket/key to cancel all pending operations of the key. However, as
CancelIoEx()
is basically asynchronous, this also makes the key unregistration asynchronous, so here are some consequences: