-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix memory leak issue with AWS API usage #6160
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
@@ -1,81 +1,15 @@ | |||
#include "s3_storage.h" | |||
#include "s3_storage_config.h" | |||
|
|||
#include <contrib/libs/aws-sdk-cpp/aws-cpp-sdk-core/include/aws/core/internal/AWSHttpResourceClient.h> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These includes were not used, I believe
|
||
struct TApiInitializer { | ||
TApiInitializer() { | ||
Options.httpOptions.initAndCleanupCurl = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose there is no need to have a separate TCurlInitializer (which calls curl_global_init
and cleanup
), because if you enable the httpOptions.initAndCleanupCurl
option, then curl_global_init
and cleanup
would be called from AWS SDK
Options.httpOptions.initAndCleanupCurl = false; | ||
InitAPI(Options); | ||
|
||
Internal::CleanupEC2MetadataClient(); // speeds up config construction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is from Internal
namespace. I don't think we should call it. Moreover, TApiInitializer
would be initialized only once per ydbd
application start, so its performance should not be an issue.
Runtime.Reset(); | ||
S3Mock.Reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reset in reverse order of initialization. It does not really matter
ydb/services/ydb/backup_ut/ya.make
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to move BackupRestore tests from ydb_import_ut.cpp
, because its ya.make
builds a huge binary with a lot of tests. None of these tests need Y_TEST_HOOK_BEFORE_RUN
, except mine, so I decided to separate them.
Implement AWS API guard as a YDB GlobalObject instead of calling InitAPI / ShutdownAPI multiple times during the program run.
35b0937
to
3ca7872
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
⚪ ⚪ ⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
⚪
🟢
*please be aware that the difference is based on comparing your commit and the last completed build from the post-commit, check comparation |
@CyberROFL has pointed out the following potential problems:
I believe that the fix proposed in this PR should avoid both problems. Reasons why data races in Aws::S3::S3Client (KIKIMR-11055) are most probably avoidedThere are 6 data races caught in KIKIMR-11055:
As of this PR, Reasons why potential data races in setenv, getenv calls (KIKIMR-12129) are avoidedThis PR does not introduce any new calls to them. Thread sanitizer testsI have run functional tests in
The caught data races are the following:
Data races in AWS SDK stack were not observed. Comparison with the base branch codeI have tested the code of the base commit of this PR from the
Caught data races:
SummaryData races caught in 62 runs of the code in this PR have all been caught in 3 runs of the base branch code. This makes me certain that I have not introduced any new data races. |
@@ -1653,6 +1653,8 @@ TIntrusivePtr<TServiceInitializersList> TKikimrRunner::CreateServiceInitializers | |||
sil->AddServiceInitializer(new TGraphServiceInitializer(runConfig)); | |||
} | |||
|
|||
sil->AddServiceInitializer(new TAwsApiInitializer(*this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No dedicated service mask, always enabled. This would help us to avoid misconfiguration by users.
This reverts commit ff0fd33.
…m#6160)" (ydb-platform#8698)" This reverts commit 2091af7.
#5737
A memory leak can be observed in the production and reproduced with a
ydbd
binary compiled with the--sanitize=leak
flag and aydb import s3
call. Detailed reproduction steps are described in the issue.The exact reason for the leak is unknown to me. However, it seems to be caused by the indeterministic order of static variable destruction. Here is a relevant line from the docs of AWS C++ SDK:
At first glance, it does not seem like
TApiInitializer
objects have static storage duration in the current code. However, if you change theTApiInitializer
object to be a Singleton itself and run the same script as described in the issue, then you will get the exact same leak sanitizer report. It gave me an idea that the underlying issue might be fixed in the same way as a staticTApiInitializer
would be.I have done some research on the topic of memory issues in AWS SDK and that is what I have found:
Proposed fix
So the fix that I propose in this PR is simple: call InitAPI somewhere in the beginning of the
main
function and callShutdownAPI
somewhere close to the end of themain
function. Specifically, makeTApiInitializer
a data member ofTKikimrRunner
.Unit tests have to call
InitAPI
andShutdownAPI
explicitly before each test run. It must be done once per process, so a unit test hook is a good choice because individual test cases are run sequentially in a for loop in the same process by default.