Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edited mutex lock of dlt_housekeeper_running_mutex #668

Merged
merged 1 commit into from
Oct 16, 2024

Conversation

ranoongs
Copy link
Contributor

The original place of mutex lock was wrong because it didn't protect dlt_housekeeper_running
and there was no mutex lock when sending signal.

    The original place of mutex lock was wrong because it didn't protect dlt_housekeeper_running
    and there was no mutex lock when sending signal.
@ranoongs
Copy link
Contributor Author

ranoongs commented Aug 5, 2024

@michael-methner
Hello, it's my first time to pull request here.
Is there any rule about PR?

minminlittleshrimp

This comment was marked as duplicate.

@@ -3827,13 +3827,18 @@ void dlt_user_housekeeperthread_function(void *ptr)

pthread_cleanup_push(dlt_user_cleanup_handler, NULL);


pthread_mutex_lock(&dlt_housekeeper_running_mutex);

// signal dlt thread to be running
*dlt_housekeeper_running = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ranoongs
The variable you trying to lock is an atomic type and it is totally free from race data.
Hence I believe we do not need to lock the variable.
Kindly check this point.

Copy link
Contributor Author

@ranoongs ranoongs Aug 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minminlittleshrimp

Yes, as you said, dlt_housekeeper_running is atomic.

The objective for this commit is to avoid unnecessary pthread_cond_timedwait's timeout.

In rare cases, pthread_cond_timedwait become true right going into while roop.
If this happens, the application who uses DLT library has to wait pthread_cond_timewait's timeout (0.5 sec)
I know 0.5 sec might not be long but, in instruments like cluster this might be a problem.

In the latest codes, pthread_mutex_lock cannot prevent this situation.

Please kindly consider this.

1

@minminlittleshrimp minminlittleshrimp self-assigned this Aug 14, 2024
@lti9hc
Copy link
Collaborator

lti9hc commented Oct 9, 2024

Hello @ranoongs,

You can refer to #491 to know the purpose of using dlt_housekeeper_running_mutex
Moreover, 0.5 sec for timeout is maximum value to wait, it will exit immediately when receiving signal by pthread_cond_signal
For me, This PR should be rejected

@ranoongs
Copy link
Contributor Author

ranoongs commented Oct 9, 2024

@lti9hc
Hello,
My point is pthread_mutex_lock(&dlt_housekeeper_running_mutex) is dead code because it misses the corner case I explained above. Yes, I know that 0.5 seconds might not be a big deal but I just want to prevent unnecessary waiting. If the pthread_cond_signal is done 'after' getting into the while loop and 'before' pthread_cond_timedwait, it just waits unnecessary 0.5 seconds. Then what's the point of mutex lock?

If you look into manual page of pthread_cond_timedwait(), mutex guards are out side the while loop. All the examplary source codes use mutex lock to cover all the critical sections...

https://man7.org/linux/man-pages/man3/pthread_cond_timedwait.3p.html
image

If you say 0.5 seconds is not critical, Well... then I have nothing to say. You can just reject it.

@lti9hc
Copy link
Collaborator

lti9hc commented Oct 9, 2024

Hello @ranoongs,

Thanks for your response quickly,

I agree with your point " If the pthread_cond_signal is done 'after' getting into the while loop and 'before' pthread_cond_timedwait, it just waits unnecessary 0.5 seconds"

but with your change, what happen if pthread_mutex_lock(&dlt_housekeeper_running_mutex) in dlt_start_threads function will own mutex first? the dlt_user_housekeeperthread_function thread will block until getting mutex
-> it will wait 0.5s by pthread_cond_timedwait

@ranoongs
Copy link
Contributor Author

ranoongs commented Oct 9, 2024

@lti9hc
I thank you too for the quick reply.
what happen if pthread_mutex_lock(&dlt_housekeeper_running_mutex) in dlt_start_threads function will own mutex first?
-> It locks the mutex , changes the variable dlt_housekeeper_running to be true, and unlocks the mutex. The while loop has not started so far because the mutex was not free. Now, the while loop checks the condition !dlt_housekeeper_running is false so the while loop does not initiate. This case shows how the mutex guard successfully protects the critical section.

@lti9hc
Copy link
Collaborator

lti9hc commented Oct 9, 2024

@ranoongs ,
The case you mention is housekeeperthread thread will owns the mutex first
My case is opposite your case

@ranoongs
Copy link
Contributor Author

ranoongs commented Oct 9, 2024

@lti9hc
Oh sorry.
In that case, it locks the mutex, pthread_cond_timedwait() starts, and it unlocks that mutex. Then after, pthread_cond_signal() is activated, and the pthread_cond_timedwait() gets released. This case also shows that mutex guard protected the critical section because the corner case I mentioned is prevented. The pthread_cond_signal() cannot activate between the start of while loop and pthread_cond_timedwait().

@lti9hc
Copy link
Collaborator

lti9hc commented Oct 10, 2024

@ranoongs

the pthread_cond_timedwait() will get timeout

The work flow is :
pthread_mutex_lock(&dlt_housekeeper_running_mutex);
while (!dlt_housekeeper_running && now.tv_sec <= time_to_wait.tv_sec)
pthread_cond_timedwait()
pthread_mutex_unlock(&dlt_housekeeper_running_mutex);

....
//waiting the mutex release
dlt_housekeeper_running = true;
pthread_cond_signal(&dlt_housekeeper_running_cond);
pthread_mutex_unlock(&dlt_housekeeper_running_mutex);

it may get the log "dlt_log(LOG_CRIT, "Failed to wait for house keeper thread!\n");"
and then stop all thread

@ranoongs
Copy link
Contributor Author

ranoongs commented Oct 10, 2024

@lti9hc
No, the work flow is as below...

pthread_mutex_lock(&dlt_housekeeper_running_mutex);
while (!dlt_housekeeper_running && now.tv_sec <= time_to_wait.tv_sec)
pthread_cond_timedwait() -> and it unlocks the mutex
pthread_mutex_unlock(&dlt_housekeeper_running_mutex);

....
//waiting the mutex release (X) -> pthread_mutex_lock(&dlt_housekeeper_running_mutex);
dlt_housekeeper_running = true;
pthread_cond_signal(&dlt_housekeeper_running_cond);
pthread_mutex_unlock(&dlt_housekeeper_running_mutex);

I share the test result.

image

image

image

@lti9hc
Copy link
Collaborator

lti9hc commented Oct 10, 2024

@ranoongs
Thanks for your explaining
I am done for review

@alexander-mohr could you please take time to have a look at this PR?

@lti9hc
Copy link
Collaborator

lti9hc commented Oct 16, 2024

It looks good and merge now

Thanks you @ranoongs

@lti9hc lti9hc merged commit 4e7ecb4 into COVESA:master Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants