Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

log.h improvements #3219

Draft
wants to merge 16 commits into
base: master
Choose a base branch
from
Draft

log.h improvements #3219

wants to merge 16 commits into from

Conversation

staviq
Copy link
Contributor

@staviq staviq commented Sep 16, 2023

WIP improvements for log.h:

  • Reduced performance overhead of log handler.
  • State data moved from static variables inside log_handler1_impl function to LogStateWrapper.
  • Thread safety.
  • Cmd arguments now properly override logs behavior.
  • --log-disable is now consistent with pre-log.h prints and compilation with LOG_DISABLE_LOGS

Code still needs cleanup, and probably lots of things can be simplified further.

Ideas/questions for further improvements:

  • Log file name could be extracted from binary name via argv[0], removing the need for specifying log filename at all.
  • I don't see multiple log targets (files) being used, does anybody have use for that ? Could support for multiple log files possibly be removed for simplicity ?

I'm open for any suggestions and feedback.

Copy link
Collaborator

@cebtenzzre cebtenzzre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still a draft, but I made a few style comments.

common/log.h Outdated Show resolved Hide resolved
common/log.h Outdated Show resolved Hide resolved
common/log.h Outdated Show resolved Hide resolved
common/log.h Outdated Show resolved Hide resolved
@staviq
Copy link
Contributor Author

staviq commented Sep 16, 2023

@cebtenzzre

I know this is still a draft, but I made a few style comments.

Thanks for checking it out, is there some sort of "official" .clang-format config, or equivalent ? It would make things a bit simpler, instead of bothering you unnecessarily :)

Edit: As a sidenote, cuda CI takes forever, and I think it downloads the whole cuda package unnecessarily:

sub-packages: '["nvcc", "cudart", "cublas", "cublas_dev", "thrust", "visual_studio_integration"]'

But the docs say, if sub-packages are used, it can download only those, when 'network' method is used: https://github.com/marketplace/actions/cuda-toolkit#method

Which seems to me like it was the original goal, but the 'method' isn't defined in the CI ?

@cebtenzzre
Copy link
Collaborator

is there some sort of "official" .clang-format config, or equivalent ?

I don't believe there is. I tried out clang-format a little bit but I don't think it's flexible enough to match our current code style. Btw, some of the lines added in this PR are still too long.

{
static LogStateWrapper inst;
return inst;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Singletons are generally considered an anti-pattern. Maybe a namespace, or possibly a class with a deleted constructor and static fields and methods, would be more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I realize that most people seem to hate singletons with passion, in this particular case it's doing exactly what I wanted it to do, by being global and having runtime initialization, and without introducing additional "log_initialize" functionality, singleton or not, something will end up being a function, returning it's local static variable (struct/object), at least the way I'm thinking about it now.

If you have a particular problem in mind, that this can cause, I'll do it your way in an instant, no questions asked.

But otherwise, I'm leaning towards this staying this way, unless I figure a way to do it without making it more complicated.

I'll sleep on it, and see if I can figure out something better tomorrow.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also just have a single, global instance of the class, and make any methods that call instance() non-static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I'm trying to visialize if this doesn't create a potential situation where a LOG might be called before that instance is initialized. I think I would have to ensure its initialized in main thread, before any other threads are created

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++ has global constructors - you'll be fine as long as you initialize it on the same line as the declaration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll try tomorrow. I'm having a bit of flu/cold thing, I hope tomorrow my head will be a bit less mushy than right now :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing brightens your day more than reading what isocpp has to say about something, only to find "TODO: WRITE THIS UP" :)

(https://isocpp.org/wiki/faq/ctors#nifty-counter-idiom)

Anyway, either I convert this class to fully static and have to rely on atexit for cleanup, or use global instance and constructor can't be private so single instance can't be guaranteed, or I have to use this shenaniganery: https://en.wikibooks.org/wiki/More_C%2B%2B_Idioms/Nifty_Counter

What do you think ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you really want to make sure there is only a single instance, you can use a global or 'static' variable to track whether it was initialized, and assert that it is false when you enter the constructor - then set it to true. I wouldn't worry about public vs private too much in a code base that mixes C and C++.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, if somebody can confirm or deny whether support for multiple targets ( log files ) has any use here, if I drop it, I could probably completely get rid of that entire singleton problem, altogether.

common/log.h Show resolved Hide resolved
@cebtenzzre
Copy link
Collaborator

cebtenzzre commented Sep 17, 2023

Apparently std::this_thread::yield may be a bad idea, as the thread isn't indicating what it is waiting for. Maybe we should try to acquire the atomic flag (possibly retrying N times without yielding) and fall back to waiting on a condition_variable otherwise? From https://github.com/max0x7ba/atomic_queue:

People often propose limiting busy-waiting with a subsequent call to sched_yield/pthread_yield. However, sched_yield is a wrong tool for locking because it doesn't communicate to the OS kernel what the thread is waiting for, so that the OS thread scheduler can never reschedule the calling thread to resume when the shared state has changed (unless there are no other threads that can run on this CPU core, so that the caller resumes immediately). More details about sched_yield and spinlocks from Linus Torvalds.

@staviq
Copy link
Contributor Author

staviq commented Sep 17, 2023

Apparently std::this_thread::yield may be a bad idea, as the thread isn't indicating what it is waiting for.

Interesting, I'm definitely bookmarking this link :)

As a side note, this could also be greatly simplified if multiple log targets weren't needed.

I had an idea previously, which I dropped because it seemed silly, given yield "worked" so well, but maybe it will make more sense now:

Since fprintf etc, does a memcpy one way or another, what if, I use a simple function, a fprintf proxy, which copies all passed arguments, and then stores them in a queue, which at that point can be processed completely independently in a separate thread ? This would have pretty much the same cost as fprintf and even half of that for TEE calls, from the calling thread point of view, and the additional cost of another memory operations of fprintf itself would happen asynchronously.

This would also simplify thread safety, by making log actually single threaded itself, and only the queue would have to be thread safe

@cebtenzzre
Copy link
Collaborator

Btw, that repo implements a spinlock using _mm_pause on x86.
See also the Intel docs for _mm_pause.

@mofosyne mofosyne added the obsolete? Marker for potentially obsolete PR label May 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
obsolete? Marker for potentially obsolete PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants