-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch CARBON_CHECK
to a format string API
#4285
Conversation
Also, to note, I generated almost all of the edits here with regular expressions -- I'm happy to re-do them, so if there are changes that folks would like to suggest, that shouldn't really be a problem. I used the And here are some of the relevant regexs: # Tries to find `CARBON_D?CHECK` statements with just a string
# message. This proved useful to do prior to any with arguments
# below, even though it is in some ways a generalization of that
# regular expression. See that comment for the limitations.
> sd -f ms 'CARBON_(D?CHECK)\(((?:[^()]|\((?:[^()]|\([^()]*\))*\))+)\)\s*<<\s+"((?:[^"]+|"\s*(?:<< )?")+)";' 'CARBON_$1($2, "$3");' (rg -l 'CARBON_D?CHECK')
# Tries to find `CARBON_D?CHECK` statement with one non-string
# parameter and optional prefix and suffix string parameters and
# rewrite them into format string form accepting one argument.
# Only handles a limited degree of parenthesis and complexity in
# the condition and a limited degree of template argument nesting
# in parameters. The tail can be copy/pasted to create 2-argument,
# 3-argument, and more forms. I found never found a 5-argument.
> sd -f ms 'CARBON_(D?CHECK)\(((?:[^()]|\((?:[^()]|\([^()]*\))*\))+)\)(?:\s*<< "((?:[^"]+|"\s*(?:<< )?")+)")?\s*<< ((?:[^"<;]*(?:[^"<;&&[:^space:]]|<[^<>]*>))+)(?:\s*<< "((?:[^"]+|"\s*(?:<< )?")+)")?;' 'CARBON_$1($2, "$3{0}$5", $4);' (rg -l 'CARBON_D?CHECK')
# Finds string literals continued across lines, both with streams
# in between and through string literal concatenation, and merges
# one such line break away. Run repeatedly to replace long runs.
> sd -f ms '(CARBON_D?CHECK\([^;"]+"[^"]*)"\n\s*(?:<< )?"' '$1' (rg -l 'CARBON_D?CHECK') I also used the relevant variations of these for There were several manual fixes required, but quite manageable. 10 maybe across the whole codebase. |
2451fbd
to
0415cb7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not reviewed the migration commit, but the macro changes commit LGTM.
0415cb7
to
3747530
Compare
I think this is ready for review -- or to discuss how to break up the review of the migration if its too much? |
Build failure:
|
Yeah, some new code is all. Updated and fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving. zygoloid said he'd looked at the macros, I glanced but I've focused on the migration. I saw one issue, but generally looks clean, so I think it makes sense to approve and let you merge if you're ready (particularly to avoid more skew).
I'm realizing though, a couple risks with formatv:
- There's a spot where we pass in the full sem_ir. Probably not a big memory risk, but generating the intermediate string is at least a little risky.
- The stringent formatv checking has me concerned on a different end... Because it enforces arguments, but doesn't check at compile time, and we can't really unit test CHECK-failures, we risk having typos in these that are uncaught.
- This mainly makes me wonder if we should build our own format function, particularly to validate parameters at compile time. It doesn't feel like it should be blocking, given the other concerns being addressed here.
// Render the final check string here. | ||
std::string message = llvm::formatv( | ||
"{0} failure at {1}:{2}{3}{4}{5}{6}\n", kind, file, line, | ||
llvm::StringRef(condition_str).empty() ? "" : ": ", condition_str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting the StringRef cast, why doesn't this just use StringRef parameters? I assume you have a reason, but looking at template_string.h and check_internal.h, I don't see a comment explaining the preference for const char*
over llvm::StringRef
for parameters. Can you add something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment -- the goal is to reduce codesize of the caller by using C strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where should I be looking? I don't see anything in 04dc806
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is now in check_internal.h
on line 18?
toolchain/check/handle_struct.cpp
Outdated
CARBON_CHECK(refs_id != SemIR::InstBlockId::Empty) | ||
<< "{} is handled by StructLiteral."; | ||
CARBON_CHECK(refs_id != SemIR::InstBlockId::Empty, | ||
"{} is handled by StructLiteral."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be escaped for formatv?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weirdly, this works as is... because we avoid formatv
when there are no arguments.
But your comment is worrisome, because that seems like an unfortunate implementation detail to leak in this way. I'm not really sure what the best approach is here, and interested in ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it just pass to formatv irrespective?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, if you're concerned about codesize of that, the version with no arguments could do just a thin non-templated wrapper of formatv. Just to force that the argument string gets the same treatment.
But it's probably still going to be missed, unless we get something with compile-time validation.
(either way, I don't view this as critical for this PR, but maybe worth a comment/todo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, it actually is passed to formatv
always, I misremembered. So I think we're fine here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is some initial feedback before I look at all the callsites.
// flow. | ||
// This is similar to CHECK, but is unconditional. Writing | ||
// `CARBON_FATAL("message")` is clearer than `CARBON_CHECK(false, "message") | ||
// because it avoids confusion about control flow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be mixing "clarity" (human understandability and nice error message) with compiler optimization ("confusion about control flow")?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure -- this comment text was already present, not something I was adding. Happy to have suggestions for better wording though.
common/check_internal.h
Outdated
// Pass the format string as a template parameter in optimized builds. Note that | ||
// the template arguments include commas and so require extra parentheses to use | ||
// within a macro. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I expect lots of extra parentheses when writing macros, and would just add them in all cases, even the debug case, without any specific comment.
This switches `DCHECK` and `FATAL` as well. The goal is to reduce the code size impact of these assertions so that we can keep more of them enabled. Currently, the largest cost I see from `CHECK` is not the actual check or the cold code itself, but actually the failure to inline trivial functions due to the presence of the cold code. This means that our goal isn't to reduce apparent code size in the final binary but the LLVM IR cost assessed for these routines in the inliner, which closely correlates with code size but is a bit different. As discussed in carbon-language#4283, experimentation shows that a single function call with a minimal number of arguments is the lowest cost model for these. This is easily achieved with a format-string API that internally uses `llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283. However, the check macros are *substantially* harder to make work with both format strings and streaming because they also take a condition. Also, unexpectedly, I was very successful at devising a regular expression based automated rewrite from the streaming to the format string form with only low 10s of manual fixes. This includes compacting strings broken up across lines, etc. Given how well that went, I've prepared this PR which just directly switches to the format string API and migrate everything to use it. One nice side-effect is that the format string approach ends up greatly simplifying the implementation here as well. This is ... *shockingly* effective. Parsing speeds up by more than 3% with just this change. And checking speeds up by **8%** with this change alone: ``` BM_CompileAPIFileDenseDecls<Phase::Parse>/256 86.3µs ± 1% 82.9µs ± 1% -3.94% (p=0.000 n=17+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/1024 431µs ± 1% 415µs ± 1% -3.76% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/4096 1.77ms ± 1% 1.71ms ± 1% -3.18% (p=0.000 n=18+19) BM_CompileAPIFileDenseDecls<Phase::Parse>/16384 7.44ms ± 1% 7.17ms ± 2% -3.56% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/65536 30.7ms ± 1% 29.7ms ± 1% -3.15% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Parse>/262144 131ms ± 1% 127ms ± 1% -2.81% (p=0.000 n=18+18) BM_CompileAPIFileDenseDecls<Phase::Check>/256 878µs ± 2% 800µs ± 1% -8.91% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/1024 1.88ms ± 2% 1.72ms ± 1% -8.56% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/4096 5.78ms ± 2% 5.28ms ± 1% -8.70% (p=0.000 n=20+18) BM_CompileAPIFileDenseDecls<Phase::Check>/16384 21.9ms ± 1% 20.1ms ± 1% -8.02% (p=0.000 n=18+20) BM_CompileAPIFileDenseDecls<Phase::Check>/65536 90.4ms ± 2% 83.1ms ± 1% -8.04% (p=0.000 n=19+20) BM_CompileAPIFileDenseDecls<Phase::Check>/262144 381ms ± 2% 352ms ± 1% -7.79% (p=0.000 n=19+19) ```
Co-authored-by: Richard Smith <richard@metafoo.co.uk>
1abf3aa
to
efc3773
Compare
Note, I had a bad git push earlier and so a weird temporary version I was experimenting with got put back in place. Sorry for that. I think the version now is the one I intended and working on the last batches of comments. |
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
This switches
DCHECK
andFATAL
as well.The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
CHECK
is not the actual check or the cold code itself, but actuallythe failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.
As discussed in #4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
llvm::formatv
. This PR is essentially theCHECK
version of #4283.However, the check macros are substantially harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.
One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.
This is ... shockingly effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by 8% with this change
alone: