Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch CARBON_CHECK to a format string API #4285

Merged
merged 11 commits into from
Sep 12, 2024

Conversation

chandlerc
Copy link
Contributor

@chandlerc chandlerc commented Sep 9, 2024

This switches DCHECK and FATAL as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
CHECK is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in #4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
llvm::formatv. This PR is essentially the CHECK version of #4283.

However, the check macros are substantially harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.

This is ... shockingly effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by 8% with this change
alone:

BM_CompileAPIFileDenseDecls<Phase::Parse>/256      86.3µs ± 1%  82.9µs ± 1%  -3.94%  (p=0.000 n=17+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      431µs ± 1%   415µs ± 1%  -3.76%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.77ms ± 1%  1.71ms ± 1%  -3.18%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.44ms ± 1%  7.17ms ± 2%  -3.56%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    30.7ms ± 1%  29.7ms ± 1%  -3.15%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    131ms ± 1%   127ms ± 1%  -2.81%  (p=0.000 n=18+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       878µs ± 2%   800µs ± 1%  -8.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.88ms ± 2%  1.72ms ± 1%  -8.56%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.78ms ± 2%  5.28ms ± 1%  -8.70%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21.9ms ± 1%  20.1ms ± 1%  -8.02%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    90.4ms ± 2%  83.1ms ± 1%  -8.04%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    381ms ± 2%   352ms ± 1%  -7.79%  (p=0.000 n=19+19)

@github-actions github-actions bot added explorer Action items related to Carbon explorer code toolchain labels Sep 9, 2024
@github-actions github-actions bot requested review from jonmeow and josh11b September 9, 2024 00:57
@chandlerc
Copy link
Contributor Author

Also, to note, I generated almost all of the edits here with regular expressions -- I'm happy to re-do them, so if there are changes that folks would like to suggest, that shouldn't really be a problem.

I used the sd tool: https://github.com/chmln/sd

And here are some of the relevant regexs:

# Tries to find `CARBON_D?CHECK` statements with just a string
# message. This proved useful to do prior to any with arguments
# below, even though it is in some ways a generalization of that
# regular expression. See that comment for the limitations.
> sd -f ms 'CARBON_(D?CHECK)\(((?:[^()]|\((?:[^()]|\([^()]*\))*\))+)\)\s*<<\s+"((?:[^"]+|"\s*(?:<< )?")+)";' 'CARBON_$1($2, "$3");' (rg -l 'CARBON_D?CHECK')

# Tries to find `CARBON_D?CHECK` statement with one non-string
# parameter and optional prefix and suffix string parameters and
# rewrite them into format string form accepting one argument.
# Only handles a limited degree of parenthesis and complexity in
# the condition and a limited degree of template argument nesting
# in parameters. The tail can be copy/pasted to create 2-argument,
# 3-argument, and more forms. I found never found a 5-argument.
> sd -f ms 'CARBON_(D?CHECK)\(((?:[^()]|\((?:[^()]|\([^()]*\))*\))+)\)(?:\s*<< "((?:[^"]+|"\s*(?:<< )?")+)")?\s*<< ((?:[^"<;]*(?:[^"<;&&[:^space:]]|<[^<>]*>))+)(?:\s*<< "((?:[^"]+|"\s*(?:<< )?")+)")?;' 'CARBON_$1($2, "$3{0}$5", $4);' (rg -l 'CARBON_D?CHECK')

# Finds string literals continued across lines, both with streams
# in between and through string literal concatenation, and merges
# one such line break away. Run repeatedly to replace long runs.
> sd -f ms '(CARBON_D?CHECK\([^;"]+"[^"]*)"\n\s*(?:<< )?"' '$1' (rg -l 'CARBON_D?CHECK')

I also used the relevant variations of these for CARBON_FATAL, etc.

There were several manual fixes required, but quite manageable. 10 maybe across the whole codebase.

Copy link
Contributor

@zygoloid zygoloid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not reviewed the migration commit, but the macro changes commit LGTM.

@chandlerc
Copy link
Contributor Author

I think this is ready for review -- or to discuss how to break up the review of the migration if its too much?

@jonmeow
Copy link
Contributor

jonmeow commented Sep 11, 2024

Build failure:

toolchain/parse/handle_requirement.cpp:68:22: error: invalid operands to binary expression ('void' and 'const char[49]')
      CARBON_FATAL() << "Unexpected token kind for requirement operator: "
      ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@chandlerc
Copy link
Contributor Author

Build failure:

toolchain/parse/handle_requirement.cpp:68:22: error: invalid operands to binary expression ('void' and 'const char[49]')
      CARBON_FATAL() << "Unexpected token kind for requirement operator: "
      ~~~~~~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Yeah, some new code is all. Updated and fixed.

Copy link
Contributor

@jonmeow jonmeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. zygoloid said he'd looked at the macros, I glanced but I've focused on the migration. I saw one issue, but generally looks clean, so I think it makes sense to approve and let you merge if you're ready (particularly to avoid more skew).

I'm realizing though, a couple risks with formatv:

  • There's a spot where we pass in the full sem_ir. Probably not a big memory risk, but generating the intermediate string is at least a little risky.
  • The stringent formatv checking has me concerned on a different end... Because it enforces arguments, but doesn't check at compile time, and we can't really unit test CHECK-failures, we risk having typos in these that are uncaught.
    • This mainly makes me wonder if we should build our own format function, particularly to validate parameters at compile time. It doesn't feel like it should be blocking, given the other concerns being addressed here.

// Render the final check string here.
std::string message = llvm::formatv(
"{0} failure at {1}:{2}{3}{4}{5}{6}\n", kind, file, line,
llvm::StringRef(condition_str).empty() ? "" : ": ", condition_str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting the StringRef cast, why doesn't this just use StringRef parameters? I assume you have a reason, but looking at template_string.h and check_internal.h, I don't see a comment explaining the preference for const char* over llvm::StringRef for parameters. Can you add something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment -- the goal is to reduce codesize of the caller by using C strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should I be looking? I don't see anything in 04dc806

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is now in check_internal.h on line 18?

CARBON_CHECK(refs_id != SemIR::InstBlockId::Empty)
<< "{} is handled by StructLiteral.";
CARBON_CHECK(refs_id != SemIR::InstBlockId::Empty,
"{} is handled by StructLiteral.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be escaped for formatv?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weirdly, this works as is... because we avoid formatv when there are no arguments.

But your comment is worrisome, because that seems like an unfortunate implementation detail to leak in this way. I'm not really sure what the best approach is here, and interested in ideas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it just pass to formatv irrespective?

Copy link
Contributor

@jonmeow jonmeow Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, if you're concerned about codesize of that, the version with no arguments could do just a thin non-templated wrapper of formatv. Just to force that the argument string gets the same treatment.

But it's probably still going to be missed, unless we get something with compile-time validation.

(either way, I don't view this as critical for this PR, but maybe worth a comment/todo)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it actually is passed to formatv always, I misremembered. So I think we're fine here.

Copy link
Contributor

@josh11b josh11b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is some initial feedback before I look at all the callsites.

// flow.
// This is similar to CHECK, but is unconditional. Writing
// `CARBON_FATAL("message")` is clearer than `CARBON_CHECK(false, "message")
// because it avoids confusion about control flow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be mixing "clarity" (human understandability and nice error message) with compiler optimization ("confusion about control flow")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure -- this comment text was already present, not something I was adding. Happy to have suggestions for better wording though.

Comment on lines 78 to 80
// Pass the format string as a template parameter in optimized builds. Note that
// the template arguments include commas and so require extra parentheses to use
// within a macro.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I expect lots of extra parentheses when writing macros, and would just add them in all cases, even the debug case, without any specific comment.

chandlerc and others added 4 commits September 12, 2024 00:01
This switches `DCHECK` and `FATAL` as well.

The goal is to reduce the code size impact of these assertions so that
we can keep more of them enabled. Currently, the largest cost I see from
`CHECK` is not the actual check or the cold code itself, but actually
the failure to inline trivial functions due to the presence of the cold
code. This means that our goal isn't to reduce apparent code size in the
final binary but the LLVM IR cost assessed for these routines in the
inliner, which closely correlates with code size but is a bit different.

As discussed in carbon-language#4283, experimentation shows that a single function call
with a minimal number of arguments is the lowest cost model for these.
This is easily achieved with a format-string API that internally uses
`llvm::formatv`. This PR is essentially the `CHECK` version of carbon-language#4283.

However, the check macros are *substantially* harder to make work with
both format strings and streaming because they also take a condition.
Also, unexpectedly, I was very successful at devising a regular
expression based automated rewrite from the streaming to the format
string form with only low 10s of manual fixes. This includes compacting
strings broken up across lines, etc. Given how well that went, I've
prepared this PR which just directly switches to the format string API
and migrate everything to use it.

One nice side-effect is that the format string approach ends up greatly
simplifying the implementation here as well.

This is ... *shockingly* effective. Parsing speeds up by more than 3%
with just this change. And checking speeds up by **8%** with this change
alone:
```
BM_CompileAPIFileDenseDecls<Phase::Parse>/256      86.3µs ± 1%  82.9µs ± 1%  -3.94%  (p=0.000 n=17+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/1024      431µs ± 1%   415µs ± 1%  -3.76%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/4096     1.77ms ± 1%  1.71ms ± 1%  -3.18%  (p=0.000 n=18+19)
BM_CompileAPIFileDenseDecls<Phase::Parse>/16384    7.44ms ± 1%  7.17ms ± 2%  -3.56%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/65536    30.7ms ± 1%  29.7ms ± 1%  -3.15%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Parse>/262144    131ms ± 1%   127ms ± 1%  -2.81%  (p=0.000 n=18+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/256       878µs ± 2%   800µs ± 1%  -8.91%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/1024     1.88ms ± 2%  1.72ms ± 1%  -8.56%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/4096     5.78ms ± 2%  5.28ms ± 1%  -8.70%  (p=0.000 n=20+18)
BM_CompileAPIFileDenseDecls<Phase::Check>/16384    21.9ms ± 1%  20.1ms ± 1%  -8.02%  (p=0.000 n=18+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/65536    90.4ms ± 2%  83.1ms ± 1%  -8.04%  (p=0.000 n=19+20)
BM_CompileAPIFileDenseDecls<Phase::Check>/262144    381ms ± 2%   352ms ± 1%  -7.79%  (p=0.000 n=19+19)
```
Co-authored-by: Richard Smith <richard@metafoo.co.uk>
@chandlerc
Copy link
Contributor Author

Note, I had a bad git push earlier and so a weird temporary version I was experimenting with got put back in place. Sorry for that. I think the version now is the one I intended and working on the last batches of comments.

chandlerc and others added 2 commits September 11, 2024 15:30
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
chandlerc and others added 2 commits September 11, 2024 15:54
Co-authored-by: josh11b <15258583+josh11b@users.noreply.github.com>
@chandlerc chandlerc added this pull request to the merge queue Sep 12, 2024
Merged via the queue into carbon-language:trunk with commit 4845f40 Sep 12, 2024
8 checks passed
@chandlerc chandlerc deleted the check-shrink branch September 12, 2024 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
explorer Action items related to Carbon explorer code toolchain
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants