Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for -XX:+HeapDumpOnOutOfMemoryError #7012

Merged
merged 4 commits into from
Aug 5, 2023

Conversation

galderz
Copy link
Contributor

@galderz galderz commented Jul 18, 2023

Closes #4641.

Implements support for -XX:+HeapDumpOnOutOfMemoryError flag.

I've made some small changes to the HeapDumpOperation to be able to use it for this use case. The key thing is that all code in the operation has to be allocation free, which meant tweaking the error reporting method to not allocate. After discussing with @christianhaeubl, a new set of open and create methods have been added that take a CCharPointer. On startup, if -XX:+HeapDumpOnOutOfMemoryError is passed, the filename is pre-calculated and stored temporarily in case it's needed. Then, when OOME hits, the CCharPointer is used to create the file descriptor and pass it to the native VM operation.

To use -XX:+HeapDumpOnOutOfMemoryError, native image has to be built with --enable-monitoring=heapdump.

I will push the documentation updates shortly.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jul 18, 2023
@galderz galderz marked this pull request as draft July 18, 2023 13:14
@galderz
Copy link
Contributor Author

galderz commented Jul 18, 2023

Pushed a couple of commits:

  • Added log message similar to the HotSpot on when heap is dumped on OOME.
  • Added documentation for feature.

This PR is ready for review now. @christianhaeubl can you have a look? @zakkak can you make this PR not a draft? I'm not able to do it, nor can I add reviewers...etc.

@galderz
Copy link
Contributor Author

galderz commented Jul 18, 2023

By the way, the style errors reported are unrelated to the code changes in this PR.

@zakkak
Copy link
Collaborator

zakkak commented Jul 20, 2023

@zakkak can you make this PR not a draft? I'm not able to do it, nor can I add reviewers...etc.

I can't do it either, It says:

Only those with [write access](https://docs.github.com/articles/what-are-the-different-access-permissions) to this repository can mark a draft pull request as ready for review.

@fniephaus fniephaus marked this pull request as ready for review July 20, 2023 13:27
@fniephaus
Copy link
Member

unblocked the PR :)

@galderz
Copy link
Contributor Author

galderz commented Jul 20, 2023

Thanks @fniephaus.

Having had a day to think about this, there's one doubt I have lingering, which is what to do if the heap dump file already exists. As it stands, any existing heap dumps would be overwritten, but I'm unsure if more than one OOME can be fired. In HotSpot you can have code that catches and ignores and so you'll get more OOMEs fired, but in SVM, an OOME does not bubble up all the way to the application. The main thread simply halts and the program finishes.

In any case, assuming multiple OOMEs can be fired, what HotSpot does is append a number to the file, but in SVM, the code that fires the OOME can't create new file names. So, you would have to pre-allocate a number of files in advance in case they're needed, which is a bit clunky. E.g. how many would you pre-allocate?

Side note: I've not been able to trigger HotSpot to generate multiple heap dump files on OOME, even after catching the OOME and continue trying to allocate objects. Multiple OOMEs are fired but only one dump file.

@galderz
Copy link
Contributor Author

galderz commented Jul 20, 2023

Can someone add the redhat-interest label? @zakkak @roberttoyonaga can you do that?

@fniephaus
Copy link
Member

In any case, assuming multiple OOMEs can be fired, what HotSpot does is append a number to the file, but in SVM, the code that fires the OOME can't create new file names. So, you would have to pre-allocate a number of files in advance in case they're needed, which is a bit clunky. E.g. how many would you pre-allocate?

Why can't the code that fires the OOME create new file names? If it's because the code is not allowed to allocate managed memory, I'm sure @christianhaeubl can show you how to do this with native/unmanaged memory. Or am I missing something?

Side note: I've not been able to trigger HotSpot to generate multiple heap dump files on OOME, even after catching the OOME and continue trying to allocate objects. Multiple OOMEs are fired but only one dump file.

Me neither, but maybe possible if you're running multiple threads and they all see OOMEs? Regardless, I'd argue we should stay as close to the HotSpot impl as possible.

@galderz
Copy link
Contributor Author

galderz commented Jul 26, 2023

@fniephaus Actually, I misread the HotSpot code. Appending a number to the heap dump file is only done when multiple heap dumps are produced in the case of VM errors other than OOME, e.g.

// Called by error reporting by a single Java thread outside of a JVM safepoint,
// or by heap dumping by the VM thread during a (GC) safepoint. Thus, these various
// callers are strictly serialized and guaranteed not to interfere below. For more
// general use, however, this method will need modification to prevent
// inteference when updating the static variables base_path and dump_file_seq below.
void HeapDumper::dump_heap() {
  HeapDumper::dump_heap(false);
}

The false there means the heap dump is not an OOME. This is something independent of this PR.

What is relevant to this PR, which I have only realised when I tried OOMEs with multiple threads, is that if multiple threads reach the same point, only one heap dump is produced:

void report_java_out_of_memory(const char* message) {
  static int out_of_memory_reported = 0;

  // A number of threads may attempt to report OutOfMemoryError at around the
  // same time. To avoid dumping the heap or executing the data collection
  // commands multiple times we just do it once when the first threads reports
  // the error.
  if (Atomic::cmpxchg(&out_of_memory_reported, 0, 1) == 0) {
    // create heap dump before OnOutOfMemoryError commands are executed
    if (HeapDumpOnOutOfMemoryError) {
      tty->print_cr("java.lang.OutOfMemoryError: %s", message);
      HeapDumper::dump_heap_from_oome();
    }

    if (OnOutOfMemoryError && OnOutOfMemoryError[0]) {
      VMError::report_java_out_of_memory(message);
    }

    if (CrashOnOutOfMemoryError) {
      tty->print_cr("Aborting due to java.lang.OutOfMemoryError: %s", message);
      report_fatal(OOM_JAVA_HEAP_FATAL, __FILE__, __LINE__, "OutOfMemory encountered: %s", message);
    }

    if (ExitOnOutOfMemoryError) {
      tty->print_cr("Terminating due to java.lang.OutOfMemoryError: %s", message);
      os::_exit(3); // quick exit with no cleanup hooks run
    }
  }
}

I need to add the necessary protection so that only one heap dump is produced. I'll add that next.

@galderz galderz force-pushed the topic.0714.heapdump-oome branch from e61f461 to 9a8c441 Compare July 26, 2023 15:33
@galderz
Copy link
Contributor Author

galderz commented Jul 26, 2023

Added support for only generating one heap dump on OOME. Also fixed the conflict.

@christianhaeubl
Copy link
Member

Thanks for the PR, I did a bunch of changes and fixes on top (see #7088). Feel free to review and comment the changes that I pushed. It might take a few more days until this PR gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
native-image OCA Verified All contributors have signed the Oracle Contributor Agreement. redhat-interest
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[GR-38994] Add support for -XX:+HeapDumpOnOutOfMemoryError
6 participants