Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: fix memory corruption issues around fs.Dir #33274

Closed
wants to merge 2 commits into from

Conversation

addaleax
Copy link
Member

@addaleax addaleax commented May 7, 2020

fs: clean up Dir.read() uv_fs_t data before calling into JS

A call into JS can schedule another operation on the same uv_dir_t.
In particular, when the handle is closed from the callback for a
directory read operation, there previously was a race condition window:

  1. A dir.read() operation is submitted to libuv
  2. The read operation is finished by libuv, calling AfterDirRead()
  3. We call into JS
  4. JS calls dir.close()
  5. libuv posts the close request to a thread in the pool
  6. The close request runs, destroying the directory handle
  7. AfterDirRead() is being exited.

Exiting the FSReqAfterScope in step 7 attempts to destroy the original
uv_fs_tfrom step 1, which now points to anuv_dir_t` that has
already been destroyed in step 5.

By forcing the FSReqAfterScope to clean up before we call into JS,
we can be sure that no other operations on the same uv_dir_t are
submitted concurrently.

This addresses issues observed when running with ASAN/valgrind.

fs: forbid concurrent operations on Dir handle

libuv does not expect concurrent operations on uv_dir_t instances,
and will gladly create memory leaks, corrupt data, or crash the
process.

This patch forbids that, and:

  • Makes sure that concurrent async operations are run sequentially
  • Throws an exception if sync operations are attempted during an
    async operation

The assumption here is that a thrown exception is preferable to
a potential hard crash.

This fully fixes flakiness from parallel/test-fs-opendir when
run under ASAN.

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

addaleax added 2 commits May 7, 2020 03:05
A call into JS can schedule another operation on the same `uv_dir_t`.
In particular, when the handle is closed from the callback for a
directory read operation, there previously was a race condition window:

1. A `dir.read()` operation is submitted to libuv
2. The read operation is finished by libuv, calling `AfterDirRead()`
3. We call into JS
4. JS calls dir.close()
5. libuv posts the close request to a thread in the pool
6. The close request runs, destroying the directory handle
7. `AfterDirRead()` is being exited.

Exiting the `FSReqAfterScope` in step 7 attempts to destroy the original
uv_fs_t` from step 1, which now points to an `uv_dir_t` that has
already been destroyed in step 5.

By forcing the `FSReqAfterScope` to clean up before we call into JS,
we can be sure that no other operations on the same `uv_dir_t` are
submitted concurrently.

This addresses issues observed when running with ASAN/valgrind.
libuv does not expect concurrent operations on `uv_dir_t` instances,
and will gladly create memory leaks, corrupt data, or crash the
process.

This patch forbids that, and:

- Makes sure that concurrent async operations are run sequentially
- Throws an exception if sync operations are attempted during an
  async operation

The assumption here is that a thrown exception is preferable to
a potential hard crash.

This fully fixes flakiness from `parallel/test-fs-opendir` when
run under ASAN.
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels May 7, 2020
@addaleax addaleax added fs Issues and PRs related to the fs subsystem / file system. and removed lib / src Issues and PRs related to general changes in the lib or src directory. labels May 7, 2020
@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented May 7, 2020

CI: https://ci.nodejs.org/job/node-test-pull-request/31197/ (:yellow_heart:)

@jasnell
Copy link
Member

jasnell commented May 7, 2020

Overall, it looks fine. I'm a bit conflicted on the new error code being specific to ERR_DIR_... although I cannot think right now of another place in core where this specific kind of error is currently relevant. Definitely not worth blocking on that but overall a more generalized error code would be better I think.

@addaleax
Copy link
Member Author

addaleax commented May 7, 2020

@jasnell I would agree, but our errors codes are too specific in general, so at least this is consistent 😬

@addaleax addaleax added the review wanted PRs that need reviews. label May 7, 2020
@addaleax
Copy link
Member Author

addaleax commented May 9, 2020

@nodejs/fs

@addaleax addaleax added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. and removed review wanted PRs that need reviews. labels May 15, 2020
@addaleax
Copy link
Member Author

Landed in 441e703...d3a8a23

@addaleax addaleax closed this May 15, 2020
addaleax added a commit that referenced this pull request May 15, 2020
A call into JS can schedule another operation on the same `uv_dir_t`.
In particular, when the handle is closed from the callback for a
directory read operation, there previously was a race condition window:

1. A `dir.read()` operation is submitted to libuv
2. The read operation is finished by libuv, calling `AfterDirRead()`
3. We call into JS
4. JS calls dir.close()
5. libuv posts the close request to a thread in the pool
6. The close request runs, destroying the directory handle
7. `AfterDirRead()` is being exited.

Exiting the `FSReqAfterScope` in step 7 attempts to destroy the original
uv_fs_t` from step 1, which now points to an `uv_dir_t` that has
already been destroyed in step 5.

By forcing the `FSReqAfterScope` to clean up before we call into JS,
we can be sure that no other operations on the same `uv_dir_t` are
submitted concurrently.

This addresses issues observed when running with ASAN/valgrind.

PR-URL: #33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
addaleax added a commit that referenced this pull request May 15, 2020
libuv does not expect concurrent operations on `uv_dir_t` instances,
and will gladly create memory leaks, corrupt data, or crash the
process.

This patch forbids that, and:

- Makes sure that concurrent async operations are run sequentially
- Throws an exception if sync operations are attempted during an
  async operation

The assumption here is that a thrown exception is preferable to
a potential hard crash.

This fully fixes flakiness from `parallel/test-fs-opendir` when
run under ASAN.

PR-URL: #33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@addaleax addaleax deleted the dir-clear-before-js branch May 15, 2020 17:37
codebytere pushed a commit that referenced this pull request May 16, 2020
A call into JS can schedule another operation on the same `uv_dir_t`.
In particular, when the handle is closed from the callback for a
directory read operation, there previously was a race condition window:

1. A `dir.read()` operation is submitted to libuv
2. The read operation is finished by libuv, calling `AfterDirRead()`
3. We call into JS
4. JS calls dir.close()
5. libuv posts the close request to a thread in the pool
6. The close request runs, destroying the directory handle
7. `AfterDirRead()` is being exited.

Exiting the `FSReqAfterScope` in step 7 attempts to destroy the original
uv_fs_t` from step 1, which now points to an `uv_dir_t` that has
already been destroyed in step 5.

By forcing the `FSReqAfterScope` to clean up before we call into JS,
we can be sure that no other operations on the same `uv_dir_t` are
submitted concurrently.

This addresses issues observed when running with ASAN/valgrind.

PR-URL: #33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
codebytere pushed a commit that referenced this pull request May 16, 2020
libuv does not expect concurrent operations on `uv_dir_t` instances,
and will gladly create memory leaks, corrupt data, or crash the
process.

This patch forbids that, and:

- Makes sure that concurrent async operations are run sequentially
- Throws an exception if sync operations are attempted during an
  async operation

The assumption here is that a thrown exception is preferable to
a potential hard crash.

This fully fixes flakiness from `parallel/test-fs-opendir` when
run under ASAN.

PR-URL: #33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@codebytere codebytere mentioned this pull request May 18, 2020
codebytere pushed a commit that referenced this pull request Jun 7, 2020
A call into JS can schedule another operation on the same `uv_dir_t`.
In particular, when the handle is closed from the callback for a
directory read operation, there previously was a race condition window:

1. A `dir.read()` operation is submitted to libuv
2. The read operation is finished by libuv, calling `AfterDirRead()`
3. We call into JS
4. JS calls dir.close()
5. libuv posts the close request to a thread in the pool
6. The close request runs, destroying the directory handle
7. `AfterDirRead()` is being exited.

Exiting the `FSReqAfterScope` in step 7 attempts to destroy the original
uv_fs_t` from step 1, which now points to an `uv_dir_t` that has
already been destroyed in step 5.

By forcing the `FSReqAfterScope` to clean up before we call into JS,
we can be sure that no other operations on the same `uv_dir_t` are
submitted concurrently.

This addresses issues observed when running with ASAN/valgrind.

PR-URL: #33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
codebytere pushed a commit that referenced this pull request Jun 7, 2020
libuv does not expect concurrent operations on `uv_dir_t` instances,
and will gladly create memory leaks, corrupt data, or crash the
process.

This patch forbids that, and:

- Makes sure that concurrent async operations are run sequentially
- Throws an exception if sync operations are attempted during an
  async operation

The assumption here is that a thrown exception is preferable to
a potential hard crash.

This fully fixes flakiness from `parallel/test-fs-opendir` when
run under ASAN.

PR-URL: #33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
codebytere pushed a commit to codebytere/node that referenced this pull request Jun 9, 2020
A call into JS can schedule another operation on the same `uv_dir_t`.
In particular, when the handle is closed from the callback for a
directory read operation, there previously was a race condition window:

1. A `dir.read()` operation is submitted to libuv
2. The read operation is finished by libuv, calling `AfterDirRead()`
3. We call into JS
4. JS calls dir.close()
5. libuv posts the close request to a thread in the pool
6. The close request runs, destroying the directory handle
7. `AfterDirRead()` is being exited.

Exiting the `FSReqAfterScope` in step 7 attempts to destroy the original
uv_fs_t` from step 1, which now points to an `uv_dir_t` that has
already been destroyed in step 5.

By forcing the `FSReqAfterScope` to clean up before we call into JS,
we can be sure that no other operations on the same `uv_dir_t` are
submitted concurrently.

This addresses issues observed when running with ASAN/valgrind.

PR-URL: nodejs#33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
codebytere pushed a commit to codebytere/node that referenced this pull request Jun 9, 2020
libuv does not expect concurrent operations on `uv_dir_t` instances,
and will gladly create memory leaks, corrupt data, or crash the
process.

This patch forbids that, and:

- Makes sure that concurrent async operations are run sequentially
- Throws an exception if sync operations are attempted during an
  async operation

The assumption here is that a thrown exception is preferable to
a potential hard crash.

This fully fixes flakiness from `parallel/test-fs-opendir` when
run under ASAN.

PR-URL: nodejs#33274
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@codebytere codebytere mentioned this pull request Jun 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. fs Issues and PRs related to the fs subsystem / file system.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants