Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undici.Request and AbortController doesn't work well #3353

Closed
fawazahmed0 opened this issue Jun 21, 2024 · 42 comments
Closed

Undici.Request and AbortController doesn't work well #3353

fawazahmed0 opened this issue Jun 21, 2024 · 42 comments
Labels
bug Something isn't working

Comments

@fawazahmed0
Copy link
Contributor

fawazahmed0 commented Jun 21, 2024

Bug Description

Sometimes Aborting Undici.Request causes whole program to stop, even though everything is wrapped around try catch.

Reproducible By

I don't have minimal reproducible example, as my code is larger in size. But Undici.Fetch doesn't have this issue.

Here is minimal reproducible example:

const undici = require('undici')

async function testUndici() {
    const controller = new AbortController()
    const {statusCode,headers,trailers,body} = await undici.request("https://github.com/fawazahmed0/tiger/releases/download/v2/pipetocsv.7z", { signal: controller.signal }).catch(console.error)
    setTimeout(()=>controller.abort(), 500)
}
testUndici()

Expected Behavior

Undici.Request and AbortController should work

Logs & Screenshots

node:events:498
      throw er; // Unhandled 'error' event
      ^
DOMException [AbortError]: This operation was aborted
    at new DOMException (node:internal/per_context/domexception:53:5)
    at AbortController.abort (node:internal/abort_controller:395:18)
    at Timeout._onTimeout (T:\xx\other scripts\undici-debug.js:7:31)
    at listOnTimeout (node:internal/timers:573:17)
    at process.processTimers (node:internal/timers:514:7)
Emitted 'error' event on BodyReadable instance at:
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:82:21)

Node.js v22.3.0

Environment

Ubuntu 24.04 & Windows 11
Node v22.3.0
Undici 6.19.1

@fawazahmed0 fawazahmed0 added the bug Something isn't working label Jun 21, 2024
@Gigioliva
Copy link
Contributor

I am having the same problem. The bug was introduced in 6.16.0. I think the problem is in this #3209. In case of onComplete we don't remove the listener from the signal

@Gigioliva
Copy link
Contributor

Adding this in api-request.js fixes the issue

Screenshot 2024-06-21 alle 21 38 51

@mcollina
Copy link
Member

@Gigioliva Would you like to send a Pull Request to address this issue? Remember to add unit tests.

@Gigioliva
Copy link
Contributor

👍🏼 I will do

@Gigioliva
Copy link
Contributor

@mcollina #3354

@ronag
Copy link
Member

ronag commented Jun 22, 2024

The problem is that you are not cleaning up / destroying / consuming the body of returned response object which keep the signal handler referenced.

    while (!response?.ok) {
        await response?.body.dump()
        response = await undici.request(link,{maxRedirections: 30, signal: controller?.signal})
    }

@ronag ronag closed this as completed Jun 22, 2024
@Gigioliva
Copy link
Contributor

@ronag In my case I was consuming the response 🤔 await result.body.text()

@ronag
Copy link
Member

ronag commented Jun 22, 2024

Not between iterations of the while loop

@Gigioliva
Copy link
Contributor

Yes. within the loop. My code is something like:

class MyClass {
  readonly #pool: Agent;

  constructor() {
    this.#pool = new Agent({ connections: MAX_CONCURRENCY });
  }

  async runFetch(urls: string[]): Promise<string[]> {
    return await Promise.all(urls.map((url) => this.doGet(url)));
  }

  private async doGet(url: string) {
    const result = await this.#pool.request({
      origin: url,
      path: "/foo",
      method: "GET",
      signal: AbortSignal.timeout(1_000),
    });
    const text = await result.body.text();
    console.log(text);
    return text;
  }
}

@ronag
Copy link
Member

ronag commented Jun 22, 2024

Do you have a repro then? Because your other repro was incorrect. Without a repro we cannot help.

@ronag ronag reopened this Jun 22, 2024
@Gigioliva
Copy link
Contributor

Yup. I can reproduce it in my tool tests. I will try to extract the test and share with you

@fawazahmed0
Copy link
Contributor Author

fawazahmed0 commented Jun 22, 2024

Here is a repro:

const undici = require('undici')

async function testUndici(link, controller) {
    let undiciResponse;
    while (!(undiciResponse?.statusCode >= 200 && undiciResponse?.statusCode < 300))
        undiciResponse = await undici.request(link, { maxRedirections: 30, signal: controller?.signal })

    sleep(500).then(() => controller.abort())

}
async function begin() {

    const controller = new AbortController()
    while (true) 
        await testUndici("https://github.com/fawazahmed0/tiger/releases/download/v2/pipetocsv.7z", controller).catch(console.error)
    
}
begin()

function sleep(ms) {
    return new Promise(resolve => setTimeout(resolve, ms));
}

Update: Removed unnecessary fetch from repro

@ronag
Copy link
Member

ronag commented Jun 22, 2024

You are not consuming the body of the undici.request response object between loop iterations. Why are you including fetch? Is the problem with request or fetch?

@Gigioliva
Copy link
Contributor

Gigioliva commented Jun 22, 2024

@ronag looking better at the code, for a specific status code (204) I don't consume the body. This however in my opinion is a bug. Why should I necessarily consume the body?

Note: adding a "useless" await res.body.text() I cannot reproduce the bug

I get the error even if I don't consume the body with a 40X... why I need to consume the body after an error?

@fawazahmed0
Copy link
Contributor Author

I agree with @Gigioliva

@ronag
Copy link
Member

ronag commented Jun 22, 2024

That's how it was designed and is clearly documented.

We can discuss that in a separate issue if you wish.

@ronag ronag closed this as completed Jun 22, 2024
@fawazahmed0
Copy link
Contributor Author

fawazahmed0 commented Jun 23, 2024

That's how it was designed and is clearly documented.

@ronag Could you point me to the documentation which says that using abort() will kill the whole node process.

I can find one example which says that using abort() will throw RequestAbortedError. In my case the error never gets caught by try catch, thus terminating whole node process.

Another repro (example copied from doc):

import { createServer } from 'http'
import { Client } from 'undici'
import { once } from 'events'

const server = createServer((request, response) => {
  response.end('Hello, World!')
}).listen()

await once(server, 'listening')

const client = new Client(`http://localhost:${server.address().port}`)
const abortController = new AbortController()

client.request({
  path: '/',
  method: 'GET',
  signal: abortController.signal
}).catch(error=>{
  console.log('Hi! Error Caught')
  console.error(error) // should print an RequestAbortedError
  client.close()
  server.close()
})
setTimeout(()=>abortController.abort(),500)

@ronag
Copy link
Member

ronag commented Jun 23, 2024

request returns a body: Stream which gets destroyed with an abort error and emits an error that you are not handling and therefore the process crashes with an unhandled exception. This is normal nodejs behavior. Feel free to open a PR if you think that needs to be explicitly documented by undici.

client.request({
  path: '/',
  method: 'GET',
  signal: abortController.signal
}).then(res=>{
  console.log('Hi! Response received')
  res.body.on('error', error=>{
    console.log('Hi! Error Caught')
    console.error(error) // prints a RequestAbortedError
    client.close()
    server.close()
  })
}).catch(error=>{
  console.log('Hi! Error Caught')
  console.error(error) // should print an RequestAbortedError
  client.close()
  server.close()
})
setTimeout(()=>abortController.abort(),500)

@Gigioliva
Copy link
Contributor

Is this considered "correct"?

const pool = new Agent()

try {
    const result = await this.#pool.request({
          origin: url.origin,
          path: url.pathname,
          method: "GET",
          signal: AbortSignal.timeout(1000),
        });

   if(result.statusCode === 204){
       // log and DON'T consume the body
   } else {
      // call await result.body.text()
   }

   // Manually close the body to avoid this bug
   result.body.emit("close");

} catch(err) {
    // Handle connection errors
}

@ronag
Copy link
Member

ronag commented Jun 23, 2024

No that is not correct. You should not be emitting events yourself.

const pool = new Agent()

try {
    const result = await this.#pool.request({
          origin: url.origin,
          path: url.pathname,
          method: "GET",
          signal: AbortSignal.timeout(1000),
        });

   if(result.statusCode === 204){
      await result.body.dump()
   } else {
      await result.body.text()
   }
} catch(err) {
    // Handle connection errors
}

@titanism
Copy link
Contributor

Any update yet on a proper fix for this?

@ronag
Copy link
Member

ronag commented Jun 24, 2024

There is nothing to fix. It works as designed.

@titanism
Copy link
Contributor

titanism commented Aug 4, 2024

This is quite odd, we are listening for the error but it is still throwing an uncaught exception.

forwardemail/nodejs-dns-over-https-tangerine@4035c48#diff-e727e4bdf3657fd1d798edcd6b099d6e092f8573cba266154583a746bba0f346R1132-R1137

@ronag any idea why?

@titanism
Copy link
Contributor

titanism commented Aug 4, 2024

Oh, we see the issue, we aren't calling dump() in one case in one of our applications.

@titanism
Copy link
Contributor

titanism commented Aug 5, 2024

No, that was not the case. This is definitely a breaking bug and should have been more carefully done with a major release.

@titanism
Copy link
Contributor

titanism commented Aug 5, 2024

@ronag we're stumped and cannot figure out why we are getting uncaught DOMException errors thrown (causing our entire process to restart)

here's the code <https://github.com/forwardemail/nodejs-dns-over-https-tangerine/blob/main/index.js

@titanism
Copy link
Contributor

titanism commented Aug 5, 2024

Could it be that throwOnError: true is not doing some sort of cleanup?

@titanism
Copy link
Contributor

titanism commented Aug 5, 2024

@ronag There is a call here which does nothing:

signal?.throwIfAborted()

function throwIfAborted (state) {
if (state.aborted) {
throw new DOMException('The operation was aborted.', 'AbortError')
}
}

That function throwIfAborted accepts an argument state and no state is being passed?

@titanism
Copy link
Contributor

titanism commented Aug 5, 2024

Nevermind, throwIfAborted was added in in Node v17.3.0 per #2507 (comment)

titanism added a commit to forwardemail/nodejs-dns-over-https-tangerine that referenced this issue Aug 5, 2024
@titanism
Copy link
Contributor

titanism commented Aug 5, 2024

I believe we've fixed the issue per #3353.

titanism added a commit to forwardemail/forwardemail.net that referenced this issue Aug 5, 2024
@uriva
Copy link

uriva commented Nov 3, 2024

@titanism I'm still seeing this issue on nodejs 23.1.0

the code in question is https://github.com/webtorrent/bittorrent-tracker/blob/master/lib/client/http-tracker.js#L147

I'm getting the DOMException from the line where the AbortController is created, that is outside the try/catch

Any idea?

Thanks

(original issue: webtorrent/webtorrent#2874)

@ggoodman
Copy link
Contributor

Hi team, we got bitten by this issue in production like others in this thread.

I think that while Readable and undici are each working as intended, the combination of the two leads to an API that is unnecessarily hostile for a user.

In our case, the issue manifested because we're holding onto a reference to clientRes to pipe into a subsequent bit of work. However, after obtaining a reference to clientRes and before calling pipeline(clientRes.body, nextConsumer, { signal }), we call signal.throwIfAborted().

This leads to a whole-program crash despite the whole sequence above being orchestrated in an async function, attempting idiomatic use of AbortSignal.

The fix, in our case is the following, which I believe is indicative of the dangerous nature of this interaction:

async function criticalPath() {
  clientRes = await performUndiciRequest(signal);
  
  // This is the fix.
  clientRes.body.on('error', () => {
    // Ignore the error so that we don't get uncaught exceptions and crash.
  });
  
  // ... do some async work
  
  signal.throwIfAborted();
  
  // ... consume the body
  
  await pipeline([clientRes.body, targetStream], { signal });
}

@titanism
Copy link
Contributor

@ggoodman how do you safely do cleanup and destroy the stream after? otherwise you will encounter 100% cpu usage, e.g. https://www.reddit.com/r/node/comments/vcxx4n/comment/ideo7s4/

@titanism
Copy link
Contributor

@kyrylkov can you elaborate on how you solved this via code snippet?

So we use undici fetch that returns WebStream readable. We then convert that readable into Node.js readable. Then we async iterate over it. Turns out that after we consume the whole readable, the stream wouldn't close for some reason. We've started explicitly destroying stream after consuming them and the problem went away. Still not sure why streams were not destroyed implicitly, because our understanding is that they should have.

@ggoodman
Copy link
Contributor

how do you safely do cleanup and destroy the stream after? otherwise you will encounter 100% cpu usage

@titanism my understanding is that the problematic behaviour with CPU usage is limited to fetch and Web streams. In our case, we're using undici as a lower-overhead http/1.1 client for some high-volume, low-latency reverse proxying.

Is there evidence that import('node:stream').Readable also suffers from this footgun?

@ronag
Copy link
Member

ronag commented Nov 12, 2024

I don't see a problem here. You are mostly mis-using the api... The signal.throwIfAborted() is not necessary.

@titanism
Copy link
Contributor

I suppose we should use node-fetch or an alternative instead, e.g. XHR due to the CPU issues.

@ronag How do you properly destroy a fetch? See that Reddit link above; we have the exact same issue that @kyrylkov had, which is using PM2 and after ~20-30 mins time, undici fetch calls seem to cause event loop to build-up and queue, due to Readable. We used v8-profiler-next and cpupro, but it's very limited in insight:

Screen Shot 2024-11-12 at 2 45 38 PM

@titanism
Copy link
Contributor

Anonymous function #994 is this:

      queueMicrotask(() => {
        stream[kConsume] = {
          type,
          stream,
          resolve,
          reject,
          length: 0,
          body: []
        }

        stream
          .on('error', function (err) {
            consumeFinish(this[kConsume], err)
          })
          .on('close', function () {
            if (this[kConsume].body !== null) {
              consumeFinish(this[kConsume], new RequestAbortedError())
            }
          })

        consumeStart(stream[kConsume])
      })

@ronag
Copy link
Member

ronag commented Nov 12, 2024

@titanism That does not seem relateed to this issue, please create a new one

@titanism
Copy link
Contributor

titanism commented Nov 12, 2024

@ggoodman
Copy link
Contributor

@ronag I think I see where you're coming from now. This appears to be the precedent for all Node.js Streams produced with a supplied signal (https://github.com/nodejs/node/blob/f17b9a4c7838f37022255f4dcee09a083686018e/lib/internal/streams/add-abort-signal.js#L41-L60). At least, that's the precedent for synchronous APIs returning streams.

I think the surprising thing here is that this is a hybrid promise / stream API. Both the initial promise for getting the Response object AND the stream are governed by the signal. I suppose that I was only expecting the initial Promise for the headers to be controlled by the signal. I was already controlling the consumption of the body stream using the signal via pipeline(). I expected any possible failure modes to result in promise rejections instead of this subtle interaction with node's emitter-based legacy.

@ronag
Copy link
Member

ronag commented Nov 13, 2024

I think the surprising thing here is that this is a hybrid promise / stream API. Both the initial promise for getting the Response object AND the stream are governed by the signal. I suppose that I was only expecting the initial Promise for the headers to be controlled by the signal. I was already controlling the consumption of the body stream using the signal via pipeline(). I expected any possible failure modes to result in promise rejections instead of this subtle interaction with node's emitter-based legacy.

@mcollina wdyt? I think it's a reasonable opinion but not sure what to do with it. Would break everyone to change it now.

SilentBot1 pushed a commit to webtorrent/bittorrent-tracker that referenced this issue Nov 29, 2024
…be caught (#539)

* use-native-fetch

* Bugfix: `res.body` should handle errors according to nodejs/undici#3353 (comment)

* Revert "use-native-fetch"

This reverts commit d654603.

* fix-quotes

* some-build-targets-return-different-output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants