Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion `parser->current_buffer_.IsEmpty()' failed #39671

Closed
lamflam opened this issue Aug 5, 2021 · 32 comments
Closed

Assertion `parser->current_buffer_.IsEmpty()' failed #39671

lamflam opened this issue Aug 5, 2021 · 32 comments
Labels
http Issues or PRs related to the http subsystem.

Comments

@lamflam
Copy link

lamflam commented Aug 5, 2021

Version

v16.6.1

Platform

(AWS EC2) - Linux 33ea5513d673 5.10.25-linuxkit #1 SMP Tue Mar 23 09:27:39 UTC 2021 x86_64 GNU/Linux

Subsystem

http_parser

What steps will reproduce the bug?

Difficult to reproduce, but it's very similar to #34016 in that I'm uploading a lot of content to S3. This error started happening when I switched from another upstream to S3 so it's interesting that both issues seems to be specific to S3. However, even if there's an issue in the AWS S3 SDK, this shouldn't be crashing the node process, but instead bubble up so it can be handled in JS.

Exception message:

node[1]: ../src/node_http_parser.cc:517:static void node::{anonymous}::Parser::Execute(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `parser->current_buffer_.IsEmpty()' failed.
1: 0xafedf0 node::Abort() [node]
2: 0xafee6e  [node]
3: 0xb1a392  [node]
4: 0xd432cb  [node]
5: 0xd4454a  [node]
6: 0xd44a26 v8::internal::Builtin_HandleApiCall(int, unsigned long*, v8::internal::Isolate*) [node]
7: 0x15c9fd9  [node]

How often does it reproduce? Is there a required condition?

Under production load, this consistently happens within a few hours of the service starting.

What is the expected behavior?

I don't know whether this is an application level bug or not, but here's what I would expect:

If an application level bug, the error is raised on the JS side so that it can be handled or debugged at that level.

If it is an node level bug then I'd expect this not to crash or raise an error at all.

What do you see instead?

The above stack trace is printed and the process immediately exits.

Additional information

No response

@gireeshpunathil
Copy link
Member

@nodejs/http

@mcollina
Copy link
Member

mcollina commented Aug 8, 2021

I would love to help but we need a way to reliably reproduce this bug.

@gireeshpunathil
Copy link
Member

I was thinking initially that if we can somehow print the current client request header this can be easily reproduced, but from the assertion error and the failing context, it looks like the parser's byte buffer is not empty in the beginning is the problem, which might lead to a problem in the previous parsing, not in the current?

@mcollina
Copy link
Member

mcollina commented Aug 8, 2021

I was thinking initially that if we can somehow print the current client request header this can be easily reproduced, but from the assertion error and the failing context, it looks like the parser's byte buffer is not empty in the beginning is the problem, which might lead to a problem in the previous parsing, not in the current?

Yes exactly. That's why we need a way to reproduce.
This is likely tied to some specific usage/configuration and possibly even timing :/.

@tniessen tniessen added the http Issues or PRs related to the http subsystem. label Aug 8, 2021
@lamflam
Copy link
Author

lamflam commented Aug 9, 2021

So far I've seen it happen in our production environment very consistently. Within a few hours of a deployment, this crash will occur and then ramp up as the requests get retried over and over. I still can't quite tell if it is the specific request, or if it's timing and the additional load for the retries just increases the chances. I've tried running some load tests locally on my machine as well as in our dev environment but so far have not been able to reproduce.

I will try to do some more testing today to see if I can come up with a reliable reproduction.

@lamflam
Copy link
Author

lamflam commented Aug 10, 2021

I spent a bit of time yesterday load testing in multiple environments and even wrote up a case to just do the upload to S3 part in a tight loop 500K+ times but have been unable to reproduce.

One interesting piece is that I was not seeing this at all in a previous version where I was uploading to a different service. The main difference between the two was that the old one used Transfer-Encoding: chunked but S3 doesn't support that so I no longer stream it through and instead buffer the contents and then upload all at once at the end.

I'v been able to workaround the issue for now by streaming the data to a proxy which then handles the upload to S3. So there's at least one data point indicating it's possible it's only an issue with sending buffered data, not streaming (of course it's still possible it's just a timing thing and purely less likely to happen in streaming mode so I haven't hit it yet).

I can't exactly purposefully reintroduce this in production to get better info, but I eventually hope to remove the workaround and just handle the crash more robustly. If/when I get to that, I will add some more logging and update.

Hopefully someone will come across this with an example that can be reliably reproduced before that.

@mcollina
Copy link
Member

Thanks! Let us know how you proceed.

@gireeshpunathil
Copy link
Member

const h = require('http')
const n = require('net')

function getRequest() {
  const l = Math.round(Math.random() * 1024 * 13)
  let chunk = ''
  const data = 'x'.repeat(l)
  const head = 'POST /chunked HTTP/1.1\r\n' +
               'Content-Type: text/plain\r\n' +
               'Transfer-Encoding: chunked\r\n\r\n'
  for(var i = 1; i < Math.round(Math.random() * 10); i++) {
    chunk += l/i + '\r\n'
    chunk += data.substring(l/i) + '\r\n'
  }
  chunk += '0\r\n'
  return Buffer.from(head + chunk)
}

const s = h.createServer((q, r) => {
  r.end('hello')
})

s.listen(12000, () => {
  setInterval(() => {
    for(var i = 0; i < 100; i++) {
      const d = n.connect({port:12000})
      d.on('error', (e) => {
        console.log(e)
      })
      d.write(getRequest())
    }
 }, 1000)
})

wrote a program from the available information on the context and tried, it does not fail in several attempts. Pasting it here for awareness, as well as incremental improvements that can be realized to make it recreate - as I have several open things in it.

  • a trivial http server
  • a tcp client, so that I can manipulate the HTTP header in my way
  • transfer encoding set to chunked
  • many clients make request with chunked payload
  • the chunk is created at random, in each request

any insights / thoughts?

@ronag
Copy link
Member

ronag commented Aug 11, 2021

I’m a little confused. Isn’t this a http client issue?

@lamflam
Copy link
Author

lamflam commented Aug 11, 2021 via email

@ronag
Copy link
Member

ronag commented Aug 11, 2021

Do you have the same problem if you use https://github.com/nodejs/undici?

@gireeshpunathil
Copy link
Member

const h = require('http')
const c = require('crypto')

const s = h.createServer((q, r) => {
  r.end('hello')
})

s.listen(12000, () => {
  setInterval(() => {
    for(var i = 0; i < 100; i++) {
      const d = h.request({port:12000, method: 'POST'}, (m) => {
        m.on('data', (n) => {})
        m.on('error', (e) => {
          console.log(e)
        })
      })
      d.on('error', (e) => {
        console.log(e)
      })
      d.end(c.randomBytes(Math.round(Math.random() *1024 * 13)))
    }
 }, 1000)
})

modified based on @lamflam 's comments; does not get any error so far.

@radford-for-smpte
Copy link

I've been able to (pretty) consistently reproduce this error using the following simple script. You'll need a bucket and a file to test with. For me it doesn't crash every single time, but almost every time (using node v14.17.5):

#!/usr/bin/env ts-node

import { Upload } from "@aws-sdk/lib-storage";
import { createReadStream } from 'fs'
import { S3Client } from "@aws-sdk/client-s3";
import crypto from 'crypto'

void (async () => {
  const config = getConfig()

  const s3Client: S3Client = new S3Client({region: "us-west-2"})

  for(let x=0; x<200; x++) {

    const parallelUpload = new Upload({
      client: s3Client,
      queueSize: 6,
      partSize: 5 * 1024 * 1024, // optional size of each part
      leavePartsOnError: false, // optional manually handle dropped parts
      params: {
        Bucket: "my-bucket",
        Key: "234343",
        Body: createReadStream("/tmp/myfile-7MB-file")
          .pipe(crypto.createHash('sha1').setEncoding('hex'))
          .on('finish', function () {
            // @ts-ignore
            console.log("Sha is ", this.read())
          })
      },
    });

    await parallelUpload.done()
  }

})().catch(e => {
  console.error(e.message || e)
  process.exit(1)
})

crashes with:

node[6088]: ../src/node_http_parser.cc:510:static void node::(anonymous namespace)::Parser::Execute(const FunctionCallbackInfo<v8::Value> &): Assertion `parser->current_buffer_.IsEmpty()' failed.

@mcollina
Copy link
Member

I am not able to reproduce at all using an adapted version of the above.
Maybe @trivikr could help to reduce the scope / remove the AWS bits.

@mingchuno
Copy link

mingchuno commented Sep 21, 2021

@radford-for-smpte
I have encounter similar issue when running code to upload data stream into AWS S3 using @aws-sdk/lib-storage Upload. It run fine almost every time in my local machine (Macbook Pro 16" Intel running node v14.17.5) with only one time I have seen the exception. But in the production env running EKS over EC2, it fail consistancly at the same point no matter the data size.

AWS ENV:

  • machine type: m5.xlarge
  • container base image tag: node:14.17.5-alpine3.14 (same node version as my local but use alpine instead. I am not sure does it matter)
  • resources (which should be more then enough for my case)
resources:
  limits:
    cpu: 1000m
    memory: 1024Mi
  requests:
    cpu: 1000m
    memory: 1024Mi

Exception:

node[1]: ../src/node_http_parser.cc:510:static void node::{anonymous}::Parser::Execute(const v8::FunctionCallbackInfo<v8::Value>&): Assertion `parser->current_buffer_.IsEmpty()' failed.

Not sure if it related to: #34016 (comment)

@mingchuno
Copy link

Update: upgraded to node:16.9.1-alpine3.14 for the container running in k8s. Same task success / fail randomly which is better then node14 (fail almost all the time in my k8s env) but still doesn't fix the issue.

@mingchuno
Copy link

Update 2: In my application, I need to fetch some JSON data from HTTP and stream upload it to S3 due to possible large chunk to use s3.putObject in one batch. I have a concurrency params in my application to set how many upload happening concurrently (default 4). When I set the concurrency to 1, it seems fix the issue in my k8s env too! I suspect there is some concurrency issue in node/http module?

@mcollina
Copy link
Member

At this point this requires somebody to invest time and money to reproduce the problem and debug why that is happening. I spent two hours trying to reproduce and I could not, so it is likely dependent on 1) cloud environment 2) the individual application.

I recommend to open a bug in that AWS SDK library and/or contact AWS support they are likely the best ones that can help with this problem.

@mingchuno
Copy link

@mcollina Issue have been created on aws-sdk
https://github.com/aws/aws-sdk-js/issues/3896

@naxxster
Copy link

Can this typescript code reproduce the error?
When I use file size of 5MB + 1, there is no assertion failure.
But if I use file size of exact 5MB, there is assertion error.

import { v4 as uuidv4 } from 'uuid';
import { Readable } from 'stream';
import { randomBytes } from 'crypto';
import { S3Client } from '@aws-sdk/client-s3';
import { Upload } from "@aws-sdk/lib-storage";
import moment from 'moment';
import { range } from 'ramda';
import { each } from 'async';

const BucketName = process.env['BUCKET'] || 'bucket-name-placeholder';

const fileSizeCandidate = [ 5, 10, 20, 40 ];

const s3Client = new S3Client({});

const getFileSize = () => fileSizeCandidate[Math.floor(Math.random() * fileSizeCandidate.length)] * 1024 * 1024 + 1;
// Following line of code raise assertion error
//const getFileSize = () => fileSizeCandidate[Math.floor(Math.random() * fileSizeCandidate.length)] * 1024 * 1024;

const randomBytesReadableStream = ({ size = Number.POSITIVE_INFINITY } = {}) => {
  let producedSize = 0;

  return new Readable({
    encoding: 'hex',
    read(readSize) {
      let shouldEnd = false;

      if ((producedSize + readSize) >= size) {
        readSize = size - producedSize;
        shouldEnd = true;
      }

      randomBytes(readSize, (error, buffer) => {
        if (error) {
          this.emit('error', error);
          return;
        }

        producedSize += readSize;
        this.push(buffer);

        if (shouldEnd) {
          this.push(null);
        }
      });
    },
  });
};

export async function handler(): Promise<void> {
  const now = moment();
  const prefix = `${now.format('YYYY-MM-DD')}/${now.format('YYYY-MM-DDTHHmmss.SSS')}-${uuidv4()}/`;

  await each(range(0, 20), async (n) => {
    const key = `${prefix}${n.toString().padStart(2, '0')}.bin`;
    const rs = randomBytesReadableStream({ size: getFileSize() });

    const upload = new Upload({
      client: s3Client,
      params: { Bucket: BucketName, Key: key, Body: rs }
    });

    upload.on('httpUploadProgress', console.log);

    await upload.done();
  });

  const indexKey = `${prefix}index.xml`;

  const items = range(0, 20).map(n => {
    return `<Item>${prefix}${n.toString().padStart(2, '0')}.bin</Item>`;
  }).join('\n');

  const upload = new Upload({
    client: s3Client,
    params: { Bucket: BucketName, Key: indexKey, Body: `<Items>${items}</Items>` }
  });
  await upload.done();
}

handler();

@gugu
Copy link

gugu commented Jan 20, 2022

node 16.13 still has this problem. I also use AWS S3 to upload data

thebengeu added a commit to supabase/storage that referenced this issue Mar 16, 2022
As a temporary mitigation for nodejs/node#39671.
@libre-man
Copy link

So is there any known workaround for this?

@mcollina
Copy link
Member

mcollina commented Apr 9, 2022

Not that I know of. I estimate reproducing and fixing this bug is a 2 to 4 weeks activity. At this time no companies have been willing to invest this time to fix this problem.

@vsosevic
Copy link

I could still reproduce this bug using '@aws-sdk/lib-storage' library, but this was not being reproduced using AWS-SDK of 2 version.
Steps to reproduce - create ~1500 of 0-length files.
In that case it would crash randomly - on 50th file, 250th file, 853d file, etc.
So this bug is not stable.

If you put some data to each file - 8KB-5MB - this bug isn't reproducible.

Surprisingly enough it's working pretty stable without any bugs with AWS-SDK 2 ver.

@ShaunB-SQ
Copy link

Getting this as well. Ended up rolling back all changes for AWS-SDK code to vs 2 code as my work around.

@ShogunPanda
Copy link
Contributor

I was finally able to reproduce this using the following snippet:

import { S3Client } from '@aws-sdk/client-s3'
import { Upload } from '@aws-sdk/lib-storage'
import { randomFillSync } from 'crypto'
import { Readable } from 'stream'
import { v4 as uuidv4 } from 'uuid'

const concurrency = parseInt(process.env['CONCURRENCY'] || 20, 10)
const BucketName = process.env['BUCKET'] || 'bucket-name-placeholder'
const fileSizeCandidate = [5]
const s3Client = new S3Client({})

// If the file size is exactly X MB, then it fail - If it is X MB + 1B, then it will fail
function getFileSize() {
  const failByte = process.env.DONT_FAIL === 'true' ? 1 : 0
  return fileSizeCandidate[Math.floor(Math.random() * fileSizeCandidate.length)] * 1024 * 1024 + failByte
}

function randomReadableStream(size = Number.POSITIVE_INFINITY) {
  const buffer = Buffer.alloc(size)
  randomFillSync(buffer)
  return Readable.from(buffer)
}

function mapRange(size, cb) {
  return Array.from(Array(size)).map((_, i) => cb(i))
}

async function handler() {
  const now = new Date().toISOString()
  const prefix = `${now.slice(0, 10)}/${now}-${uuidv4()}/`

  await Promise.all(
    mapRange(concurrency, async n => {
      const key = `${prefix}${n.toString().padStart(2, '0')}.bin`
      const rs = randomReadableStream(getFileSize())

      const upload = new Upload({
        client: s3Client,
        params: { Bucket: BucketName, Key: key, Body: rs }
      })

      upload.on('httpUploadProgress', console.log)

      await upload.done()
    })
  )

  const indexKey = `${prefix}index.xml`

  const items = mapRange(concurrency, n => {
    return `<Item>${prefix}${n.toString().padStart(2, '0')}.bin</Item>`
  }).join('\n')

  const upload = new Upload({
    client: s3Client,
    params: { Bucket: BucketName, Key: indexKey, Body: `<Items>${items}</Items>` }
  })
  await upload.done()
}

handler()

Note the concurrency matter a little. With higher concurrency it's faster to reproduce the bug but it happens almost always.
I scoped the problem to be a concurrency issue in https://github.com/nodejs/node/blob/master/lib/_http_client.js#L502.
Basically the AWS server sends first a HTTP/1.1 100 Continue as requested and then the HTTP 200 OK. If the two messages are received too closely (less than 5ms away) then the execute is somehow called on the same parser before the previous call even returns.

Being Node.js single thread I don't even know how is that possible and I also was unable to reproduce using Node on the server.

@mcollina @ronag @indutny @nodejs/http Do you have any idea on what I'm missing to consistently reproduce using Node on the server and/or how the all is possible? I'm out of idea at the moment.

@mcollina
Copy link
Member

mcollina commented Jun 5, 2022

The scenario you are describing is not possible being Node.js single threaded. However it's likely the two responses are bundled in the same data chunk and parsed within execute at the same time, synchronously. Looks like a race.

@ShogunPanda
Copy link
Contributor

I also thought it was impossible.
So in your opinion the socket is receiving a single chunk with both response and calling execute twice before the buffer can be cleaned up?

@mcollina
Copy link
Member

mcollina commented Jun 5, 2022

So in your opinion the socket is receiving a single chunk with both response and calling execute twice before the buffer can be cleaned up?

That's my theory.

@ShogunPanda
Copy link
Contributor

I see. I'll investigate tomorrow then.

F3n67u pushed a commit to F3n67u/node that referenced this issue Jun 24, 2022
PR-URL: nodejs#43369
Fixes: nodejs#39671
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
targos pushed a commit that referenced this issue Jul 12, 2022
PR-URL: #43369
Fixes: #39671
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
targos pushed a commit that referenced this issue Jul 31, 2022
PR-URL: #43369
Fixes: #39671
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
guangwong pushed a commit to noslate-project/node that referenced this issue Oct 10, 2022
PR-URL: nodejs/node#43369
Fixes: nodejs/node#39671
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
@bes
Copy link

bes commented Jan 12, 2023

I am consistently reproducing this issue (or a similar one) with node 19.4.0.
Here is a proof-of-concept: https://github.com/bes/aws-sdk-js-v3-s3-exit-0
And another issue i filed on aws js sdk: aws/aws-sdk-js-v3#4332 which finally led me here.

In my case, node js exits with code 0 and no stack trace, if I try to upload a 0-byte file to AWS S3.

@ShogunPanda @mcollina Should I file a new issue or do you want to reopen this one?

@ShogunPanda
Copy link
Contributor

Thanks @bes. Let's file a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
http Issues or PRs related to the http subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.