Duplicated values in Sentry Baggage header cause eventual 431 "Request header fields too large" error on HTTP fetch #3709

jmo1012 · 2024-10-28T16:46:28Z

Issue:
While polling an external API endpoint, the baggage header grows each iteration of the poll with sentry-trace_id, sentry-environment, and sentry-release repeating in each loop. After a while (if the poll does not resolve) the header eventually grows so big it errors with 431 "Request header fields too large" error on HTTP fetch.

Similar Issues

Python CeleryIntegration accumulates baggage header data on retry #2916
- Similar issue, but our case is not using celery
Sentry Integration Causing '431 Request Header Fields Too Large' Error with FastAPI and Azure API Management #3526
- Similar issue, but our case is not using Azure API Management
- Failed to recognize that the reason they were too large was due to the headers being added multiple times.
Duplicated values in Sentry Baggage header cause eventual 431 "Request header fields too large" error on HTTP fetch sentry-javascript#12350
- Same issue, but this example is from javascript and our case is in python (using sentry-python)

SDK Version:
sentry-sdk==2.13.0

Example:

from httpx import AsyncClient


while is_still_processing:
  async with AsyncClient() as http_client:
      http_request = http_client.build_request(
          method="GET",
          url=MY_URL,
          content=MY_BODY,
          headers={"content-type":  "application/json"}
      )
  
    http_response = await AsyncClient.send(request=http_request)
    http_response.raise_for_status()

    is_still_processing = is_successful_response(http_response)
    if not is_still_processing():
        return http_response.content
    await asyncio.sleep(0.5)

In the above example, the baggage header in the request grows with each iteration.

Note
The trace_propagation_targets is available to bypass the issue, but it doesn't fix the root cause.

The javascript lib issue was fixed by using .set() instead of .append(), so the Python lib might be resolved in a similar manner?

The text was updated successfully, but these errors were encountered:

szokeasaurusrex · 2024-10-29T10:13:58Z

@jmo1012 thanks for the report.

The javascript lib issue was fixed by using .set() instead of .append(), so the Python lib might be resolved in a similar manner?

~~Is this the issue you are referring to? If not, can you please provide the link to the issue you are referring to?~~

~~In the future, please link to any specific issues you are mentioning in your original post so that we can help you more quickly.~~

Never mind, I see you linked it already earlier in the issue, sorry!

szokeasaurusrex · 2024-10-29T10:48:31Z

@jmo1012 Would you be able to provide a full minimal reproduction? I tried with the following code (based on the snippet you provided), but the baggage did not grow over time:

from httpx import AsyncClient
import asyncio
import sentry_sdk

from fastapi import FastAPI
import multiprocessing
import uvicorn

MY_URL = "http://localhost:8000/status"
MY_BODY = '{"job_id": "123"}'


def is_successful_response(response) -> bool:
    # Always return True to indicate it's still processing
    return True


def run_server():
    app = FastAPI()

    @app.get("/status")
    async def status():
        return {"status": "processing"}

    uvicorn.run(app, host="127.0.0.1", port=8000)


async def poll_until_complete():
    sentry_sdk.init(dsn=..., traces_sample_rate=1.0)  # replace with your DSN
    await asyncio.sleep(1.0)
    is_still_processing = True

    with sentry_sdk.start_transaction(name="poll_until_complete"):
        while is_still_processing:
            async with AsyncClient() as http_client:
                http_request = http_client.build_request(
                    method="GET",
                    url=MY_URL,
                    content=MY_BODY,
                    headers={"content-type": "application/json"},
                )

                http_response = await http_client.send(request=http_request)
                http_response.raise_for_status()

                is_still_processing = is_successful_response(http_response)
                if not is_still_processing:
                    return http_response.content
                await asyncio.sleep(0.1)


if __name__ == "__main__":
    # Start mock server in a child process
    server_process = multiprocessing.Process(target=run_server)
    server_process.start()

    asyncio.run(poll_until_complete())

jmo1012 · 2024-10-30T19:38:41Z

@szokeasaurusrex - I am working on this for you. I will get back to you as soon as I build one.

jmo1012 · 2024-10-31T16:47:25Z

@szokeasaurusrex - Thank you for your patience. I was able to create a working reproduction script. I didn't realize that the key to this issue was the placement of async with AsyncClient()...

from httpx import AsyncClient
import asyncio
import sentry_sdk

from fastapi import FastAPI
import multiprocessing
import uvicorn

MY_URL = "https://catfact.ninja/fact?max_length=140"

def is_successful_response(response) -> bool:
    # Always return True to indicate it's still processing
    return True


def run_server():
    app = FastAPI()

    @app.get("/status")
    async def status():
        return {"status": "processing"}

    uvicorn.run(app, host="127.0.0.1", port=8000)


async def poll_until_complete():
    sentry_sdk.init(dsn=..., traces_sample_rate=1.0)  # replace with your DSN
    await asyncio.sleep(1.0)
    is_still_processing = True

    with sentry_sdk.start_transaction(name="poll_until_complete"):
        async with AsyncClient() as http_client:
            http_request = http_client.build_request(
                method="GET",
                url=MY_URL,
                headers={"content-type": "application/json"},
            )
            while is_still_processing:
                    http_response = await http_client.send(request=http_request)
                    http_response.raise_for_status()

                    is_still_processing = is_successful_response(http_response)
                    if not is_still_processing:
                        return http_response.content
                    await asyncio.sleep(0.1)


if __name__ == "__main__":
    # Start mock server in a child process
    server_process = multiprocessing.Process(target=run_server)
    server_process.start()

    asyncio.run(poll_until_complete())

For reference (if it is helpful) this is the requirements I used

sentry-sdk==2.13.0
fastapi==0.112.1
uvicorn[standard]==0.30.6
httpx==0.27.0

szokeasaurusrex · 2024-10-31T17:15:55Z

Thanks @jmo1012! I will look at this next week (tomorrow is a holiday in Austria)

jmo1012 · 2024-10-31T17:42:21Z

Thanks @jmo1012! I will look at this next week (tomorrow is a holiday in Austria)

Happy All Saints' Day!

szokeasaurusrex · 2024-11-04T09:44:56Z

@jmo1012 Thank you for the reproduction; I also observed the baggage growing with each request when using your code.

The problem is not actually with the with AsyncClient; rather, you are seeing the baggage grow because you are re-sending the same request, like so:

http_request = http_client.build_request(
    method="GET",
    url=MY_URL,
    headers={"content-type": "application/json"},
)
while is_still_processing:
    http_response = await http_client.send(request=http_request)  # same request each iteration
    ...

If you put the build_request inside the loop, instead, the problem goes away:

while is_still_processing:
    http_request = http_client.build_request(
        method="GET",
        url=MY_URL,
        headers={"content-type": "application/json"},
    )
    http_response = await http_client.send(request=http_request)  # new request each iteration

You are observing this behavior because we always append the Sentry baggage to any existing baggage on the request, so that we don't overwrite any existing baggage set by the user. Likely, we should instead parse the baggage and overwrite the Sentry baggage while keeping any other baggage on the request

jmo1012 · 2024-11-04T14:54:08Z

Understood. Thank you!

Sentry baggage will get added to an HTTPX request multiple times if the same request is repeated. To prevent this from occurring, we can strip any existing Sentry baggage before adding Sentry baggage to the request. Fixes #3709

Sentry baggage will get added to an HTTPX request multiple times if the same request is repeated. To prevent this from occurring, we can strip any existing Sentry baggage before adding Sentry baggage to the request. Fixes #3709 --------- Co-authored-by: Ivana Kellyer <ivana.kellyer@sentry.io> Co-authored-by: Anton Pirker <anton.pirker@sentry.io>

getsantry bot added the Waiting for: Product Owner label Oct 28, 2024

getsantry bot added this to GitHub Issues with 👀 3 Oct 28, 2024

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 28, 2024

getsantry bot removed the Waiting for: Product Owner label Oct 29, 2024

getsantry bot removed the status in GitHub Issues with 👀 3 Oct 29, 2024

szokeasaurusrex added the Waiting for: Community label Oct 29, 2024

getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Oct 29, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Oct 30, 2024

getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 30, 2024

sl0thentr0py added Waiting for: Community and removed Waiting for: Product Owner labels Oct 31, 2024

getsantry bot removed the status in GitHub Issues with 👀 3 Oct 31, 2024

getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Oct 31, 2024

getsantry bot added Waiting for: Product Owner and removed Waiting for: Community labels Oct 31, 2024

getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 31, 2024

getsantry bot removed the Waiting for: Product Owner label Oct 31, 2024

getsantry bot removed the status in GitHub Issues with 👀 3 Oct 31, 2024

getsantry bot added the Waiting for: Product Owner label Oct 31, 2024

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 31, 2024

getsantry bot removed the Waiting for: Product Owner label Nov 4, 2024

getsantry bot removed the status in GitHub Issues with 👀 3 Nov 4, 2024

szokeasaurusrex self-assigned this Nov 4, 2024

getsantry bot added the Waiting for: Product Owner label Nov 4, 2024

getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Nov 4, 2024

szokeasaurusrex removed the Waiting for: Product Owner label Nov 4, 2024

getsantry bot removed the status in GitHub Issues with 👀 3 Nov 4, 2024

szokeasaurusrex mentioned this issue Nov 4, 2024

fix(httpx): Prevent Sentry baggage duplication #3728

Merged

antonpirker closed this as completed in #3728 Nov 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Duplicated values in Sentry Baggage header cause eventual 431 "Request header fields too large" error on HTTP fetch #3709

Duplicated values in Sentry Baggage header cause eventual 431 "Request header fields too large" error on HTTP fetch #3709

jmo1012 commented Oct 28, 2024

szokeasaurusrex commented Oct 29, 2024 •

edited

Loading

szokeasaurusrex commented Oct 29, 2024

jmo1012 commented Oct 30, 2024

jmo1012 commented Oct 31, 2024

szokeasaurusrex commented Oct 31, 2024

jmo1012 commented Oct 31, 2024

szokeasaurusrex commented Nov 4, 2024

jmo1012 commented Nov 4, 2024

Duplicated values in Sentry Baggage header cause eventual 431 "Request header fields too large" error on HTTP fetch #3709

Duplicated values in Sentry Baggage header cause eventual 431 "Request header fields too large" error on HTTP fetch #3709

Comments

jmo1012 commented Oct 28, 2024

szokeasaurusrex commented Oct 29, 2024 • edited Loading

szokeasaurusrex commented Oct 29, 2024

jmo1012 commented Oct 30, 2024

jmo1012 commented Oct 31, 2024

szokeasaurusrex commented Oct 31, 2024

jmo1012 commented Oct 31, 2024

szokeasaurusrex commented Nov 4, 2024

jmo1012 commented Nov 4, 2024

szokeasaurusrex commented Oct 29, 2024 •

edited

Loading