Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated values in Sentry Baggage header cause eventual 431 "Request header fields too large" error on HTTP fetch #3709

Closed
jmo1012 opened this issue Oct 28, 2024 · 8 comments · Fixed by #3728
Assignees

Comments

@jmo1012
Copy link

jmo1012 commented Oct 28, 2024

Issue:
While polling an external API endpoint, the baggage header grows each iteration of the poll with sentry-trace_id, sentry-environment, and sentry-release repeating in each loop. After a while (if the poll does not resolve) the header eventually grows so big it errors with 431 "Request header fields too large" error on HTTP fetch.

Similar Issues

SDK Version:
sentry-sdk==2.13.0

Example:

from httpx import AsyncClient


while is_still_processing:
  async with AsyncClient() as http_client:
      http_request = http_client.build_request(
          method="GET",
          url=MY_URL,
          content=MY_BODY,
          headers={"content-type":  "application/json"}
      )
  
    http_response = await AsyncClient.send(request=http_request)
    http_response.raise_for_status()

    is_still_processing = is_successful_response(http_response)
    if not is_still_processing():
        return http_response.content
    await asyncio.sleep(0.5)

In the above example, the baggage header in the request grows with each iteration.

Note
The trace_propagation_targets is available to bypass the issue, but it doesn't fix the root cause.

The javascript lib issue was fixed by using .set() instead of .append(), so the Python lib might be resolved in a similar manner?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 28, 2024
@szokeasaurusrex
Copy link
Member

szokeasaurusrex commented Oct 29, 2024

@jmo1012 thanks for the report.

The javascript lib issue was fixed by using .set() instead of .append(), so the Python lib might be resolved in a similar manner?

Is this the issue you are referring to? If not, can you please provide the link to the issue you are referring to?

In the future, please link to any specific issues you are mentioning in your original post so that we can help you more quickly.

Never mind, I see you linked it already earlier in the issue, sorry!

@szokeasaurusrex
Copy link
Member

@jmo1012 Would you be able to provide a full minimal reproduction? I tried with the following code (based on the snippet you provided), but the baggage did not grow over time:

from httpx import AsyncClient
import asyncio
import sentry_sdk

from fastapi import FastAPI
import multiprocessing
import uvicorn

MY_URL = "http://localhost:8000/status"
MY_BODY = '{"job_id": "123"}'


def is_successful_response(response) -> bool:
    # Always return True to indicate it's still processing
    return True


def run_server():
    app = FastAPI()

    @app.get("/status")
    async def status():
        return {"status": "processing"}

    uvicorn.run(app, host="127.0.0.1", port=8000)


async def poll_until_complete():
    sentry_sdk.init(dsn=..., traces_sample_rate=1.0)  # replace with your DSN
    await asyncio.sleep(1.0)
    is_still_processing = True

    with sentry_sdk.start_transaction(name="poll_until_complete"):
        while is_still_processing:
            async with AsyncClient() as http_client:
                http_request = http_client.build_request(
                    method="GET",
                    url=MY_URL,
                    content=MY_BODY,
                    headers={"content-type": "application/json"},
                )

                http_response = await http_client.send(request=http_request)
                http_response.raise_for_status()

                is_still_processing = is_successful_response(http_response)
                if not is_still_processing:
                    return http_response.content
                await asyncio.sleep(0.1)


if __name__ == "__main__":
    # Start mock server in a child process
    server_process = multiprocessing.Process(target=run_server)
    server_process.start()

    asyncio.run(poll_until_complete())

@getsantry getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Oct 29, 2024
@jmo1012
Copy link
Author

jmo1012 commented Oct 30, 2024

@szokeasaurusrex - I am working on this for you. I will get back to you as soon as I build one.

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 30, 2024
@getsantry getsantry bot moved this to Waiting for: Community in GitHub Issues with 👀 3 Oct 31, 2024
@jmo1012
Copy link
Author

jmo1012 commented Oct 31, 2024

@szokeasaurusrex - Thank you for your patience. I was able to create a working reproduction script. I didn't realize that the key to this issue was the placement of async with AsyncClient()...

from httpx import AsyncClient
import asyncio
import sentry_sdk

from fastapi import FastAPI
import multiprocessing
import uvicorn

MY_URL = "https://catfact.ninja/fact?max_length=140"

def is_successful_response(response) -> bool:
    # Always return True to indicate it's still processing
    return True


def run_server():
    app = FastAPI()

    @app.get("/status")
    async def status():
        return {"status": "processing"}

    uvicorn.run(app, host="127.0.0.1", port=8000)


async def poll_until_complete():
    sentry_sdk.init(dsn=..., traces_sample_rate=1.0)  # replace with your DSN
    await asyncio.sleep(1.0)
    is_still_processing = True

    with sentry_sdk.start_transaction(name="poll_until_complete"):
        async with AsyncClient() as http_client:
            http_request = http_client.build_request(
                method="GET",
                url=MY_URL,
                headers={"content-type": "application/json"},
            )
            while is_still_processing:
                    http_response = await http_client.send(request=http_request)
                    http_response.raise_for_status()

                    is_still_processing = is_successful_response(http_response)
                    if not is_still_processing:
                        return http_response.content
                    await asyncio.sleep(0.1)


if __name__ == "__main__":
    # Start mock server in a child process
    server_process = multiprocessing.Process(target=run_server)
    server_process.start()

    asyncio.run(poll_until_complete())

For reference (if it is helpful) this is the requirements I used

sentry-sdk==2.13.0
fastapi==0.112.1
uvicorn[standard]==0.30.6
httpx==0.27.0

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 31, 2024
@szokeasaurusrex
Copy link
Member

Thanks @jmo1012! I will look at this next week (tomorrow is a holiday in Austria)

@jmo1012
Copy link
Author

jmo1012 commented Oct 31, 2024

Thanks @jmo1012! I will look at this next week (tomorrow is a holiday in Austria)

Happy All Saints' Day!

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 31, 2024
@szokeasaurusrex
Copy link
Member

@jmo1012 Thank you for the reproduction; I also observed the baggage growing with each request when using your code.

The problem is not actually with the with AsyncClient; rather, you are seeing the baggage grow because you are re-sending the same request, like so:

http_request = http_client.build_request(
    method="GET",
    url=MY_URL,
    headers={"content-type": "application/json"},
)
while is_still_processing:
    http_response = await http_client.send(request=http_request)  # same request each iteration
    ...

If you put the build_request inside the loop, instead, the problem goes away:

while is_still_processing:
    http_request = http_client.build_request(
        method="GET",
        url=MY_URL,
        headers={"content-type": "application/json"},
    )
    http_response = await http_client.send(request=http_request)  # new request each iteration

You are observing this behavior because we always append the Sentry baggage to any existing baggage on the request, so that we don't overwrite any existing baggage set by the user. Likely, we should instead parse the baggage and overwrite the Sentry baggage while keeping any other baggage on the request

@szokeasaurusrex szokeasaurusrex self-assigned this Nov 4, 2024
@jmo1012
Copy link
Author

jmo1012 commented Nov 4, 2024

Understood. Thank you!

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Nov 4, 2024
@getsantry getsantry bot removed the status in GitHub Issues with 👀 3 Nov 4, 2024
szokeasaurusrex added a commit that referenced this issue Nov 4, 2024
Sentry baggage will get added to an HTTPX request multiple times if the same request is repeated. To prevent this from occurring, we can strip any existing Sentry baggage before adding Sentry baggage to the request.

Fixes #3709
antonpirker added a commit that referenced this issue Nov 21, 2024
Sentry baggage will get added to an HTTPX request multiple times if the same request is repeated. To prevent this from occurring, we can strip any existing Sentry baggage before adding Sentry baggage to the request.

Fixes #3709

---------

Co-authored-by: Ivana Kellyer <ivana.kellyer@sentry.io>
Co-authored-by: Anton Pirker <anton.pirker@sentry.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants