Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default fetch request headers differ between Cloudflare Workers and Miniflare #139

Closed
dfcowell opened this issue Jan 2, 2022 · 6 comments
Labels
behaviour mismatch Different behaviour to Workers runtime fixed-in-next Fixed in next release

Comments

@dfcowell
Copy link

dfcowell commented Jan 2, 2022

When making a fetch request to an external service from a Worker, certain request headers are populated by default. The Workers runtime and miniflare send different default headers, meaning that when calling some external APIs (e.g. Github's which requires a User-Agent header for all requests!) code that works in miniflare will fail when running on the Workers runtime.

To validate this, I set up a simple echo service running on https://http-ua-proof-of-concept.dancowell.workers.dev/ which will return a structured JSON object containing request data.

Executing the following worker in each context results in different output:

export async function handleRequest(request: Request, env: Bindings) {
  const response = await fetch(
    "https://http-ua-proof-of-concept.dancowell.workers.dev/"
  );

  const body = await response.text();

  return new Response(body);
}

const worker: ExportedHandler<Bindings> = { fetch: handleRequest };

export default worker;

Request from miniflare

{
  "method": "GET",
  "url": "https://http-ua-proof-of-concept.dancowell.workers.dev/",
  "body": "",
  "headers": {
    "accept": "*/*",
    "accept-encoding": "gzip",
    "accept-language": "*",
    "cf-connecting-ip": "126.124.253.198",
    "cf-ipcountry": "JP",
    "cf-ray": "6c6fcc948c918d01",
    "cf-visitor": "{\"scheme\":\"https\"}",
    "connection": "Keep-Alive",
    "host": "http-ua-proof-of-concept.dancowell.workers.dev",
    "sec-fetch-mode": "cors",
    "user-agent": "undici",
    "x-forwarded-proto": "https",
    "x-real-ip": "126.124.253.198"
  }
}

Request from Cloudflare

{
  "method": "GET",
  "url": "https://http-ua-proof-of-concept.dancowell.workers.dev/",
  "body": "",
  "headers": {
    "accept-encoding": "gzip",
    "cf-connecting-ip": "35.224.27.5",
    "cf-ipcountry": "US",
    "cf-ray": "6c6fdbd42d216318",
    "cf-visitor": "{\"scheme\":\"https\"}",
    "connection": "Keep-Alive",
    "host": "http-ua-proof-of-concept.dancowell.workers.dev",
    "x-forwarded-proto": "https",
    "x-real-ip": "35.224.27.5"
  }
}

Based on this output, miniflare sends 4 additional headers by default: accept, accept-language, sec-fetch-mode, user-agent

I would expect that miniflare's default request headers should match that of the workers runtime.

@mrbbot
Copy link
Contributor

mrbbot commented Jan 2, 2022

Hey! 👋 I think this is going to be difficult to fix. As you can see from the User-Agent, Miniflare uses undici for its fetch implementation, which aims to be as spec-compliant as possible. These are the 3 sections that add those headers:

https://github.com/nodejs/undici/blob/2158dd84c4c6a468eed317fc39e00d797e8d51f5/lib/fetch/index.js#L418-L444

https://github.com/nodejs/undici/blob/2158dd84c4c6a468eed317fc39e00d797e8d51f5/lib/fetch/index.js#L1125-L1127

https://github.com/nodejs/undici/blob/1f22c1711e58f2f5389357949b90df7ee06568ba/lib/fetch/util.js#L197

For accept, accept-language, and user-agent, we could add empty strings for the corresponding headers by default, but they'd still be sent, which still isn't the Cloudflare Behaviour. It doesn't look like we can do anything about sec-fetch-mode at all though.

Not sure what to do here, we could fork undici, but ideally I don't want to do that.

@mrbbot mrbbot added the dependency Issue in dependency label Jan 2, 2022
@dfcowell
Copy link
Author

dfcowell commented Jan 2, 2022

I'm going to speculate for a minute here. I'm guessing that the desire to use undici comes from that very same spec-compliance - it doesn't make sense for the Miniflare project to maintain its own fetch implementation with all the classes that come along with that.

Forking undici brings on that very same maintenance burden, with extra steps (since you also have to incorporate upstream changes into your fork.) Therefore it's not really a great option.

This might be a terrible idea, but what if Miniflare ran its own lightweight proxy endpoint that stripped the offending headers from outbound requests before forwarding to the intended destination, and in this shim fetch function rewrote the URI passed into fetch to send it via that proxy?

You would keep the benefits of having a spec-compatible internal API, while also being compliant with the CF implementation. There would be a small maintenance cost for the proxy, but I imagine it would probably be substantially lower compared to maintaining a fork, particularly considering the very specific transform required.

There would be some hoops to jump through when it comes to telling the proxy what to strip, but I imagine something like this would do the trick:

fetch('http://example.com', {
  headers: {
    'accept-language': 'en'
  }
});

Proxy request: GET 127.0.0.1:1500?keep=accept-language&url=http%3A%2F%2Fexample.com
                         ^          ^                   ^
                         │          │                   └ Original URL
                         │          └ The proxy will strip all headers except these
                         │            (We know what to keep since we control the shim)
                         └ Miniflare's proxy

@mrbbot
Copy link
Contributor

mrbbot commented Jan 2, 2022

I'm going to speculate for a minute here. I'm guessing that the desire to use undici comes from that very same spec-compliance - it doesn't make sense for the Miniflare project to maintain its own fetch implementation with all the classes that come along with that.

Exactly 👍

That's a sneaky idea with the proxy.

I'm actually wondering if we could do this with a custom undici Dispatcher? undici.request lets you specify a custom one, but it looks like fetch doesn't (probably for spec-compliance 🙃). It looks like Dispatcher has a request instance method but is missing a fetch so maybe this is coming? We could maybe use the setGlobalDispatcher method, but I'm not sure about modifying globals like this.

@dfcowell
Copy link
Author

dfcowell commented Jan 2, 2022

Modifying the global dispatcher smells really bad to me. If anything else in Miniflare uses undici in the future (even indirectly) it would be quite difficult without context of this issue to understand why those requests are getting mangled.

mrbbot added a commit that referenced this issue Jan 4, 2022
Specifically, remove `Accept`, `Accept-Language`, `Sec-Fetch-Mode`
and `User-Agent`, unless they are explicitly added by the user.
`User-Agent` is particularly important to remove as many APIs (e.g.
GitHub), require it to be set, meaning code was working locally but
not when deployed.

The implementation of this relies on some internal `undici` exports.
We were already using some internal symbols for accessing hidden
`Request`/`Response` state, so this isn't too bad. The version of
`undici` is now pinned to an exact version though, so if internal
changes are made, they shouldn't affect end-users. We'll have to
explicitly upgrade and run our tests again.
@mrbbot mrbbot added behaviour mismatch Different behaviour to Workers runtime fixed-in-next Fixed in next release and removed dependency Issue in dependency labels Jan 4, 2022
@mrbbot
Copy link
Contributor

mrbbot commented Jan 4, 2022

Hey! 👋 I noticed undici's fetch implementation does actually allow you to pass a custom Dispatcher, but the exported function always uses the default. I've switched to importing the implementation directly, and use a custom Dispatcher to remove the headers. We were already using internal imports like this, but just to be extra safe, I've pinned undici to an exact version too. When we need to upgrade, we can run all the tests again and update code as needed. 👍

Will be fixed in the next release. 🙂

mrbbot added a commit that referenced this issue Jan 4, 2022
Specifically, remove `Accept`, `Accept-Language`, `Sec-Fetch-Mode`
and `User-Agent`, unless they are explicitly added by the user.
`User-Agent` is particularly important to remove as many APIs (e.g.
GitHub), require it to be set, meaning code was working locally but
not when deployed.

The implementation of this relies on some internal `undici` exports.
We were already using some internal symbols for accessing hidden
`Request`/`Response` state, so this isn't too bad. The version of
`undici` is now pinned to an exact version though, so if internal
changes are made, they shouldn't affect end-users. We'll have to
explicitly upgrade and run our tests again.
@mrbbot mrbbot closed this as completed in d99015a Jan 7, 2022
@mrbbot
Copy link
Contributor

mrbbot commented Jan 7, 2022

Hey! 👋 Miniflare 2.0.0 has just been released, which includes this fix. You can find the changes since 2.0.0-rc.5 at the end of the changelog and install it with npm i miniflare -D. Please let me know if you have any other issues, and feel free to ask questions in the #miniflare channel of the Cloudflare Workers Discord server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
behaviour mismatch Different behaviour to Workers runtime fixed-in-next Fixed in next release
Projects
None yet
Development

No branches or pull requests

2 participants