Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

http: interpret strings as utf8 #2629

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion src/node_http_parser.cc
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,10 @@ struct StringPtr {

Local<String> ToString(Environment* env) const {
if (str_)
return OneByteString(env->isolate(), str_, size_);
return String::NewFromUtf8(env->isolate(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the question "is this more correct?", I am curious about the performance impact this may have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks would be bias in out current benchmarks. We need to set some up that have various header lengths.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious too though, aren't HTTP headers supposed to be ASCII and not unicode?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bnoordhuis gave an explanation here: #1693 (comment). Basically that most headers use US-ASCII, but traditionally ISO-8859-1 (i.e. latin-1) is allowed. latin-1 is the encoding returned by OneByteString(). So is what we currently use. Note that ASCII is a subset of latin-1.

While our current implementation is definitely faster, there's a problem with developers/companies moving from v0.10 to v4 LTS being hit by the change in functionality. Any solution outside of what we currently use will be a performance hit. How much will depend on the chosen solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So going with what Ben said, why the change to String::NewFromUtf8? Does v0.10 read UTF-8 strings here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronkorving Yes. v0.10 reads in headers as utf-8. Which is incorrect per the specification.

Thing is, if everyone followed US-ASCII then functionally we wouldn't notice a difference. It's not until someone uses the full latin-1 encoding (which is spec compliant) that they'd experience issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks.

str_,
String::kNormalString,
size_);
else
return String::Empty(env->isolate());
}
Expand Down
28 changes: 28 additions & 0 deletions test/parallel/test-http-utf-incoming.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about

'use strict';

const common = require('../common');
const assert = require('assert');
const http = require('http');

http.createServer(common.mustCall(function(req, res) {
  res.end();
  this.close();
  assert.equal(req.url, '/?тест');
  assert.equal(req.headers['x-test'], 'тест');
})).listen(common.PORT, function() {
  http.request({
    port: common.PORT,
    path: new Buffer('/?тест').toString('binary'),
    headers: {
      'x-test': new Buffer('тест').toString('binary')
    }
  }).end();
});

'use strict';
var common = require('../common');
var assert = require('assert');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use const for these. :)


var http = require('http');

var req;

process.on('exit', function() {
assert.equal(req.url, '/?тест');
assert.equal(req.headers['x-test'], 'тест');
});


http.createServer(function(req_, res) {
req = req_;
res.end();
this.close();
}).listen(common.PORT, function() {
http.request({
port: common.PORT,
path: new Buffer('/?тест').toString('binary'),
headers: {
'x-test': new Buffer('тест').toString('binary')
}
}).end();
});