-
Notifications
You must be signed in to change notification settings - Fork 30.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark: http/simple is unreliable #8139
benchmark: http/simple is unreliable #8139
Comments
/cc @AndreasMadsen |
You should upload the raw ( I got 1.655% (and 1.669% unbiased) on
here is the raw data (collected with note that I used I must admit I'm not very familiar with the coefficient of variation and had to read up on it. However remember that the coefficient of variation is invariant to the number of observations, which greatly affects the standard deviation of the sample mean. I'm curious, do you have any source on the "> 1-2% is so unreliable to be worthless", or is it completely your own opinion? Personally I have always used confidence intervals instead (though it not completely compatible). In this case I think the confidence interval is reasonable, though it is easier to comment on when one has done some actual changes. |
I don't have it, I scraped the numbers from the screen output. Here they are if you want them: (6129.01, 6273.31, 6081.69, 6606.24, 4287.8, 6481.24, 6126.81, 6326.17, 5757.36, 6343.68) # http/simple.js c=50 chunks=0 length=4 type="bytes"
(4825.36, 4892.82, 4745.18, 5210.71, 4141.06, 4740.05, 4723.65, 4705.13, 4632.6, 4450.33) # http/simple.js c=500 chunks=0 length=4 type="bytes"
(5387.29, 5432.56, 5581.44, 5951.68, 5284.43, 6043.86, 5618.86, 6103.43, 5969.04, 5457.41) # http/simple.js c=50 chunks=1 length=4 type="bytes"
(4345.28, 4619.93, 4577.87, 4433.45, 4219.12, 4456.08, 4700.55, 4528.67, 4214.6, 4256.32) # http/simple.js c=500 chunks=1 length=4 type="bytes"
(1131.76, 1162.73, 1170.54, 1155.69, 1163.2, 1170.81, 1157.91, 1159.79, 1169.75, 1135.27) # http/simple.js c=50 chunks=4 length=4 type="bytes"
(2541.06, 2796.41, 2702.26, 2919.21, 2553.31, 2926.16, 2821.81, 2876.22, 2850.04, 2985.13) # http/simple.js c=500 chunks=4 length=4 type="bytes"
(6082.11, 6247.37, 5885.72, 5948.79, 5321.72, 6403.99, 6078.45, 5628.47, 5651.41, 6261.07) # http/simple.js c=50 chunks=0 length=1024 type="bytes"
(4743.77, 4143.62, 5279.55, 4741.32, 4481.43, 4167.41, 4573.96, 4629.41, 5155.01, 4568.6) # http/simple.js c=500 chunks=0 length=1024 type="bytes"
(5106.41, 5566.92, 5262.0, 6164.03, 5726.01, 5374.95, 5310.74, 5311.06, 5355.57, 5767.94) # http/simple.js c=50 chunks=1 length=1024 type="bytes"
(4414.44, 3608.63, 4828.11, 4692.2, 4393.0, 4002.1, 4219.51, 4692.33, 4057.87, 4766.84) # http/simple.js c=500 chunks=1 length=1024 type="bytes"
(1164.83, 1154.38, 1148.38, 1155.99, 1167.51, 1166.54, 1150.04, 1167.37, 1161.98, 1161.54) # http/simple.js c=50 chunks=4 length=1024 type="bytes"
(3008.63, 2792.37, 2733.67, 2750.96, 2694.03, 2738.08, 2781.87, 2670.81, 2834.29, 2718.33) # http/simple.js c=500 chunks=4 length=1024 type="bytes"
(1603.17, 1540.93, 1594.18, 1641.75, 1573.88, 1569.05, 1575.24, 1590.35, 1625.87, 1559.17) # http/simple.js c=50 chunks=0 length=102400 type="bytes"
(1447.06, 1443.11, 1415.4, 1474.7, 1493.6, 1405.53, 1437.9, 1538.42, 1473.68, 1452.95) # http/simple.js c=500 chunks=0 length=102400 type="bytes"
(942.54, 880.88, 955.97, 964.18, 980.56, 974.08, 972.91, 953.37, 937.53, 961.64) # http/simple.js c=50 chunks=1 length=102400 type="bytes"
(862.65, 836.06, 895.56, 889.77, 872.32, 758.42, 839.46, 924.86, 881.89, 885.89) # http/simple.js c=500 chunks=1 length=102400 type="bytes"
(1464.56, 1260.95, 1648.63, 1652.4, 1640.25, 1567.12, 1474.97, 1634.81, 1589.97, 1668.75) # http/simple.js c=50 chunks=4 length=102400 type="bytes"
(1394.29, 1380.53, 1522.57, 1635.81, 1392.91, 1287.14, 1358.17, 1494.43, 1511.81, 1567.85) # http/simple.js c=500 chunks=4 length=102400 type="bytes"
(5731.38, 5465.33, 5866.3, 5828.25, 6202.44, 5775.79, 6062.04, 5762.29, 5827.93, 5655.72) # http/simple.js c=50 chunks=0 length=4 type="buffer"
(4430.37, 4247.96, 4499.54, 4767.27, 4689.36, 4642.78, 4764.07, 4820.84, 4822.76, 4869.27) # http/simple.js c=500 chunks=0 length=4 type="buffer"
(5145.11, 5924.89, 5986.92, 5912.23, 5506.47, 5318.37, 5949.75, 5999.83, 5642.12, 5977.34) # http/simple.js c=50 chunks=1 length=4 type="buffer"
(3883.17, 4350.6, 5428.7, 4719.65, 4687.47, 4163.63, 4394.01, 4407.87, 4223.54, 4523.87) # http/simple.js c=500 chunks=1 length=4 type="buffer"
(5025.94, 5146.72, 5525.48, 5664.2, 5617.77, 4786.47, 5290.52, 5482.93, 5229.53, 5712.98) # http/simple.js c=50 chunks=4 length=4 type="buffer"
(3958.14, 3845.48, 4587.39, 4394.0, 4013.58, 4294.32, 4373.46, 4594.54, 4041.38, 4054.52) # http/simple.js c=500 chunks=4 length=4 type="buffer"
(5694.81, 5034.5, 5892.08, 5537.23, 5548.67, 5930.73, 5928.66, 5375.86, 5801.77, 6505.35) # http/simple.js c=50 chunks=0 length=1024 type="buffer"
(4653.95, 4331.3, 4529.86, 3842.34, 5162.96, 4262.19, 4129.83, 4178.14, 4781.28, 5165.36) # http/simple.js c=500 chunks=0 length=1024 type="buffer"
(5377.8, 5404.16, 5598.9, 5266.68, 5889.52, 5673.61, 5794.47, 5470.61, 5643.74, 5724.31) # http/simple.js c=50 chunks=1 length=1024 type="buffer"
(4273.99, 3851.52, 4485.49, 3888.94, 5008.76, 3951.45, 4611.87, 3798.13, 4100.02, 4239.95) # http/simple.js c=500 chunks=1 length=1024 type="buffer"
(4445.57, 4769.13, 5392.51, 4952.31, 5239.15, 4765.72, 5367.59, 4270.06, 5311.94, 4462.06) # http/simple.js c=50 chunks=4 length=1024 type="buffer"
(3940.47, 3633.86, 4603.27, 3572.15, 4300.59, 3813.66, 3837.7, 3929.49, 3944.15, 4580.91) # http/simple.js c=500 chunks=4 length=1024 type="buffer"
(4422.92, 4584.11, 4689.41, 4339.86, 4774.42, 4323.61, 4298.2, 4359.04, 4381.4, 4345.08) # http/simple.js c=50 chunks=0 length=102400 type="buffer"
(3414.73, 3235.97, 3658.03, 3576.69, 3354.0, 2927.27, 3288.56, 3034.4, 3358.44, 3391.09) # http/simple.js c=500 chunks=0 length=102400 type="buffer"
(3743.97, 4326.18, 4421.93, 4421.81, 3840.19, 4099.92, 4555.21, 3934.28, 4118.16, 4219.32) # http/simple.js c=50 chunks=1 length=102400 type="buffer"
(2866.77, 3301.74, 3674.63, 3477.26, 3251.45, 3319.95, 3789.7, 3075.3, 3069.91, 3437.71) # http/simple.js c=500 chunks=1 length=102400 type="buffer"
(3986.23, 3926.79, 3964.04, 3871.27, 3534.66, 4061.72, 4307.7, 3728.58, 3813.63, 4106.0) # http/simple.js c=50 chunks=4 length=102400 type="buffer"
(2869.64, 3088.59, 3151.95, 2962.71, 3098.05, 3423.59, 3134.49, 3172.54, 3209.03, 3171.31) # http/simple.js c=500 chunks=4 length=102400 type="buffer"
I didn't copy the std(x)/mean(x) column because it wouldn't fit in the GH comment. Apologies if that was confusing. (I use std(x)/median(x) as a quick gauge for means that are distorted by big outliers. Like you observed, not an issue here.)
Let me rephrase that as "so unreliable as to be worthless for me." :-) Benchmarks with too much variance don't let me measure the impact of small performance improvements, which is what I'm trying to test here. Removing the min and the max from each column helps even things out a little but there are still tests where the variance over 8 runs is >7% (with
|
Great I will take a look later.
That actually depends on how many observations you have. The standard deviation of the mean performance, which is what you are really interested in when comparing changes, is given my |
What are you proposing? I'm willing to accept that if you collect 1,000 samples, drop the top and bottom 25%, then do statistics to the remainder, that you're able to make statistically significant observations - but the problem is that I don't have days to let the benchmark suite collect those 1,000 samples. As it stands, the variance between individual runs is so big it's useless (to me!) If there is a way to fix that, great. If not, I'll just have to PR my http parser changes without benchmarks to back them up. |
Ran the full benchmark with 30 samples (raw data: https://gist.github.com/AndreasMadsen/c0dff8145a910984bb96fa422749743c), here are the results. They are not as bad as yours, though they are definitely not all within < 1-2%.
I'm not proposing anything. I'm saying there is more to it than just the standard deviation. In fact I agree that the benchmarks takes too long. However I don't agree that they should be neglected completely. If you made a particular change and think that improves some aspect of of the I did the math and it turns out it is not too hard to calculate the approximate appropriate coefficient of variation, given the number of observations and the expected relative improvement. Here are some outputs of that (e.g. if you have 30 observations and expect an improvement of 1%, then you need
|
I don't disagree there, the numbers I posted were just to show that individual runs are too imprecise to be useful.
I'm trying to make across-the-board performance improvements so that doesn't work for me, unfortunately. Thanks for running the benchmarks, it helps to know it's not just local to my system. |
I've seen this variation also. Not entirely sure what to do about it tho as I haven't dug in enough. |
Since you are just benchmarking using single benchmark and you are interested in the general performance difference you could use a linear regression instead of a standard t-test. This way you can test for significance using the observations for all the different parameters combined. This should be (hard to derive this late) much more sensitive to performance improvements since you go from I just did this in #8140 (comment), but instead of testing the effect of the http benchmarker you will test the node version. |
I'm not sure I follow. It's a little ambiguous what "different parameters" refers to. |
var bench = common.createBenchmark(main, {
// unicode confuses ab on os x.
type: ['bytes', 'buffer'],
length: [4, 1024, 102400],
chunks: [0, 1, 4], // chunks=0 means 'no chunked encoding'.
c: [50, 500]
}); If you only need the combined effect then you can do the statstics on all of the parameters simultaneously. This increases the number of observations greatly without extra cost, and as you know the number of observations is important as well. |
I'll close, I think this ran its course. |
I was working on some HTTP parser improvements and I noticed the numbers from the http/simple benchmark fluctuate wildly, even when the system is otherwise quiescent. This is with master, without my changes applied.
Results from 10 runs. Columns are
mean(x) median(x) std(x) std(x)/median(x)*100
.Everything with a standard deviation > 1-2% is so unreliable as to be worthless, IMO.
Can someone confirm whether they are seeing similar fluctuations?
cc @nodejs/benchmarking
The text was updated successfully, but these errors were encountered: