You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I read this blog post from the maintainers of BenchmarkDotNet,
It talks about an alternative and better suitable outlier detection for non normally distributed data. As benchmarks has high variance I thought it interesting even if I myself did not understand the exact math around it. Is this something Tango.rs may consider?
The text was updated successfully, but these errors were encountered:
FilipAndersson245
changed the title
Alternative outlier detection algorithm.
DoubleMAD outlier detector based on the Harrell-Davis quantile estimator
Apr 4, 2024
I looked at your IQR and tried copy it to use a double MAD, I am not confident enough to try to implement the Harrell-Davis quantile estimator.
Double MAD perform better when we have unsymetric distributions, ex when we have long tails. in one direction. https://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers/
fndouble_mad_thresholds(mutinput:Vec<f64>) -> Option<RangeInclusive<f64>>{constC:f64 = 1.4826;constK:f64 = 3.0;
input.sort_unstable_by(|a, b| a.partial_cmp(b).unwrap_or(Ordering::Equal));let m = median(&mut input);let x_l:Vec<f64> = input
.iter().filter(|v| **v <= m).map(|v| f64::abs(*v - m)).collect();let x_u:Vec<f64> = input
.iter().filter(|v| **v >= m).map(|v| f64::abs(*v - m)).collect();let mad_l = C*median(&x_l);let mad_u = C*median(&x_u);let lower = m - K* mad_l;let upper:f64 = m + K* mad_u;println!("lower: {}, upper: {}", lower, upper);// Calculating the indicies of the thresholds in an datasetlet low_threshold_idx = match input[..].binary_search_by(|probe| probe.total_cmp(&lower)){Ok(idx) => idx,Err(idx) => idx,};let high_threshold_idx = match input[..].binary_search_by(|probe| probe.total_cmp(&upper)){Ok(idx) => idx,Err(idx) => idx,};println!("low_threshold_idx: {}, high_threshold_idx: {}",
low_threshold_idx, high_threshold_idx
);Some(input[low_threshold_idx]..=(input[high_threshold_idx - 1]))}/// Calculate the median of a slice of data, expects the data to be sorted already.fnmedian(data:&[f64]) -> f64{let len = data.len();let mid = len / 2;if len % 2 == 0{// Even number of elements(data[mid - 1] + data[mid]) / 2.0}else{// Odd number of elements
data[mid]}}
Hello, I read this blog post from the maintainers of BenchmarkDotNet,
It talks about an alternative and better suitable outlier detection for non normally distributed data. As benchmarks has high variance I thought it interesting even if I myself did not understand the exact math around it. Is this something Tango.rs may consider?
The text was updated successfully, but these errors were encountered: