-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
best boosting AUC? #15
Comments
One thing that might be interesting to try is use integer encoding for the dates(seemed i got far better result with simply depth=6) |
If no overfitting was happening, eta=0.01 was good enough. Another interesting thing to try, which I always do to optimize AUC, is to re-balance the weight. In particular, set
Will re-balance the positive and negative weight, usually work better for AUC. Though the effect was more significant on unbalanced dataset, I am not sure what will happen here |
Well, I deliberately considered that "forbidden" :) See some discussion here: #1 The reason is that I want the benchmark on a dataset with a mix of categoricals and numerics (with more categoricals) - similar to a industry/business dataset. So I'm making the day of the week, month etc. somewhat artificially categoricals:
If I made them ordinal/numeric those variables would dominate importance-wise the prediction and the dataset would have more too many "numeric" features. So, the game between RF and boosting is on in the sense that those vars need to be categoricals and no other feature engineering ;) |
Re @tqchen 2nd comment: This dataset is pretty well balanced. What's very handy in xgboost (and missing from the other tools) is the early stopping :) And I use it with I'll let you know when this run finishes: https://github.com/szilard/benchm-ml/blob/master/3-boosting/6a-xgboost-grid.R |
I might also try |
I do not know if there is any theory on decreasing learning rate. I see you set subsample parameter, there is another colsample_bytree which sub-samples columns, usually makes result less easier to overfit and running time faster(set it to 0.5 or 0.3) |
although gbm usually wins on complicated cases with more features and many feature played an important role, maybe this dataset was not in that case. Since the things are made explicitly categorical, currently there was too few integer features |
Hm... Quick google does not bring up anything. Besides VW, there is simulated annealing in various contexts (e.g. neural nets) etc. This might be useful for boosting... Yeah, |
Yeah, I wish I chose a dataset with more columns... Anyway, my main focus here is see what tools can run on 10M rows in decent time and with decent AUC, and AUC for boosting is pretty close to RF. |
Just for reference, I sometimes try this adaptive learning rate method: http://www.ark.cs.cmu.edu/cdyer/adagrad.pdf . It is not implemented in the xgboost. |
Oh, now I remember reading this http://www.matthewzeiler.com/pubs/googleTR2012/googleTR2012.pdf which is implemented in H2O deep learning. Btw do you think this has been adapted for GBMs?@hetong007: when you say "sometimes I try this..." what tool are you using? (is it boosting/GBM or something else like linear etc) |
The nature of boosting was very different from linear solver, which makes adagrad may not be a directly applied here. Actually, even for deeplearning, adagrad was not the best choice for common convnet(maybe a safe choice). However, it was actually straight forward to tweak the R/python code of xgboost to implementing decay learning rate without touching the cpp part, so maybe it was interesting to try. |
Sometimes in my research I will write some small prototypes to compare, but basically matrix factorization. @tqchen I think currently we cannot get the gradient of each update from R/python, so at least the one I posted is not applicable. |
Here is some options for decay (after quick web search): http://static.googleusercontent.com/media/research.google.com/en/us/pubs/archive/40808.pdf I was doing some simulated annealing related stuff in physics 20 yrs ago ;) Anyway, maybe it's easy to change it in the C++ level code in xgboost? |
there is a thing called customized loss function in xgboost, which should do the job in R. Adding learning rate was equivalent to scale the h statistics:) |
BTW, @szilard maybe it is interesting to do some feature importance analysis on the trees learnt, see this example I guess the result will have a few very important features |
Yeah, I think I took a quick look at the variable importance in RF in H2O. There are only 8 variables though... |
I'll have to look at custom loss functions, probably tomorrow... |
@tqchen @hetong007 See some boosting results here (GBM beats RF now) https://github.com/szilard/benchm-ml#boosting-gradient-boosted-treesgradient-boosting-machines |
great, thanks @szilard |
Here is Time/AUC for a few settings:
|
@tqchen @hetong007 I'm trying to get a good AUC with boosting for the largest dataset (n = 10M). Would be nice to beat random forests :)
So far I did some basic grid search https://github.com/szilard/benchm-ml/blob/master/3-boosting/0-xgboost-init-grid.R for n = 1M (not the largest dataset) and seems like deeper trees,
min_child_weight = 1
subsample = 0.5
work well.I'm running now https://github.com/szilard/benchm-ml/blob/master/3-boosting/6a-xgboost-grid.R with
n = 10M
by just looping overmax_depth = c(2,5,10,20,50)
but it's been running for a while.Any suggestions?
Smallest learning rate I'm using is
eta = 0.01
, any experience with smaller values?PS: See results so far here: https://github.com/szilard/benchm-ml#boosting-gradient-boosted-treesgradient-boosting-machines
The text was updated successfully, but these errors were encountered: