-
-
Notifications
You must be signed in to change notification settings - Fork 8.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[jvm-packages] support distributed synchronization of customized evaluation metrics #4280
Conversation
Hello @CodingCat , could you write a couple of things on the purpose of this so I can potentially help out with reviewing and testing? |
6f2119a
to
6776568
Compare
@thvasilo thanks for offering the help on testing the feature, I will have a more comprehensive description of the PR after I finalize the code part |
1 similar comment
@thvasilo thanks for offering the help on testing the feature, I will have a more comprehensive description of the PR after I finalize the code part |
@trivialfis @RAMitchell @hcho3 please help to review this, as I moved several classes in the original .cu files to headers to be shared by other files |
Had a brief look. This is gonna take some time. |
1e005fa
to
067f96c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CodingCat Let's don't merge this for now.
- Is the feature jvm only or it's universal to other bindings (I have never done this before)? If it's jvm only then we can discuss how to make it more generic before moving so many code.
- Is it necessary to move the whole implementation into header? We can create a suitable interface.
- Other stuffs in comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with this PR is that it fuses the internal C++ implementation of metrics to the external Java API. If we merge this we effectively lose the ability to refactor or change the way metrics are implemented.
While this provides some generality for users it still doesn't give us the ability to implement metrics like AUC via Java/Scala.
Given how complicated it seems to expose this feature and the fact that it is not a complete solution, I wonder if users should be implementing these functions in C++?
@RAMitchell , thanks for the review actually we have make an regulation on the signature of methods in metrics implementations, e.g. xgboost/src/metric/elementwise_metric.cu Line 106 in 2e052e7
as long as we want to expose the distributed metrics syncing to users without requiring them to call rabit directly (which is not an elegant solution as we are exposing an XGBoost dependency to the user), they need to follow that signature no matter which language they want to regarding implementing something like AUC, the users can do that with existing interface like https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/scala/ml/dmlc/xgboost4j/scala/EvalTrait.scala#L38 (they just cannot use it in a distributed setting for purposes like early stopping) the question is only on do we want to |
for now, only JVM is having this functionality as the strategy is to make cross-language calls via language-specific APIs.... IIRC, other languages didn't provide any base class for users to extend and implement custom metrics, instead, they require the user to implement the whole function , like xgboost/python-package/xgboost/core.py Line 1147 in 82dca3c
if we want to extend to other languages, I think the pattern here is a good references though cannot be applied directly in implementation (because every language has its own cross-language function call APIs) |
can we take further consideration on this PR? |
28feba0
to
3802324
Compare
@trivialfis @hcho3 @RAMitchell I got failed compilation in Jenkins it says
do you have any insights about what is happening? |
@CodingCat Let me try fixing it in another PR. |
nvm, I think I messed up the switch for conditional compilation |
@CodingCat The thrust header paths are managed by |
got it! thanks! |
ping for the further review @trivialfis @RAMitchell @hcho3 |
Can we explore the alternative a little more? Could we expose a very basic AllReduce method via the xgboost API that connects on the back end to the rabit library linked to the xgboost shared library? |
we actually have exposed something like https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/native/xgboost4j.h#L327-L328 do you mean we are going to document how to implement a customized metrics for distributed training with this API (instead of masking the call to rabit api with the implementation here)? |
I guess I would like to know what it look like and how complicated it would be for users to call rabit themselves in their jvm implementation of the metric. If it's not overly difficult and distributed metrics can be achieved this way it may be preferable to this current implementation for me. |
@RAMitchell I was trying to write an example to concretely show how it is complicated for a user to write a distributed metrics by their own, but discarded just because of the complexity, given the current interface https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/IEvaluation.java#L40
I would still prefer to mask all these complexities to the users, since we have implemented everything in C++, it's not worth repeating it in Java....and I am actually unclear about the downside of the approach in PR except it's a big diff most of which are just from moving files? |
IMO the solution would be make it easy for the users to get predictions, labels and weights on the JVM side and a function simplifying the rabit reduction. The reason why I am taking this angle as it requires no changes to the C API and leaves us much more flexible in terms of how we internally implement metrics in C++ code. I don't think this solution is more complicated than what you have. I realise you have done a lot of work already but I want to make sure this is correct as I think it will have long-term implications on my work and others. Lets look for other opinions, in particular I'm curious what @tqchen thinks. |
One thing I would add is that we should think about how we treat metrics system in xgb, is it like a helper function which is fine under any changed or we are going to treat it as a submodule sharing by other modules If we want to make it a formal module, we need to sign the contract to regulate the changes The contract here is, no matter how you change it, you should allow others to tell
re on more opinions |
I will get back on this after 0.9 release and until so far, I have no idea about (1) why we require every user to write the same chunk of code like https://github.com/dmlc/xgboost/blob/master/src/metric/elementwise_metric.cu#L36-L57 if they want to use custom evaluation metrics in distributed setting (2) why we require XGBoost user knows what Rabit is? (Do we also require our user knows what NCCL is and how to use it?) (3) why we are so afraid of stablizing any API even we are seriously considering 1.0 release (4) what is the concrete plan of metrics system refactoring (as concrete as the solution which have been here for two months) which can be broken by exposing any API? I don't think we are on the right track to move things forward ...such disaster happened in other communities as well and essentially hurt users and create many private folks of the project. One of the most significant example might be the nested column pruning feature in SparkSQL, that's a feature proposed 2 years ago, but was delayed time and time again due to the reason like "hmmmm, we need to refactor/re-design/whatever A,B,C, there might be some conflict" (turns out no in most of cases), even that's a feature waited by many users.... and eventually they only merged a (very very) partial version of that feature after 2 years, and pissed-off users have created some private folks of Spark containing that feature and many other innovations, for example, FB/Uber/Apple |
@CodingCat One thing I like about this pull request is |
@CodingCat One little useful detail: AUC / AUCPR are actually list-wise ranking metric. So |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comments on the metric interface design. I plan to review literature in survival analysis and learning-to-rank and design the metric design for these applications.
---------------- | ||
With XGBoost4j (including XGBoost4J-Spark), users are able to implement their own custom evaluation metrics and synchronize the values in the distributed training setting. To implement a custom evaluation metric, users should implement the interface ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` (for binary classification and regression), ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed`` (for multi classification) and ``ml.dmlc.xgboost4j.java.IEvalRankListDistributed`` (for ranking). | ||
|
||
* ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed``: users are supposed to implement ``float evalRow(float label, float pred);`` which calculates the metric for a single sample given the prediction and label, as well as ``float getFinal(float errorSum, float weightSum);`` which performs the final transformation over the sum of error and weights of samples. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness, we need to have weights included in element-wise metric:
float evalRow(float label, float pred, float weight)
We may also want to define an interface to let the user define a custom reduce op.
|
||
* ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed``: users are supposed to implement ``float evalRow(float label, float pred);`` which calculates the metric for a single sample given the prediction and label, as well as ``float getFinal(float errorSum, float weightSum);`` which performs the final transformation over the sum of error and weights of samples. | ||
|
||
* ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed``: the methods to be implemented by the users are similar to ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` except that the single row metric calculating method is ``float evalRow(int label, float pred, int numClasses);`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be float evalRow(int label, float pred, float weight, int numClasses);
|
||
* ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed``: the methods to be implemented by the users are similar to ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` except that the single row metric calculating method is ``float evalRow(int label, float pred, int numClasses);`` | ||
|
||
* ``ml.dmlc.xgboost4j.java.IEvalRankListDistributed``: users are to implement ``float evalMetric(float[] preds, int[] labels);`` which gives the predictions and labels for instances in the same group; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be float evalMetric(float[] preds, int[] labels, float group_weight);
@CodingCat I will see if it's possible to "unmask" the sum op. |
this PR implements the functionality to synchronize custom metrics in distributed training setting. Before this PR, only the built-in metrics are synchronized resulting that users cannot leverage functionalities like early-stopping with custom metrics (since different workers may have different decision about whether to stop).
This PR is to
closes #3595 #3813