Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] support distributed synchronization of customized evaluation metrics #4280

Closed

Conversation

CodingCat
Copy link
Member

@CodingCat CodingCat commented Mar 20, 2019

  • basic JNI framework ( c++ <-> java )
  • support binary classification
  • support regression
  • support multi classes
  • support ranking
  • unit test checking numerical correctness of synchronized metrics
  • document

this PR implements the functionality to synchronize custom metrics in distributed training setting. Before this PR, only the built-in metrics are synchronized resulting that users cannot leverage functionalities like early-stopping with custom metrics (since different workers may have different decision about whether to stop).

This PR is to

  • implement distributed synchronization of custom metrics
  • avoid users who implement the metrics to explicitly call rabit operation

closes #3595 #3813

@thvasilo
Copy link
Contributor

Hello @CodingCat , could you write a couple of things on the purpose of this so I can potentially help out with reviewing and testing?

@CodingCat
Copy link
Member Author

@thvasilo thanks for offering the help on testing the feature, I will have a more comprehensive description of the PR after I finalize the code part

1 similar comment
@CodingCat
Copy link
Member Author

@thvasilo thanks for offering the help on testing the feature, I will have a more comprehensive description of the PR after I finalize the code part

@CodingCat CodingCat changed the title [jvm-packages][WIP] support distributed synchronization of customized evaluation metrics [WIP] support distributed synchronization of customized evaluation metrics Mar 26, 2019
@CodingCat CodingCat changed the title [WIP] support distributed synchronization of customized evaluation metrics [jvm-packages] support distributed synchronization of customized evaluation metrics Mar 28, 2019
@CodingCat CodingCat requested review from trivialfis, hcho3 and RAMitchell and removed request for hcho3 March 28, 2019 02:48
@CodingCat
Copy link
Member Author

@trivialfis @RAMitchell @hcho3

please help to review this, as I moved several classes in the original .cu files to headers to be shared by other files

@trivialfis
Copy link
Member

Had a brief look. This is gonna take some time.

@CodingCat CodingCat force-pushed the distributed_custom_eval_metrics branch from 1e005fa to 067f96c Compare March 30, 2019 00:05
Copy link
Member

@trivialfis trivialfis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodingCat Let's don't merge this for now.

  • Is the feature jvm only or it's universal to other bindings (I have never done this before)? If it's jvm only then we can discuss how to make it more generic before moving so many code.
  • Is it necessary to move the whole implementation into header? We can create a suitable interface.
  • Other stuffs in comments.

include/xgboost/c_api.h Outdated Show resolved Hide resolved
include/xgboost/learner.h Show resolved Hide resolved
include/xgboost/metric/metric.h Outdated Show resolved Hide resolved
include/xgboost/metric/metric.h Outdated Show resolved Hide resolved
Copy link
Member

@RAMitchell RAMitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern with this PR is that it fuses the internal C++ implementation of metrics to the external Java API. If we merge this we effectively lose the ability to refactor or change the way metrics are implemented.

While this provides some generality for users it still doesn't give us the ability to implement metrics like AUC via Java/Scala.

Given how complicated it seems to expose this feature and the fact that it is not a complete solution, I wonder if users should be implementing these functions in C++?

@CodingCat
Copy link
Member Author

My concern with this PR is that it fuses the internal C++ implementation of metrics to the external Java API. If we merge this we effectively lose the ability to refactor or change the way metrics are implemented.

While this provides some generality for users it still doesn't give us the ability to implement metrics like AUC via Java/Scala.

Given how complicated it seems to expose this feature and the fact that it is not a complete solution, I wonder if users should be implementing these functions in C++?

@RAMitchell , thanks for the review

actually we have make an regulation on the signature of methods in metrics implementations, e.g.

bst_float residue = d_policy.EvalRow(s_label[idx], s_preds[idx]);
,

as long as we want to expose the distributed metrics syncing to users without requiring them to call rabit directly (which is not an elegant solution as we are exposing an XGBoost dependency to the user), they need to follow that signature no matter which language they want to

regarding implementing something like AUC, the users can do that with existing interface like https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/scala/ml/dmlc/xgboost4j/scala/EvalTrait.scala#L38 (they just cannot use it in a distributed setting for purposes like early stopping)

the question is only on do we want to expose the distributed metrics syncing to without requiring them to call rabit directly ?

@CodingCat
Copy link
Member Author

@CodingCat Let's don't merge this for now.

  • Is the feature jvm only or it's universal to other bindings (I have never done this before)? If it's jvm only then we can discuss how to make it more generic before moving so many code.
  • Is it necessary to move the whole implementation into header? We can create a suitable interface.
  • Other stuffs in comments.

for now, only JVM is having this functionality as the strategy is to make cross-language calls via language-specific APIs....

IIRC, other languages didn't provide any base class for users to extend and implement custom metrics, instead, they require the user to implement the whole function , like

def eval_set(self, evals, iteration=0, feval=None):
. Then the users needs to care about everything from logic to distributed sync

if we want to extend to other languages, I think the pattern here is a good references though cannot be applied directly in implementation (because every language has its own cross-language function call APIs)

@CodingCat
Copy link
Member Author

can we take further consideration on this PR?

@CodingCat CodingCat force-pushed the distributed_custom_eval_metrics branch from 28feba0 to 3802324 Compare April 18, 2019 20:21
@CodingCat
Copy link
Member Author

@trivialfis @hcho3 @RAMitchell I got failed compilation in Jenkins

it says


[2019-04-19T03:53:37.599Z] In file included from /workspace/src/metric/multiclass_metric.cc:8:0:

[2019-04-19T03:53:37.599Z] /workspace/include/xgboost/metric/multiclass_metric.h:17:59: fatal error: thrust/execution_policy.h: No such file or directory

[2019-04-19T03:53:37.599Z] compilation terminated.

do you have any insights about what is happening?

@trivialfis
Copy link
Member

@CodingCat Let me try fixing it in another PR.

@CodingCat
Copy link
Member Author

nvm, I think I messed up the switch for conditional compilation

@trivialfis
Copy link
Member

@CodingCat The thrust header paths are managed by nvcc, which will pass the right arguments and device code to gcc. For a cu file this all happens automatically. But for a cc file, you need guards like __CUDACC__ or XGBOOST_USE_CUDA to avoid letting gcc process CUDA related code directly.

@CodingCat
Copy link
Member Author

@CodingCat The thrust header paths are managed by nvcc, which will pass the right arguments and device code to gcc. For a cu file this all happens automatically. But for a cc file, you need guards like __CUDACC__ or XGBOOST_USE_CUDA to avoid letting gcc process CUDA related code directly.

got it! thanks!

@CodingCat
Copy link
Member Author

ping for the further review @trivialfis @RAMitchell @hcho3

@RAMitchell
Copy link
Member

Can we explore the alternative a little more? Could we expose a very basic AllReduce method via the xgboost API that connects on the back end to the rabit library linked to the xgboost shared library?

@CodingCat
Copy link
Member Author

CodingCat commented Apr 22, 2019

we actually have exposed something like https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/native/xgboost4j.h#L327-L328

do you mean we are going to document how to implement a customized metrics for distributed training with this API (instead of masking the call to rabit api with the implementation here)?

@RAMitchell
Copy link
Member

do you mean we are going to document how to implement a customized metrics for distributed training with this API (instead of masking the call to rabit api with the implementation here)?

I guess I would like to know what it look like and how complicated it would be for users to call rabit themselves in their jvm implementation of the metric. If it's not overly difficult and distributed metrics can be achieved this way it may be preferable to this current implementation for me.

@CodingCat
Copy link
Member Author

CodingCat commented Apr 25, 2019

@RAMitchell I was trying to write an example to concretely show how it is complicated for a user to write a distributed metrics by their own, but discarded just because of the complexity,

given the current interface https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j/src/main/java/ml/dmlc/xgboost4j/java/IEvaluation.java#L40

  1. in the first step, users need to get predictions, labels and weights

  2. then they need to implement something like https://github.com/dmlc/xgboost/blob/master/src/metric/elementwise_metric.cu#L36-L57

  3. after that, they need to call Rabit API to synchronize residual and weights, like https://github.com/dmlc/xgboost/blob/master/src/metric/elementwise_metric.cu#L36-L57

  4. another headache is that we need to either provide a specific API tailored for syncing float array or the user has to use our current rabit api but deal with transform their local evaluation results got from step 1, 2 to a nio.directBuffer in java and pass to our API...

I would still prefer to mask all these complexities to the users, since we have implemented everything in C++, it's not worth repeating it in Java....and I am actually unclear about the downside of the approach in PR except it's a big diff most of which are just from moving files?

@RAMitchell
Copy link
Member

IMO the solution would be make it easy for the users to get predictions, labels and weights on the JVM side and a function simplifying the rabit reduction. The reason why I am taking this angle as it requires no changes to the C API and leaves us much more flexible in terms of how we internally implement metrics in C++ code. I don't think this solution is more complicated than what you have.

I realise you have done a lot of work already but I want to make sure this is correct as I think it will have long-term implications on my work and others. Lets look for other opinions, in particular I'm curious what @tqchen thinks.

@CodingCat
Copy link
Member Author

One thing I would add is that we should think about how we treat metrics system in xgb, is it like a helper function which is fine under any changed or we are going to treat it as a submodule sharing by other modules

If we want to make it a formal module, we need to sign the contract to regulate the changes

The contract here is, no matter how you change it, you should allow others to tell

  1. What’s the output given the prediction and label of a single row

  2. What’s the output given the sum of errors and weights

re on more opinions

@hcho3 hcho3 mentioned this pull request May 6, 2019
18 tasks
@CodingCat
Copy link
Member Author

CodingCat commented May 12, 2019

I will get back on this after 0.9 release

and until so far, I have no idea about

(1) why we require every user to write the same chunk of code like https://github.com/dmlc/xgboost/blob/master/src/metric/elementwise_metric.cu#L36-L57 if they want to use custom evaluation metrics in distributed setting

(2) why we require XGBoost user knows what Rabit is? (Do we also require our user knows what NCCL is and how to use it?)

(3) why we are so afraid of stablizing any API even we are seriously considering 1.0 release

(4) what is the concrete plan of metrics system refactoring (as concrete as the solution which have been here for two months) which can be broken by exposing any API?

I don't think we are on the right track to move things forward ...such disaster happened in other communities as well and essentially hurt users and create many private folks of the project. One of the most significant example might be the nested column pruning feature in SparkSQL, that's a feature proposed 2 years ago, but was delayed time and time again due to the reason like "hmmmm, we need to refactor/re-design/whatever A,B,C, there might be some conflict" (turns out no in most of cases), even that's a feature waited by many users....

and eventually they only merged a (very very) partial version of that feature after 2 years, and pissed-off users have created some private folks of Spark containing that feature and many other innovations, for example, FB/Uber/Apple

@hcho3
Copy link
Collaborator

hcho3 commented May 12, 2019

@CodingCat One thing I like about this pull request is IEvalMultiClassesDistributed. I've seen quite a few users tripping over writing customized objectives for multi-class classification, e.g. #4288. It turns out that writing customized objective for multi-class classification is hard to do correctly, even in non-distributed setting. I'd love to see first-class support for this use case.

@hcho3
Copy link
Collaborator

hcho3 commented May 12, 2019

@CodingCat One little useful detail: AUC / AUCPR are actually list-wise ranking metric. So IEvalRankListDistributed should be able to accommodate AUC / AUCPR. (In the case of binary classification, set the number of query groups to 1.)

Copy link
Collaborator

@hcho3 hcho3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comments on the metric interface design. I plan to review literature in survival analysis and learning-to-rank and design the metric design for these applications.

----------------
With XGBoost4j (including XGBoost4J-Spark), users are able to implement their own custom evaluation metrics and synchronize the values in the distributed training setting. To implement a custom evaluation metric, users should implement the interface ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` (for binary classification and regression), ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed`` (for multi classification) and ``ml.dmlc.xgboost4j.java.IEvalRankListDistributed`` (for ranking).

* ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed``: users are supposed to implement ``float evalRow(float label, float pred);`` which calculates the metric for a single sample given the prediction and label, as well as ``float getFinal(float errorSum, float weightSum);`` which performs the final transformation over the sum of error and weights of samples.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For completeness, we need to have weights included in element-wise metric:

float evalRow(float label, float pred, float weight)

We may also want to define an interface to let the user define a custom reduce op.


* ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed``: users are supposed to implement ``float evalRow(float label, float pred);`` which calculates the metric for a single sample given the prediction and label, as well as ``float getFinal(float errorSum, float weightSum);`` which performs the final transformation over the sum of error and weights of samples.

* ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed``: the methods to be implemented by the users are similar to ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` except that the single row metric calculating method is ``float evalRow(int label, float pred, int numClasses);``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be float evalRow(int label, float pred, float weight, int numClasses);


* ``ml.dmlc.xgboost4j.java.IEvalMultiClassesDistributed``: the methods to be implemented by the users are similar to ``ml.dmlc.xgboost4j.java.IEvalElementWiseDistributed`` except that the single row metric calculating method is ``float evalRow(int label, float pred, int numClasses);``

* ``ml.dmlc.xgboost4j.java.IEvalRankListDistributed``: users are to implement ``float evalMetric(float[] preds, int[] labels);`` which gives the predictions and labels for instances in the same group;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be float evalMetric(float[] preds, int[] labels, float group_weight);

@trivialfis
Copy link
Member

@CodingCat I will see if it's possible to "unmask" the sum op.

@lock lock bot locked as resolved and limited conversation to collaborators Aug 28, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[jvm-packages] error using a custom evaluation function (setCustomEval) on spark
5 participants