PT/TF examples vs XGBoost examples #1178
-
Hi community, I am trying to understand more on the xgboost examples.
|
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
Thanks for asking! Replies inline.
There are two flavors in the examples, "histogram-based" and "tree-based". Only the histogram-based approach relies on dmlc/xgboost#7778.
This is the tree-based approach. It has some limitations on training speed and model accuracy. The histogram-based approach is "lossless", trained models should be identical to those from xgboost distributed training, speed should be comparable given fast networks.
Yes if you want federated TF/Pytorch to work exactly like distributed training. But so far parameter aggregation seems to work pretty well for deep models, so it may not be necessary. |
Beta Was this translation helpful? Give feedback.
-
A separate question, seems it plays a role of Parameter Server, in that case, workers are communicated with Parameter Server to exchange data. there's no worker and worker communication as well. In that case, I think all NVFlare native design would be implementing NVFlare server as parameter server and user does need some framework changes to glue NVFlare and framework distributed training together? This sounds like an ideal solution but not necessary since we leverage scatter and gather mode now. |
Beta Was this translation helpful? Give feedback.
-
@Jeffwan The two approaches were worked on in parallel, mostly me on the histogram-based approach, and the NVFlare team on the tree-based one. As I mentioned, the histogram-based approach is "lossless". Any dataset you can train with distributed xgboost you can translate directly into the federated environment. However, currently there is no strong privacy guarantee since gradients are shared freely between participants. This needs to be worked on. Also since each gradient sum requires a gRPC call from all the workers, it's sensitive to network latency, especially when The tree-based approach is more research oriented, and may work well in some scenarios, but may also suffer with model accuracy in others (e.g. data skew). Since only the trees are shared, it's probably more privacy preserving at the moment. Also there is less communication, so if you are running over a very slow network it may be faster. Deep models seem to be more forgiving with stochastic parameter updates (e.g. https://arxiv.org/abs/1106.5730), so having a "lossless" federated training approach may not be necessary. |
Beta Was this translation helpful? Give feedback.
-
Hi @Jeffwan thanks for the discussion. @rongou already provided some good points. Let me tag @ZiyueXu77 as he is our scientist that work on the algorithm side of tree-based method, and he has analyzed some of these "model accuracy" part, maybe he can share some insights.
^As Rong replied: "Deep models seem to be more forgiving with stochastic parameter updates (e.g. https://arxiv.org/abs/1106.5730)." To add additional point, as stated in: https://github.com/NVIDIA/NVFlare/tree/dev/examples/xgboost/tree-based#tree-based-federated-learning-for-xgboost |
Beta Was this translation helpful? Give feedback.
Hi @Jeffwan thanks for the discussion.
@rongou already provided some good points.
Let me tag @ZiyueXu77 as he is our scientist that work on the algorithm side of tree-based method, and he has analyzed some of these "model accuracy" part, maybe he can share some insights.
^As Rong replied: "Deep models seem to be more forgiving with stochastic parameter updates (e.g. https://arxiv.org/abs/1106.5730)."
To add additional point, as stated in: https://github.com/NVIDIA/NVFlare/tree/dev/examples/x…