-
Notifications
You must be signed in to change notification settings - Fork 6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] [AIR] Batch size for preprocessor transforms #29229
Comments
(meta: big fan of this format btw). I wonder if it's possible to configure the preprocessor directly rather than making this batch_predictor a godconfig? |
I think my main concern here is having two separate configs. Could we make it by default use the present batch size config for both, and then call the new flag Btw, I don't think None is a great default. It may be better to make it mandatory as an argument. |
|
Also, +1 to @ericl's suggestion of sharing batch_size: Optional[int] = None,
preprocess_batch_size: Optional[int] = None, where the docstring for |
BatchPredictor
preprocessing configurability
@richardliaw good point, added it as an option @ericl @clarkzinzow removed the section about the defaults. We can make that decision separately in a future RFC. Also renamed to |
Updated the issue- can you guys take another look? Thanks! |
This may be too radical, but how about using a context manager to set batch size for all preprocessor operations inside the context? For 2, can we not just have another argument for training batch size? If not set, it will use the transform batch size. We can determine whether we are in the middle of training by eg. setting a global |
@amogkam We need to be able to set |
Vote for option 2) with the following high level assumptions:
Concrete example: Preprocessor -> A, B E2E Workflow:
|
@clarkzinzow yeah that's a good point, updated the rfc |
@jiaodong yeah that's a good point. So basically seems like Option 2 is the best approach. The missing piece here is with Option 2 we also need a setter for batch size: So user can do something like this preprocessor = Preprocessor(transform_batch_size=1024)
trainer = Trainer(preprocessor)
trainer.fit(). checkpoint.to_uri(s3_path) In a separate prediction cluster, change the batch size of the preprocessor for prediction checkpoint = Checkpoint.from_uri(uri_path)
preprocessor = checkpoint.get_preprocessor()
preprocessor.set_transform_batch_size(256)
predictor = BatchPredictor.from_checkpoint(checkpoint) |
I actually don't think it's a good idea to support per preprocessor batch
size. The implementation complexity of this is very high, especially if it
breaks fusion across preprocessing stages.
Why not just have users set the batch size to be small enough for all
preprocessors so it won't OOM?
…On Tue, Oct 11, 2022, 4:08 PM Amog Kamsetty ***@***.***> wrote:
@jiaodong <https://github.com/jiaodong> yeah that's a good point.
So basically seems like Option 2 is the best approach.
To change the batch size for prediction, the user can modify the
preprocessor stored in the checkpoint with a new batch size.
The missing piece here is with Option 2 we also need a setter for batch
size: preprocessor.set_batch_size()
So user can do something like this
preprocessor = Preprocessor(transform_batch_size=1024)trainer = Trainer(preprocessor)trainer.fit(). checkpoint.to_uri(s3_path)
In a separate prediction cluster, change the batch size of the
preprocessor for prediction
checkpoint = Checkpoint.from_uri(uri_path)
preprocessor = checkpoint.get_preprocessor()
preprocessor.set_transform_batch_size(256)
predictor = BatchPredictor.from_checkpoint(checkpoint)
—
Reply to this email directly, view it on GitHub
<#29229 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSXRBUCUZQUJ3FO3QMLWCXXO3ANCNFSM6AAAAAARCU6BZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The per-preprocessor API perhaps is too much boilerplate/configuration complexity. The main problem with the batch size too small is the h2d right? though not clear if that is going to be the bottleneck necessarily. |
@ericl This wouldn't break fusion across preprocessing stages, we can (and already do) support fusing |
cc @c21 - it seems that stage fusion is supported for multiple |
We don't do |
Right, but that's true regardless of what the batch size is. |
AIR force enables lazy mode for BatchPredictor, so we can assume it's on.
…On Wed, Oct 12, 2022, 3:11 PM Amog Kamsetty ***@***.***> wrote:
Right, but that's true regardless of what the batch size is.
—
Reply to this email directly, view it on GitHub
<#29229 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAADUSW7XMV74HZJNVCIU33WC4ZR5ANCNFSM6AAAAAARCU6BZQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I don't think this is currently the case...where is this happening in the code? |
Ah, I'm mistaken. It's not force enabling lazy mode, but you do effectively get stage fusion when using most preprocessors after a |
We should probably just explicitly call .lazy() in BatchPredictor and avoid relying on this. |
Strong +1 to this, and the same goes for when triggering preprocessing before training, especially since at most one preprocess transform (i.e. |
+1 as well. This will be much cleaner. |
So just to summarize the discussion: Makes sense to set datasets to lazy by default for BatchPredictor, but seems orthogonal to the API discussion about batch size. Fusion should work regardless of what the batch size is of multiple map batches calls. Seems the primary pushback to option 2 is that there is too much configuration complexity to have to set it for each preprocessor? |
We can always have a simple default option (like batch size = block size, or batch size = 4096) for all preprocessors without enforcing them right ? I think for 95%+ of the case end user will never need to know or tune it, but for internal developers be able to use this flag can benefit efficiency and benchmarking. |
I think we need tiers of overridable defaults here, both at a single preprocessor level and at a global level:
One question I have is whether (3) should be a single batch size for all preprocessors, or if we should support overriding single preprocessors in a chain at call time. |
@clarkzinzow @ericl and I chatted just now. The decision is to go with Option 1, and introduce option 2 if needed down the line. We will change the default batch size at the AIR level to be less than 4096 (probably make it 256 by default). And we will also log a warning if batch size is not specified for training or prediction. |
Hi, I'm a bot from the Ray team :) To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months. If there is no further activity in the 14 days, the issue will be closed!
You can always ask for help on our discussion forum or Ray's public slack channel. |
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message. Please feel free to reopen or open a new issue if you'd still like it to be addressed. Again, you can always ask for help on our discussion forum or Ray's public slack channel. Thanks again for opening the issue! |
User workflow
Workflow 1: I am doing GPU prediction. I also have a preprocessor that has high heap memory pressure leading to OOMs with too much parallelism or large batch sizes. I want to either decrease my batch size for preprocessing, or reduce my preprocessing parallelism by requesting more CPUs for each
map_batches
task.Workflow 1, variant 1: Same as above, except my GRAM is a lot lower than RAM. So I need to use a smaller batch size for prediction than for preprocessing.
Workflow 2: I am doing training and am passing in a Preprocessor to my Trainer. Same as for prediction- my UDF has high heap memory pressure leading to OOMs with the default batch size. I would like to decrease my batch size to prevent OOMs.
Current State
Prediction:
When doing GPU prediction with CPU preprocessing, these two stages cannot be fused as each operation has different resource requirements. Currently, users can configure prediction (things like batch size, num resources per task/actor), but for preprocessing, the default args for
map_batches
are used and cannot be configured. This forces a batch size of 4096 which may be large for certain preprocessors.https://github.com/ray-project/ray/blob/master/python/ray/train/batch_predictor.py#L226
Training:
We call
preprocessor.transform
which callsdataset.map_batches
with the default batch size of 4096. There is no way to configure this.https://github.com/ray-project/ray/blob/master/python/ray/data/preprocessor.py#L253
https://github.com/ray-project/ray/blob/master/python/ray/data/preprocessor.py#L255
Proposal
Make preprocessing batch size configurable.
Option 1: Configured at higher level Trainer and BatchPredictor
Add a
batch_size
arg toPreprocessor.transform
.Add configuration options at the
Trainer
andBatchPredictor
level which will forward toPreprocessor.transform
.More concretely:
override_preprocess_batch_size
arg toBatchPredictor.predict
to use as the arguments for Preprocessingmap_batches
. If not doing GPU prediction, then this arg is ignored as preprocessing+prediction are fused into a single stage. If not specified, the same batch size will be used for prediction and preprocessing.transform_batch_size
arg toDatasetConfig
which will pipe through toPreprocessor.transform
Pros:
Cons:
New API after the changes:
Addition of
batch_size
arg toPreprocessor.transform
:Addition of
preprocess_batch_size
arg toBatchPredictor
:Addition of
transform_batch_size
toDatasetConfig
:Option 2: Tie batch size to the Preprocessor instance
Have
batch_size
tied to the preprocessor via an arg passed toPreprocessor.__init__
. This is already being done forBatchMapper
: #29193, but we would need to extend this to all Preprocessors.Pros:
Cons:
transform_batch_size
doesn't make sense for the online serving case. There's batching at the Ray Serve level that's independent of this.New API after the changes:
No longer make
Preprocessor
an abstract class. Add an__init__
method that takes in atransform_batch_size
toPreprocessor.__init__
.The text was updated successfully, but these errors were encountered: