forked from scikit-learn/scikit-learn
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Avoid ValueError in parallel computing of large arrays
This PR introduces the optional *max_nbytes* parameter on *OneVsRestClassifier", OneVsOneClassifier" and OutputCodeClassifier" multiclass learning algorithms within *multiclass.py*. Such parameter is in addition to the already existing *n_jobs* one and might be useful when dealing with a large training set processed by concurrently running jobs defined by *n_jobs* > 0 or by *n_jobs* = -1 (meaning that the number of jobs is set to the number of CPU cores). In this case, [Parallel](https://joblib.readthedocs.io/en/latest/parallel.html#parallel-reference-documentation) is called with the default "loky" backend, that [implements multi-processing](https://joblib.readthedocs.io/en/latest/parallel.html#thread-based-parallelism-vs-process-based-parallelism); *Parallel* also sets a default 1-megabyte [threshold](https://joblib.readthedocs.io/en/latest/parallel.html#automated-array-to-memmap-conversion) on the size of arrays passed to the workers. Such parameter may not be enough for large arrays and could break the job with exception **ValueError: UPDATEIFCOPY base is read-only**. *Parallel* uses *max_nbytes* to control this threshold. Through this fix, the multiclass classifiers will offer the optional possibility to customize the max size of arrays. Fixes scikit-learn#6614 Expected to also fix scikit-learn#4597
- Loading branch information
Showing
1 changed file
with
38 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters