-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rasterize_features uses too much memory in xcube 0.10.2 #666
Comments
Test script (zipped because GitHub doesn’t support uploading Python scripts). Also at https://gist.github.com/pont-us/533556e95363c25daf0c4b716b0660f0 |
The logic in |
I fear that you're right. The results above were obtained with
I'll experiment with older versions, and with substituting |
Results of experiments so far, now using the current ( e7e9709 ) head of the master branch:
So apparently the bug, wherever it is, is not a new one. |
Ok, let me try avoiding the use of xarray in the process. I have the feeling that the problem may be caused by xarray accidently keeping references to the temporary dask arrays. This may haben because xarray thinks, they form a graph or so. |
The current dask-based implementation generates a node of a dask graph for each polygon. The input of this node is the respective node of the previous polygon. The resulting graph "paints" all polygons sequentially into the feature variable. That's logically correct, but the implementation is conceivable unfavorable. This is because the intermediate arrays cannot release the intermediate arrays. This is not a memory leak, but instead simply a big, heavy graph that needs to remember all intermediate results. We may reimplement the algorithm using |
Describe the bug
In xcube 0.10.2, the memory usage of
rasterize_features
increases rapidly with the number of polygons. In the attached test case, 10 polygons require around 2.6 GB, 200 polygons require over 10 GB, and for over ≥300 features, my local machine (13.6 GiB RAM + 16.8 GiB swap) runs out of memory. By comparison, with version 0.9.2 of xcube, RAM usage is constant at around 6.8 GB for up to 500 polygons.To Reproduce
Steps to reproduce the behavior:
Download the attached test script and run with xcube 0.10.2. It creates an xarray DataSet and a geopandas geodataframe and calls
rasterize_features
on them. The script takes a number of polygons for the geodataframe as a command-line parameter. Run it with e.g. 10, 200, and 500, measure maximum memory usage during execution, and compare with results for xcube 0.9.2.Expected behavior
Memory usage should not increase without limit when more polygons are rasterized (as was the case for xcube 0.9.2).
Screenshots
![Figure_1](https://user-images.githubusercontent.com/9010180/163425459-897469c7-30fb-4782-af13-99b1ed0c7883.png)
Memory usage of the test script with different xcube versions and numbers of polygons; y-axis in tens of GB. Testing with 300 or more polygons under xcube 0.10.2 was not possible due to memory exhaustion.
Additional context
The problem appeared during testing of an AVL use case when rasterizing LPIS features into a cube of Sentinel-2 data. The test script was originally derived from this use case (and retains the original dimensions and resolution of the data cube) but has been simplified to a bare minimum.
The relevant changes in the xcube code were introduced in PR #594.
The text was updated successfully, but these errors were encountered: