-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
44 translate cropno crop inference udf to gfmap #48
44 translate cropno crop inference udf to gfmap #48
Conversation
…rely on remote mvp_wc_presto for fast debugging
…/worldcereal-classification into 44-translate-cropno-crop-inference-udf-to-gfmap
…eal-classification into kvt_mvp_inferenceUDF Conflicts: minimal_wc_presto/backend_inference_example_openeo.ipynb minimal_wc_presto/mvp_wc_presto/world_cereal_inference.py minimal_wc_presto/udf_long_worldcereal_inference.py
Fixed long UDF
…rence-udf-to-gfmap
…ps://github.com/WorldCereal/worldcereal-classification into 44-translate-cropno-crop-inference-udf-to-gfmap Conflicts: minimal_wc_presto/test_cropland_gfmap.py scripts/inference/cropland_mapping.py src/worldcereal/openeo/feature_extractor.py src/worldcereal/openeo/inference.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GriffinBabe provided quite a few comments for you to consider. Let's discuss wherever needed.
@GriffinBabe I pushed a new Presto wheel to Artifactory and added what is supposed to become the new way of using dependency wheel in the UDF in my commit. Once this works, we may even not need the dependency zip in this UDF anymore (to be verified). Good luck testing the new functionality. Let me know if you need anything from my side. |
I am also rehashing over the UDF trying to load the required libraries from within the class initiation. I already managed to pull the catboost predictor definition out of the UDF_long. I will try to do the same for the Presto feature extractor. If it works we might not to upload mvp_world_cereal inference anywhere |
This may be parallel work with what I’m doing Hans, as I discussed with Darius and migrated all that functionality to presto-WorldCereal library where it belongs. We want to test the new functionality defining an external whl on the top of the UDF. |
Did PrestoFeatureExtractor also end up in the whl? It is indeed not required! |
Presto-related stuff now went here: https://github.com/WorldCereal/presto-worldcereal/blob/openeo_inference/presto/inference.py |
…ps://github.com/WorldCereal/worldcereal-classification into 44-translate-cropno-crop-inference-udf-to-gfmap
@kvantricht @HansVRP can you review again? |
def extract(self, image): | ||
pass | ||
|
||
ONNX_DEPS_URL = "https://artifactory.vgt.vito.be/artifactory/auxdata-public/openeo/onnx_dependencies_1.16.3.zip" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this still needed? onnxruntime
is included in the worldcereal_deps.zip
. And if it's still needed, shouldn't we also put it on s3?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's much lighter than all the presto dependency so it's a waste of resources to install everything just for onnxruntime
. Also it's a requirement in GFMAP for the ModelInference subclasses, until the pip dependency feature is more stable.
Indeed, could you upload it to the S3 @HansVRP ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that for each single UDF in the chain those dependencies get downloaded/unpacked again? That's also lots of overhead I guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it's just onnxruntime that get's reinstalled (quite light), compared to the full UDF I don't think there is a large difference in execution time, even less with large spatial extents probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If cached, they should not be reloaded each time.
out_format="NetCDF", | ||
job_options={ | ||
"driver-memory": "4g", | ||
"executor-memoryOverhead": "12g", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's very high. Need to flag this for further profiling. But for now we can leave it like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I have seen it is a potential overkill, 200 km square ran on half the amount of memory for me. @GriffinBabe do you have a job id for a job which needed more memory?
BASE_URL = "https://s3.waw3-1.cloudferro.com/swift/v1/project_dependencies" # NOQA | ||
DEPENDENCY_NAME = "worldcereal_deps.zip" | ||
|
||
GFMAP_BAND_MAPPING = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in fact a bit an overhead, if later on in presto-worldcereal
we again remap: https://github.com/WorldCereal/presto-worldcereal/blob/main/presto/inference.py#L41
So we should consider to immediately use the presto naming here and remove the remapping in presto-worldcereal
FYI: The branch
hv_mvp_inferenceUDF
is already merged hereAbout the folder
minimal_wc_presto
: I deleted all the duplicates UDFs. I suppose that want to keep the dependency code somewhere, but should we leave this here?I added the Feature Extractor and Model Inference classes to the module's source, as well as updated a script to perform cropland mapping from coordinates provided in the command lines.
The UDFs are now using all the exporter dependencies packed by @HansVRP and are using GFMAP for fetching the data from the collections, resulting in less (and hopefully for you, more readable) code.
I also took the liberty in deleting an old Feature Extractor UDF (I suppose we don't need it anymore)