You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have been doing further testing of the S3 backed asset store and we recently tested with an internal asset that was 60MB zipped. Unzipped the asset had the following composition:
It should be sufficient to create a mock asset with roughly the same characteristics and get a job to start up with this asset. The execution controller should then OOM when run in k8s using the default memory limit of 512MB. We tested increasing the memory limit (to 6GB) and the execution controller did not OOM.
Here is some of the log output:
[2024-04-17T22:41:06.009Z] DEBUG: teraslice/18 on ts-exc-datagen-100m-noop-tmp1-3d18bacc-9e6d-7fsdc: getting record with id: 46f47558baf4e3f0e8f736ad5c91827a53cc4b4b from s3 minio_test1 connection, ts-assets-teraslice-tmp1 bucket. (assignment=execution_controller, module=assets_storage, worker_id=97W8Ruer, ex_id=77c23ff5-409e-4b45-a264-c089fd90b3e1, job_id=3d18bacc-9e6d-4651-96bf-5fffec667073)
[2024-04-17T22:41:06.533Z] INFO: teraslice/18 on ts-exc-datagen-100m-noop-tmp1-3d18bacc-9e6d-7fsdc: loading assets: a5b3d9e48bce3b5f997ba7c21cb3d47945e231a2 (assignment=execution_controller, module=asset_loader, worker_id=97W8Ruer, ex_id=77c23ff5-409e-4b45-a264-c089fd90b3e1, job_id=3d18bacc-9e6d-4651-96bf-5fffec667073)
[2024-04-17T22:41:06.808Z] INFO: teraslice/18 on ts-exc-datagen-100m-noop-tmp1-3d18bacc-9e6d-7fsdc: decompressing and saving asset a5b3d9e48bce3b5f997ba7c21cb3d47945e231a2 to /app/assets/a5b3d9e48bce3b5f997ba7c21cb3d47945e231a2 (assignment=execution_controller, module=asset_loader, worker_id=97W8Ruer, ex_id=77c23ff5-409e-4b45-a264-c089fd90b3e1, job_id=3d18bacc-9e6d-4651-96bf-5fffec667073)
[2024-04-17T22:41:10.938Z] ERROR: teraslice/7 on ts-exc-datagen-100m-noop-tmp1-3d18bacc-9e6d-7fsdc: Teraslice Worker shutting down due to failure! (assignment=execution_controller)
Error: Failure to get assets, caused by exit code null
at ChildProcess.<anonymous> (file:///app/source/packages/teraslice/dist/src/lib/workers/assets/spawn.js:45:31)
at ChildProcess.emit (node:events:517:28)
at maybeClose (node:internal/child_process:1098:16)
at ChildProcess._handle.onexit (node:internal/child_process:303:5)
If necessary, I can supply the internal asset separately.
This PR makes the following changes:
- Improves the s3 backend get() requests to grab assets in a more memory
efficient way
- This resolves and issue where pulling and decompressing large assets
from s3 would cause and OOM on the execution controller on job startup
- Add error message when asset loader would close with an error that
advises what to do in the case of an OOM issue
Ref to issue #3595
We have been doing further testing of the S3 backed asset store and we recently tested with an internal asset that was 60MB zipped. Unzipped the asset had the following composition:
It should be sufficient to create a mock asset with roughly the same characteristics and get a job to start up with this asset. The execution controller should then OOM when run in k8s using the default memory limit of
512MB
. We tested increasing the memory limit (to 6GB) and the execution controller did not OOM.Here is some of the log output:
If necessary, I can supply the internal asset separately.
cc @busma13
The text was updated successfully, but these errors were encountered: