You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is this a new feature, an improvement, or a change to existing functionality?
Change
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
I want Morpheus to continually run and monitor a directory for new files, performance inference on those files, save the output, and then repeat the process when new files are detected. The functionality exists already in morpheus/stages/input/autoencoder_source_stage.py but not in examples/digital_fingerprinting/production/morpheus/dfp/stages/multi_file_source.py.
Describe your ideal solution
A new command line argument in examples/digital_fingerprinting/production/morpheus/dfp_*_pipeline.py to indicate the input_file or input_glob should be continually monitored, e.g.:
@click.option('--watch_directory',
type=bool,
default=False,
help=("The watch directory option instructs this stage to not close down once all files have been read. "
"Instead it will read all files that match the 'input_glob' pattern, and then continue to watch "
"the directory for additional files. Any new files that are added that match the glob will then "
"be processed."))
Describe any alternatives you have considered
It's possible to write wrapper shells scripts to launch new instances of the pipeline when new files are detected, but this is not efficient.
Additional context
The code exists both in Morpheus input stages (appshield_source_stage.py, azure_source_stage.py, cloud_trail_source_stage.py, autoencoder_source_stage.py, and duo_source_stage.py) as well as the ransomware_detection example, but not in the DFP Production example.
Code of Conduct
I agree to follow this project's Code of Conduct
I have searched the open feature requests and have found no duplicates for this feature request
The text was updated successfully, but these errors were encountered:
Thanks for submitting this issue - our team has been notified and we'll get back to you as soon as we can!
In the mean time, feel free to add any relevant information to this issue.
* Adds two new constructor args to `MultiFileSource`: `watch` and `watch_interval`, when `watch` is True the source will poll the input file globs every `watch_interval` seconds for new files.
* These are exposed as `--watch_inputs` and `--watch_interval` on the command line
* Misc updates to fix linting warnings
I spent some time looking into what the impacts of this change would be on the rest of the pipeline:
* Files shouldn't appear in the source directory until they're fully populated, otherwise the pipeline will ingest a partially populated file.
* `DFPFileBatcherStage`: Assuming that the watch_interval is smaller than the `period` argument to `DFPFileBatcherStage`, and assuming that new files are actually new (not historical files recently fetched) then this will cause all most new files to likely be batched together, unless they straddle the period boundary. This should be OK and is likely the desired outcome.
* `DFPRollingWindowStage`: This should be OK, the stage appends incoming data to the existing history for the user. However, there is a potential issue if new files are older than existing files already ingested. This could potentially happen if the files appearing in the directory could be populated from an outside source not in creation order.
fixes#975
Authors:
- David Gardner (https://github.com/dagardner-nv)
Approvers:
- Michael Demoret (https://github.com/mdemoret-nv)
URL: #978
Is this a new feature, an improvement, or a change to existing functionality?
Change
How would you describe the priority of this feature request
Medium
Please provide a clear description of problem this feature solves
I want Morpheus to continually run and monitor a directory for new files, performance inference on those files, save the output, and then repeat the process when new files are detected. The functionality exists already in
morpheus/stages/input/autoencoder_source_stage.py
but not inexamples/digital_fingerprinting/production/morpheus/dfp/stages/multi_file_source.py
.Describe your ideal solution
A new command line argument in
examples/digital_fingerprinting/production/morpheus/dfp_*_pipeline.py
to indicate theinput_file
orinput_glob
should be continually monitored, e.g.:Describe any alternatives you have considered
It's possible to write wrapper shells scripts to launch new instances of the pipeline when new files are detected, but this is not efficient.
Additional context
The code exists both in Morpheus input stages (appshield_source_stage.py, azure_source_stage.py, cloud_trail_source_stage.py, autoencoder_source_stage.py, and duo_source_stage.py) as well as the ransomware_detection example, but not in the DFP Production example.
Code of Conduct
The text was updated successfully, but these errors were encountered: