-
Notifications
You must be signed in to change notification settings - Fork 36
add AD task cache #337
add AD task cache #337
Conversation
Codecov Report
@@ Coverage Diff @@
## master #337 +/- ##
============================================
+ Coverage 75.50% 75.77% +0.27%
- Complexity 2160 2207 +47
============================================
Files 207 209 +2
Lines 10030 10144 +114
Branches 898 902 +4
============================================
+ Hits 7573 7687 +114
Misses 2035 2035
Partials 422 422
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure how this task will be used.
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Outdated
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Outdated
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Outdated
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Outdated
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADBatchTaskCache.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Outdated
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Show resolved
Hide resolved
* @param taskId task id | ||
* @param trained threshold model trained or not | ||
*/ | ||
protected void setThresholdModelTrained(String taskId, boolean trained) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when will you call this method? Threshold model can emit results even if it has only seen 1 rcf score. We use rcf's total updates to measure whether the models are ready or not. Simply put, threshold model is always trained. I wonder why we need a flag to set it trained or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method will be called when historical detector's threshold model finishes training/cold start.
For historical detector, we have all of the data, so we should use these data as much as possible to train and predict better. In realtime detector, we have read some(512) historical sampled data to train model. In historical detector, we use more(1000) data points to train model. For both realtime and historical detectors, the RCF&Threshold model will be continuously trained with following data after cold start. Threshold model will be trained first, then start to output reliable results. So we need to know whether the threshold model is trained or not.
From ML team's suggestion that we should use enough data to train RCF model and they suggest 1000 should be good enough. We may tune this value after performance test and more data quality checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. 1000 is a trade-off between accuracy and usability. In real time, 128 is the threshold to emit results.
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Outdated
Show resolved
Hide resolved
src/main/java/com/amazon/opendistroforelasticsearch/ad/task/ADTaskCacheManager.java
Outdated
Show resolved
Hide resolved
} | ||
checkRunningTaskLimit(); | ||
long neededCacheSize = calculateADTaskCacheSize(adTask); | ||
if (!memoryTracker.canAllocate(neededCacheSize)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you missed my comment
if (!memoryTracker.canAllocate(neededCacheSize)) { | ||
throw new LimitExceededException("No enough memory to run detector"); | ||
} | ||
memoryTracker.consumeMemory(neededCacheSize, false, HISTORICAL_SINGLE_ENTITY_DETECTOR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you missed my comment
* {@link java.lang.IllegalArgumentException} | ||
* We throw exception rather than return {@code Optional.empty} or null | ||
* here, so don't need to check task existence by writing duplicate null | ||
* checking code. All AD task exceptions will be handled in AD task manager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still suggest we remove the exceptions and use Optional or null. In Java, the convention is to always check conditions whenever possible and not to use exceptions for flow control. Conditional check is a jump in the byte code while the exception handling is much more complex. When an exception occurs inside a Java method, the method creates an Exception object and passes the Exception object to the JVM (in Java term, the method "throw" an Exception). The Exception object contains the type of the exception, and the state of the program when the exception occurs. The JVM is responsible for finding an exception handler to process the Exception object. It searches backward through the call stack until it finds a matching exception handler for that particular class of Exception object (in Java term, it is called "catch" the Exception). If the JVM cannot find a matching exception handler in all the methods in the call stack, it terminates the program.
If you don't want to do it, I am fine as well. Just a note this is not a good practice.
Ref: https://stackoverflow.com/questions/8161042/why-use-an-exception-instead-of-if-else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally agree to not use exception for flow control. Here the exception is not to control the flow, but rather terminate the task run flow, just like any AD result action to terminate detector job run by throwing exceptions. In next PR I may change the method name to reduce the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it. Please change the method name or add comment to make it clearer.
* @param taskId task id | ||
* @param trained threshold model trained or not | ||
*/ | ||
protected void setThresholdModelTrained(String taskId, boolean trained) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. 1000 is a trade-off between accuracy and usability. In real time, 128 is the threshold to emit results.
Issue #, if available:
Description of changes:
Add AD task cache. We will put RCF&threshold model, shingle data, threshold model training data in cache.
./gradlew build
./gradlew integTest -PnumNodes=3
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.