-
-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PRE REVIEW]: Unsupervised learning approach towards anomaly detection in compat logs with ADE #2972
Comments
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post. For a list of things I can do to help you, just type:
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
|
PDF failed to compile for issue #2972 with the following error: Can't find any papers to compile :-( |
|
Hi @ayush-1506 — is there a paper associated with your submission? |
@kthyng Yes, the code and paper live inside a different branch of the repository. Can we get whedon to use this branch instead of master? Else I'll discuss with my collaborators to merge this into master as soon as possible. |
@whedon generate pdf from branch logs |
|
@ayush-1506 Yes it is fine to have the paper in another branch. Please look through the paper requirements to be sure you've covered them all. For one thing, we require a section entitled "Statement of Need". |
@ayush-1506 This looks like interesting work, but can you make a compelling argument for why it is research software in particular? You can read more about that requirement here. I'm going to label this with a scope query to get the editorial board's input on this, which should take 1-2 weeks. |
@whedon scope query |
I'm sorry human, I don't understand that. You can see what commands I support by typing:
|
@whedon query scope |
Submission flagged for editorial review. |
@kthyng Thanks for the input. I'll add a Statement of Need section (which will support the fact that this software and approach solves a problem). Do I need to add the argument behind this being a research software in the paper or a comment here will suffice? |
Here is the specific seciton on what your paper should contain: https://joss.readthedocs.io/en/latest/submitting.html#what-should-my-paper-contain Your statement of need should describe the research purpose of the software, but summarizing that or expanding on it here would also be helpful as the editors look through your submission to learn about it. |
@kthyng Just realised that the Motivation section should probably be renamed to Statement of Need. |
Sure, I'll make required edits to the paper and expand the same here. |
Made changes to the paper, summarizing the same here: Objective:The aim of the project is to solve the problem of efficiently detecting anomalous logs slices from large set of logs (This can include sparse logs such as Linux Syslogs RFC3164/RFC5424 format or very dense logs such as those generated from Spark jobs). This is a common occurrence in large system or a development cluster where system crash or unexpected behavior can have adverse effects. We introduce a novel approach towards solving this problem with a data science/statistical approach. Expanding on the approach later in this comment. Need:Why do we need to find anomalous log slices? Debugging system failures is a cumbersome task. Upheaval behavior in the system can be identified by studying the logs generated while the system was running. If the system fails or reacts with unexpected behavior, this data is logged somewhere. However, going through hours of dense logs is a challenge: sysadmins typically need to race against time to study large amounts of log messages to decipher the root cause of the issue. Such system failures are very common and at times unavoidable. Over the years these have led to huge loss of time and resources. Relevant work and our approach:While there has been work towards this direction of anomaly detection in large logs, such as TadGAN (https://arxiv.org/abs/2009.07769) and semi-supervised adversarial learning with GANs (https://doi.org/10.1109/ciss.2019.8693024), most of these approaches have focused on using large deep learning models and some treat this as a supervised problem. These models are large to train and also comparatively slower. On the other hand, we treat the problem as a statistical one and use unsupervised learning techniques for fast and robust detection of anomalous slices. Being an unsupervised approach, we don't need labelled features. Avoiding computationally heavy deep learning makes our system fast and it's written in the Java language which makes it ideal for enterprise IT use cases (which can be adapted to others too). To this end, we divide the problem into 3 main sub-categories:
Along with this, for each message, we try to classify it into four categories based on the frequency of the particular family of messages. These classes include:
Using all this calculated information, we allocate an anomaly score to every internal slice. The higher the anomaly score, the greater are the chances of that particular slice being the source of anomalous logs. Output format:Output format: Finally, we write out the analysis output in XML format. An example of the analysis output for a day can be seen here. We also provide specialized output for each interval, which can be accessed by clicking on the XML links associated with each slice. Examples of analysis for a period can be viewed here. Our approach has shown comparatively accurate results when tested on real data, along with fast inference and training. We also provide sample data and instructions to build the binary and run it on the data. Looking at the What we mean by research software section, I believe this falls under the category: Kindly let me know if there are any questions or if I missed something. |
@ayush-1506 Thank you! someone from the editorial board will get back to you after a week or two. |
@ayush-1506 - can you explain what code you are submitting to JOSS in this branch vs the overall repo? The paper seems to describe ADE, which is what the repo contains, but you also have suggested that https://github.com/openmainframeproject/ade/tree/logs is the contribution being submitted here, and I can't tell if the paper describes that specific contribution. |
@danielskatz There were some issues (here : CLA wasn't registering the contributors) with pushing new commits to master branch in the ADE repository, hence all development was being done and reviewed in the logs branch temporarily. However, the issues with CLA have been resolved now and all changes have been merged to master. We can take the master branch as the main branch with paper and code from now on. |
So is all of the content in the main branch the JOSS submission? |
@whedon check repository |
|
@ jonathanschilling are you willing to contribute a review for this JOSS submission? |
@gkthiruvathukal I think jonathanschilling wasn't notified since there's a space between |
@ayush-1506 So sorry! @jonathanschilling are you willing to contribute a review for this JOSS submission? |
@gkthiruvathukal Hi, I'm not sure if jonathanschilling is seeing this, should we proceed into the reviewing stage (I believe arcuri82 has agreed to be the reviewer here) while we wait for him? (Or request another reviewer?) Thanks. |
@ayush-1506 Yes, and please do suggest 2-3 names if you can. People are very busy. We need 2 reviewers to proceed to review. So having your input from our list of reviewers will be extremely helpful. |
@ayush-1506 @gkthiruvathukal Sorry, I am indeed quite busy at the moment. Maybe someone else more involved in this topic can perform the review in this case? |
@gkthiruvathukal Adding some names to {kuangmeng, marcoapintoo} (mentioned above): {mdpiper, markbasham, johnsamuelwrites}. Kindle let me know if you'd like more suggestions. |
@ayush-1506 I will get moving on this shortly. Thanks for your suggestions and patience! |
@kuangmeng Are you willing to contribute a review for this JOSS submission? |
@whedon generate pdf |
Hi, @gkthiruvathukal : I made a first review at openmainframeproject/ade#85 However, as I do not use Linux, I did not run the software (just read documentation, compiled software, run its test cases). We would need the second reviewer to make sure to run the software. In worst case, I can try to install a virtual machine to run Linux on it (but that would be quite a bit of work, so might take me a while...). However, the authors have quite an extensive documentation, with few examples of outputs |
@arcuri82 Are you on Mac or Windows? (I'm assuming one of those two since you're not on Linux.) Still waiting on a second reviewer. I will be sending out another invite shortly if I don't hear back from @kuangmeng. |
And thank you for your early input, @arcuri82! |
@gkthiruvathukal Yes, I can review this submission. |
@mdpiper Thanks for your response! We are always grateful to our reviewers during these challenging times! I will add you and get the review started. |
OK, @mdpiper is now a reviewer |
@whedon start review |
OK, I've started the review over in #3052. |
Submitting author: @ayush-1506 (Ayush Shridhar)
Repository: https://github.com/openmainframeproject/ade.git
Version: v1.0.5
Editor: @gkthiruvathukal
Reviewers: @arcuri82, @mdpiper
Managing EiC: Kristen Thyng
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Author instructions
Thanks for submitting your paper to JOSS @ayush-1506. Currently, there isn't an JOSS editor assigned to your paper.
The author's suggestion for the handling editor is @bmcfee.
@ayush-1506 if you have any suggestions for potential reviewers then please mention them here in this thread (without tagging them with an @). In addition, this list of people have already agreed to review for JOSS and may be suitable for this submission (please start at the bottom of the list).
Editor instructions
The JOSS submission bot @whedon is here to help you find and assign reviewers and start the main review. To find out what @whedon can do for you type:
The text was updated successfully, but these errors were encountered: