-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Issue #31] (Team4) Keyword Operator #85
Conversation
Merge branch 'team4-indexsourceoperator' of https://github.com/TextDB/textdb into team4-indexsourceoperator
…/textdb into team4-indexsourceoperator
@sandeepreddy602 please take a look. |
import edu.uci.ics.textdb.api.common.Attribute; | ||
import org.apache.lucene.analysis.Analyzer; | ||
import org.apache.lucene.analysis.TokenStream; | ||
import org.apache.lucene.analysis.standard.StandardAnalyzer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unused packages. Hopefully you can see them easily in your IDE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks Professor @chenlica . Will keep in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still see unused packages on the "import" list.
@prakul : I had a review. Here are high-level comments:
|
@chenlica I am writing more test cases for the same. Thanks for your review. I'll be taking care of other comments as well. |
@prakul and @akshaybetala : is this PR ready for another round of review? |
@prakul and @akshaybetala : your PR is blocking team1. Please finish it ASAP. If needed, we can ask team1 to finish the phrase operator, which is also blocking... |
2. Modified DictionaryMatcher and KeyWordMatcher to use the methods in Utils class.
//Each element of Array of keywords is matched in tokenized TextField Value | ||
for(int iter = 0; iter < queryTokens.size(); iter++) { | ||
positionIndex = 0; | ||
String query = queryTokens.get(iter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"query" -> "queryToken"?
@sandeepreddy602 and @prakul : here's a general comment. I saw quite some similarity between this keyword operator (scan based) and the dictionary match operator (scan based). After we finish this PR, we need to think about the similarity and see whether we can refactor the code. Eventually we want an index-based dictionary matcher operator to be implemented on top of an index-based keyword/phrase operator. @sandeepreddy602 : please remind me to discuss this issue after merging this PR. |
@prakul : To "unblock" you, I suggest you modify the code to implement the "AND" logic. Can you? Thank you. |
private List<Span> spanList; | ||
private String query; | ||
private List<Attribute> attributeList; | ||
private ArrayList<String> queryTokens; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayList --> List
ArrayList --> List
@prakul .. Added a couple of small comments. Please merge with the master after taking care of them. |
fieldList = sourceTuple.getFields(); | ||
spanList.clear(); | ||
if(!spanSchemaDefined){ | ||
spanSchemaDefined = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a final minor comment, technically we should move "spanSchemaDefined = true" AFTER "spanSchema = Utils.createSpanSchema(schema)".
I have added AND logic. But there are some cases regarding how query tokens should be searched for across the fields, which need to be discussed. I can revert back to OR logic as per your last email and we can refactor the code in future. I'll merge after the doubt regarding "queryTokens.get(iter)" to "queryToken" is resolved. |
|
||
List<IField> fieldList; | ||
boolean foundFlag = false; | ||
int positionIndex = 0; // Next position in the field to be checked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can move "int positionIndex" and "spanStartPosition" into the for(int attributeIndex = 0;...
loop.
@prakul I gave you a few more comments, and left a major one about the "AND" logic across different fields. Since you already tried to implement the "AND" logic, we are getting very close to finishing it (instead of going back to the OR logic). Can we do one more round of revision to finish this PR? Thank you. |
Sure, I'll make suggested changes soon. |
@chenlica I have modified AND logic as suggested and all added corresponding Test Case as well. |
@prakul : the AND logic implementation looks correct to me. I left a minor comment about a confusing comment. After taking care of it, please go ahead to do the merge. Thank you! |
fixed the comment , and merged |
@prakul please remove your unused branch from the git repo. |
I have implemented the keyword operator , which returns tuples found for given query. This includes info about spans. Main changes are in KeywordPredicate.java and KeywordMatcher.java