Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #31] (Team4) Keyword Operator #85

Merged
merged 34 commits into from
May 6, 2016
Merged

Conversation

prakul
Copy link
Contributor

@prakul prakul commented May 2, 2016

I have implemented the keyword operator , which returns tuples found for given query. This includes info about spans. Main changes are in KeywordPredicate.java and KeywordMatcher.java

@prakul
Copy link
Contributor Author

prakul commented May 2, 2016

@sandeepreddy602 please take a look.

import edu.uci.ics.textdb.api.common.Attribute;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove unused packages. Hopefully you can see them easily in your IDE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks Professor @chenlica . Will keep in mind.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still see unused packages on the "import" list.

@chenlica
Copy link
Collaborator

chenlica commented May 2, 2016

@prakul : I had a review. Here are high-level comments:

  • Make the PR title more informative;
  • Describe the main changes in the PR;
  • Remove unused packages;
  • Use TestUtils functions in the test cases. Check DictionaryMatcher test cases as examples.
  • Write more test cases.

@prakul
Copy link
Contributor Author

prakul commented May 2, 2016

@chenlica I am writing more test cases for the same. Thanks for your review. I'll be taking care of other comments as well.

@prakul prakul changed the title Team4 indexsourceoperator Team4 KeywordOperator May 2, 2016
@chenlica
Copy link
Collaborator

chenlica commented May 2, 2016

@prakul and @akshaybetala : is this PR ready for another round of review?

@chenlica
Copy link
Collaborator

chenlica commented May 2, 2016

@prakul and @akshaybetala : your PR is blocking team1. Please finish it ASAP. If needed, we can ask team1 to finish the phrase operator, which is also blocking...

@chenlica chenlica changed the title Team4 KeywordOperator [Issue #31] (Team4) Keyword Operator May 2, 2016
2. Modified DictionaryMatcher and KeyWordMatcher to use the methods in
Utils class.
//Each element of Array of keywords is matched in tokenized TextField Value
for(int iter = 0; iter < queryTokens.size(); iter++) {
positionIndex = 0;
String query = queryTokens.get(iter);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"query" -> "queryToken"?

@chenlica
Copy link
Collaborator

chenlica commented May 5, 2016

@sandeepreddy602 and @prakul : here's a general comment. I saw quite some similarity between this keyword operator (scan based) and the dictionary match operator (scan based). After we finish this PR, we need to think about the similarity and see whether we can refactor the code.

Eventually we want an index-based dictionary matcher operator to be implemented on top of an index-based keyword/phrase operator.

@sandeepreddy602 : please remind me to discuss this issue after merging this PR.

@chenlica
Copy link
Collaborator

chenlica commented May 5, 2016

@prakul : To "unblock" you, I suggest you modify the code to implement the "AND" logic. Can you? Thank you.

private List<Span> spanList;
private String query;
private List<Attribute> attributeList;
private ArrayList<String> queryTokens;
Copy link
Contributor

@sandeepreddy602 sandeepreddy602 May 5, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayList --> List
ArrayList --> List

@sandeepreddy602
Copy link
Contributor

@prakul .. Added a couple of small comments. Please merge with the master after taking care of them.

fieldList = sourceTuple.getFields();
spanList.clear();
if(!spanSchemaDefined){
spanSchemaDefined = true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a final minor comment, technically we should move "spanSchemaDefined = true" AFTER "spanSchema = Utils.createSpanSchema(schema)".

@prakul
Copy link
Contributor Author

prakul commented May 5, 2016

I have added AND logic. But there are some cases regarding how query tokens should be searched for across the fields, which need to be discussed. I can revert back to OR logic as per your last email and we can refactor the code in future.

I'll merge after the doubt regarding "queryTokens.get(iter)" to "queryToken" is resolved.


List<IField> fieldList;
boolean foundFlag = false;
int positionIndex = 0; // Next position in the field to be checked.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can move "int positionIndex" and "spanStartPosition" into the for(int attributeIndex = 0;... loop.

@chenlica
Copy link
Collaborator

chenlica commented May 5, 2016

@prakul I gave you a few more comments, and left a major one about the "AND" logic across different fields.

Since you already tried to implement the "AND" logic, we are getting very close to finishing it (instead of going back to the OR logic). Can we do one more round of revision to finish this PR? Thank you.

@prakul
Copy link
Contributor Author

prakul commented May 5, 2016

Sure, I'll make suggested changes soon.

@prakul
Copy link
Contributor Author

prakul commented May 6, 2016

@chenlica I have modified AND logic as suggested and all added corresponding Test Case as well.

@chenlica
Copy link
Collaborator

chenlica commented May 6, 2016

@prakul : the AND logic implementation looks correct to me.

I left a minor comment about a confusing comment. After taking care of it, please go ahead to do the merge.

Thank you!

@prakul prakul merged commit 141dcf0 into master May 6, 2016
@prakul
Copy link
Contributor Author

prakul commented May 6, 2016

fixed the comment , and merged

@chenlica
Copy link
Collaborator

chenlica commented May 6, 2016

Thanks @prakul

@laisycs Please merge the master into your branch to finish your PR #84 .

@chenlica
Copy link
Collaborator

chenlica commented May 6, 2016

@prakul please remove your unused branch from the git repo.

@prakul prakul deleted the team4-indexsourceoperator branch May 6, 2016 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants