[Issue #31] (Team4) Keyword Operator #85

prakul · 2016-05-02T04:22:22Z

I have implemented the keyword operator , which returns tuples found for given query. This includes info about spans. Main changes are in KeywordPredicate.java and KeywordMatcher.java

Merge branch 'team4-indexsourceoperator' of https://github.com/TextDB/textdb into team4-indexsourceoperator

…/textdb into team4-indexsourceoperator

prakul · 2016-05-02T04:28:00Z

@sandeepreddy602 please take a look.

chenlica · 2016-05-02T05:05:04Z

textdb/textdb-dataflow/src/main/java/edu/uci/ics/textdb/dataflow/common/KeywordPredicate.java

+import edu.uci.ics.textdb.api.common.Attribute;
+import org.apache.lucene.analysis.Analyzer;
+import org.apache.lucene.analysis.TokenStream;
+import org.apache.lucene.analysis.standard.StandardAnalyzer;


Remove unused packages. Hopefully you can see them easily in your IDE.

thanks Professor @chenlica . Will keep in mind.

I still see unused packages on the "import" list.

chenlica · 2016-05-02T05:27:08Z

@prakul : I had a review. Here are high-level comments:

Make the PR title more informative;
Describe the main changes in the PR;
Remove unused packages;
Use TestUtils functions in the test cases. Check DictionaryMatcher test cases as examples.
Write more test cases.

prakul · 2016-05-02T05:38:39Z

@chenlica I am writing more test cases for the same. Thanks for your review. I'll be taking care of other comments as well.

chenlica · 2016-05-02T15:46:39Z

@prakul and @akshaybetala : is this PR ready for another round of review?

chenlica · 2016-05-02T17:57:12Z

@prakul and @akshaybetala : your PR is blocking team1. Please finish it ASAP. If needed, we can ask team1 to finish the phrase operator, which is also blocking...

2. Modified DictionaryMatcher and KeyWordMatcher to use the methods in Utils class.

chenlica · 2016-05-05T05:19:33Z

...b/textdb-dataflow/src/main/java/edu/uci/ics/textdb/dataflow/keywordmatch/KeywordMatcher.java

+                    //Each element of Array of keywords is matched in tokenized TextField Value
+                    for(int iter = 0; iter < queryTokens.size(); iter++) {
+                        positionIndex = 0;
+                        String query = queryTokens.get(iter);


"query" -> "queryToken"?

chenlica · 2016-05-05T05:35:03Z

@sandeepreddy602 and @prakul : here's a general comment. I saw quite some similarity between this keyword operator (scan based) and the dictionary match operator (scan based). After we finish this PR, we need to think about the similarity and see whether we can refactor the code.

Eventually we want an index-based dictionary matcher operator to be implemented on top of an index-based keyword/phrase operator.

@sandeepreddy602 : please remind me to discuss this issue after merging this PR.

chenlica · 2016-05-05T06:28:52Z

@prakul : To "unblock" you, I suggest you modify the code to implement the "AND" logic. Can you? Thank you.

sandeepreddy602 · 2016-05-05T06:29:02Z

...b/textdb-dataflow/src/main/java/edu/uci/ics/textdb/dataflow/keywordmatch/KeywordMatcher.java

+    private List<Span> spanList;
+    private String query;
+    private List<Attribute> attributeList;
+    private ArrayList<String> queryTokens;


ArrayList --> List
ArrayList --> List

sandeepreddy602 · 2016-05-05T06:34:15Z

@prakul .. Added a couple of small comments. Please merge with the master after taking care of them.

chenlica · 2016-05-05T07:15:16Z

...b/textdb-dataflow/src/main/java/edu/uci/ics/textdb/dataflow/keywordmatch/KeywordMatcher.java

+            fieldList = sourceTuple.getFields();
+            spanList.clear();
+            if(!spanSchemaDefined){
+                spanSchemaDefined = true;


As a final minor comment, technically we should move "spanSchemaDefined = true" AFTER "spanSchema = Utils.createSpanSchema(schema)".

…me code

prakul · 2016-05-05T07:46:51Z

I have added AND logic. But there are some cases regarding how query tokens should be searched for across the fields, which need to be discussed. I can revert back to OR logic as per your last email and we can refactor the code in future.

I'll merge after the doubt regarding "queryTokens.get(iter)" to "queryToken" is resolved.

chenlica · 2016-05-05T15:19:19Z

...b/textdb-dataflow/src/main/java/edu/uci/ics/textdb/dataflow/keywordmatch/KeywordMatcher.java

+
+        List<IField> fieldList;
+        boolean foundFlag = false;
+        int positionIndex = 0; // Next position in the field to be checked.


I believe we can move "int positionIndex" and "spanStartPosition" into the for(int attributeIndex = 0;... loop.

chenlica · 2016-05-05T15:30:11Z

@prakul I gave you a few more comments, and left a major one about the "AND" logic across different fields.

Since you already tried to implement the "AND" logic, we are getting very close to finishing it (instead of going back to the OR logic). Can we do one more round of revision to finish this PR? Thank you.

prakul · 2016-05-05T19:47:22Z

Sure, I'll make suggested changes soon.

prakul · 2016-05-06T03:48:34Z

@chenlica I have modified AND logic as suggested and all added corresponding Test Case as well.

chenlica · 2016-05-06T04:17:59Z

@prakul : the AND logic implementation looks correct to me.

I left a minor comment about a confusing comment. After taking care of it, please go ahead to do the merge.

Thank you!

prakul · 2016-05-06T04:38:49Z

fixed the comment , and merged

chenlica · 2016-05-06T04:56:42Z

Thanks @prakul

@laisycs Please merge the master into your branch to finish your PR #84 .

chenlica · 2016-05-06T04:58:50Z

@prakul please remove your unused branch from the git repo.

prakul added 9 commits April 23, 2016 22:29

merge from master changes

29bef31

merging with refactored code for dataReader dataWriter

b6affd2

merge

5e64dce

Merge branch 'team4-indexsourceoperator' of https://github.com/TextDB/textdb into team4-indexsourceoperator

Merge branch 'team4-indexsourceoperator' of https://github.com/TextDB…

ce33cc3

…/textdb into team4-indexsourceoperator

KeywordMatcher operator

00f1b4d

minor code cleanup

4487b20

refactored code to eliminate recalculation of schema

31f12ed

removed comments

c103be7

merged with master

f5f58f6

chenlica reviewed May 2, 2016
View reviewed changes

prakul changed the title ~~Team4 indexsourceoperator~~ Team4 KeywordOperator May 2, 2016

chenlica changed the title ~~Team4 KeywordOperator~~ [Issue #31] (Team4) Keyword Operator May 2, 2016

sandeepreddy602 added 2 commits May 2, 2016 12:26

Merge branch 'master' into team4-indexsourceoperator

712bcdd

Issue #31: Moved span related methods to Utils class

2d8afe6

2. Modified DictionaryMatcher and KeyWordMatcher to use the methods in Utils class.

chenlica reviewed May 5, 2016
View reviewed changes

Changed some variable names for clarity

43d9604

sandeepreddy602 reviewed May 5, 2016
View reviewed changes

chenlica reviewed May 5, 2016
View reviewed changes

prakul added 2 commits May 5, 2016 00:19

Added AND logic, added a test case corresponding to it. Refactored so…

a782b1f

…me code

shifted position of spanSchemaDefined

ea5d6c0

chenlica reviewed May 5, 2016
View reviewed changes

prakul added 2 commits May 5, 2016 18:59

Moved variables around

11bff36

Modified AND logic to search for all query tokens in a field

c0e4ba1

Fixed confusing comment in Utils

365285e

prakul merged commit 141dcf0 into master May 6, 2016

prakul deleted the team4-indexsourceoperator branch May 6, 2016 05:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue #31] (Team4) Keyword Operator #85

[Issue #31] (Team4) Keyword Operator #85

prakul commented May 2, 2016 •

edited

Loading

prakul commented May 2, 2016

chenlica May 2, 2016

prakul May 2, 2016

chenlica May 2, 2016

chenlica commented May 2, 2016

prakul commented May 2, 2016

chenlica commented May 2, 2016

chenlica commented May 2, 2016

chenlica May 5, 2016

chenlica commented May 5, 2016 •

edited

Loading

chenlica commented May 5, 2016

sandeepreddy602 May 5, 2016 •

edited

Loading

sandeepreddy602 commented May 5, 2016

chenlica May 5, 2016

prakul commented May 5, 2016

chenlica May 5, 2016

chenlica commented May 5, 2016

prakul commented May 5, 2016

prakul commented May 6, 2016

chenlica commented May 6, 2016

prakul commented May 6, 2016

chenlica commented May 6, 2016

chenlica commented May 6, 2016

[Issue #31] (Team4) Keyword Operator #85

[Issue #31] (Team4) Keyword Operator #85

Conversation

prakul commented May 2, 2016 • edited Loading

prakul commented May 2, 2016

chenlica May 2, 2016

Choose a reason for hiding this comment

prakul May 2, 2016

Choose a reason for hiding this comment

chenlica May 2, 2016

Choose a reason for hiding this comment

chenlica commented May 2, 2016

prakul commented May 2, 2016

chenlica commented May 2, 2016

chenlica commented May 2, 2016

chenlica May 5, 2016

Choose a reason for hiding this comment

chenlica commented May 5, 2016 • edited Loading

chenlica commented May 5, 2016

sandeepreddy602 May 5, 2016 • edited Loading

Choose a reason for hiding this comment

sandeepreddy602 commented May 5, 2016

chenlica May 5, 2016

Choose a reason for hiding this comment

prakul commented May 5, 2016

chenlica May 5, 2016

Choose a reason for hiding this comment

chenlica commented May 5, 2016

prakul commented May 5, 2016

prakul commented May 6, 2016

chenlica commented May 6, 2016

prakul commented May 6, 2016

chenlica commented May 6, 2016

chenlica commented May 6, 2016

prakul commented May 2, 2016 •

edited

Loading

chenlica commented May 5, 2016 •

edited

Loading

sandeepreddy602 May 5, 2016 •

edited

Loading