Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue #31] (Team4) Keyword Operator #85

Merged
merged 34 commits into from
May 6, 2016
Merged
Show file tree
Hide file tree
Changes from 29 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
29bef31
merge from master changes
prakul Apr 24, 2016
b6affd2
merging with refactored code for dataReader dataWriter
prakul Apr 24, 2016
5e64dce
merge
prakul Apr 24, 2016
ce33cc3
Merge branch 'team4-indexsourceoperator' of https://github.com/TextDB…
prakul Apr 24, 2016
00f1b4d
KeywordMatcher operator
prakul May 1, 2016
4487b20
minor code cleanup
prakul May 1, 2016
31f12ed
refactored code to eliminate recalculation of schema
prakul May 2, 2016
c103be7
removed comments
prakul May 2, 2016
f5f58f6
merged with master
prakul May 2, 2016
712bcdd
Merge branch 'master' into team4-indexsourceoperator
sandeepreddy602 May 2, 2016
2d8afe6
Issue #31: Moved span related methods to Utils class
sandeepreddy602 May 2, 2016
18b19b6
query building logic for indexsurceoperator
prakul May 4, 2016
7e3619f
Merge branch 'team4-indexsourceoperator' of https://github.com/TextDB…
prakul May 4, 2016
17b42d6
Fixed Bugs, Added Comments
prakul May 4, 2016
5f1f044
Removed unused imports and variables. Added more comments
prakul May 4, 2016
ed297d7
Merging from master
prakul May 4, 2016
eb74d2f
refactored the code for clarity
prakul May 5, 2016
8fb8d92
removed redundant getQueryResult() method
prakul May 5, 2016
db4a530
Moved tokenizeQuery() to Utils
prakul May 5, 2016
774f5da
Refactored Code. Added Comments
prakul May 5, 2016
b4ec071
renamed getQueryResults to getPeopleQueryResults
prakul May 5, 2016
7671fbf
fixed spacing issues
prakul May 5, 2016
05315cb
changed variable names
prakul May 5, 2016
3b2927c
moved variables to more local scope
prakul May 5, 2016
8ddf519
refactored code for ease of read. Removed some redundant code
prakul May 5, 2016
5ba14f6
refactored some variables
prakul May 5, 2016
1357ac2
refactored variables
prakul May 5, 2016
9bdf5d0
Added comment to getNextTuple. Moved some variables
prakul May 5, 2016
43d9604
Changed some variable names for clarity
prakul May 5, 2016
a782b1f
Added AND logic, added a test case corresponding to it. Refactored so…
prakul May 5, 2016
ea5d6c0
shifted position of spanSchemaDefined
prakul May 5, 2016
11bff36
Moved variables around
prakul May 6, 2016
c0e4ba1
Modified AND logic to search for all query tokens in a field
prakul May 6, 2016
365285e
Fixed confusing comment in Utils
prakul May 6, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,19 +1,32 @@
package edu.uci.ics.textdb.common.utils;

import java.io.StringReader;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashSet;
import java.util.List;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.document.DateTools;
import org.apache.lucene.document.DateTools.Resolution;
import org.apache.lucene.document.Field.Store;
import org.apache.lucene.document.IntField;
import org.apache.lucene.index.IndexableField;

import edu.uci.ics.textdb.api.common.Attribute;
import edu.uci.ics.textdb.api.common.FieldType;
import edu.uci.ics.textdb.api.common.IField;
import edu.uci.ics.textdb.api.common.ITuple;
import edu.uci.ics.textdb.api.common.Schema;
import edu.uci.ics.textdb.common.constants.SchemaConstants;
import edu.uci.ics.textdb.common.field.DataTuple;
import edu.uci.ics.textdb.common.field.DateField;
import edu.uci.ics.textdb.common.field.DoubleField;
import edu.uci.ics.textdb.common.field.IntegerField;
import edu.uci.ics.textdb.common.field.ListField;
import edu.uci.ics.textdb.common.field.Span;
import edu.uci.ics.textdb.common.field.StringField;
import edu.uci.ics.textdb.common.field.TextField;

Expand Down Expand Up @@ -72,4 +85,60 @@ public static IndexableField getLuceneField(FieldType fieldType,
}
return luceneField;
}
/**
* @about Modifies schema, fields and creates a new span tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an example to explain this function.

Does it really "modifies schema"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandeepreddy602 Please take a look

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't modifiy the schema.. Will change the comments.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sandeepreddy602 and @prakul : was the comment modified accordingly?

*/
public static ITuple getSpanTuple( List<IField> fieldList, List<Span> spanList, Schema spanSchema) {
IField spanListField = new ListField<Span>(new ArrayList<>(spanList));
List<IField> fieldListDuplicate = new ArrayList<>(fieldList);
fieldListDuplicate.add(spanListField);

IField[] fieldsDuplicate = fieldListDuplicate.toArray(new IField[fieldListDuplicate.size()]);
return new DataTuple(spanSchema, fieldsDuplicate);
}

/**
*
* @param schema
* @about Creating a new schema object, and adding SPAN_LIST_ATTRIBUTE to
* the schema. SPAN_LIST_ATTRIBUTE is of type List
*/
public static Schema createSpanSchema(Schema schema) {
List<Attribute> dataTupleAttributes = schema.getAttributes();
//spanAttributes contains all attributes of dataTupleAttributes and an additional SPAN_LIST_ATTRIBUTE
Attribute[] spanAttributes = new Attribute[dataTupleAttributes.size() + 1];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we "+1"? Explain in comments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

for (int count = 0; count < dataTupleAttributes.size(); count++) {
spanAttributes[count] = dataTupleAttributes.get(count);
}
spanAttributes[spanAttributes.length - 1] = SchemaConstants.SPAN_LIST_ATTRIBUTE;
Schema spanSchema = new Schema(spanAttributes);
return spanSchema;
}

/**
* Tokenizes the query string using the given analyser
* @param analyzer
* @param query
* @return ArrayList<String> list of results
*/
public static ArrayList<String> tokenizeQuery(Analyzer analyzer, String query) {
HashSet<String> resultSet = new HashSet<>();
ArrayList<String> result = new ArrayList<String>();
TokenStream tokenStream = analyzer.tokenStream(null, new StringReader(query));
CharTermAttribute charTermAttribute = tokenStream.addAttribute(CharTermAttribute.class);

try{
tokenStream.reset();
while (tokenStream.incrementToken()) {
String term = charTermAttribute.toString();
resultSet.add(term);
}
tokenStream.close();
} catch (Exception e) {
e.printStackTrace();
}
result.addAll(resultSet);

return result;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
package edu.uci.ics.textdb.dataflow.common;

import java.util.ArrayList;
import java.util.List;
import edu.uci.ics.textdb.api.common.Attribute;
import edu.uci.ics.textdb.common.utils.Utils;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.queryparser.classic.MultiFieldQueryParser;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.search.BooleanClause;
import org.apache.lucene.search.BooleanQuery;
import org.apache.lucene.search.Query;
import edu.uci.ics.textdb.api.common.IPredicate;
import edu.uci.ics.textdb.api.common.ITuple;
import edu.uci.ics.textdb.common.exception.DataFlowException;

/**
* @author prakul
*
*/

/**
* This class handles creation of predicate for querying using Keyword Matcher
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add high-level comments to explain the purpose of this class.

public class KeywordPredicate implements IPredicate{

private final List<Attribute> attributeList;
private final String[] fields;
private final String query;
private final Query queryObject;
private ArrayList<String> tokens;
private Analyzer analyzer;

public KeywordPredicate(String query, List<Attribute> attributeList, Analyzer analyzer ) throws DataFlowException{
try {
this.query = query;
this.attributeList = attributeList;
String[] temp = new String[attributeList.size()];

for(int i=0; i < attributeList.size(); i++){
temp[i] = attributeList.get(i).getFieldName();
}
this.fields = temp;
this.tokens = Utils.tokenizeQuery(analyzer, this.query);
this.analyzer = analyzer;
this.queryObject = createQueryObject();
} catch (Exception e) {
e.printStackTrace();
throw new DataFlowException(e.getMessage(), e);
}
}

@Override
public boolean satisfy(ITuple tuple) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a "TODO" here?


//This method is necessary for the interface implementation, and it's really not used.
return true;
}

/**
* Creates a Query object as a boolean Query on all attributes.
* Example: For creating a query like
* (TestConstants.DESCRIPTION + ":lin" + " AND " + TestConstants.LAST_NAME + ":lin")
* we provide a list of AttributeFields (Description, Last_name) to search on and a query string (lin)
*
* TODO #88:BooleanQuery() is deprecated. In future a better solution could be worked out in Query builder layer

* @return QueryObject
* @throws ParseException
*/
private Query createQueryObject() throws ParseException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use an example to explain the purpose of this function.

BooleanQuery booleanQuery = new BooleanQuery();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My Intellij IDE shows that "BooleanQuery" is deprecated.

MultiFieldQueryParser parser = new MultiFieldQueryParser(this.fields, this.analyzer);
for(String searchToken: this.tokens){
Query termQuery = parser.parse(searchToken);
booleanQuery.add(termQuery, BooleanClause.Occur.MUST);
}
return booleanQuery;
}

public String getQuery(){
return query;
}

public List<Attribute> getAttributeList() {
return attributeList;
}
public Query getQueryObject(){return this.queryObject;}

public ArrayList<String> getTokens(){return this.tokens;}

public Analyzer getAnalyzer(){
return analyzer;
}


}
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,11 @@
import edu.uci.ics.textdb.api.common.ITuple;
import edu.uci.ics.textdb.api.common.Schema;
import edu.uci.ics.textdb.api.dataflow.IOperator;
import edu.uci.ics.textdb.common.constants.SchemaConstants;
import edu.uci.ics.textdb.common.exception.DataFlowException;
import edu.uci.ics.textdb.common.field.DataTuple;
import edu.uci.ics.textdb.common.field.ListField;
import edu.uci.ics.textdb.common.field.Span;
import edu.uci.ics.textdb.common.field.StringField;
import edu.uci.ics.textdb.common.field.TextField;
import edu.uci.ics.textdb.common.utils.Utils;

/**
* @author Sudeep [inkudo]
Expand All @@ -36,7 +34,6 @@ public class DictionaryMatcher implements IOperator {
private String spanFieldName;
private ITuple dataTuple;
private List<IField> fields;
private Schema schema;
private Schema spanSchema;

private String regex;
Expand Down Expand Up @@ -79,8 +76,9 @@ public void open() throws Exception {

dataTuple = operator.getNextTuple();
fields = dataTuple.getFields();
schema = dataTuple.getSchema();
spanSchema = createSpanSchema();
if(spanSchema == null){
spanSchema = Utils.createSpanSchema(dataTuple.getSchema());
}

spanList = new ArrayList<>();
isPresent = false;
Expand All @@ -91,22 +89,6 @@ public void open() throws Exception {
}
}

/**
*
* @about Creating a new schema object, and adding SPAN_LIST_ATTRIBUTE to
* the schema. SPAN_LIST_ATTRIBUTE is of type List
*/
private Schema createSpanSchema() {
List<Attribute> dataTupleAttributes = schema.getAttributes();
Attribute[] spanAttributes = new Attribute[dataTupleAttributes.size() + 1];
for (int count = 0; count < spanAttributes.length - 1; count++) {
spanAttributes[count] = dataTupleAttributes.get(count);
}
spanAttributes[spanAttributes.length - 1] = SchemaConstants.SPAN_LIST_ATTRIBUTE;
Schema spanSchema = new Schema(spanAttributes);
return spanSchema;
}

/**
* @about Gets next matched tuple. Returns a new span tuple including the
* span results. Performs a scan based search, gets the dictionary
Expand Down Expand Up @@ -168,7 +150,7 @@ public ITuple getNextTuple() throws Exception {
} else if (attributeIndex == searchInAttributes.size() && isPresent) {
isPresent = false;
positionIndex = 0;
return getSpanTuple();
return Utils.getSpanTuple(fields, spanList, spanSchema);

} else if ((dataTuple = operator.getNextTuple()) != null) {
// Get the next document
Expand All @@ -177,8 +159,6 @@ public ITuple getNextTuple() throws Exception {
spanList.clear();

fields = dataTuple.getFields();
schema = dataTuple.getSchema();
spanSchema = createSpanSchema();
return getNextTuple();

} else if ((dictionaryValue = dictionary.getNextValue()) != null) {
Expand All @@ -197,7 +177,6 @@ public ITuple getNextTuple() throws Exception {

dataTuple = operator.getNextTuple();
fields = dataTuple.getFields();
schema = dataTuple.getSchema();
return getNextTuple();
}

Expand All @@ -209,18 +188,6 @@ private void addSpanToSpanList(String fieldName, int start, int end, String key,
spanList.add(span);
}

/**
* @about Modifies schema, fields and creates a new span tuple
*/
private ITuple getSpanTuple() {
IField spanListField = new ListField<Span>(new ArrayList<>(spanList));
List<IField> fieldListDuplicate = new ArrayList<>(fields);
fieldListDuplicate.add(spanListField);

IField[] fieldsDuplicate = fieldListDuplicate.toArray(new IField[fieldListDuplicate.size()]);
return new DataTuple(spanSchema, fieldsDuplicate);
}

/**
* @about Closes the operator
*/
Expand Down
Loading