You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you want to expand your query/documents with synonyms in Apache Lucene, you need a predefined file containing the list of terms that share the same semantics.
It's not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match your contextual domain.
The term "daemon" in the domain of operating system articles is not a synonym of "devil" but it's closer to the term "process".
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary.
Two words with similar meanings are identified with two vectors close to each other.
This contribution integrates this technique with the text analysis pipeline. It automatically generates synonyms on the fly from a Word2Vec model generated using the library DL4J.
Please see our presentation at the Berlin Buzzwords conference: https://pretalx.com/bbuzz22/talk/UYZAUX/
Description
If you want to expand your query/documents with synonyms in Apache Lucene, you need a predefined file containing the list of terms that share the same semantics.
It's not always easy to find a list of basic synonyms for a language and, even if you find it, this doesn’t necessarily match your contextual domain.
The term "daemon" in the domain of operating system articles is not a synonym of "devil" but it's closer to the term "process".
Word2Vec is a two-layer neural network that takes as input a text and outputs a vector representation for each word in the dictionary.
Two words with similar meanings are identified with two vectors close to each other.
This contribution integrates this technique with the text analysis pipeline. It automatically generates synonyms on the fly from a Word2Vec model generated using the library DL4J.
Please see our presentation at the Berlin Buzzwords conference: https://pretalx.com/bbuzz22/talk/UYZAUX/
We also created a tool to generate a Word2vec model from a Lucene index: https://github.com/SeaseLtd/LuceneWord2VecModelTrainer
The text was updated successfully, but these errors were encountered: