An Elasticsearch-comparable, full-text search engine using JavaScript that leverages advanced Natural Language Processing. The BM25 ranking function at the core of this project is tunable to different types of texts (e.g. tweets, scientific journals, legal writing). Key features are:
- The JavaScript source code can be natively deployed on the server side to Node.js as well as on the client side in browser extensions, single-page apps, serverless, React Native, edge computing, and many other applications.
- The accuracy and versatility of BM25 comes from being able to tune its parameters to specific types of documents.
- Separates offline indexing from the time-sensitive online search.
- Each individual NLP component, like the stemmer or the stopword list, is pluggable and carefully researched to keep at the bleeding edge. (For example, the stopword list is a confluence of the best words from three authoritative stopword lists: the Stanford CoreNLP, Journal of Machine Learning Research, and NLTK.)
- Dockerfile and Docker image are available. Conveniently tryout the module.
- Reasonable unit test coverage, continuous integration, and separation of concerns for each functionality.
Right above is a demo Express app (see MEAN stack) enhanced with full-text search capability. The easy way to try this demo is to run its docker image as below, then point browser to localhost:3000 .
docker run --rm -d -p 3000:8080 jj232/retrieval
Or you can run the command below after installing:
npm run demo2
Then, point browser to localhost:8080 .
Suggestions on deploying: For integrating the module into a simple js app, the demo right here shows this to be doable in only a few lines of code--see source code at "./demo/demo2/server.js". But for a more complex software solution, or one that relies on other languages/RTEs, the recommended way is to Dockerize this module and then expose as a microservice.
For the latest release:
npm install retrieval
For continuous build:
git clone https://github.com/zjohn77/retrieval.git
cd retrieval
npm install
const path = require("path");
const Retrieval = require(path.join(__dirname, "..", "..", "src", "Retrieval.js"));
const texts = require("./data/music-collection"); // Load some sample texts to search.
// 1st step: instantiate Retrieval with the tuning parameters for BM25 that attenuate term frequency.
let rt = new Retrieval(K=1.6, B=0.75);
// 2nd step: index the array of texts (strings); store the resulting document-term matrix.
rt.index(texts);
// 3rd step: search. In other words, multiply the document-term matrix and the indicator vector representing the query.
rt.search("theme and variations", 5) // Top 5 search results for the query 'theme and variations'
.map(item => console.log(item));
// 04 - Theme & Variations In G Minor.flac
// 17 - Rhapsody On A Theme of Paganini - Variation 18.flac
// 01 - Diabelli Variations - Theme Vivace & Variation 1 Alla Marcia Maestoso.flac
// 07 - Rhapsody On A Theme of Paganini (Introduction and 24 Variations).flac
// 10 - Diabelli Variations - Variation 10 Presto.flac
The example right above is from "./demo/demo1/scenarios.js". To run the full example, do:
npm run demo1
To run unit tests, do:
npm test