Skip to content

Fast + local semantic search over select body of Wikipedia pages.

Notifications You must be signed in to change notification settings

jackhhao/wikidex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wikidex

Fast + local semantic search over select body of Wikipedia pages.

Installation

Using pip:

pip install -r requirements.txt

Usage

To construct the search index:

python construct_index.py --save_dir=<directory name> --title=<title of main Wikipedia list>

To query over the index:

python search.py --save_dir=<directory name>

Architecture

I chose to use prebuilt LLM orchestration tools such as LangChain and LlamaIndex for their efficiency and simplistic usage. Saves having to manually implement text chunking, embedding, and vector lookup via an algorithm like FAISS.

I also chose to use the open-source FastEmbed embeddings model (defaults to BAAI/bge-base-en on HuggingFace). This prioritizes latency and end-user experience while maintaining accuracy of the lookup.

A command line interface was chosen for simplicity and ease of implementation.

About

Fast + local semantic search over select body of Wikipedia pages.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages