Skip to content

πŸ”Ž A simple Crawler-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them using Java and a Web Interface.

License

Notifications You must be signed in to change notification settings

EslamAsHhraf/Flash

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ Flash

logo


πŸ“ Table of Contents


πŸ“™ Overview

  • The aim of this project is to develop a simple Crawler- based search engine that demonstrates the main features of a search engine and the interaction between them.
  • The main features of a search engine
    • Web Crawling
    • Indexing
    • Ranking

  • Built using Java lnaguage.
  • Web interface for Search Engine using Html & CSS & JS.
  • Built using MongoDB.

πŸš€ Get Started

  1. Clone the repository.
    git clone https://github.com/AbdelrahmanHamdyy/Flash
    
  2. You will need to download Jdk.
  3. You will need to download Tomcat.
  4. You will need to read Search Engine Project to understand the project very well and how the search engine operates.

πŸ”Ž Search Engine Modules

Module Description
πŸ”· Web Crawler The web crawler is a software agent that collects documents from the web. The crawler starts with a list of URL addresses (seed set). It downloads the documents identified by these URLs and extracts hyper-links from them. The extracted URLs are added to the list of URLs to be downloaded. Thus, web crawling is a recursive process.
πŸ”Ά Indexer The output of web crawling process is a set of downloaded HTML documents. To respond to user queries fast enough, the contents of these documents have to be indexed in a data structure that stores the words contained in each document and their importance (e.g., whether they are in the title, in a header or in plain text).
πŸ”· Query Processor This module receives search queries, performs necessary preprocessing and searches the index for relevant documents. Retrieve documents containing words that share the same stem with those in the search query. For example, the search query β€œtravel” should match (with lower degree) the words β€œtraveler”, β€œtraveling” … etc.
πŸ”Ά Phrase Searching Search engines will generally search for words as phrases when quotation marks are placed around the phrase.
πŸ”· Ranker

The ranker module sorts documents based on their popularity and relevance to the search query.

  1. Relevance
  2. Relevance is a relation between the query words and the result page and could be calculated in several ways such as tf-idf of the query word in the result page or simply whether the query word appeared in the title, heading, or body. And then you aggregate the scores from all query words to produce the final page relevance score.

  3. Popularity
  4. Popularity is a measure for the importance of any web page regardless the requested query. You can use pagerank algorithm (as explained in the lecture) or other ranking algorithms to calculate each page popularity.

πŸ”Ά Voice Recognition Search Using a voice query instead of a typed one.
πŸ”· Web Interface

We implement a web interface for search engine.

  • This interface receives user queries and displays the resulting pages returned by the engine

  • The result appears with snippets of the text containing queries words. The output looks like google's results page.

πŸŽ₯ Demo


video_0PS1M9TF_Trim.mp4

πŸ‘‘ Contributors


Abdelrahman Hamdy


Abdelrahman Noaman


Adham Ali


Eslam Ashraf

πŸ”’ License

Note: This software is licensed under MIT License, See License for more information Β©AbdelrahmanHamdyy.

About

πŸ”Ž A simple Crawler-based search engine that demonstrates the main features of a search engine (web crawling, indexing and ranking) and the interaction between them using Java and a Web Interface.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Java 63.6%
  • HTML 15.5%
  • CSS 15.5%
  • JavaScript 5.4%