In this project, we will design and implement a mini search engine that is used to search through a colle ction of documents . The data struc tures used are files for sto rin g, has h tab les for ind exi ng and tre es for search ing the doc ume nts . The documents will be stored using files and given a set of texts and a query, the search engine will locate all the documents that contain the keywords in that query. The purpose of this project is to provide an overview of how a search engine works and to gain hands-on experience in using hash tables, files and trees.
The documents stored as files will be indexed based on their words/tokens using hashing functions. This is done in order to make it easier to retrieve the required documents.
Searching will be done using trees, and depend in g upon th eefficiency an d complexity of the algorithm we will use AVL trees or balanced binary search trees. In order to allow efficient searching, for every word a list of documents where it will occur will be stored. The queries may contain simple Boolean operators, that is AND/OR, which act in a similar manner with the well-known analogous logical operators. For each such query, the document that satisfies that query will be displayed.
For instance, a query:
Keyword1 AND Keyword2 -- should retrieve all documents that contain both these keywords (elements).
Keyword1 OR Keyword2 -- instead will retrieve documents that contain either one of the two keywords
Related Projects : Bluetooth Home Automation,Creepy Crawler System,ATM Reporting system,E-Mail Campaign System,Mingle Box,Trade Service Engine,SMTP Mail Server,Virtual Shopping,Value card – Smart card based Loyalty,Universal Web Based File Coordinator,UA Portal,Trackerz,Survey Logics,Pro-net Communication,Implementation of Security in WAN,Implementation of OSPF on IPV6,Support Vector Machines For Face Recognition,Web based Applications for Insurance Services,Cold Boot Attack,Virtual Class Rooms,Electronic Mail Server,SUDOKU,Bluetooth Hotspot ,Result Alert System With E-mail and SMS,Bug Tracking System,Partial Face Recognition Using Core features Of The Face,Face Recognition in e-attendance,Online Examination System,Chat Server,Bandwidth-Allocation-for-Distributed-Algorithm