Building an Inverted Index at the DBMS Layer for Fast Full Text Search

Ciprian-Octavian Truica, Alexandru Boicea, Florin Radulescu

Abstract


In order to make accurate and fast full text searches it is recommended to index the words in the documents. One way to do this is to use an Inverted Index to maintain, in a structured form, the occurrence of words in a set of documents. In order to minimize the number of stored words in the index, a stemmer like Porter Stemmer can be used, so only the root word will be kept for each word. In this paper an Inverted Index for documents stored in MongoDB and Oracle databases will be constructed. Four different methods for constructing an Inverted Index to compare and determine which model has the best performance will be presented. Two of them are implemented in Python, one constructed is using a single thread and the other uses the MapReduce algorithm. The other two systems will use the frameworks and tools provided by the databases. MapReduce framework for MongoDB and Pipelined Table Functions for Oracle will be used.

Keywords


MapReduce; Inverted Index; Porter Stemmer; Oracle; MongoDB; Pipeline Table Functions

Full Text: PDF