AI-powered Document Search

Abstract

AI-powered Document Search is a Python project that uses AI to perform semantic search and ranking of documents. The application features NLP-based ranking, error handling, and a CLI interface, demonstrating information retrieval and text processing techniques.

Prerequisites

Python 3.8 or above
A code editor or IDE
Basic understanding of NLP and information retrieval
Required libraries: scikit-learnscikit-learn, numpynumpy, pandaspandas

Before you Start

Install Python and the required libraries:

Install dependencies

pip install scikit-learn numpy pandas

Install dependencies

pip install scikit-learn numpy pandas

Getting Started

Create a Project

Create a folder named ai-powered-document-searchai-powered-document-search.
Open the folder in your code editor or IDE.
Create a file named ai_powered_document_search.pyai_powered_document_search.py.
Copy the code below into your file.

Write the Code

⚙️ AI-powered Document Search

AI-powered Document Search

"""
AI-powered Document Search
 
Features:
- Semantic document search
- NLP-based ranking
- Modular design
- CLI interface
- Error handling
"""
import sys
try:
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.metrics.pairwise import cosine_similarity
except ImportError:
    TfidfVectorizer = None
    cosine_similarity = None
 
class DocumentSearch:
    def __init__(self):
        self.vectorizer = TfidfVectorizer() if TfidfVectorizer else None
        self.documents = []
        self.vectors = None
    def add_documents(self, docs):
        self.documents.extend(docs)
        if self.vectorizer:
            self.vectors = self.vectorizer.fit_transform(self.documents)
    def search(self, query):
        if self.vectorizer and self.vectors is not None:
            query_vec = self.vectorizer.transform([query])
            scores = cosine_similarity(query_vec, self.vectors).flatten()
            ranked = sorted(zip(self.documents, scores), key=lambda x: x[1], reverse=True)
            return ranked[:5]
        return []
 
class CLI:
    @staticmethod
    def run():
        print("AI-powered Document Search")
        searcher = DocumentSearch()
        while True:
            cmd = input('> ')
            if cmd.startswith('add'):
                parts = cmd.split(maxsplit=1)
                if len(parts) < 2:
                    print("Usage: add <doc1|doc2|...>")
                    continue
                docs = parts[1].split('|')
                searcher.add_documents(docs)
                print(f"Added {len(docs)} documents.")
            elif cmd.startswith('search'):
                parts = cmd.split(maxsplit=1)
                if len(parts) < 2:
                    print("Usage: search <query>")
                    continue
                query = parts[1]
                results = searcher.search(query)
                for doc, score in results:
                    print(f"Score: {score:.2f} | Doc: {doc}")
            elif cmd == 'exit':
                break
            else:
                print("Unknown command")
 
if __name__ == "__main__":
    try:
        CLI.run()
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

AI-powered Document Search

"""
AI-powered Document Search
 
Features:
- Semantic document search
- NLP-based ranking
- Modular design
- CLI interface
- Error handling
"""
import sys
try:
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.metrics.pairwise import cosine_similarity
except ImportError:
    TfidfVectorizer = None
    cosine_similarity = None
 
class DocumentSearch:
    def __init__(self):
        self.vectorizer = TfidfVectorizer() if TfidfVectorizer else None
        self.documents = []
        self.vectors = None
    def add_documents(self, docs):
        self.documents.extend(docs)
        if self.vectorizer:
            self.vectors = self.vectorizer.fit_transform(self.documents)
    def search(self, query):
        if self.vectorizer and self.vectors is not None:
            query_vec = self.vectorizer.transform([query])
            scores = cosine_similarity(query_vec, self.vectors).flatten()
            ranked = sorted(zip(self.documents, scores), key=lambda x: x[1], reverse=True)
            return ranked[:5]
        return []
 
class CLI:
    @staticmethod
    def run():
        print("AI-powered Document Search")
        searcher = DocumentSearch()
        while True:
            cmd = input('> ')
            if cmd.startswith('add'):
                parts = cmd.split(maxsplit=1)
                if len(parts) < 2:
                    print("Usage: add <doc1|doc2|...>")
                    continue
                docs = parts[1].split('|')
                searcher.add_documents(docs)
                print(f"Added {len(docs)} documents.")
            elif cmd.startswith('search'):
                parts = cmd.split(maxsplit=1)
                if len(parts) < 2:
                    print("Usage: search <query>")
                    continue
                query = parts[1]
                results = searcher.search(query)
                for doc, score in results:
                    print(f"Score: {score:.2f} | Doc: {doc}")
            elif cmd == 'exit':
                break
            else:
                print("Unknown command")
 
if __name__ == "__main__":
    try:
        CLI.run()
    except Exception as e:
        print(f"Error: {e}")
        sys.exit(1)

Example Usage

Run document search

python ai_powered_document_search.py

Run document search

python ai_powered_document_search.py

Explanation

Key Features

Semantic Search: Uses NLP for document ranking.
Information Retrieval: Finds relevant documents based on queries.
Error Handling: Validates inputs and manages exceptions.
CLI Interface: Interactive command-line usage.

Code Breakdown

Import Libraries and Setup Search

ai_powered_document_search.py

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

ai_powered_document_search.py

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

Document Search Function

ai_powered_document_search.py

def search_documents(docs, query):
    vectorizer = TfidfVectorizer()
    doc_vectors = vectorizer.fit_transform(docs)
    query_vec = vectorizer.transform([query])
    scores = cosine_similarity(query_vec, doc_vectors).flatten()
    ranked = np.argsort(scores)[::-1]
    return [(docs[i], scores[i]) for i in ranked[:5]]

ai_powered_document_search.py

def search_documents(docs, query):
    vectorizer = TfidfVectorizer()
    doc_vectors = vectorizer.fit_transform(docs)
    query_vec = vectorizer.transform([query])
    scores = cosine_similarity(query_vec, doc_vectors).flatten()
    ranked = np.argsort(scores)[::-1]
    return [(docs[i], scores[i]) for i in ranked[:5]]

CLI Interface and Error Handling

ai_powered_document_search.py

def main():
    print("AI-powered Document Search")
    # docs = [...]  # Load documents (not shown for brevity)
    while True:
        cmd = input('> ')
        if cmd == 'search':
            query = input("Query: ")
            # results = search_documents(docs, query)
            print("[Demo] Search logic here.")
        elif cmd == 'exit':
            break
        else:
            print("Unknown command. Type 'search' or 'exit'.")
 
if __name__ == "__main__":
    main()

ai_powered_document_search.py

def main():
    print("AI-powered Document Search")
    # docs = [...]  # Load documents (not shown for brevity)
    while True:
        cmd = input('> ')
        if cmd == 'search':
            query = input("Query: ")
            # results = search_documents(docs, query)
            print("[Demo] Search logic here.")
        elif cmd == 'exit':
            break
        else:
            print("Unknown command. Type 'search' or 'exit'.")
 
if __name__ == "__main__":
    main()

Features

AI-Based Document Search: High-accuracy semantic ranking
Modular Design: Separate functions for search and ranking
Error Handling: Manages invalid inputs and exceptions
Production-Ready: Scalable and maintainable code

Next Steps

Enhance the project by:

Integrating with real-world document datasets
Supporting batch search
Creating a GUI with Tkinter or a web app with Flask
Adding evaluation metrics (precision, recall)
Unit testing for reliability

Educational Value

This project teaches:

Information Retrieval: Semantic search and ranking
Software Design: Modular, maintainable code
Error Handling: Writing robust Python code

Real-World Applications

Enterprise Search Tools
Content Management
Educational Tools

Conclusion

AI-powered Document Search demonstrates how to build a scalable and accurate semantic search tool using Python. With modular design and extensibility, this project can be adapted for real-world applications in enterprise, education, and more. For more advanced projects, visit Python Central Hub.

If this helped you, consider buying me a coffee ☕

Buy me a coffee

AI-powered Document Search

Abstract

Prerequisites

Before you Start

Getting Started

Create a Project

Write the Code

Example Usage

Explanation

Key Features

Code Breakdown

Features

Next Steps

Educational Value

Real-World Applications

Conclusion

Was this page helpful?