Document Savvy AI Agent

SECTOR

Technology

PROJECT TYPE

Full Stack + OpenAI

Technologies

ReactJS
NodeJS
ExpressJS
OpenAI API
Pinecone
PostgreSQL
AWS Lambda

Goal of the project

The project aimed to:

Build an AI-powered system that understands and retrieves precise information from document repositories.
Enable efficient document indexing using vector databases for semantic search capabilities.
Provide a responsive, user-friendly interface for querying and managing documents.

The application was built using a modular architecture to ensure scalability and flexibility:

Frontend:
- Developed using React.js for a responsive and interactive user interface.
- Features included document upload, query input, and response visualization.
Backend:
- Built with Node.js, handling API requests and orchestrating AI-powered document query processing.
- Integrated with OpenAI API for generating natural language responses based on user queries.
Vector Database:
- Used Pinecone for vector storage and similarity search.
- Processed and indexed documents into embeddings using OpenAI’s Embedding API.
Document Processing:
- Preprocessed and segmented documents for efficient embedding and indexing.
- Stored metadata and file references for retrieval.
Cloud Integration:
- AWS S3 for storing uploaded documents securely.
- AWS Lambda for asynchronous tasks like embedding generation and indexing.

Frontend: React.js, Material-UI
Backend: Node.js, Express.js
AI Integration: OpenAI API (ChatGPT and Embedding models)
Database: Pinecone (Vector Database)
Cloud Services: AWS S3, AWS Lambda
Tools Used:
- Postman for API testing.
- Docker for containerization.
- GitHub for version control.

The project was executed over 10 weeks:

Week 1-2: Requirement gathering, architecture design, and tech stack finalization.
Week 3-5: Backend development, including integration with OpenAI and Pinecone.
Week 6-7: Frontend development and user interface design.
Week 8: Cloud integration for document storage and embedding processing.
Week 9-10: Testing, performance optimization, and deployment.