Innovative Software Technology-How to Build Your First RAG System with Amazon Bedrock Agents and FAISS

This article guides developers through building their initial Retrieval-Augmented Generation (RAG) system using Amazon Bedrock Agents and FAISS. It emphasizes leveraging Amazon Bedrock Agents for intelligent orchestration and Large Language Model (LLM) interaction, while utilizing Facebook AI Similarity Search (FAISS) as a local, high-performance vector database. This combination provides an accessible entry point into Generative AI applications.

Why FAISS is Ideal for Your First RAG Project

Starting with RAG requires understanding core concepts without getting bogged down in complex database setups. FAISS offers a perfect solution for this initial learning phase because it operates locally, provides excellent speed, and gives developers full control over vector operations. Moreover, it’s open-source and eliminates the need for any AWS infrastructure for vector storage at the prototyping stage.

For practical application, FAISS serves as an excellent local sandbox for experimentation, learning, and rapid prototyping before transitioning to enterprise-grade solutions like Amazon OpenSearch Serverless or Amazon MemoryDB for production environments.

Architectural Overview of Our System

The proposed architecture intelligently combines the strengths of various components:

Amazon Bedrock Agents: Manages the overall workflow, decision-making, and interactions with Large Language Models.
FAISS: Handles the efficient storage of vector embeddings and performs high-speed similarity searches locally.
Custom Action Group: Acts as a crucial connector, bridging the Bedrock agent with our FAISS-based retrieval operations.

The operational flow for a user query is as follows:

User Query → Bedrock Agent → Action Group → FAISS Search → Retrieved Context → LLM Response

Establishing Your Local FAISS Environment

To begin, set up your local Python environment with the necessary libraries. Here’s how to initialize the core components:

import boto3
import faiss
import numpy as np
from sentence_transformers import SentenceTransformer
import json
import uuid
from typing import List, Dict, Any

# Initialize the embedding model
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')
embedding_dimension = 384

# Create FAISS index for Inner Product (cosine similarity)
index = faiss.IndexFlatIP(embedding_dimension)
document_store = {} # A simple dictionary to store original documents and metadata

Crafting Your Document Ingestion Mechanism

An efficient document ingestion system is key. A class-based approach is recommended to encapsulate the functionality for adding and searching documents:

class FAISSDocumentStore:
    def __init__(self, embedding_model_name='all-MiniLM-L6-v2'):
        self.embedding_model = SentenceTransformer(embedding_model_name)
        self.dimension = self.embedding_model.get_sentence_embedding_dimension()
        self.index = faiss.IndexFlatIP(self.dimension)
        self.documents = {} # Stores original text and metadata

    def add_documents(self, texts: List[str], metadata: List[Dict] = None):
        """Adds documents to the FAISS index."""
        if metadata is None:
            metadata = [{}] * len(texts)

        # Generate embeddings, normalizing for cosine similarity
        embeddings = self.embedding_model.encode(texts, normalize_embeddings=True)

        # Add vectors to the FAISS index
        start_id = len(self.documents)
        self.index.add(embeddings.astype('float32'))

        # Store documents with their metadata
        for i, (text, meta) in enumerate(zip(texts, metadata)):
            doc_id = start_id + i
            self.documents[doc_id] = {
                'text': text,
                'metadata': meta,
                'id': doc_id
            }

    def search(self, query: str, k: int = 5) -> List[Dict]:
        """Searches for documents similar to the given query."""
        query_embedding = self.embedding_model.encode([query], normalize_embeddings=True)
        scores, indices = self.index.search(query_embedding.astype('float32'), k)

        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx != -1: # Ensure it's a valid result
                doc = self.documents[idx].copy()
                doc['similarity_score'] = float(score)
                results.append(doc)
        return results

# Instantiate your document store
doc_store = FAISSDocumentStore()

Developing the Bedrock Agent Action Group

The action group serves as a Lambda function that translates calls from your Bedrock Agent into operations on your FAISS document store. This function receives a query, performs the FAISS search, and returns structured results to the agent.

def lambda_handler(event, context):
    """
    AWS Lambda handler for Bedrock Agent action group calls,
    performing RAG operations using the local FAISS store.
    """
    action_group = event.get('actionGroup', '')
    api_path = event.get('apiPath', '')
    http_method = event.get('httpMethod', '')
    parameters = event.get('parameters', [])

    query = None
    for param in parameters:
        if param.get('name') == 'query':
            query = param.get('value')
            break

    if not query:
        return {
            'messageVersion': '1.0',
            'response': {
                'actionGroup': action_group,
                'apiPath': api_path,
                'httpMethod': http_method,
                'httpStatusCode': 400,
                'responseBody': {
                    'application/json': {
                        'body': json.dumps({'error': 'Query parameter is required'})
                    }
                }
            }
        }

    try:
        # Perform FAISS search using the global doc_store (ensure it's initialized once)
        results = doc_store.search(query, k=3)

        # Format results for the Bedrock Agent
        context_documents = []
        for result in results:
            context_documents.append({
                'content': result['text'],
                'score': result['similarity_score'],
                'metadata': result.get('metadata', {})
            })

        response_body = {
            'query': query,
            'documents': context_documents,
            'total_results': len(context_documents)
        }

        return {
            'messageVersion': '1.0',
            'response': {
                'actionGroup': action_group,
                'apiPath': api_path,
                'httpMethod': http_method,
                'httpStatusCode': 200,
                'responseBody': {
                    'application/json': {
                        'body': json.dumps(response_body)
                    }
                }
            }
        }

    except Exception as e:
        return {
            'messageVersion': '1.0',
            'response': {
                'actionGroup': action_group,
                'apiPath': api_path,
                'httpMethod': http_method,
                'httpStatusCode': 500,
                'responseBody': {
                    'application/json': {
                        'body': json.dumps({'error': str(e)})
                    }
                }
            }
        }

Configuring Your Bedrock Agent

The AWS CLI offers a streamlined way to set up your Bedrock Agent and integrate the action group.

# Create the Bedrock Agent
aws bedrock-agent create-agent \
    --agent-name "faiss-rag-agent" \
    --description "RAG agent using FAISS for vector search" \
    --foundation-model "anthropic.claude-3-sonnet-20240229-v1:0" \
    --instruction "You are a helpful assistant that can search through documents to answer questions. When a user asks a question, use the search_documents function to find relevant information, then provide a comprehensive answer based on the retrieved context." \
    --region us-east-1

# Create the action group, linking it to your Lambda function
aws bedrock-agent create-agent-action-group \
    --agent-id "YOUR_AGENT_ID" \
    --agent-version "DRAFT" \
    --action-group-name "document-search" \
    --description "Search documents using FAISS vector database" \
    --action-group-executor lambda="arn:aws:lambda:us-east-1:YOUR_ACCOUNT:function:faiss-rag-function" \
    --api-schema '{
        "openapi": "3.0.0",
        "info": {
            "title": "Document Search API",
            "version": "1.0.0"
        },
        "paths": {
            "/search": {
                "post": {
                    "description": "Search for relevant documents",
                    "parameters": [
                        {
                            "name": "query",
                            "in": "query",
                            "required": true,
                            "schema": {
                                "type": "string"
                            },
                            "description": "The search query"
                        }
                    ]
                }
            }
        }
    }' \
    --region us-east-1

Ingesting Your Initial Documents

To make your RAG system functional, populate the FAISS index with some data. Here’s an example using AWS service descriptions:

# Sample documents about various AWS services
sample_docs = [
    "Amazon S3 is a highly scalable object storage service that offers industry-leading durability, availability, and performance.",
    "AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume.",
    "Amazon EC2 provides secure, resizable compute capacity in the cloud. It's designed to make web-scale cloud computing easier.",
    "Amazon RDS makes it easy to set up, operate, and scale a relational database in the cloud.",
    "Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale."
]

metadata = [
    {"service": "S3", "category": "Storage"},
    {"service": "Lambda", "category": "Compute"},
    {"service": "EC2", "category": "Compute"},
    {"service": "RDS", "category": "Database"},
    {"service": "DynamoDB", "category": "Database"}
]

# Add these sample documents to your FAISS store
doc_store.add_documents(sample_docs, metadata)
print(f"Successfully added {len(sample_docs)} documents to the FAISS index.")

Verifying Your RAG System’s Functionality

Before fully integrating, it’s prudent to test the search component directly:

# Test the document search capability
test_query = "What database services does AWS offer?"
results = doc_store.search(test_query, k=2)

print(f"Executing Query: \"{test_query}\"")
print("Top Search Results:")
for i, result in enumerate(results, 1):
    print(f"{i}. Score: {result['similarity_score']:.3f}")
    print(f"   Text: {result['text']}")
    print(f"   Service: {result['metadata'].get('service', 'N/A')}")
    print()

Programmatic Interaction with Bedrock Agents SDK

You can interact with your configured Bedrock Agent programmatically using the boto3 SDK:

import boto3

def chat_with_rag_agent(query: str, agent_id: str, agent_alias_id: str):
    """
    Initiates a chat with the Bedrock agent, which uses FAISS for RAG.
    """
    bedrock_agent_runtime = boto3.client('bedrock-agent-runtime', region_name='us-east-1')

    try:
        response = bedrock_agent_runtime.invoke_agent(
            agentId=agent_id,
            agentAliasId=agent_alias_id,
            sessionId=str(uuid.uuid4()), # Generate a unique session ID
            inputText=query
        )

        # Aggregate the streaming response chunks
        full_response = ""
        for event in response['completion']:
            if 'chunk' in event:
                chunk = event['chunk']
                if 'bytes' in chunk:
                    full_response += chunk['bytes'].decode('utf-8')
        return full_response

    except Exception as e:
        print(f"An error occurred while invoking the agent: {str(e)}")
        return None

# Example of how to use the function
agent_response = chat_with_rag_agent(
    query="What are the benefits of using AWS Lambda?",
    agent_id="YOUR_AGENT_ID",      # Replace with your agent's ID
    agent_alias_id="YOUR_ALIAS_ID"  # Replace with your agent's alias ID
)

print("Agent's Response:", agent_response)

Enhancing FAISS Performance

As your datasets grow, consider these optimization strategies for FAISS:

# For handling larger datasets more efficiently, consider IndexIVFFlat
def create_optimized_index(dimension: int, nlist: int = 100):
    """
    Creates an optimized FAISS index suitable for larger datasets.
    `nlist` determines the number of Voronoi cells.
    """
    quantizer = faiss.IndexFlatIP(dimension)
    index = faiss.IndexIVFFlat(quantizer, dimension, nlist)
    return index

# Integrate GPU support for accelerated processing if available
def create_gpu_index(dimension: int):
    """
    Constructs a GPU-accelerated FAISS index.
    Falls back to CPU if no GPUs are detected.
    """
    if faiss.get_num_gpus() > 0:
        res = faiss.StandardGpuResources()
        index = faiss.IndexFlatIP(dimension)
        gpu_index = faiss.index_cpu_to_gpu(res, 0, index)
        return gpu_index
    else:
        print("No GPU found, creating CPU index.")
        return faiss.IndexFlatIP(dimension)

Avoiding Common Pitfalls

Be mindful of these potential issues to ensure a robust RAG system:

Embedding Model Uniformity: Always use the identical embedding model for both indexing your documents and generating embeddings for your search queries. Inconsistencies will severely degrade search accuracy.
Normalization for Cosine Similarity:
Ensure embeddings are always normalized when using cosine similarity (or inner product, which is equivalent for normalized vectors).
```
# Ensure embeddings are normalized for accurate cosine similarity
embeddings = model.encode(texts, normalize_embeddings=True)
```

Batch Processing for Scale:
For extensive document collections, process data in batches to manage memory consumption effectively.

def add_documents_in_batches(doc_store, texts, batch_size=100):
    for i in range(0, len(texts), batch_size):
        batch = texts[i:i + batch_size]
        doc_store.add_documents(batch)
        print(f"Processed batch {i // batch_size + 1}")

Moving Towards Production Readiness

While this setup is excellent for learning, consider these steps for a production-ready system:

Index Persistence: Save your FAISS index to disk to avoid re-indexing on every application restart.
```
faiss.write_index(index, "my_index.faiss")
```
Scalable Vector Stores: For production workloads, migrate to managed, scalable vector databases like Amazon OpenSearch Serverless.
Monitoring and Logging: Implement robust logging and monitoring for your Lambda function to track performance and troubleshoot issues.
Security Best Practices: Adhere to AWS IAM roles and VPC configurations for secure access and network isolation.

Key Takeaway for Developers

The primary benefit of starting with FAISS locally is its ability to demystify RAG fundamentals without adding the overhead of complex distributed systems. Once you gain a solid understanding and confidence, you can smoothly transition to fully managed services like Amazon Bedrock Knowledge Bases or Amazon OpenSearch Serverless for more advanced, scalable vector search capabilities.

This iterative approach fosters rapid learning, component-level comprehension, and a gradual progression towards sophisticated GenAI deployments as your project demands evolve.

Your Action Plan

[ ] Set up your local Python environment with boto3, faiss-cpu, and sentence-transformers.
[ ] Instantiate the FAISSDocumentStore and add some sample documents.
[ ] Deploy the Lambda function that hosts your action group logic.
[ ] Configure your Amazon Bedrock Agent, linking it to your deployed action group.
[ ] Execute the end-to-end RAG pipeline to confirm functionality.
[ ] Experiment with different embedding models to observe impact on retrieval.
[ ] Investigate FAISS optimization techniques for your specific data characteristics.

My Final Recommendation

Begin your RAG journey with this local FAISS approach to grasp the foundational concepts. Once proficient, explore Amazon Bedrock Knowledge Bases for a fully managed solution that simplifies RAG deployments, or consider Amazon OpenSearch Serverless for highly scalable and customizable vector search.

The flexibility of this method enables quick iterations, deep understanding of each component, and a clear path to scaling your GenAI applications as your requirements mature.

Thank you for engaging with this guide. Now, it’s your turn to implement and innovate! Feel free to connect and share your experiences in building with Generative AI:

🔗 LinkedIn: https://www.linkedin.com/in/carloscortezcloud
🐦 X (formerly Twitter): https://x.com/ccortezb
💻 GitHub: https://github.com/ccortezb
📝 Dev.to: https://dev.to/ccortezb
🏆 AWS Heroes: https://builder.aws.com/community/@breakinthecloud
📖 Medium: https://ccortezb.medium.com

Stay curious, continue experimenting, and I look forward to our next interaction!