RAG Architectures

AWS vs Azure vs. GCP RAG Architectures

Apr 05, 2025

Audience: Anyone who has basic knowledge on Gen AI and wants to know more about AI Agents and how they are implemented (primarily software architects and developers).

Acronyms and Terms: GenAI (generative AI), GPT (Generative Pretrained Transformer), LLM (Large Language Model), RAG (Retrieval Augmented Generation), For more details on many of these terms, see our AI Glossary.

Introduction

This article maps and explains the AWS, Azure, and Google components used in the RAG (Retrieval-Augmented Generation) Architecture. I would like to remind here that the RAG component can be part of your Agents to reduce the hallucinations and improve the context limited to your company data. This is a very short AWS vs Azure vs GCP comparison article for RAG components services.

Please refer to my Retrieval-Augmented Generation (RAG) article for details on RAG:

Retrieval Augmented Generation

Lakshmi Veeramani and Karen Smiley

Jan 10

Read full story

AWS vs. Azure vs. GCP Architectures

I am using the numbered block names from the three figures above in the text below. The AWS, Azure, and Google services mentioned are used to implement those blocks’ functionalities.

1. Document Upload

In order to create the knowledge base of documents required to be fed to the AI agent to control the context, it needs to be uploaded to a database.

In AWS, the database is S3.
In Azure, the database is Blob Storage.
In GCP, the database is Google Cloud Storage (GCS).

2. Extract Text

Depending on the file type, there are open-source Python libraries for text extraction, like pdfplumber, PyPDF2, python-docx, etc. Cloud-based services are also available, but in most simple cases, they are not needed.

In AWS, Amazon Textract extracts text from documents, and Amazon Comprehend detects sentiment & entities.
In Azure, Azure Form Recognizer handles OCR, while Azure Text Analytics performs ‘Named Entity Recognition’ & sentiment analysis.
In GCP, Google Cloud Vision extracts text from images and PDFs using OCR, while Google Cloud Natural Language API performs entity recognition, sentiment analysis, and text classification.

3. Embeddings

In AWS, the embedding models available in Bedrock are Titan Embeddings and Cohere Embed. Of these, the most-used is the latest version of the Titan Embedding model.
In Azure, the OpenAI text-embedding-ada-002 is the most used embedding model.
In GCP, Vertex AI offers embedding models such as Text Embedding Gecko from Google’s Gemini models, which provides high-quality embeddings for enterprise applications.

We can use open-source embedding models as well. But I focus on enterprise-wide scalable solutions where we use these cloud-based services.

4. Vector Store

A vector store stores and manages vector embeddings, organizing and retrieving similar vectors efficiently using advanced algorithms.

In AWS, OpenSearch Serverless Vector Database with the retrieval algorithm k-NN search efficiently retrieves embeddings.
In Azure, Azure Cognitive Search is used for vector storage and retrieval.
In GCP, Vertex AI Matching Engine provides a high-performance, scalable vector database that supports ANN (Approximate Nearest Neighbor) search for fast and efficient retrieval of embeddings.

5. Retrieval

In AWS, OpenSearch Serverless Vector Database with the retrieval algorithm k-NN search efficiently retrieves embeddings.
In Azure, Azure AI also supports k-NN for vector search.
In GCP, Vertex AI Matching Engine provides high-performance vector retrieval with ANN (Approximate Nearest Neighbor) search, while Vertex AI Search combines LLM reasoning with Google-quality semantic search for more advanced retrieval capabilities.

6. Prompt Augmentation

When a user has the question {question}, and the chat history is available as {chat history}, along with the retrieved sources as {retrieved_results}, the prompt can be enhanced. Augmented prompt:

“You are {Role}. Based on our previous discussions about [{chat history}], and considering insights from [{retrieved_results}], how do you think [{question}] can be addressed? Could you please provide your answer?”

7. LLM Integration

In AWS, Bedrock gives access to Claude, Titan, and Llama 2, with SageMaker JumpStart for custom models.
In Azure, Azure OpenAI provides GPT-4, GPT-3.5, and other models fine-tuned for domain-specific applications.
In GCP, Vertex AI Model Garden provides access to Gemini models (Gemini 1.5, Gemini 1.0), PaLM 2, and open-source models like Llama 2, enabling enterprise-grade AI solutions.

Conclusion

I hope this article is straightforward and easy to understand. However, I believe there is room for further clarity regarding the selection of retrieval algorithms. To address this, we can explore in an upcoming article the various retrieval algorithms available and discuss how to select the most suitable ones based on specific use cases.

I am also exploring MemGPT (Letta), the other component which is a very important part of LLMs acting as operating systems for memory management. Hopefully that will be the next article, until then see you all!

Disclaimer: This content is for informational purposes only and does not and should not be considered professional advice. Information is current at the time of publication but may become outdated. Please verify details before relying on it.
All content, downloads, and services provided through 6 'P's in AI Pods (AI6P) publication are subject to the Publisher Terms available here. By using this content you agree to the Publisher Terms.

Thanks for reading 6 'P's in AI Pods (AI6P)! This post is public, so feel free to share it.