Cloud Blog: Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence

Jan 9, 2025

—

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/introducing-vertex-ai-rag-engine/
Source: Cloud Blog
Title: Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence

Feedly Summary: Closing the gap between impressive model demos and real-world performance is crucial for successfully deploying generative AI for enterprise. Despite the incredible capabilities of generative AI for enterprise, this perceived gap may be a barrier for many developers and enterprises to “productionize” AI. This is where retrieval-augmented generation (RAG) becomes non-negotiable – it strengthens your enterprise applications by building trust in its AI outputs.
Today, we’re sharing the general availability of Vertex AI’s RAG Engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods. With our Vertex AI RAG Engine you will be able to:

Adapt to any architecture: Choose the models, vector databases, and data sources that work best for your use case. This flexibility ensures RAG Engine fits into your existing infrastructure rather than forcing you to adapt to it.

Evolve with your use case: Add new data sources, updating models, or adjusting retrieval parameters happens through simple configuration changes. The system grows with you, maintaining consistency while accommodating new requirements.

Evaluate in simple steps: Set up multiple RAG engines with different configurations to find what works best for your use case.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Introducing Vertex AI RAG Engine
Vertex AI RAG Engine is a managed service that lets you build and deploy RAG implementations with your data and methods. Think of it as having a team of experts who have already solved complex infrastructure challenges such as efficient vector storage, intelligent chunking, optimal retrieval strategies, and precise augmentation — all while giving you the controls to customize for your specific use case.

Vertex AI’s RAG Engine offers a vibrant ecosystem with a range of options catering to diverse needs.

DIY capabilities: DIY RAG empowers users to tailor their solutions by mixing and matching different components. It works great for low to medium complexity use cases with easy-to-get-started API, enabling fast experimentation, proof-of-concept and RAG-based application with a few clicks.

Search functionality: Vertex AI Search stands out as a robust, fully managed solution. It supports a wide variety of use cases, from simple to complex, with high out-of-the-box quality, easiness to get started and minimum maintenance.

Connectors: A rapidly growing list of connectors helps you quickly connect to various data sources, including Cloud Storage, Google Drive, Jira, Slack, or local files. RAG Engine handles the ingestion process (even for multiple sources) through an intuitive interface.

Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.

Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.

Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.

Customization
One of the defining strengths of Vertex AI’s RAG Engine is its capacity for customization. This flexibility allows you to fine-tune various components to perfectly align with your data and use case.

Parsing: When documents are ingested into an index, they are split into chunks. RAG Engine provides the possibility to tune chunk size and chunk overlap and different strategies to support different types of documents.

Retrieval: you might already be using Pinecone, or perhaps you prefer the open-source capabilities of Weaviate. Maybe you want to leverage Vertex AI Vector Search or our Vector database. RAG Engine works with your choice, or if you prefer, can manage the vector storage entirely for you. This flexibility ensures you’re never locked into a single approach as your needs evolve.

Generation: You can choose from hundreds of LLMs in Vertex AI Model Garden, including Google’s Gemini, Llama and Claude.

Use Vertex AI RAG as a tool in Gemini
Vertex AI’s RAG Engine is natively integrated with Gemini API as a tool. You can create grounded conversation that uses RAG to provide contextually relevant answers. Simply initialize a RAG retrieval tool, configured with specific settings like the number of documents to retrieve and using an LLM-based ranker. This tool is then passed to a Gemini model.

code_block
<ListValue: [StructValue([(‘code’, ‘from vertexai.preview import rag\r\nfrom vertexai.preview.generative_models import GenerativeModel, Tool\r\nimport vertexai\r\n\r\nPROJECT_ID = “PROJECT_ID"\r\nCORPUS_NAME = "projects/{PROJECT_ID}/locations/LOCATION/ragCorpora/RAG_CORPUS_RESOURCE"\r\nMODEL_NAME= "MODEL_NAME"\r\n\r\n# Initialize Vertex AI API once per session\r\nvertexai.init(project=PROJECT_ID, location="LOCATION")\r\n\r\nconfig = vertexai.preview.rag.RagRetrievalConfig(\r\n top_k=10,\r\n ranking=rag.Ranking(\r\n llm_ranker=rag.LlmRanker(\r\n model_name=MODEL_NAME\r\n )\r\n )\r\n)\r\n\r\nrag_retrieval_tool = Tool.from_retrieval(\r\n retrieval=rag.Retrieval(\r\n source=rag.VertexRagStore(\r\n rag_resources=[\r\n rag.RagResource(\r\n rag_corpus=CORPUS_NAME,\r\n )\r\n ],\r\n rag_retrieval_config=config\r\n ),\r\n )\r\n)\r\n\r\nrag_model = GenerativeModel(\r\n model_name=MODEL_NAME, tools=[rag_retrieval_tool]\r\n)\r\nresponse = rag_model.generate_content("Why is the sky blue?")\r\nprint(response.text)\r\n# Example response:\r\n# The sky appears blue due to a phenomenon called Rayleigh scattering.\r\n# Sunlight, which contains all colors of the rainbow, is scattered\r\n# by the tiny particles in the Earth\’s atmosphere….\r\n# …’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eaf5b947670>)])]>

Use Vertex AI Search as a retriever:
Vertex AI Search provides a solution for retrieving and managing data within your Vertex AI RAG applications. By using Vertex AI Search as your retrieval backend, you can improve performance, scalability, and ease of integration.

Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.

Seamless integration: Vertex AI provides built-in integration with Vertex AI Search, which lets you select Vertex AI Search as the corpus backend for your RAG application. This simplifies the integration process and helps to ensure optimal compatibility between components.

code_block
<ListValue: [StructValue([(‘code’, ‘from vertexai.preview import rag\r\nimport vertexai\r\n\r\nPROJECT_ID = "PROJECT_ID"\r\nDISPLAY_NAME = "DISPLAY_NAME"\r\nENGINE_NAME = "ENGINE_NAME"\r\n\r\n# Initialize Vertex AI API once per session\r\nvertexai.init(project=PROJECT_ID, location="us-central1")\r\n\r\n# Create a corpus\r\nvertex_ai_search_config = rag.VertexAiSearchConfig(\r\n serving_config=f"{ENGINE_NAME}/servingConfigs/default_search",\r\n)\r\n\r\nrag_corpus = rag.create_corpus(\r\n display_name=DISPLAY_NAME,\r\n vertex_ai_search_config=vertex_ai_search_config,\r\n)\r\n\r\n# Check the corpus just created\r\nnew_corpus = rag.get_corpus(name=rag_corpus.name)\r\nprint(new_corpus)’), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3eaf5b947fa0>)])]>

Get started today
You can access Vertex AI’s RAG Engine through our Vertex AI Studio. Visit the Google Cloud Console to get started, or reach out to us for a guided proof of concept. To get started visit our RAG quick start documentation or take a look at our Vertex AI RAG Engine GitHub repository.

AI Summary and Description: Yes

**Summary:** The text discusses the release of Vertex AI’s Retrieval-Augmented Generation (RAG) Engine, a managed service designed to help enterprises effectively implement generative AI models. It highlights the flexibility, customization capabilities, and performance enhancements the RAG Engine offers, aiming to build trust in AI outputs and address gaps between AI demonstrations and real-world applications.

**Detailed Description:**

The text provides detailed insights on Vertex AI’s RAG Engine, emphasizing its importance for enterprises that wish to leverage generative AI effectively. This is particularly relevant for professionals in AI security and cloud computing, as it lays out the potential for improved deployment and application of AI models in enterprise settings.

Key Points:

– **General Availability of Vertex AI RAG Engine:**
– It serves as a fully managed service for deploying RAG implementations, enhancing enterprise applications by integrating retrieval capabilities.

– **Adaptability and Integration:**
– **Custom Architecture:** Users can choose models, databases, and data sources that best fit their existing infrastructure.
– **Evolvability:** Easily update models and data sources through configuration changes to fit evolving requirements.

– **Performance Evaluation:**
– Offers the ability to set up multiple RAG engines with different configurations, simplifying the process of finding the optimal setup for specific use cases.

– **DIY and Managed Features:**
– Allows for do-it-yourself (DIY) solutions that are adaptable for low to medium complexity use cases.
– Provides a robust search functionality supporting a diverse range of applications with minimal maintenance.

– **Data Management and Connectors:**
– Streamlined ingestion of data from various sources, such as Cloud Storage, Google Drive, and Jira, facilitated through an intuitive interface.
– Enhanced performance and scalability capabilities designed to handle large datasets with low latency, crucial for real-time applications.

– **Customization Abilities:**
– Users can fine-tune chunk sizes during document ingestion and have the flexibility to use existing vector databases, enhancing output quality by retrieving relevant information effectively.

– **Integration with LLMs:**
– Vertex AI’s RAG Engine supports integration with various large language models (LLMs), allowing users to select models from the Vertex AI Model Garden.

– **Use Cases and Advantages:**
– Facilitates the creation of contextually relevant answers in applications by utilizing RAG alongside models like Google’s Gemini.
– The text underlines the importance of RAG in bridging the gap between impressive AI demos and practical enterprise applications, positioning it as essential for successful integration of generative AI in business contexts.

Overall, the content is significant for professionals focused on AI deployment in enterprise environments, emphasizing enhancements to infrastructure security through trusted AI outputs and adaptive solutions within cloud ecosystems.

1 3 4 5 a access Act adaptability adaptive solutions AGI AI AI models AI security API Application applications Arch architecture art as augmented generation availability backend based Best BigQuery business business context by C capabilities capacity challenges Chunking CIA Claude Cloud cloud computing cloud console cloud storage code compatibility complexity Computing concept Configuration connectors consistency Console content Context control controls customization D data Data Ingestion data management data sources database databases dataset datasets day de DeFi demo deployment design developer developers document documentation e e-learning ecosystem ecosystems edge effective efficient end enhanced performance enterprise enterprise applications enterprise environments enterprise settings enterprises Entra environment ERP evaluation exp experimentation Experts F5 face fast fault features fine flexibility focused for full functionality g Gemini Gemini model Gen generated generation generative Generative AI generative AI models git GitHub Go Google Google Cloud Google Drive gs high Highlight http HTTPS image implementation in information infrastructure infrastructure challenges infrastructure security insights integration Intel inter ite Jira Just k knowledge knowledge base knowledge bases l language language model language models large large datasets large language model large language models latency learning led llama llm llms lm long low low latency mac machine managed service management mini ML model models multi native no non o of off on one open open-source opt ory Outputs over parameter parsing performance performance enhancement performance enhancements performance evaluation porting Power pre Preview production products professionals projects proof proof-of-concept QUIC R rag ranking Ray RCE real real-time real-time applications real-world applications repository Requirements resources response retrieval Retrieval-Augmented Generation RSA Rust s scalability Scale search search functionality sec security self service settings SHA sharing side Sig Sim Simple Slack source SSE storage system systems T text the Time time applications to tools Tor TP trial trie trust trust in AI up update US use cases user val Valuation vector database vector databases vector search Vertex Vertex AI WAN web website Wi world applications x XAI XR