Source URL: https://cloud.google.com/blog/topics/hybrid-cloud/on-prem-generative-ai-search-with-google-distributed-cloud-rag/
Source: Cloud Blog
Title: Find sensitive data faster (but safely) with Google Distributed Cloud’s gen AI search solution
Feedly Summary: Today, generative AI is giving organizations new ways to process and analyze data, discover hidden insights, increase productivity and build new applications. However, data sovereignty, regulatory compliance, and low-latency requirements can be a challenge. The need to keep sensitive data in certain locations, adhere to strict regulations, and respond swiftly can make it difficult to capitalize on the cloud’s innovation, scalability, and cost-efficiency advantages.
Google Distributed Cloud (GDC) brings Google’s AI services anywhere you need them — in your own data center or at the edge. Designed with AI and data-intensive workloads in mind, GDC is a fully managed hardware and software solution featuring a rich set of services. It comes in a range of extensible hardware form factors, with leading industry independent software vendor (ISV) solutions integrated via GDC Marketplace, and your choice of whether to run it connected to Google Cloud’s systems or air-gapped from the public internet.
In this blog post, we dive into the details of how GDC’s new AI-optimized servers with NVIDIA H100 Tensor Core GPUs and our gen AI search packaged solution — now available in preview — allow you to bring increasingly popular retrieval-augmented generation (RAG) to your on-premises environment, and unlock multimodal and multilingual natural-language search experiences across your text, image, voice, and video data.
aside_block
Gen AI-optimized infrastructure
GDC air-gapped now incorporates new servers with NVIDIA H100 GPUs, powered by the advanced NVIDIA Hopper architecture and the 5th Gen Intel Xeon Scalable processors. The new servers introduce the new GPU-optimized A3 VM family optimized for NVIDIA NVLink interconnect to GDC, enabling faster shared compute and memory for AI workloads using large language models (LLMs) with up to 100 billion parameters. It also extends the set of NVIDIA Multi-Instance GPU (MIG) profiles, supporting a variety of new GPU slicing schemes (both uniform and mixed-mode) and dynamic allocation of GPU resources to serve the needs of AI services with better ownership costs.
Ready-to-deploy on-prem conversational search
With GDC’s new gen AI Search solution, you get a ready-to-deploy, on-prem conversational search solution based on the Gemma 2 LLM with 9 billion parameters. You can easily ingest your sensitive on-prem data into the search solution and quickly find the most relevant information and content via natural language search, boosting employee productivity and knowledge sharing, while helping ensure that the search queries and data remain on-prem.
Responses also include citation links to your original documents so you can easily verify all answers to reduce hallucinations. Watch the demo below to see the solution in action:
For more accurate responses, the GDC gen AI search solution relies on a RAG architecture to combine the benefits of traditional search and generative AI, and user queries are augmented with relevant on-prem data before they’re sent to the LLM to generate responses. Other core integrations available out-of-box include Vertex AI pre-trained APIs (translation for 105 languages, speech-to-text for 13 languages, and optical character recognition for 46 supported and 24 experimental languages) for multimodal and multilingual data ingestion across text, images, and audio. It also includes the AlloyDB Omni database service for embeddings storage and semantic search across ingested data.
GDC’s open cloud approach also allows you to customize this solution according to your needs and swap any components as you see fit, including for other database services like Elasticsearch, other open-source models and LLMs, or your own proprietary models.
Get started on your GDC development journey
To join GDC’s gen AI search solution preview and experience how on-prem gen AI search can transform how your organization retrieves information, contact your Google account representative. Note that you will need a GDC deployment where you can deploy and run the preview.
AI Summary and Description: Yes
Summary: The text discusses Google Distributed Cloud (GDC) and its capabilities in supporting organizations’ needs for AI and data-intensive workloads while addressing challenges related to data sovereignty and compliance. GDC features AI-optimized infrastructure that enables on-premises deployment of generative AI solutions, enhancing data processing and natural language search capabilities.
Detailed Description:
The text presents an overview of Google Distributed Cloud (GDC) and its innovations in the realm of generative AI and data handling. Several key points highlight the advancements and relevance of GDC for organizations looking to leverage AI in a compliant and efficient manner:
– **Generative AI Trends**: Organizations are increasingly using generative AI to analyze data, boost productivity, and develop new applications. However, challenges such as data sovereignty and regulatory compliance can hinder full utilization of these advantages.
– **Google Distributed Cloud (GDC)**:
– GDC delivers Google’s AI services across various environments, whether on-premises or at the edge.
– It is tailored for AI and data-intensive workloads and is characterized by its fully managed hardware and software solutions.
– **Innovative Infrastructure**:
– GDC features new AI-optimized servers powered by NVIDIA H100 Tensor Core GPUs and advanced Intel Xeon processors.
– Supports large language models (LLMs) with up to 100 billion parameters and efficient resource management through GPU-optimized architectures.
– **Data Management and Compliance**:
– The GDC solution allows businesses to keep sensitive data on-premises and fulfill data sovereignty requirements.
– The conversational search solution based on the Gemma 2 LLM helps organizations efficiently manage and retrieve information while ensuring compliance.
– **Enhanced Search Capabilities**:
– The integration of RAG architecture enables the enhancement of search queries with relevant on-prem data.
– Out-of-box integrations include APIs for translation, speech-to-text, and optical character recognition to accommodate multimodal and multilingual data queries.
– **Customization Options**:
– GDC’s open cloud approach allows organizations to customize their deployments and integrate various databases or models as needed.
– **Invitation for Engagement**:
– Organizations are encouraged to explore the GDC gen AI search solution and experience its capabilities first-hand by contacting their Google account representative.
The content is highly pertinent to professionals in AI, cloud computing, and infrastructure security domains, particularly those focusing on implementing compliant and innovative AI solutions that respect data sovereignty. It underscores the balance between leveraging advanced AI technologies while adhering to crucial regulatory frameworks and enhancing organizational efficiency.