Cloud Blog: Boost your Search and RAG agents with Vertex AI’s new state-of-the-art Ranking API

May 30, 2025

—

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/launching-our-new-state-of-the-art-vertex-ai-ranking-api/
Source: Cloud Blog
Title: Boost your Search and RAG agents with Vertex AI’s new state-of-the-art Ranking API

Feedly Summary: The AI era has supercharged expectations: users now issue more complex queries and demand pinpoint results, meaning there’s an 82% chance of losing a customer if they can’t quickly find what they need. Similarly, AI agents require ultra-relevant context for reliable task execution. However, when traditional search methods deliver noise – with generally up to 70% of retrieved passages lacking a true answer – both agentic workflows and user experiences suffer from untrustworthy and unreliable results.
To help businesses meet these rising expectations, we’re launching our new state-of-the-art Vertex AI Ranking API. It makes it easy to boost the precision of information surfaced within search, agentic workflows, and retrieval-augmented generation (RAG) systems. This means you can elevate your legacy search system and AI application in minutes, not months.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Go beyond simple retrieval
This is where precise ranking becomes essential. Think of the Vertex AI Ranking API as the precision filter at the crucial final stage of your retrieval pipeline. It intelligently sifts through the initial candidate set, identifying and elevating only the most pertinent information. This refinement step is key to unlocking higher quality, more trustworthy, and more efficient AI applications.
Vertex AI Ranking API acts as this powerful, yet easy-to-integrate, refinement layer. It takes the candidate list from your existing search or retrieval system and re-orders it based on deep semantic understanding, ensuring the best results rise to the top. Here’s how it helps you uplevel your systems:

Upgrade legacy search systems: Easily add state-of-the-art relevance scoring to existing search outputs, improving user satisfaction and business outcomes on commercial searches without overhauling your current stack.

Strengthen RAG systems: Send fewer, more relevant documents to your generative models. This improves answer trustworthiness while reducing latency and operating costs by optimizing context window usage.

Support intelligent agents: Guide AI agents with highly relevant information, streamlining their context and traces, and significantly improving the success rate of task completion.

Figure 1: Ranking API usage in a typical search and retrieval flow

What’s new in Ranking API
Today, we’re launching our new semantic reranker models:

semantic-ranker-default-004 – our most accurate model for any use case
semantic-ranker-fast-004 – our fastest model for latency-critical use cases

Our model establishing a new benchmark for ranking performance:

State-of-the-art ranking: Based on evaluations using the industry-standard BEIR dataset, our model leads in accuracy among competitive standalone reranking API services. The nDCG is a metric that’s used to evaluate the quality of a ranking system by assessing how well ranked items align with their actual relevance and prioritizes relevant results at the top. We’ve published our evaluation scripts to ensure reproducibility of results.

Figure 2: semantic-ranker-default-004 leads in NDCG@5 on BEIR datasets compared to other rankers.

Industry-leading low latency: Our default model (semantic-ranker-default-004) is at least 2x faster than competitive reranking API services at any scale. Our fast model (semantic-ranker-fast-004) is tuned for latency-critical applications and typically exhibits 3x lower latency than our default model.

We’re also launching long context ranking with a limit of 200k total tokens per API request. Providing longer documents to the Ranking API allows it to better understand nuanced relationships between queries and information such as for customer reviews or product specifications in Retail.
Real-world impact across domains
The benefits aren’t just theoretical. Benchmarks on industry-specific datasets demonstrate that integrating the Ranking API can significantly boost the quality of search results across diverse high-value domains such as retail, news, finance, and healthcare.

Figure 3: nDCG@5 performance improvement with semantic-ranker-default-004 in various high-value domains based on internal datasets. Lexical & Semantic search baseline uses the best result of Vertex AI text-embedding-004 and BM25 based retrieval.

Elevate your search results in minutes
We designed the Vertex AI Ranking API for seamless integration. Adding this powerful relevance layer is straightforward, with several options:

Try it live: Experience the difference on real-world data by enabling our Ranking API in the interactive Vertex Vector Search demo (link)

Build with Vertex AI: Integrate directly into any existing system for maximum flexibility (link)

Enable it in RAG Engine: Select Ranking API in your RAG Engine to get more robust and accurate answers from your generative AI applications (link)

Use it in AlloyDB: For a truly streamlined experience, leverage the built-in ai.rank() SQL function directly within AlloyDB – a novel integration simplifying search use cases with AlloyDB (link)

AI Frameworks: Use our native integrations with popular AI frameworks like GenKit and LangChain (link)

Use it in Elasticsearch: Quickly boost accuracy with our built-in Ranking API integration in Elasticsearch (link)

AI Summary and Description: Yes

**Summary:** The text introduces the Vertex AI Ranking API, which enhances search precision and retrieval operations within AI applications. This tool addresses the challenges of user expectations for accurate results and provides businesses with an advanced mechanism to uplift their existing systems, thus potentially increasing customer satisfaction and operational efficiency.

**Detailed Description:**

The Vertex AI Ranking API is a new offering aimed at improving the precision of information retrieval within AI systems. By addressing the common issues of irrelevant results that plague traditional search methods, this API enhances both user experiences and the reliability of AI agents. Here are the major points of relevance:

– **Market Need:**
– Users have increased expectations for accuracy in search results, with a high likelihood of losing customers if they cannot find relevant information quickly.
– Traditional search results often contain noise, leading to trust issues in AI outputs.

– **API Functionality:**
– **Precision Enhancer:** The API serves as a “precision filter,” refining the candidate set of results to ensure that only the most pertinent information is surfaced.
– **Integration Ease:** It can be easily integrated into legacy systems to enhance relevance scoring without the need for extensive revisions to existing infrastructures.

– **Key Benefits:**
– **Upgrade Legacy Systems:** Businesses can improve outcomes on commercial searches by enhancing the relevance of existing search outputs.
– **Support for RAG Systems:** Fewer relevant documents are sent to generative models, enhancing answer quality and optimizing resource use.
– **Intelligent Agents:** The API helps AI agents with context clarity, leading to higher task completion success rates.

– **Performance Metrics:**
– New semantic reranker models have been introduced, focusing on accuracy and speed.
– The default model has outperformed competitive services in benchmark tests, making it ideal for a range of applications.
– It features industry-leading low latency, which is essential for applications that require quick responses.

– **Real-World Application:**
– The Ranking API has shown significant improvements in high-value domains such as retail, finance, and healthcare, demonstrating its versatility and effectiveness across various industries.

– **Integration Options:**
– Users can experience the API live, or integrate it directly into existing systems for increased flexibility, including options for AlloyDB and popular AI frameworks.

By providing this advanced ranking capability, the Vertex AI Ranking API represents a significant step forward for organizations seeking to enhance their AI functionalities and address the challenges posed by user expectations in the digital landscape. This offering is particularly relevant for professionals involved in AI, infrastructure, and cloud computing security, as it contributes to better data handling and AI application reliability.

1 2 3 4 5 7 a accuracy Act addresses ads advanced agent agentic workflows agents AI AI applications AI frameworks AI systems Alloy AlloyDB alt and anti API app Application applications Arch art as Augment augmented generation based benchmark benchmarks Best beyond Bi building built business by C capability chain challenges CI CIA Cloud cloud computing cloud computing security co commercial competitive Computing Console Context context window cost Costs critical critical applications cross Current Customer D data Data Handling dataset datasets day de deep demand demo design digital digital landscape document domain domains e E 3 e-learning effective effectiveness efficiency efficient Elasticsearch end evaluation evaluations execution exp experience face fact fast fault feature features finance fine flexibility for framework frameworks free function functionality g Gen general generation generative Generative AI generative model Generative Models git Go Google grade H handling health Healthcare high HR http HTTPS image in industry information information retrieval infrastructure infrastructures integration Integration Options integrations Intel intelligent agents inter intern io issue ite J Just k Key l land LangChain latency leading learning least led legacy systems level Li liability Link long low low latency M mac machine making man market max metrics Mila ML Mode model models N native news no non Nuanced o of off on one only oost operation operational efficiency operations opt options organization organizations oS out output Outputs over performance performance improvement performance metrics Pipeline point potential Power pre precision product products professionals Q quality queries QUIC R rag Rank ranking Ranking API rate RCE real real-world data red reliability reproducibility reranking resource response responses retail retrieval Retrieval Operations Retrieval-Augmented Generation revision Ro RSA Rust s Scale search search and retrieval Search Precision search results sec security Semantic Semantic Search semantic understanding service services side Sig Sim Simple source specific sql SSE stack start state structures support system systems T Task task execution test text the to token tokens tool Tor TP trial trie trust trust issues trustworthiness two UI Ultra under up upgrade US usage use use cases user user experience user satisfaction Users V val Valuation vector search versatility Vertex Vertex AI Vision Well Wi Wind workflow workflows world world impact x