Cloud Blog: How to enable real time semantic search and RAG applications with Dataflow ML

Source URL: https://cloud.google.com/blog/topics/developers-practitioners/create-and-retrieve-embeddings-with-a-few-lines-of-dataflow-ml-code/
Source: Cloud Blog
Title: How to enable real time semantic search and RAG applications with Dataflow ML

Feedly Summary: Embeddings are a cornerstone of modern semantic search and Retrieval Augmented Generation (RAG) applications. In short, they enable applications to understand and interact with information on a deeper, conceptual level. In this post, we’ll show you how to create and retrieve embeddings with a few lines of Dataflow ML code to enable both of these use cases. We will cover streaming and batch approaches for generating embeddings and storing them in vector databases such as AlloyDB  to power semantic search and RAG applications with their vector search capabilities.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://goo.gle/try_alloydb’), (‘image’, None)])]>

Semantic search and RAG
In semantic search, we are able to leverage underlying relationships between words to find relevant results beyond a simple keyword match. Embeddings are vector representations of data, ranging from text to videos, that capture these relationships. In a semantic search we find embeddings that are mathematically close to our search query, allowing us to find words or search terms close in meaning that may have not shown up in a keyword search. Databases such as AlloyDB allow us to combine this unstructured search with a structured search to provide high quality relevant results. For example, the prompt “Show me all pictures of sunsets I took in the past month” includes a structured part (the date is within the past month) and an unstructured part (the picture contains a sunset).
In many RAG applications, embeddings play an important role in retrieving the relevant context from a knowledge base (such as a database) to ground the responses of large language models (LLMs).  RAG systems can perform a semantic search on a Vector Database, such as AlloyDB, or directly pull data from the database to provide the retrieved results as context to the LLM so that it has access to the necessary information to generate informative answers. 
Knowledge ingestion pipelines
A knowledge ingestion pipeline takes unstructured content, such as product catalogs with free form descriptions, support ticket dialogs, and legal documents, processes them into embeddings, and then pushes these embeddings into a vector database. The source of this knowledge can vary widely, from files stored in cloud storage buckets (like Google Cloud Storage) and information stored in databases like AlloyDB, to streaming sources such as Google Cloud Pub/Sub or Google Cloud Managed Service for Kafka. For streaming sources, the data itself might be raw content (e.g, plain text) or URIs pointing to documents.  A key consideration when designing knowledge ingestion pipelines is how to ingest and process knowledge, whether in a batch or streaming fashion. 

Streaming vs Batch: To provide the most up-to-date and relevant search results, and thus a superior user experience, embeddings should be generated in real time for streaming data. This applies to new documents being uploaded or new product images, where current knowledge holds significant business value. For less time-sensitive applications and operational tasks like backfilling, a batch pipeline is suitable. Crucially, the chosen framework must support both streaming and batch processing without requiring business logic re-implementation.

Chunking: Regardless of the data source, after reading the data there is normally a step for preprocessing the information. For simple raw text, this might mean basic cleaning. However, for larger documents or more complex content, chunking is a crucial step. Chunking breaks down the source material into smaller, manageable units. The best chunking strategy varies depending on the specific data and application. 

Introducing Dataflows MLTransform for embeddings
Dataflow ML provides many out of the box capabilities to simplify the entire process of building and running a streaming or batch knowledge ingestion pipeline, allowing you to implement these pipelines in a few lines of code. For an ingestion pipeline there are typically four phases, reading from data sources, preprocessing the data, making it ready for embeddings, and finally writing the correctly shaped schema to our vector database. The new capabilities in MLTransform adds support for chunking, generation of embeddings, using Vertex or bring your own (BYO) models and specialized writers for persisting embeddings to databases such as AlloyDB.

Knowledge ingestion using Dataflow

With Dataflow’s new MLTransform capabilities, the flow to chunk and generate embeddings and land them into AlloyDB can be achieved within a few lines of code:

code_block
<ListValue: [StructValue([(‘code’, ‘…\r\ndef to_chunk(product: Dict[str, Any]) -> Chunk:\r\n return Chunk(\r\n content=Content(\r\n text=f”{product[\’name\’]}: {product[\’description\’]}"\r\n ), \r\n id=product[\’id\’], # Use product ID as chunk ID\r\n metadata=product, # Store all product info in metadata\r\n )\r\n\r\n…\r\nwith beam.Pipeline() as p:\r\n _ = (\r\n p\r\n | \’Create Products\’ >> beam.Create(products)\r\n | \’Convert to Chunks\’ >> beam.Map(to_chunk) \r\n | \’Generate Embeddings\’ >> MLTransform(\r\nwrite_artifact_location=tempfile.mkdtemp())\r\n.with_transform(huggingface_embedder)\r\n | \’Write to AlloyDB\’ >> VectorDatabaseWriteTransform(alloydb_config)\r\n )’), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7b8e746d30>)])]>

Importantly, for the embeddings step shown above in Dataflow ML supports the ability to make use of Vertex AI embeddings as well as models from other model gardens and the ability to "bring your own model (BYOM)" hosted on Dataflows workers. 
The example above creates a single chunk per element, but you can also use LangChain for chunking instead:

code_block
<ListValue: [StructValue([(‘code’, "…\r\n\r\n# Pick a LangChain text-splitter to divide your documents into smaller Chunks\r\nsplitter = langchain.text_splitter.CharacterTextSplitter(chunk_size=100, chunk_overlap=20)\r\n# Configure Apache Beam to use your desired text-splitter\r\nlangchain_chunking_provider = beam.ml.rag.chunking.langchain.LangChainChunker(\r\ndocument_field=’content’, metadata_fields=[], text_splitter=splitter)\r\n\r\nwith beam.Pipeline() as p:\r\n_ = (\r\np\r\n| ‘Create Products’ >> beam.io.textio.ReadFromText(products)\r\n\r\n| ‘Generate Embeddings’ >> MLTransform(\r\nwrite_artifact_location=tempfile.mkdtemp())\r\n.with_transform(langchain_chunking_provider)\r\n.with_transform(huggingface_embedder)\r\n…"), (‘language’, ‘lang-py’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7b8e746c40>)])]>

You can find a full working example with AlloyDB here: build realtime vector embedding pipeline for AlloyDB with Dataflow. 
Introducing Dataflow enrichment transform for RAG
To support RAG use cases, we have also enhanced Dataflows enrichment transform to include the ability to look up results from databases, enabling the ability to create asynchronous applications in batch or stream mode that can, in a low code pipeline, consume the information stored in the vector database. This enables you to enrich your applications with operational data to embed or to utilize as filters, so that you don’t need to store your vectors in distinct storage solutions.

Enabling RAG applications with Dataflow

code_block
<ListValue: [StructValue([(‘code’, "params = BigQueryVectorSearchParameters(\r\n project=self.project,\r\n table_name=self.table_name,\r\n embedding_column=’embedding’,\r\n columns=[‘content’],\r\n neighbor_count=1)\r\n\r\nhandler = BigQueryVectorSearchEnrichmentHandler(\r\n vector_search_parameters=params)\r\n\r\nwith beam.Pipeline() as p:\r\n \r\n result = (p | beam.Create(test_chunks) | Enrichment(handler))\r\n )"), (‘language’, ”), (‘caption’, <wagtail.rich_text.RichText object at 0x3e7b8e7469a0>)])]>

Get started today
With these simple code snippets, we’ve shown how it’s possible to not only ingest and prepare the source data needed to populate your vector databases, but also to consume and use that information for your streaming and batch applications where the intent is to process large volumes of data and incoming information. For a full working example please look at the following full example. To explore more please head to Dataflow ML documentation for detailed examples and collabs. Get started with vector search on AlloyDB today. You can also sign up for a 30-day AlloyDB free trial.

Special thanks to Claude van der Merwe, Dataflow ML Engineering, for his contributions to this blog post.

AI Summary and Description: Yes

Summary: The text discusses the use of embeddings in semantic search and Retrieval-Augmented Generation (RAG) applications, explaining how to create and utilize them within the Dataflow ML environment, specifically with AlloyDB. This information is critically relevant for infrastructure, AI, and cloud security professionals engaged in enhancing data retrieval processes and integrating advanced search functionalities into their applications.

Detailed Description:
The text presents a comprehensive overview of embeddings as a core component in semantic search and RAG applications. It elaborates on the importance of embeddings for understanding relationships within data and retrieving contextual information efficiently. Here are the major insights and their significance for professionals in the fields mentioned:

– **Definition and Functionality of Embeddings**:
– Embeddings are vector representations that enable applications to understand and interact with data on a conceptual level, moving beyond simple keyword searches.
– They facilitate semantic searches by mathematically identifying relationships between terms, thus enabling better retrieval of relevant content.

– **Use in Semantic Search**:
– The text explains how embeddings can find terms that are semantically close to a query, thus enhancing the search capability by providing more relevant results than keyword matching alone.
– Example usage includes searching for images of sunsets filtered by the date which combines structured and unstructured data.

– **Retrieval-Augmented Generation (RAG)**:
– Embeddings are employed in RAG systems to retrieve relevant contexts from databases, providing large language models (LLMs) with necessary information to generate accurate responses.
– This enhances the user experience by ensuring that the responses are both informative and contextually relevant.

– **Knowledge Ingestion Pipelines**:
– A process is described for taking unstructured content (such as product catalogs or legal documents) and converting them into embeddings before storing them in a vector database.
– The design of knowledge ingestion pipelines can vary, support batch processing, and accommodate streaming data ingestion to respond to real-time business needs.

– **Streaming vs. Batch Processing**:
– Two approaches to embedding generation are highlighted:
– **Streaming**: For real-time and time-sensitive applications where up-to-date information is crucial.
– **Batch**: For less urgent tasks, suitable for historical data backfilling.

– **Chunking Technique**:
– The text emphasizes the importance of breaking down larger data into manageable chunks to optimize processing for embeddings and overall search efficiency.

– **Dataflow MLTransform and Enrichment Transform**:
– The text introduces Dataflow MLTransform capabilities for simplifying the building and managing of streaming/batch knowledge ingestion pipelines.
– The enrichment transform supports retrieving results from databases, enabling asynchronous operations that can leverage stored vectors effectively.

– **Getting Started**:
– The text provides a call to action with links to sign up for a free trial of AlloyDB, inviting readers to implement and engage with the described practices, showcasing how to enhance applications effectively.

This analysis reveals the underlying framework through which organizations can leverage embeddings and advanced search capabilities in their applications, thereby enhancing data retrieval, ultimately tightening compliance through precise information management practices.