Cloud Blog: Gemini and OSS text embeddings are now in BigQuery ML

Source URL: https://cloud.google.com/blog/products/data-analytics/use-gemini-and-open-source-text-embedding-models-in-bigquery/
Source: Cloud Blog
Title: Gemini and OSS text embeddings are now in BigQuery ML

Feedly Summary: High-quality text embeddings are the engine for modern AI applications like semantic search, classification, and retrieval-augmented generation (RAG). But when it comes to picking a model to generate these embeddings, we know one size doesn’t fit all. Some use cases demand state-of-the-art quality, while others prioritize cost, speed, or compatibility with the open-source ecosystem.
To give you the power of choice, we’re excited to announce a major expansion of text embedding capabilities in BigQuery ML. In addition to the existing text-embedding-004/005 and text-multilingual-embedding-OO2 models, you can now use Google’s powerful Gemini embedding model and over 13K open-source (OSS) models directly within BigQuery. Generate embeddings with simple SQL commands, using the model that’s right for your job, right where your data lives.

Choosing an embedding model to fit your needs
With the addition of Gemini and a wide array of OSS models, BigQuery now offers a comprehensive list of text-embedding options. You can choose the right one that fits your needs, for model quality, cost, and scalability. Here’s a breakdown to help you decide.

Model Category

Model Quality

Cost

Scalability

Billing

text-embedding-005 & text-multilingual-embedding-002

Very High

Moderate cost

Very high scalability:
O(100M) per 6h

$0.00002 per 1,000 characters

[New] Gemini Text Embedding

State-of-the-art

Higher cost

High scalability:
O(10M) per 6h

$0.00012 per 1,000 tokens

[New] OSS Text Embedding

From SOTA third party models to smaller models prioritizing efficiency over quality

Tied to model size and hardware; smaller models are highly cost-effective.

Can achieve the highest scalability by reserving more machines: O(B) per 6h

Per-machine-hour

Gemini embedding model in BigQuery ML
The Gemini embedding model is a state-of-the-art text embedding model that has consistently secured the top position on the Massive Text Embedding Benchmark (MTEB) leaderboard, a leading industry text-embedding benchmark. The model’s first-place ranking on the MTEB leaderboard directly demonstrates its high quality embedding. 
Here’s how to use the new Gemini embedding model in BigQuery:
1. Create a Gemini embedding model in BigQueryCreate the Gemini embedding model in BigQuery with the following SQL statement: by specifying the endpoint to be `gemini-embedding-001`:

code_block
)])]>

2. Batch embedding generationNow you can generate embeddings using the gemini-embedding model directly from BigQuery. The following example embeds the text in the bigquery-public-data.hacker_news.full dataset.

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT\r\n *\r\nFROM\r\n ML.GENERATE_EMBEDDING( \r\n MODEL bqml_tutorial.gemini_embedding_model,\r\n (\r\n SELECT\r\n text AS content\r\n FROM\r\n bigquery-public-data.hacker_news.full\r\n WHERE\r\n text IS NOT NULL\r\n LIMIT 10000\r\n )\r\n );’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e68140c2bb0>)])]>

The Gemini embedding model utilizes a new quota control method known as Tokens Per Minute (TPM). For projects with a high reputation score, the default TPM value is 5 million. It’s important to note that the maximum TPM a customer can set without manual approval is 20 million.
The total number of rows a job can process depends on the number of tokens per row. For example, a job processing a table with 300 tokens per row can handle up to 12 million rows with a single job.
OSS embedding models in BigQuery ML
The OSS community is rapidly evolving the text-embedding model landscape, offering a wide spectrum of choices to fit any need. Offerings range from top-ranking models like the recent Qwen3-Embedding & EmbeddingGemma to small, efficient, and cost-effective small models such as multilingual-e5-small.
You can now use any Hugging Face text embedding models (13K+ options) deployed to Vertex AI Model Garden in BigQuery ML. To showcase this capability, in our example we will use multilingual-e5-small, which delivers respectable performance while being massively scalable and cost-effective, making it a strong fit for large-scale analytical tasks.
To use an open-source text-embedding model, follow these steps.  
1. Host the model on a Vertex endpointFirst, choose a text-embedding model from Hugging Face, in this case, the above-mentioned multilingual-e5-small. Then, navigate to Vertex AI Model Garden > Deploy from Hugging Face. Enter the model URL, and set the Endpoint access to “Public (Shared endpoint)”. You can also customize the endpoint name, region, and machine specs to fit your requirements. The default settings provision a single replica of the specified machine type below:

Alternatively, you can do this step programmatically using BigQuery Notebook (see the tutorial notebook example) or other methods mentioned in the public documentation.
2. Create a remote model in BigQueryDeploying the model deployment takes several minutes. Once it’s complete, create the corresponding remote model in BigQuery with the following SQL statement:

code_block
<ListValue: [StructValue([(‘code’, "CREATE OR REPLACE MODEL bqml_tutorial.multilingual_e5_small\r\nREMOTE WITH CONNECTION DEFAULT\r\nOPTIONS\r\n(endpoint=’https://<region>-aiplatform.googleapis.com/v1/projects/<project_name>/locations/<region>/endpoints/<endpoint_id>’\r\n);"), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e68140c2550>)])]>

Replace the placeholder endpoint in the above code sample with the endpoint URL. You can get information on endpoint_id from the console via Vertex AI > Online Prediction>Endpoints>Sample Request.
3. Batch embedding generationNow you can generate embeddings using the OSS multilingual-e5-small model directly from BigQuery. See an example as follows, which embeds the text in the bigquery-public-data.hacker_news.full dataset.

code_block
<ListValue: [StructValue([(‘code’, ‘SELECT\r\n *\r\nFROM\r\n ML.GENERATE_EMBEDDING( \r\n MODEL bqml_tutorial.multilingual_e5_small,\r\n (\r\n SELECT\r\n text AS content\r\n FROM\r\n bigquery-public-data.hacker_news.full\r\n WHERE\r\n text IS NOT NULL\r\n LIMIT 10000\r\n )\r\n );’), (‘language’, ‘lang-sql’), (‘caption’, <wagtail.rich_text.RichText object at 0x3e68140c2880>)])]>

By selecting the default resources for your model deployment, the query can process all 38 million rows in the hacker_news dataset in approximately 2 hours and 10 minutes — a baseline throughput achieved with just a single model replica. 
You can scale the workload by provisioning more computational resources or enable endpoint auto scaling. For example, if you choose 10 replicas, the query gets 10 times faster, and is able to handle billions of rows per a six-hour query job. 
4. Undeploy the modelVertex AI endpoints accrue costs for as long as they are active, even when the machine is idle. To complete the batch job and prevent unexpected costs, you must “undeploy” the model after batch inference. You can do this through the Google Cloud console (Vertex AI > Endpoints > your endpoint > Undeploy model from endpoint), or by using our example Colab script.
For a batch workload, the most cost-effective pattern is to treat deployment, inference, and un-deployment as a continuous workflow, minimizing idle expenses. One way to achieve this is to run the sample Colab sequentially. The results are remarkable: processing all 38 million rows costs only about $2 to $3.
Get started today
Ready to build your next AI application? Start generating embeddings with Gemini or your favorite open-source models directly in BigQuery today. Dive into our documentation to learn the details, or follow our step-by-step Colab tutorial to see it in action.

AI Summary and Description: Yes

Summary: The text discusses the expansion of text embedding capabilities in BigQuery ML, highlighting the addition of Google’s Gemini embedding model and over 13,000 open-source (OSS) models. It emphasizes the importance of selecting the right embedding model based on quality, cost, and scalability, providing insights into how professionals can implement these models effectively.

Detailed Description: The text provides a comprehensive overview of the new capabilities in BigQuery ML that allow users to leverage advanced text embedding models for various applications. This is particularly significant in AI, where high-quality embeddings are crucial for tasks like semantic search, classification, and generation. Here are the major points discussed:

– **Introduction of Gemini and OSS Models**:
– Gemini model: State-of-the-art text embedding that ranks first on the Massive Text Embedding Benchmark (MTEB), representing the highest quality available.
– Over 13,000 open-source models now available to cater to various use cases, from high-quality models to efficient smaller ones.

– **Choosing the Right Embedding Model**:
– Users can choose embedding models based on several criteria such as quality, cost, and scalability.
– A comparison table highlights the characteristics of different model categories:
– **text-embedding-005 & text-multilingual-embedding-002**: Very high quality, moderate cost, and scalable.
– **Gemini**: State-of-the-art quality, higher cost, high scalability.
– **OSS Models**: Ranges in quality and cost depending on the model size, allowing for flexibility and cost-effectiveness.

– **How to Use These Models in BigQuery**:
1. **Creating a Gemini Model**: Steps for creating a model using SQL commands are outlined.
2. **Batch Embedding Generation**: Instructions for generating embeddings using Gemini or any OSS model.
3. **Quotas and Scaling**: Introduction of a new quota control method (Tokens Per Minute) and how this affects processing rows in BigQuery.
4. **Cost Management**: Guidance on how to effectively manage resource provisioning and minimize costs during model deployment and inference.

– **Deploying OSS Models**:
– Detailed steps for hosting OSS models, creating remote models in BigQuery, and generating embeddings.
– Emphasizes the ease of integration and cost efficiencies when using OSS models.

– **Practical Implications**:
– The described functionalities enable organizations to effectively utilize AI technologies while managing budgets and computing resources efficiently.
– Professionals in AI, data science, and analytics can greatly benefit from the expanded capabilities and reduced costs for high-volume text embedding tasks.

This expansion in BigQuery ML provides significant advancements for practitioners in AI and cloud computing, emphasizing a move toward high-quality, scalable, and cost-effective solutions.