Cloud Blog: What’s new with BigQuery AI and ML?

Source URL: https://cloud.google.com/blog/products/data-analytics/bigquery-adds-new-ai-capabilities/
Source: Cloud Blog
Title: What’s new with BigQuery AI and ML?

Feedly Summary: At Next ’25, we introduced several new innovations within BigQuery, the autonomous data to AI platform. BigQuery ML provides a full range of AI and ML capabilities, enabling you to easily build generative AI and predictive ML applications with BigQuery. The new AI and ML capabilities from BigQuery ML include: 

a new state-of-the-art pre-trained forecasting model (TimesFM) which drastically simplifies forecasting problems

support for generating or extracting structured data with large language models (LLMs)

a set of new row-wise inference functions enabling you to mix gen AI processing with standard SQL

expanded model choice with Gemini and OSS models

the general availability of the Contribution Analysis feature, useful for explaining changes in your business metrics

Let us explore these new capabilities.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

1. TimesFM forecasting model in BigQuery
Accurate time series forecasting is essential for many business scenarios such as planning, supply chain management, and resource allocation. BigQuery now embeds TimesFM, a state-of-the-art (SOTA) pre-trained model from Google Research, enabling powerful forecasting via the simple AI.FORECAST function. Trained on over 100 billion real-world time-points, TimesFM provides impressive zero-shot forecasting accuracy across various real world domains and at different granularities without requiring you to train or tune on your data. 
Key benefits of TimesFM in BigQuery include:

Managed and scalable: A fully managed, highly scalable forecasting engine within BigQuery.

Easy forecasting: Generate forecasts for one or millions of time series in a single query – no model training required.

Here’s a basic example of creating a forecast using the new AI.FORECAST function with TimesFM: 
SQL
SELECT * FROM AI.FORECAST(
TABLE dataset.table,
data_col => “data",
timestamp_col => "timestamp",
model => "TimesFM 2.0",
horizon => 30
)
This query forecasts the "data" column for the next 30 time units, using "timestamp" as the time identifier. Please see the documentation for more details.
2. Structured data extraction and generation with LLMs
Extracting structured information consistently from unstructured data such as customer reviews, emails, logs etc. can be complex. BigQuery’s new AI.GENERATE_TABLE function simplifies structured data extraction/generation using the constrained decoding capabilities of LLMs. This function takes a model, a table of input data and an output_schema as inputs and outputs a table whose schema is determined by the output_schema parameter.
Here’s how you can use AI.GENERATE_TABLE: 
SQL
SELECT * FROM AI.GENERATE_TABLE(
MODEL project_id.dataset.model,
(SELECT medical_transcripts as prompt from table),
STRUCT("age INT64, medications ARRAY<STRING>" AS output_schema)
)
In this example, the output table has ‘age’ and ‘medications’ columns — no complex parsing required. The output is written as a BigQuery temporary table. To materialize the results to a permanent table, the above query can be used in a DDL statement:
CREATE TABLE project_id.dataset.my_structured_table
AS <AI.GENERATE_TABLE subquery>
Please see the documentation for more details.
3. Row-wise (Scalar) LLM functions
The first wave of BigQuery’s LLM functions focused on table-valued functions (TVFs) that output entire tables. We are now introducing row-wise AI functions for LLM inference for more flexible and expressive data manipulation and analysis. These scalar functions enhance the usability of LLMs within BigQuery, as they can be used anywhere a value is needed, such as in SELECT, WHERE, JOIN, and GROUP BY clauses. Let’s go though some of the capabilities we are adding: a) Basic text generation with AI.GENERATE
First, let’s see how the new AI.GENERATE() can be used for convenient row-wise LLM inference:
SELECT
city,
AI.GENERATE(
(‘Give a short, one sentence description of ‘, city),
connection_id => ‘us.test_connection’,
endpoint => ‘gemini-2.0-flash’).result
FROM mydataset.cities;
b) Structured output with AI.GENERATE
In addition, the structured output generation capabilities introduced above also extend to row-wise AI functions. In this example, the query generates state capitals for a list of states, using the output_schema argument to set two custom fields in the output struct — state and capital:
SQL
SELECT
state,
AI.GENERATE(
(‘What is the capital of ‘, state, ‘?’),
connection_id => ‘us.example_connection’,
endpoint => ‘gemini-2.0-flash’,
output_schema => ‘state STRING, capital STRING’).capital
FROM mydataset.states;
c) Type-specific functions (e.g., AI.GENERATE_BOOL)
For common tasks requiring specific data types like boolean, integer, or float, BigQuery now offers simple, type-specific functions. For instance, you can use AI.GENERATE_BOOL for classification or validation tasks:
SQL
SELECT city.name, AI.GENERATE_BOOL(
("Is", city.name, "in the state of WA?"),
connection_id => "us.example_connection",
endpoint => ‘gemini-2.0-flash’).result
FROM city
Additional type-specific functions, namely AI.GENERATE_INT and AI.GENERATE_DOUBLE, are also available for generating integer and floating-point results. Please see the documentation for more details.
4. Expanded model choice: Gemini, OSS and third-party
BigQuery ML allows you to use LLMs to perform tasks such as entity extraction, sentiment analysis, translation, text generation, and more on your data using familiar SQL syntax. In addition to first-party Gemini models, BigQuery supports inference with open-source and third-party models, which comes in two flavors:

Customer-managed endpoints for open source models (previously announced): You can host any open source model of your choice on a Vertex AI Model Garden endpoint and then use it from BigQuery.

Model as a service integrations: Access fully managed model endpoints directly through BigQuery. This already included models like Anthropic’s Claude, and we are excited to announce newly added support for Llama and Mistral models, further expanding model choice available to developers.

5. Contribution analysis now generally available
Businesses constantly need to answer questions like "Why did our sales drop last month?" or " For what user, device, demographics combination was our marketing campaign most effective?” Answering these "why" questions accurately is vital, but often involves complex manual analysis. The BigQuery contribution analysis feature automates this analysis and helps you pinpoint the key factors (or combinations of factors) responsible for the most significant changes in a metric between the control and test groups you define. 
Now generally available, the BigQuery ML contribution analysis release includes enhancements focused on improved interpretability and performance, including: 

A new summable by category metric to analyze the sum of a numerical measure of interest normalized by a categorical variable

Top-K Insights by Apriori Support option to automatically fetch k insights with the largest segment size

A redundant insight pruning option, which improves result readability by returning only unique insights 

Let’s say you want to understand what drove changes in the average sales per user across various vendors and payment types between the control and test data. To answer this with a contribution analysis model, you tell BigQuery which factors (dimensions) to investigate (dimension_id_cols), what metric you care about (contribution_metric), and which column identifies your test/control groups (is_test_col).
SQL
— Define the contribution analysis task

CREATE MODEL bqml_tutorial.contribution_analysis_model
OPTIONS (
model_type = ‘CONTRIBUTION_ANALYSIS’,
dimension_id_cols = [‘vendor’, ‘month’, ‘payment_type’],
contribution_metric = ‘sum(sales)/count(distinct user_id)’,
is_test_col = ‘is_test_col’,
top_k_insights_by_apriori_support = 25,
pruning_method = ‘PRUNE_REDUNDANT_INSIGHTS’
) AS
SELECT * FROM dataset.input_data;
Once the model is created, you can use a SQL query like the following to generate insights:
SELECT * FROM ML.GET_INSIGHTS (MODEL bqml_tutorial.contribution_analysis_model);
BigQuery returns a prioritized list showing which combinations of factors (e.g., "Users paying via Amex Credit Card from Vendor") had the most significant impact on the average sales per user between your control and test groups.
Bring AI into your data
The latest BigQuery ML updates bring powerful AI/ML capabilities directly into your data workflows. Between forecasting with TimesFM, automated root-cause analysis with contribution analysis, flexible row-wise LLM functions, streamlined structured data generation, and expanded model choice, you can move faster from data to insights and impactful outcomes. 
You can also view the Next ‘25 breakout session, including a demo showcasing these capabilities.

AI Summary and Description: Yes

Summary: The text introduces significant updates to Google Cloud’s BigQuery, focusing on enhanced AI and ML capabilities, particularly for generative AI, predictive analytics, and structured data handling. These innovations streamline processes for professionals in data analysis and machine learning by integrating advanced models within familiar SQL environments.

Detailed Description:
The provided text outlines a number of key advancements in Google Cloud’s BigQuery platform aimed at leveraging AI and ML to improve data analytics workflows. Here are the major points of interest, particularly for professionals in AI, cloud computing, and infrastructure security:

– **BigQuery ML Enhancements**:
– New functionalities that allow users to create generative AI and predictive ML applications directly within BigQuery.

– **TimesFM Forecasting Model**:
– A state-of-the-art pre-trained model that simplifies time-series forecasting.
– Features:
– Requires no training or tuning on user data.
– Fully managed and highly scalable, enabling forecasts for numerous time series in one query.

– **Structured Data Extraction with LLMs**:
– Introduction of the AI.GENERATE_TABLE function which streamlines structured data extraction from unstructured sources.
– This function utilizes large language models (LLMs) to create structured output without complex parsing.

– **Row-wise (Scalar) LLM Functions**:
– New row-wise AI functions improve the flexibility of data manipulation in SQL.
– Type-specific functions such as AI.GENERATE_BOOL allow for more tailored data outputs in classification tasks.

– **Expanded Model Choice**:
– Support for first-party (Gemini) and third-party models (e.g., Claude, Llama, Mistral).
– Enhanced accessibility to open-source models through managed endpoints.

– **Contribution Analysis Functionality**:
– A new feature that automates root-cause analysis for business metrics, providing insights into performance changes.
– Includes methods for enhanced interpretability and performance, helping businesses understand key factors affecting their metrics.

Overall, the updates highlight the integration of sophisticated AI/ML capabilities directly into the BigQuery platform, enabling security and compliance professionals to leverage AI-driven insights more effectively. This evolution not only enhances data analysis but also solidifies the role of AI in cloud computing security, as insights gained can directly impact compliance and governance strategies. These capabilities allow organizations to transition smoothly from raw data to actionable insights, thus improving their operational intelligence.