Cloud Blog: Accelerate analytics with AI-assisted data preparation in BigQuery, now GA

Source URL: https://cloud.google.com/blog/products/data-analytics/ai-assisted-bigquery-data-preparation-now-ga/
Source: Cloud Blog
Title: Accelerate analytics with AI-assisted data preparation in BigQuery, now GA

Feedly Summary: According to Gartner®, “Gartner clients now report that 90% or more of their time is spent preparing data (as high as 94% in complex industries) for advanced analytics, data science and data engineering.”1. Last year, we introduced BigQuery data preparation, which helps data analyst teams wrangle data with help from Gemini in BigQuery. With it, the tedious task of data preparation becomes a breeze as Gemini analyzes your data and schema, and offers context-aware suggestions for cleaning, transforming, and enriching your data.
BigQuery’s approach to data preparation can also help you automate building data pipelines, allowing users with varying technical backgrounds to efficiently prepare data for analysis, regardless of their proficiency with SQL. Once data has been prepared, you can then run your data integration workloads on BigQuery’s serverless, cloud-native, AI-ready data analytics platform.
Today, we’re taking things one step further and announcing that BigQuery data preparation is generally available. It now also integrates with BigQuery pipelines, letting you connect data ingestion and transformation tasks so you can create end-to-end data pipelines with incremental processing, all in a unified environment. You can view all the transformations that BigQuery data preparation generates as SQL code and use BigQuery repositories and Git to collaborate on and manage your code.

A BigQuery data preparation refresher 
BigQuery data preparation leverages Gemini to provide you with intelligent guidance throughout the data preparation process. This includes:

Comprehensive transformation capabilities: Because data preparation runs on BigQuery, it supports a wide array of data transformation functions, including typecasting, string manipulation, datetime math, and JSON extraction.

Data standardization: Gemini in BigQuery analyzes your data and schema to provide intelligent suggestions for cleaning and transforming data. For example, it can identify valid date formats and standardize your data accordingly.

Automated schema mapping: Built-in schema handling helps you manage schema drift and helps prevent production pipelines from failing.

AI-suggested join keys for data enrichment: BigQuery data preparation analyzes your data and suggests relevant join keys for data enrichment.

In addition, users benefit from visual, low-code data pipeline features:

Visual data pipelines: Design, execute, and monitor complex data pipelines with a user-friendly, low-code visual interface. Cost-efficient processing on BigQuery’s fully managed and completely serverless platform scales to any use case. For more efficient changed data propagation, you can also configure your preparations to process data incrementally.

Data quality enforcement with error tables: Define validation rules and automatically route invalid rows to a designated error table, helping to ensure data quality and integrity.

Streamlined deployment with GitHub integration: You can view data preparations in pipe query syntax and export them to a Git repository for version control.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

Tasks, assemble! with BigQuery pipelines
You can now visually connect a series of data processing tasks, including data preparation tasks, in a defined sequence with BigQuery pipelines. The data preparation integration with BigQuery pipelines makes it easy to add it as part of an automation and orchestration flow, enabling end-to-end data pipelines that encompass data ingestion, preparation, transformation, and loading.

Wrangle your CLs with BigQuery repositories
Data preparation now generates SQL code in pipe query syntax, which simplifies complex queries and improves readability.  This enables data engineers to easily review data preparation code, include it in larger pipelines, and integrate data preparations in CI/CD process for better collaboration, version control, and automated deployment. This transparency helps you bridge the gap between visual transformations and code, thus bridging across teams and preferences. 
BigQuery data preparation integrates with BigQuery repositories and Git, providing robust version control and collaboration features for your data preparation assets. You can treat your data preparations as code artifacts and check them into repositories, enabling you to track changes, collaborate with team members, and revert to previous versions if needed. This integration streamlines the development process, promotes code reusability, and ensures that your data preparation logic is well-managed and auditable.

What customers are saying
GAF is a major manufacturer of roofing materials in North America, and is adopting data preparation for creating data transformation pipelines on BigQuery.
“GAF is looking to modernize the ETL infrastructure and adopt a BigQuery native, low-code solution. BigQuery data preparation will help our skilled business users and the analytics team in the data preparation processes for the enablement of self-service analytics.” – Puja Panchagnula, Management Director – Enterprise Data Management & Analytics, GAF
mCloud Technologies helps businesses in sectors like energy, buildings, and manufacturing to optimize the performance, reliability, and sustainability of their assets.
“We receive file data feeds from our partners. BigQuery data preparation allows our product managers to prepare and operate the data with little to no help from our data engineering team.” – Jim Christian, Chief Product and Technology Officer, mCloud Technologies
Public Value Technologies is a joint venture between two German public broadcasting organizations (ARD).
“Public Value Technologies receives data feeds from our media partners for our data mesh solution and AI applications. BigQuery data preparation allows our data analysts and scientists to rapidly integrate the data feeds that standardize and preprocess the data in a low code way.” – Korbinian Schwinger, Team Lead Data Engineer, Public Value Technologies
Get started
With its powerful AI capabilities, intuitive interface, and tight integration with BigQuery data pipelines, BigQuery data preparation is set to revolutionize the way organizations manage and prepare their data. By automating tedious tasks, improving data quality, and empowering users, this innovative solution reduces the time you spend preparing data and improves your productivity. 
Explore the following resources to get started with BigQuery data preparation:

See the public documentation

Watch the 5-minute demo video

Follow a tutorial

Try it out

1. Gartner, State of Metadata Management: Aggressively Pursue Metadata to Enable AI and Generative AI, By Mark Beyer, Guido De Simoni, 4. September 2024. GARTNER is a registered trademark and service mark of Gartner, Inc. and/or its affiliates in the U.S. and internationally and is used herein with permission. All rights reserved.

AI Summary and Description: Yes

Summary: The text discusses the capabilities and benefits of BigQuery data preparation, which utilizes AI via Gemini to streamline data preparation processes. This new functionality enhances data analysts’ productivity by automating tedious tasks involved in data integration and transformation without requiring advanced SQL skills.

Detailed Description:

The content centers on the launch and functionalities of BigQuery data preparation, which integrates advanced AI technologies to assist in data handling for analytics and data engineering. Key aspects of the announcement include the automation of data preparation tasks, a user-friendly interface, and the support for version control through integration with Git.

Major points include:

– **Efficiency in Data Preparation**:
– Users can now significantly reduce the time spent in data preparation processes, as the AI tool Gemini automates cleaning, transforming, and enriching data.
– It assists users with varying technical expertise, providing context-aware suggestions and enabling the creation of end-to-end data pipelines.

– **Integration with BigQuery Pipelines**:
– The data preparation functionalities integrate seamlessly with BigQuery pipelines, allowing for a unified environment for data ingestion, transformation, and loading.
– Incremental processing capabilities allow for efficient management of changing datasets.

– **Comprehensive Transformation Functions**:
– The tool offers a wide range of transformation capabilities, including typecasting, JSON extraction, and automated schema mapping.
– AI-generated suggestions for join keys enhance the efficiency of data enrichment tasks.

– **User-Friendly Interface**:
– Visual, low-code features allow users to design, execute, and monitor data pipelines with ease, reducing the barrier for technically less-proficient users.
– Error handling mechanisms ensure data quality by defining validation rules and segregating invalid data.

– **Collaboration and Version Control**:
– Integration with Git and BigQuery repositories facilitates robust collaboration among teams, treating data preparations as code artifacts.
– This enables version control, tracking changes, and managing data preparation logic comprehensively.

– **Customer Testimonials**:
– Companies like GAF and mCloud Technologies have begun adopting the solution to modernize their data processing workflows, indicating its practical application in real-world scenarios.

Ultimately, BigQuery data preparation aims to revolutionize organizational data management practices by reducing manual workloads, improving data integrity, and empowering end-users with advanced AI features, making it a crucial tool for data security and compliance specialists in managing data effectively.

This outcome is a significant leap toward enabling self-service analytics capabilities in various industries while encouraging innovation and efficiency in data analytics workflows.