Cloud Blog: BigQuery’s AI-assisted data preparation is now in preview

Source URL: https://cloud.google.com/blog/products/data-analytics/introducing-ai-driven-bigquery-data-preparation/
Source: Cloud Blog
Title: BigQuery’s AI-assisted data preparation is now in preview

Feedly Summary: In today’s data-driven world, the ability to efficiently transform raw data into actionable insights is paramount. However, data preparation and cleaning is often a significant challenge.
Reducing this time and efficiently transforming raw data into insights is crucial for staying competitive. Earlier this month, we introduced BigQuery data preparation, an AI-first solution that streamlines and simplifies the data preparation process as part of Gemini in BigQuery. 
Now in preview, BigQuery data preparation provides a number of capabilities:

AI-powered suggestions: BigQuery data preparation uses Gemini in BigQuery to analyze your data and schema and provide intelligent suggestions for cleaning, transforming, and enriching the data. This significantly reduces the time and effort required for manual data preparation tasks.
Data cleansing and standardization: Easily identify and rectify inconsistencies, missing values, and formatting errors in your data.
Visual data pipelines: The intuitive, low-code visual interface helps both technical and non-technical users easily design complex data pipelines, and leverage BigQuery’s rich and extensible SQL capabilities.
Data pipeline orchestration: Automate the execution and monitoring of your data pipelines. The SQL generated by BigQuery data preparation can become part of a Dataform data engineering pipeline that you can deploy and orchestrate with CI/CD, for a shared development experience.

BigQuery data preparation helps you ensure the accuracy and reliability of your data, leading to more informed business decisions. BigQuery data preparation automates data quality checks and integrates with other Google Cloud services such as Dataform and Cloud Storage, providing a unified and scalable environment for your data needs.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/bigquery/’), (‘image’, None)])]>

How does it work?
Getting started is easy. When you sample a BigQuery table in BigQuery data preparation, it uses state-of-the-art foundation models to evaluate the data and schema using Gemini in BigQuery to generate data preparation recommendations like filter and transformation suggestions. For example, it knows how to identify valid date formats by country and which columns can act as join keys, accelerating the data engineering process.

In the above example (using synthetic data), the Birthdate column contains two different date formats and is of type STRING. BigQuery data preparation suggests to “Convert column Birthdate from type string to date with the following format(s): ‘%Y-%m-%d’,’%m/%d/%Y”. After you apply the suggestion card, you can verify the transformed preview data in a DATE format column.

With BigQuery’s AI-assisted data preparation, you can:

Significantly reduce time spent discovering data quality issues and cleaning data by leveraging Gemini-assisted suggestion cards
Customize your own suggestion cards by providing an example in the data grid
Increase operational efficiency by deploying data preparation with incremental data processing

What BigQuery customers are saying 
Customers are already solving numerous challenges with BigQuery data preparation. 
GAF is a major manufacturer of roofing materials in North America, and is adopting data preparation for creating data transformation pipelines on BigQuery.
“GAF is looking to modernize the ETL infrastructure and adopt a BigQuery native, low-code solution. BigQuery data preparation will help our skilled business users and the analytics team in the data preparation processes for the enablement of self-service analytics.” – Puja Panchagnula, Management Director – Enterprise Data Management & Analytics, GAF
mCloud Technologies helps businesses in sectors like energy, buildings, and manufacturing to optimize the performance, reliability, and sustainability of their assets.
“We receive data feeds from our partners. BigQuery data preparation allows our product managers to prepare and operate the file data feeds with little to no help from our data engineering team.” – Jim Christian, Chief Product and Technology Officer, mCloud Technologies
Public Value Technologies is a joint venture between two German public broadcasting organizations (ARD).
“Public Value Technologies receives data feeds from our media partners for our data mesh solution and AI applications. BigQuery data preparation allows our data analysts and scientists to rapidly integrate the data feeds that standardize and preprocess the data in a low code way.” – Korbinian Schwinger, Team Lead Data Engineer, Public Value Technologies 
Getting started
With its powerful AI capabilities, intuitive interface, and tight integration with the Google Cloud ecosystem, BigQuery data preparation is set to revolutionize the way organizations manage and prepare their data. By automating tedious tasks, improving data quality, and empowering users, this innovative solution reduces the time you spend preparing data and improves your productivity. 
To get started with BigQuery data preparation, explore the following resources:

See the user guides
Watch the 2-minute demo video
Learn about Gemini in BigQuery

AI Summary and Description: Yes

Summary: The text discusses the introduction of BigQuery data preparation, an AI-driven tool that simplifies data preparation processes. It highlights the efficiencies gained through AI-powered suggestions, automated data cleansing, and orchestration of data pipelines, all of which can greatly enhance productivity for data professionals, particularly in a cloud environment.

Detailed Description:
The introduction of BigQuery data preparation offers significant advancements for data professionals working with cloud computing and data analytics. The key points include:

– **AI-Powered Suggestions**: The tool utilizes Gemini in BigQuery to provide intelligent recommendations for data cleaning, transformation, and enrichment, drastically reducing the manual efforts involved.

– **Data Cleansing and Standardization**: Automated features help identify and correct inconsistencies within datasets, such as missing values and formatting errors, ensuring data quality.

– **Visual Data Pipelines**: A low-code interface enables both technical and non-technical users to design complex data pipelines easily, leveraging BigQuery’s extensive SQL capabilities.

– **Data Pipeline Orchestration**: It automates the execution and monitoring of data pipelines. The SQL generated can be integrated into a Dataform engineering pipeline, allowing for a modern CI/CD development experience.

– **Integration with Google Cloud**: BigQuery data preparation seamlessly integrates with other Google Cloud services, consolidating tools like Dataform and Cloud Storage to offer a unified environment for data management.

– **Customer Feedback**: Various organizations, such as GAF and mCloud Technologies, have reported improvements in their data preparation processes, highlighting the solution’s low-code capabilities and its positive impact on self-service analytics.

– **Efficiency Gains**: Through automation and AI assistance, the solution is designed to improve operational efficiency, reduce time spent on data quality checks, and enable users to handle data with more autonomy.

Overall, BigQuery data preparation is positioned to transform data management and analytics in cloud environments, addressing complex challenges while increasing productivity and user empowerment.