Cloud Blog: Power up your data analysis: The Data Science Agent now supports BigQuery ML, DataFrames, and Spark

Source URL: https://cloud.google.com/blog/products/data-analytics/data-science-agent-now-supports-bigquery-ml-dataframes-and-spark/
Source: Cloud Blog
Title: Power up your data analysis: The Data Science Agent now supports BigQuery ML, DataFrames, and Spark

Feedly Summary: We recently announced AI-first Colab Enterprise notebook experience in BigQuery and Vertex AI to help you simplify and transform your data science and analytics workflows. Colab Enterprise notebooks come with a built-in Data Science Agent to accelerate your data science development with agentic capabilities that facilitate data exploration, transformation, and machine learning modeling. With nothing but a simple prompt, the agent generates a detailed plan for your workflows – from data loading and cleaning to model training and evaluation.
Today, we’re introducing powerful new features in the Data Science Agent to further simplify and scale your analytical journeys, especially with large and open-format datasets.
Generate BigQuery ML, BigQuery DataFrames, & Spark
You can now harness the power of BigQuery Machine Learning (ML), BigQuery DataFrames (BigFrames), and Spark for large-scale data processing directly within the Data Science Agent. BigQuery ML and BigQuery DataFrames allow you to scale up data transformation, model training, and inference by running them directly on BigQuery. And with Serverless for Apache Spark, you can perform distributed data processing on large datasets, allowing you to work with data that is too large to fit into memory on a single machine.
To invoke these tools, simply include the following keywords in your prompt:

For BigQuery ML: use “BigQuery ML", "BQML", or "SQL"
For BigQuery DataFrames: specify "BigQuery DataFrames" or "BigFrames"
For PySpark: include "Spark" or "PySpark"

In the future, the Data Science Agent will be able to pick the relevant framework for your use case — e.g., based on the size of your selected datasets or the contents of your Notebook.
In the meantime, here are some sample prompts to get you started:

“Build a high-quality forecasting model using BigQuery SQL on project_id.dataset_id.table_id to predict stock needs. Present the model’s evaluation metrics and visualize the forecast with a 95% confidence interval.”

“Using BigQuery DataFrames, train and evaluate a gradient boosted tree model to predict housing prices from the table project_id.dataset_id.table_id. Before training, one-hot encode the neighborhood column.”

“I want to group similar customers together for targeted marketing campaigns, but first I need to do dimensionality reduction using a PCA model. Use Spark to do this on table project_id.dataset_id.table_id.”

Limitation: the Data Science Agent currently generates Spark 4.0 code. The agent can help you upgrade your code to Spark 4.0. However, if you need to use an earlier version of Spark, we recommend not using the Data Science Agent for PySpark for now.
Add data using context and @ mentions
We are also making it easier to bring your data into the conversation. The Data Science Agent can now automatically retrieve metadata and tables for your BigQuery tables. This means you can describe a table directly in your prompt and let the Data Science Agent search for the most relevant table on your behalf.

Further, you can now search for BigQuery tables within your current project using an @ mention. This familiar, industry-standard mechanism allows you to build your prompt with the relevant context — without your hands ever leaving the keyboard.

Limitation: The @ mention currently only searches for BigQuery tables in your current project. For broader searches across projects or to add files from session storage and local uploads, please continue to use the "+" button.
Try the Data Science Agent today
Under the hood, we’ve also optimized the Data Science Agent so it will start up faster after your first message. Less waiting, faster insights. Similar improvements for Colab Enterprise in Vertex AI are coming soon.
We’re committed to evolving the AI-powered data science experience and can’t wait to show you what we’re building next. To get started, check out the resources below:

Access:

BigQuery: Navigate to Google Cloud Console > BigQuery > Notebook 

Vertex AI: Navigate to Google Cloud Console > Vertex AI > Colab Enterprise (Note: BigQuery ML, BigQuery Dataframes, and Spark improvements mentioned here are not yet available in Vertex AI – but are coming soon.)

Documentation:

Use the Data Science Agent with BigQuery

Use the Data Science Agent with Vertex AI

Feedback and Support: We’d love to know what you think! Drop us a line if you have any questions or run into any issues.

AI Summary and Description: Yes

Summary: The text discusses the newly announced AI-first Colab Enterprise notebook experience in BigQuery and Vertex AI, highlighting the integration of a Data Science Agent designed to simplify data science workflows. This innovative tool automates various processes, enabling more efficient analytics, and it includes features for processing large datasets using BigQuery ML, BigQuery DataFrames, and Apache Spark.

Detailed Description:
The text outlines the launch of an AI-first notebook experience within BigQuery and Vertex AI, emphasizing its potential to enhance productivity in data science and analytics workflows. Here are the major points covered:

– **AI-First Experience**: The Colab Enterprise notebooks are tailored to streamline the data science workflow, providing tools that enhance the user’s ability to perform data exploration and modeling with minimal manual intervention.

– **Data Science Agent**: A built-in feature that assists users in generating detailed plans for their analytics workflows. This agent can:
– Automate the processes of data loading, cleaning, and machine learning model training.
– Respond to simple prompts with comprehensive analytical solutions.

– **Integration with BigQuery Tools**: Notable enhancements include:
– **BigQuery ML and BigQuery DataFrames**: These enable large-scale data processing directly within the Data Science Agent, optimizing model training and inference.
– **Apache Spark**: The agent supports serverless processing, allowing users to manage data that exceeds local memory limits.

– **User-Friendly Prompts**: The text provides examples of how users can interact with the Data Science Agent, showcasing how to build specific models or perform analysis using natural language prompts.

– **Limitations**:
– The current version of the Data Science Agent generates code compatible only with Spark 4.0.
– Metadata and search capabilities are currently limited to the user’s project scope, requiring traditional methods for broader searches.

– **Performance Optimizations**: The Data Science Agent has been enhanced for faster startup times, indicating a focus on improving user experience.

– **Future Developments**: The commitment to evolving the AI-driven data science experience, with updates and enhancements for additional functionalities in Vertex AI anticipated soon.

This text is particularly relevant for professionals in AI and cloud computing, as it emphasizes innovative tools and methodologies that can enhance efficiency and simplify workflows in data science projects, addressing both immediate and future needs in analytics.