Source URL: https://cloud.google.com/blog/products/data-analytics/whats-new-with-google-data-cloud/
Source: Cloud Blog
Title: What’s new with Google Data Cloud
Feedly Summary: June 9 – June 13
Introducing Pub/Sub Single Message Transforms (SMTs), to make it easy to perform simple data transformations such as validate, filter, enrich, and alter individual messages as they move in real time right within Pub/Sub. The first SMT is available now: JavaScript User-Defined Functions (UDFs), which allows you to perform simple, lightweight modifications to message attributes and/or the data directly within Pub/Sub via snippets of JavaScript code. Learn more in the launch blog.
Serverless Spark is now generally available directly within BigQuery. Formerly Dataproc Serverless, the fully managed Google Cloud Serverless for Apache Spark helps to reduce TCO, provides strong performance with the new Lightning Engine, integrates and leverages AI, and is enterprise-ready. And by bringing Apache Spark directly into BigQuery, you can now develop, run and deploy Spark code interactively in BigQuery Studio. Read all about it here.
Next-Gen data pipelines: Airflow 3 arrives on Google Cloud Composer: Google is the first hyperscaler to provide selected customers with access to Apache Airflow 3, integrated into our fully managed Cloud Composer 3 service. This is a significant step forward, allowing data teams to explore the next generation of workflow orchestration within a robust Google Cloud environment. Airflow 3 introduces powerful capabilities, including DAG versioning for enhanced auditability, scheduler-managed backfills for simpler historical data reprocessing, a modern React-based UI for more efficient operations, and many more features.
June 2 – June 6
Enhancing BigQuery workload management: BigQuery workload management provides comprehensive control mechanisms to optimize workloads and resource allocation, preventing performance issues and resource contention, especially in high-volume environments. To make it even more useful, we announced several updates to BigQuery workload management around reservation fairness, predictability, flexibility and “securability,” new reservation labels, as well as autoscaler improvements. Get all the details here.
Bigtable Spark connector is now GA: The latest version of the Bigtable Spark connector opens up a world of possibilities for Bigtable and Apache Spark applications, not least of which is additional support for Bigtable and Apache Iceberg, the open table format for large analytical datasets. Learn how to use the Bigtable Spark connector to interact with data stored in Bigtable from Apache Spark, and delve into powerful use cases that leverage Apache Iceberg in this post.
BigQuery gets transactional: Over the years, we’ve added several capabilities to BigQuery to bring near-real-time, transactional-style operations directly into your data warehouse, so you can handle common data management tasks more efficiently from within the BigQuery ecosystem. In this blog post, you can learn about three of them: efficient fine-grained DML mutations; change history support for updates and deletes; and real-time updates with DML over streaming data.
Google Cloud databases integrate with MCP: We announced capabilities in MCP Toolbox for Databases (Toolbox) to make it easier to connect databases to AI assistants in your IDE. MCP Toolbox supports BigQuery, AlloyDB (including AlloyDB Omni), Cloud SQL for MySQL, Cloud SQL for PostgreSQL, Cloud SQL for SQL Server, Spanner, self-managed open-source databases including PostgreSQL, MySQL and SQLLite, as well as databases from other growing list of vendors including Neo4j, Dgraph, and more. Get all the details here.
AI Summary and Description: Yes
**Summary:** The text outlines new features and updates from Google Cloud, focusing on advancements in data transformation, serverless computing with Apache Spark, and enhanced workload management in BigQuery. These innovations are particularly relevant for professionals in cloud computing and data analytics as they underscore improvements in efficiency, integration with AI, and user-friendly operations in BigQuery and Cloud Composer.
**Detailed Description:**
The content highlights several key updates and features within Google Cloud, making it pertinent to the categories of Cloud Computing, Cloud Computing Security, and MLOps:
– **Pub/Sub Single Message Transforms (SMTs):**
– Introduction of Single Message Transforms allows real-time data transformation within Pub/Sub.
– The first SMT offering is JavaScript User-Defined Functions (UDFs), enabling lightweight modifications to message attributes.
– **Serverless Spark in BigQuery:**
– Serverless for Apache Spark is now generally available in BigQuery, previously known as Dataproc Serverless.
– Emphasizes reduced Total Cost of Ownership (TCO) and improved performance via the new Lightning Engine.
– Enables users to develop, run, and deploy Spark code interactively, integrating AI capabilities.
– **Apache Airflow 3 on Google Cloud Composer:**
– Google is pioneering the use of Airflow 3 as part of Cloud Composer 3, enhancing workflow orchestration for data teams.
– Notable features include:
– DAG versioning for improved auditability and historical data management.
– Scheduler-managed backfills for easier reprocessing of historical data.
– A modern React-based user interface streamlining operational efficiency.
– **BigQuery Workload Management Enhancements:**
– Updates aim to optimize workloads and resource allocation, addressing performance concerns in high-volume environments.
– Updates include improved reservation fairness and autoscaler functionalities, emphasizing the importance of effective resource management.
– **Bigtable Spark Connector:**
– Newly released Bigtable Spark connector facilitates interactions between Bigtable and Apache Spark, supporting Apache Iceberg.
– This opens opportunities for utilizing large analytical datasets and potentially enhances data analytics with more flexibility.
– **Transactional Capabilities in BigQuery:**
– BigQuery has integrated near-real-time transactional operations, introducing capabilities for efficient data management tasks.
– New features include fine-grained DML mutations and change history support, which enhance operational efficiency.
– **Database Integration with MCP Toolbox:**
– New capabilities in the MCP Toolbox for Databases simplify connections between databases and AI assistants within IDEs.
– Support for various databases, including Google Cloud databases and open-source alternatives, showcases flexibility and innovation.
Overall, these developments indicate a notable shift toward enhancing user experience and integrating AI functionalities within cloud services, which are essential for professionals working in cloud computing, data management, and related fields. The updates foster opportunities for better security, efficiency, and governance around cloud environments.