Source URL: https://cloud.google.com/blog/products/data-analytics/pubsub-highlights-of-2024/
Source: Cloud Blog
Title: Cloud Pub/Sub 2024 highlights: Native integrations, sharing and more
Feedly Summary: In today’s rapidly evolving digital landscape, organizations need to leverage real-time data for actionable insights and improved decision-making. Availability of real-time data is emerging as a key element to evolve and grow the business. Pub/Sub is Google Cloud’s simple, reliable, and scalable messaging service that serves as a versatile entry point to ingest streaming data into Google Cloud’s ecosystem, and is integrated with products like BigQuery, Cloud Storage, Dataflow, and more. You can then use this data for downstream analytics, visualization, and AI applications. This year we launched several new features and enhancements to help meet the demands of modern streaming workloads, across three key data analytics patterns:
Streaming ingestion – Stream data directly into BigQuery and Cloud Storage for downstream use cases such as analytics and ML with BigQuery or for backup in Cloud Storage.
Streaming analytics – Process and analyze real-time event streams and take business decisions on high-value, real-time insights with Dataflow or BigQuery Engine for Apache Flink, or BigQuery continuous queries.
Stream sharing and export – Curate, share and monetize your valuable streaming data through data exchanges with your internal teams and/or external customers.
Let’s take a closer look at the Pub/Sub highlights of 2024 across these three areas.
aside_block
Streaming ingestion
Many customers have some workloads on one public cloud and other workloads (e.g. analytical) on another. Pub/Sub has traditionally supported streaming ingestion into BigQuery and Cloud Storage through export subscriptions. This year, we simplified import into Pub/Sub from various sources, starting with AWS Kinesis Data Streams. Pub/Sub import topics is a new no-code, one-click way to ingest streaming data from AWS Kinesis Data Streams into Pub/Sub. This helps simplify streaming data ingestion pipelines without the overhead of maintaining and running a custom connector.
Another typical streaming ingestion use case is to ingest batch data into Pub/Sub. To ingest data from Cloud Storage into Pub/Sub, you used to have to either configure, deploy, run, manage and scale a custom connector, or use a Dataflow template. Now you can enable the ingestion property to create a Cloud Storage import topic to ingest batch data from Cloud Storage into a Pub/Sub topic. Once the data is flowing into an import topic, you can create a subscription (Pull, Push, BigQuery or Cloud Storage) to get the data to your choice of sink for downstream processing.
There are two key use cases for Cloud Storage import topics:
Batch to streaming – To leverage batch data for streaming analytics use cases like predictions and activations, you must first transform it into a streaming format. With Cloud Storage import topics, you can perform this ingestion in a fully managed way.
Streaming archive data – Many customers need to store historical data; using Pub/Sub with Cloud Storage subscriptions makes it easier to build their archive. From there, Cloud Storage import topics make it easy to ingest historical data into a Pub/Sub topic for streaming analytics use cases.
This year we launched BigQuery tables for Apache Iceberg in preview, a fully managed, Apache Iceberg-compatible storage engine from BigQuery with features such as autonomous storage, optimizations, clustering and high-throughput streaming ingestion. Pub/Sub BigQuery subscriptions integrates with BigQuery tables for Apache Iceberg for high-throughput streaming ingestion that durably stores the ingested tuples in a row-oriented format, and periodically converts them to Parquet stored in a customer-owned Cloud Storage bucket. BigQuery tables for Apache Iceberg can also be used with Pub/Sub to store streaming data in Cloud Storage in Parquet format.
Streaming analytics
Customers use Pub/Sub in conjunction with stream processing engines to power streaming-analytics use cases such as anomaly detection, personalization, etc. With Pub/Sub already natively integrated with Dataflow, in 2024, we focused on supporting Apache Flink, an open-source stream-processing framework that is seeing growing adoption across enterprises. You can now use Apache Flink with Pub/Sub in two ways:
1. BigQuery Engine for Apache FlinkWe recently launched BigQuery Engine for Apache Flink in preview, which lets you use the familiar Apache Flink API and ecosystem for stateful stream processing with Java, Python and SQL. It’s also a serverless offering with fully managed deployments, autoscaling, transparent upgrades and pay-as-you-go billing, and is natively integrated into our unified data and AI platform. Pub/Sub is also integrated with BigQuery Engine for Apache Flink.
2. Pub/Sub Apache Flink connectorTo support streaming analytics with existing Apache Flink deployments, we launched a new version of the Pub/Sub Flink connector. Now generally available, the connector lets you connect your existing Apache Flink deployment to Pub/Sub in just a few steps. The connector also allows you to publish an Apache Flink output into Pub/Sub topics or use Pub/Sub subscriptions as a source in Apache Flink applications.
Stream sharing & export
BigQuery Analytics Hub lets businesses share batch data assets across organizations efficiently and securely. However, many organizations also need to share real-time streaming data with partners and customers, as well as with internal teams. To help, Pub/Sub Topics sharing in Analytics Hub in preview provides:
Real-time data sharing, allowing data providers to share data updates instantly, facilitating timely access to the freshest data.
Enhanced data discovery: By listing Pub/Sub topics as data products, producers can help increase the visibility and discoverability of their data streams.
Simplified data access, with an integrated experience for centrally managing accessibility to your organization’s streaming data.
To simplify streaming real-time data from BigQuery to external systems and vendors, you can use BigQuery continuous queries with Pub/Sub, extending new streaming SQL capabilities within BigQuery in the form of SQL jobs that can run indefinitely and process real-time data the moment it arrives. BigQuery continuous queries lets you analyze streaming data in real-time, and act on those insights immediately.
You can even leverage Pub/Sub as both an input and output for real-time data processing: Use BigQuery subscriptions to ingest streaming data into BigQuery, with a BigQuery continuous query to process, analyze, and develop event-driven data pipelines for communicating insights to downstream applications by exporting the query results to a separate Pub/Sub topic. Multiple Google Cloud ISV partners already support Pub/Sub messages generated from a continuous query, including (but not limited to) Aiven, Census, Confluent, Estuary, Hightouch, Keboola, Lytics, Nexla, Qlik, and Redpanda.
Observability
New support for OpenTelemetry in Pub/Sub lets you see a detailed trace of your message lifecycle, including the ability to see a distributed trace from the moment a message is published to when it’s received and processed. Analyzing these traces can decrease troubleshooting time by allowing you to quickly identify bottlenecks, misconfigurations, and other failures in your Pub/Sub applications.
Looking ahead
As we look ahead to 2025, we have planned innovation across following key areas:
Simplified Kafka ingestion – Oftentimes customers migrate from Kafka to Pub/Sub to simplify their messaging infrastructure and enjoy Pub/Sub’s key benefits of simplicity, reliability and auto-scalability. To make this migration journey simpler, we will be launching cross-cloud Kafka sources with Import Topics in early 2025.
Single message transforms – Almost all streaming data pipelines need some form of transformations. Some customers prefer to transform the data after it has landed into a data lake or data warehouse (ELT pattern), while others prefer to transform the data before landing it in the sink (data lake, data warehouse). In 2025, we plan to further simplify streaming analytics architectures by providing native, lightweight, single-message transformations. Pub/Sub Single Message Transforms (SMT) will help you perform simple, lightweight modifications to the message attributes and/or data with JavaScript User-Defined Function (UDFs).
Thanks for reading this far. We are excited to get these capabilities to you. Get started with Pub/Sub today and start exploring these new features to solve your hardest business challenges.
AI Summary and Description: Yes
Summary: The text discusses Google Cloud’s Pub/Sub service enhancements and features for 2024, including streaming data ingestion, analytics capabilities, real-time data sharing, and observability improvements. These developments are particularly relevant for businesses leveraging cloud services for data analytics and decision-making.
Detailed Description:
The text provides a detailed overview of updates and features associated with Google Cloud’s Pub/Sub service, emphasizing its role in handling streaming data for various applications. The updates are crucial for businesses focusing on real-time data analysis, which is becoming increasingly necessary in today’s data-driven landscape. Here are the major points elaborated within the text:
– **Streaming Ingestion**:
– Simplified ingestion process from AWS Kinesis Data Streams into Pub/Sub using import topics.
– New capabilities allow users to ingest batch data into Pub/Sub seamlessly from Cloud Storage.
– Key use cases highlighted include:
– Converting batch data into a streaming format for real-time analytics.
– Utilizing historical data for streaming applications.
– **Streaming Analytics**:
– Enhanced support for Apache Flink, enabling compatibility with existing streaming analytics frameworks.
– Introduction of BigQuery Engine for Apache Flink allows for serverless, fully managed stateful stream processing with a familiar interface.
– New Pub/Sub Flink connector simplifies integration for existing Flink users.
– **Stream Sharing and Export**:
– BigQuery Analytics Hub provides a mechanism for real-time data sharing across organizations, enhancing collaboration and accessibility.
– The incorporation of continuous queries in BigQuery allows for real-time processing and proactive engagement with data streams.
– **Observability Improvements**:
– New OpenTelemetry support in Pub/Sub facilitates detailed tracing of message lifecycles, improving troubleshooting and system performance monitoring.
– **Future Innovations**:
– Planned features for 2025, including:
– Extension of Kafka ingestion capabilities to ease migrations from Kafka to Pub/Sub.
– Introduction of Single Message Transforms to streamline preprocessing of streaming data.
These advancements highlight the significant role Pub/Sub plays in modern data architecture, enabling organizations to maximize the value derived from both real-time and batch data. The practical implications include:
– Enhanced decision-making through immediate access to real-time data.
– Simplified architecture and reduced operational overhead in managing data pipelines.
– Improved overall data governance and efficiency for enterprises utilizing cloud services.
These developments significantly contribute to standing up robust data analytics frameworks that integrate seamlessly with existing technologies, making Google Cloud a competitive option for organizations focusing on data-driven strategies.