AWS News Blog: New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction

Jun 24, 2025

—

Source URL: https://aws.amazon.com/blogs/aws/new-improve-apache-iceberg-query-performance-in-amazon-s3-with-sort-and-z-order-compaction/
Source: AWS News Blog
Title: New: Improve Apache Iceberg query performance in Amazon S3 with sort and z-order compaction

Feedly Summary: Amazon S3 now enables improved Apache Iceberg query performance through two new compaction strategies—sort and z-order—available for both S3 Tables and general purpose S3 buckets, helping organize data more efficiently by clustering similar values together and reducing file scanning during queries.

AI Summary and Description: Yes

Summary: The text discusses two new compaction strategies—sort and z-order compaction—for improving the performance of Apache Iceberg queries in Amazon S3. These features optimize data layout, enhance query execution efficiency, and significantly reduce costs, which is particularly relevant for professionals managing large datasets in cloud environments.

Detailed Description:
The content describes enhancements related to Apache Iceberg and Amazon S3, specifically focused on query performance improvements through new data compaction strategies. Here are the major points explored:

– **Apache Iceberg Table Management**:
– Used for managing large-scale analytical datasets in Amazon S3 with AWS Glue Data Catalog.
– Supports features like concurrent streaming and batch ingestion, schema evolution, and time travel.

– **Challenges of Data Lakes**:
– High ingestion rates lead to many small files, affecting both cost and performance during querying.

– **New Compaction Strategies**:
– Introduced sort and z-order compaction alongside the default binpack strategy.
– **Sort Compaction**:
– Organizes files based on user-defined column order, clustering similar values and improving query efficiency by reducing file scans.
– Example: Sorting by state and zip_code, leading to reduced latency.

– **Z-Order Compaction**:
– Interleaves values from multiple columns, optimizing file pruning across dimensions.
– Ideal for complex or spatial queries, like filtering by locations or fare amounts, leading to a significant reduction in scanned files compared to traditional sorting methods.

– **Operational Details**:
– Sort compaction is automatically applied if a defined sort order exists; no additional configuration needed.
– Z-order requires table maintenance configuration updates using the S3 Tables API.
– Affected by the target file size settings only for new data; existing files remain unchanged unless explicitly rewritten.

– **Practical Implementation**:
– Example implementation using Apache Spark to demonstrate the effects of compaction strategies on an S3 table.
– Optimizations observed through compaction involve fewer but larger files, leading to improved data clustering.

– **Availability and Costs**:
– Sort and z-order compaction are available across all AWS regions for supported S3 Tables.
– While there are no additional charges for the use of S3 Tables, compute charges may be applicable during compaction processes.

– **Performance Gains**:
– The author reports performance improvements of threefold or more when switching from binpack to sort or z-order based on specific data layouts and query patterns.

The provided material clearly demonstrates significant advancements in managing data more efficiently within Amazon S3, facilitating faster query execution and cost reductions, making it crucial for professionals involved in cloud computing and data management.

3 a Act advancement advancements AGI AI Amazon Amazon S3 and apach Apache Apache Iceberg Apache Spark API app art as ated Auto availability AWS based Bi by C catalog challenges CI CIA CleaR Cloud cloud computing cloud environment cloud environments cluster clustering co code Col compute Computing Configuration content cost cost reduction cost reductions Costs cross Current D data data catalog data compaction data lake data lakes data management dataset datasets de DeFi demo e efficiency efficient environment execution exp fast fault feature features file filtering fine focused for g Gen general Glue gs H high HR http HTTPS Iceberg implementation improving in inter io Iron J k l large large datasets latency leading led Li logs long M maintenance making man management Mila multi N new news no o of on only operation opt optimization optimizations order compaction oS out patterns performance performance gains performance improvement performance improvements point pro process processes professionals ps Q queries query execution query performance R rate RCE red reduction Region Regions report Ro s S3 S3 bucket S3 buckets S3 Tables Scale scanning schema settings side Sig Sim size small sort compaction source Spark specific SSE state strategies Strategy Streaming support T Tails ted text the Time to TP two UI up update updates US use user uth V val Wi x yt z