Apache Iceberg Tables Integration: Expanding AWS Glue Crawler’s Capabilities
AWS Glue Crawler has broadened its horizons with newfound compatibility for Apache Iceberg tables. This development offers a streamlined process for those looking to leverage AWS Glue Data Catalog as the go-to catalog for Iceberg tables, and makes migrating from other Iceberg catalogs a breeze. Apache Iceberg, an open-source table format primarily used in data lakes, enables data engineers to tackle complex issues such as managing perpetually evolving datasets, all while upholding query performance.
The latest unveiling means that registration of Iceberg tables into the Glue Catalog can now be automated by simply activating the Glue Crawler. Subsequently, you can easily run queries on the Glue Catalog Iceberg tables via multiple analytics engines and implement Lake Formation granular permissions when performing queries from Amazon Athena.
When considering a migration from other Iceberg Catalogs, the process is made easy with AWS Glue Crawler. Just establish and schedule a Glue Crawler and supply one or multiple Amazon S3 paths where your Iceberg tables are located. Moreover, you have the freedom to determine the maximum depth of S3 paths that the Glue Crawler can explore.
During each run, the Glue Crawler efficiently extracts schema information and refreshes the Glue Catalog with any changes to the schema. Schema merging across snapshots is fully supported by Glue Crawler, and it also ensures the Glue Catalog stays updated with the latest metadata file location, readily accessible by AWS analytical engines.
The Glue Crawler’s compatibility with Iceberg tables is offered across all commercial regions where AWS Glue service is operational. For more details, please refer to the AWS Region Table. To deepen your understanding, don’t hesitate to consult the AWS Glue Crawler documentation.