New features bringing unmatched query performance to open data lakehouses
Today, the Delta Lake project announced the Delta Lake 2.0 release candidate, which includes a collection of new features with vast performance and usability improvements. The final release of Delta Lake 2.0 will be made available later this year.
Delta Lake has been a Linux Foundation project since October 2019 and is the open storage layer that brings reliability and performance to data lakes via the “lakehouse architectures”, the best of both data warehouses and data lakes under one roof. In the past three years, lakehouses have become an appealing solution to data engineers, analysts, and data scientists who want to have the flexibility to run different workloads on the same data with minimal complexity and no duplication – from data analysis to the development of machine learning models. Delta Lake is the most widely-used lakehouse format in the word and currently sees over 7M downloads per month (and continues to grow).
Delta Lake 2.0 will bring some major improvements to query performance for Delta Lake users, such as support for change data feed, Z-order clustering, idempotent writes to Delta tables, column dropping, and many more (get more details in the Delta Lake 2.0 RC release notes). This enables any organization to build highly performant lakehouses for a wide range of data and AI use cases.
The announcement of Delta Lake 2.0 came on stage during Data + AI Summit 2022 keynote as Michael Armbrust, distinguished engineer at Databricks and a co-founder of the Delta Lake project, showed how the new features will dramatically improve performance and manageability compared to previous versions and other storage formats. Databricks had initially open sourced Delta Lake and has, with the Delta Lake community, been continuously contributing new features to the project. The latest set of features included in v2.0 have been first made available to Databricks customers, ensuring they are “battle-tested” for production workloads before being contributed to the project.
Databricks is not the only organization actively contributing to Delta Lake – developers from over 70 different organizations have been collaborating and contributing new features and capabilities.
“The Delta Lake project is seeing phenomenal activity and growth trends indicating the developer community wants to be a part of the project. Contributor strength has increased by 60% during the last year and the growth in total commits is up 95% and the promedio line of code per commit is up 900%. We are seeing this upward velocity from contributing organizations like Uber Technologies, Walmart, and CloudBees, Inc., among others,”
— Executive Director of the Linux Foundation, Jim Zemlin.
The Delta Lake community is inviting you to explore Delta Lake and join the community. Here are a few useful links to get you started: