Highlights:

  • Databricks stated that Delta Lake boasts over 500 code contributors and has been adopted by over 10,000 companies globally.
  • Delta Lake and Iceberg have been closely competing for dominance in the data lake market, serving as centralized repositories for structured and unstructured data.

Recently, Databricks Inc. has acquired Tabular Technologies Inc., the creator of a universal storage platform built upon the Apache Iceberg standard.

The action indicates an intensified focus by Databricks on closing the compatibility divide between its Delta Lake storage format and Iceberg. While specific terms were not disclosed, Databricks CEO Ali Ghodsi disclosed to a prominent media outlet that the acquisition exceeded a billion USD. It’s been reported that Snowflake Inc. and Confluent were also contenders in the bidding process.

Tabular was established by three ex-employees of Netflix Inc. who collaborated on the development of Iceberg during their tenure there. In 2020, the project was donated to the open-source community.

Databricks’ Delta Lake storage framework, launched in the same timeframe, shares similarities with Iceberg as both leverage Apache Parquet and uphold principles of atomicity, consistency, isolation, and durability in transactions. They offer scalable metadata management and integrate both streaming and batch data processing. Databricks reported that Delta Lake boasts over 500 code contributors and is adopted by over 10,000 companies globally.

Competition for Dominance

Delta Lake and Iceberg have been engaged in a tight competition for dominance in the data lake market, which serves as centralized storage for structured and unstructured data. According to Dremio Corp.’s 2024 State of the Data Lakehouse report, 31% of respondents currently utilize Apache Iceberg, while 39% favor Delta Lake. Nevertheless, looking ahead, 29% anticipate adopting Iceberg over the next three years, compared to 23% for Delta Lake. SNS Insider Pvt Ltd. projects that the data lake market will experience a yearly growth rate exceeding 21%, reaching USD 57 billion by 2030.

The competition between the two standards has posed challenges for both sides. In a blog post introducing the agreement, Tabular stated, “The problem isn’t about determining which standard is better. The problem is that the risk of investing in the wrong format prevents people from choosing at all.”

For the past two years, Databricks has been making efforts to narrow the divide with Apache Iceberg. In the previous year, with the release of Version 3.0 of Delta Lake, Iceberg compatibility was integrated. Additionally, Iceberg support was incorporated into the company’s UniForm universal lakehouse format last year.

Progressing the Development of the Lakehouse

Consolidating the storage format is seen as a pivotal move in promoting the acceptance of data lakehouses, a term coined by Databricks to describe a blend of conventional data warehouses and data lakes. A lakehouse facilitates ACID transactions on data housed in object storage, ensuring robust reliability, performance, and compatibility with open-source engines like Apache Spark, Trino, and Presto.

The lakehouse concept has gained significant traction due to its versatility and scalability. According to research released by Databricks last year, almost three-quarters of the 600 technology leaders surveyed have already embraced a lakehouse architecture, while the remaining respondents anticipate doing so within the next three years.

Ghodsi said in a statement, “Databricks and Tabular will work with the open-source community to bring the two formats closer to each other over time, increasing openness and reducing silos and friction for customers.”

During an interview with a media outlet, Matei Zaharia, Co-founder and Chief Technology Officer of Databricks, expressed that the process of connecting the two formats will take several years. However, the collaboration between the creators of both standards represents a significant advancement.

The timing of the announcement was strategic, coinciding with Databricks’ competitor Snowflake’s annual Data Cloud Summit user conference. During recent conference in San Francisco, Snowflake unveiled Polaris Catalog, an open catalog implementation enabling cross-engine access to Iceberg data, directly competing Databricks’ Unity Catalog.