Highlights:
- The most significant new features announced by Dremio are said to center on an open-source technology known as Apache Iceberg.
- When Dremio’s data lakehouse adds, deletes, or updates records in an Iceberg table, it creates secondary files to record the changes.
Dremio Corp. recently launched a new version of its data lakehouse to make several common analytics tasks easier for customers.
Dremio is a startup based in Santa Clara, California, that has received more than USD 410 million in funding from Insight Partners, Cisco Systems Inc., and other significant investors. Its namesake data lakehouse enables businesses to inspect large amounts of business data for useful insights. It also simplifies related tasks like creating data visualizations.
Co-founder and Chief Product Officer Tomer Shiran stated, “It’s been great to see the incredible growth of our product driving value for our customers. Every company we speak to is struggling to help their businesses move faster and self-serve while maintaining security and governance. Data meshes solve for these competing priorities, and we’ve been innovating to make it easier and easier for companies to create and operate data mesh architectures.”
The most significant new features announced by Dremio are said to center on an open-source technology known as Apache Iceberg. The technology the startup has integrated into its platform is used to automate certain common data management tasks.
Iceberg enables businesses to store up to tens of petabytes of data in a single database table. Keeping information in a single table simplifies tasks that are typically difficult to perform with large datasets, such as modifying specific records and adding new ones. Iceberg also has features that assist administrators in removing incorrect items.
Dremio’s data lakehouse allows businesses to analyze data stored in Iceberg tables. The startup claims that the new release of its platform, which debuted recently, will be able to carry out the task more efficiently.
When Dremio’s data lakehouse adds, deletes, or updates records in an Iceberg table, it creates secondary files to record the changes. Secondary files can quickly accumulate. They not only consume storage space but can also potentially slow queries if not managed properly.
The company is now releasing a feature that automatically optimizes secondary files generated during data operations to reduce its storage footprint. It claims that, as a result, hardware efficiency has increased. This means that queries may be completed faster and at a lower cost.
In some circumstances, inaccurate data might enter an Iceberg-powered database table. The new version of Dremio’s platform has a table rollback feature for such circumstances. The feature allows users to roll back to a previous version if a dataset contains incorrect data.
Additionally, the business is facilitating the analysis of data from outside sources.
Dremio’s data analysis is simplified by converting it into an Iceberg table. The business has included a tool that streamlines the creation of Iceberg tables from CSV and JSON files kept in HDFS, Amazon S3, and Azure Data Lake Storage. It also includes connectors that will enable users to analyze data held in relational databases like Snowflake and Db2 from IBM Corp.
It will also be easier to analyze data stored in other deployments of the startup’s platform using a Dremio data lakehouse. The company claims that the features in the recent update significantly reduces the manual labor required. It claims that the features function even if a company has multiple data lakehouse environments across both on-premises and cloud infrastructure.