Highlights:
- In order to improve workload and resource isolation and allow the creation of distinct warehouses for various use cases, the new release is based on a cloud-native architecture.
- Users can question historical data and streaming in real-time without waiting for batching the streaming data for analysis.
CelerData Inc., developer of a real-time analytics platform based on the StarRocks open-source massively parallel database, released version 3 of its commercial offering with increased support for data lakehouses, also known as hybrid data warehouse/data lake repositories.
CelerData, formerly known as StarRocks Inc., is the primary creator of StarRocks, an Apache Doris derivative recently donated to the Linux Foundation.
The company mentioned that most query engines are not well-coordinated for real-time analysis. They find ad-hoc queries challenging and crumble under several concurrent users. Li Kang, Vice President of Strategy at CelerData, said, “They may accept streaming data sources, but they don’t support real-time. Enterprises will often build two pipelines — one for batch processing in the data lake or data warehouse and a separate real-time pipeline.”
The new release is based on a cloud-native architecture to improve workload and resource isolation and allow the creation of distinct warehouses for various use cases. Users of Lakehouse can now choose to conduct high-performance analytics without transferring their data to a central data warehouse. CelerData’s query engine is three times quicker than competitive query engineers and can handle 10,000 queries per second from thousands of concurrent users.
Users can question historical data and streaming in real-time without waiting to batch the streaming data for analysis. The company’s method distinguishes itself from the quasi-real-time processing technique known as micro-batching, which occurs by dividing data into distinct segments called tablets. “Each time we get a new record, we read it from our reader. It’s not micro-batching, but you can think of it that way and combine that data with other tables,” said Kang.
This version includes integration with popular storage formats like Apache Iceberg and Apachi Hudi. Previously, the software could only handle one type of direct-attached storage and local storage on a server or virtual machine. With reference to Amazon Web Services Inc.’s object-storage format, Kang mentioned, “Data can now be stored in S3 or our local storage.”
Multi-table materialized views constructed from numerous joint base tables and a local caching layer for remote input/output processes can enhance performance.
Beginning in early April 2023, CelerData Version 3 will be widely accessible. Additionally, the business runs a completely managed cloud service.