Spark drew widespread attention with its in-memory processing, which allowed faster and more efficient handling of workloads. The concurrent rise of Databricksĭatabricks, meanwhile, was founded in 2013, although the groundwork for it was laid way before in 2009 with the open source Apache Spark project – a multi-language engine for data engineering, data science, and machine learning. The revenue of the company has grown 106% from $592 million in FY21 to $1219 million in FY22, while the customer base has surged to over 5900 – including about two-fifths of Fortune 500 companies. Its value has since come down to around $73 billion (as of March 29). It went public in 2020, and rocketed to a market value of $100 billion, as the pandemic pushed enterprises to invest more into their data infrastructure to allow for things like hybrid work. The ease of use and scale of the platform has driven massive adoption of Snowflake over the years. Today, Snowflake customers can easily connect business intelligence tools such as Tableau and conduct historical data analyses using SQL on their datasets. It transformed the warehousing space by offering highly scalable and distributed computation capability. In no time, the company became the go-to choice for a cloud database that would give customers a single platform to store, access, analyze and share large amounts of structured data from anywhere (AWS, Azure, or any other source). Snowflake, founded in 2012 by former Oracle data architects Benoit Dageville and Thierry Cruanes, came to the scene as a better, faster alternative to Hadoop. Nearly two decades ago, the open source Java-based framework took the initial steps to solve the storage and processing layer for big data, but it failed to gain widespread adoption due to technical complexities. To understand, we have to go back to Hadoop. While the data industry has seen and continues to see many data platforms, including offerings from Amazon and Google and a bunch of startups, Databricks and Snowflake have left a particular mark. That is why Databricks has been working hard to make its data lake more compatible with the features of a warehouse, and Snowflake has been adapting its warehouse to allow more features of a data lake. Plus, because the data is constantly changing (depending on the pipeline), the information stored in a warehouse may not be as current as that in a data lake. Their teams need to employ continuous data engineering tactics to ETL/ELT data between the two systems, which can affect the overall quality of the data. While the process has been useful, companies often find it difficult and costly to maintain consistency between their data lake and their data warehouse infrastructures. This is where business and other users can more easily generate useful business insights from the data. Then they can perform a round of ETL (extract, transform, load) procedures to shift critical parts of this data into a form that can be stored in a data warehouse. Organizations typically take enterprise data from various sources and operational processes, and first store it in a raw data lake. On the other hand, Databricks is very focused on technological excellence, performance, features and high-end machine learning capabilities.” The move to the lakehouseĬloud data lakes and warehouses have become a critical element in answering enterprise data management needs. “They are great sellers and are building a data marketplace that adds real value. “Snowflake’s innovation is its investment in its ecosystem and partnerships – and its PR and sales machines,” said Andrew Brust, founder of strategy and advisory firm Blue Badge Insights. Their recent moves continue to show how different they are. And it’s run not by researchers and academics, but a CEO Frank Slootman, who’s had more than a decade of experience as a business executive running large companies as CEO or president.Īnd now, while they come from different ends of the spectrum, they are now branching out into each other’s territory, with the goal to build the one-stop-shop for all things enterprise data or what many refer to as a ‘lakehouse’. In the other corner is Snowflake, which innovated what is called the data warehouse, a place that, simply put, starts with more structure, to allow more easy analytics on the data. In the one corner is Databricks, which innovated what is called a data lake, a place where you can dump all of your data – no matter the format – and was built and is still run by researchers and academics who dream of “changing the world,” says the company’s CEO Ali Ghodsi, who was an academic for seven years before founding Databricks.
0 Comments
Leave a Reply. |