A cloud data warehouse is a database delivered in public cloud as a managed service that is optimized for analytics, scale and ease of use. We believe that cloud data warehouses are a game changer and the next wave in data warehousing. Used thoughtfully, cloud data warehouses can dramatically lower your operating costs while giving you the agility to keep up with the demands of the business.
In The late 2000s, working with Oracle 7.3/8i and Microsoft SQL 2000, a“relational” database where data was formatted into tables. The concept of adata service using SQL with dimension fact modeling was a game changer. In the 2005, when relational databases began to struggle with the size and complexity of analytical workloads, we saw the emergence of the Massive Parallel Processing (MPP) data warehouses like Teradata, Netezza and later, Vertica and Greenplum. In 2010, a sea change in data management with an open source project called Hadoop. The concept of a “data lake” where I could query raw unstructured data was a huge leap forward in my ability to capture, store and process more data with more agility at a substantially lower cost.
We’re Now witnessing a third wave of innovation in data warehousing technology with the advent of cloud data warehouses from AWS, Microsoft, Google, Oracle, IBMetc. with Serverless technology services; e.g. AWS EMR, Redshift with Lambda Functions and Azure SQL Datawarehouse DataFactory, Azure Data Lake Storage,Azure Databricks, Azure Event Hub, Azure IOT Hub, Azure Analysis Services, etc.and Google BigQuery, Google Dataflow, Google Data Catalog, Google Dataproc,Google Datafusion, Workflow Orchestration using Airflow, Cloud Data transfer services . As enterprises move to the cloud, they are abandoning their legacy on-premise data warehousing technologies, including Hadoop, for these new cloud data platforms. This transformation is a huge tectonic shift in data management and has profound implications for enterprises.
Each of the major public cloud vendors offer their own flavor of a cloud data warehouse service: Microsoft has Azure SQL Data Warehouse, Google offers BigQuery and Amazon has Redshift. There are also cloud offerings from the likes of Snowflake that provide the same capabilities via a service that runs on the public cloud but is managed independently. Foreach of these services, the cloud vendor or data warehouse provider delivers the following capabilities “out of the box”:
Cloud-based data warehouses free up companies to focus on running their business, rather than running a room full of servers,and they allow business intelligence teams to deliver faster and better insights due to robust, reliable and improved access, scalability, and performance.
How these cloud data warehouse vendors deliver these capabilities and how they charge, Let’s dive deeper into the different deployment implementations and pricing models.
There are two main camps of cloud data warehouse architectures. The first, older deployment architecture is cluster-based: Amazon Redshift and Azure SQL Data Warehouse fall into this category. Typically, clustered cloud data warehouses are really just clustered Postgres derivatives, ported to run as a service in the cloud. The other flavor, serverless, is more modern and counts Google BigQuery and Snowflake as examples. Essentially, serverless cloud data warehouses make the database cluster “invisible” or shared across many clients. Each architecture has their pros and cons (see below).
Besides deployment architecture,another major difference between the cloud data warehouse options is pricing.In all cases, you pay some nominal fee for the amount of data stored. But the pricing differs for compute.
For example, Google BigQuery and Snowflake offer on-demand pricing options based on the amount of data scanned or compute time used. Amazon Redshift and Azure SQL Data Warehouse offerre source pricing based on the number or types of nodes in the cluster. There are pros and cons to both types of pricing models. The on-demand models only charge you for what you use which can make budgeting difficult as it is hard to predict the number of users and the number and size of the queries they will be running. If a user mistakenly ran large number of query e.g. $1,000+ query,charge will be high for the mistake.
Pricing is a major consideration and requires a great deal of use case and workload modeling to find the right fit for your organization.
We’ve seen lots of enterprise sattempt a migration from their on-premise data lakes and/or relational data warehouses to the cloud. For many, their migrations “stall” after the first pilot project due to the following reasons: