Introduction to Snowflake Architecture
Snowflake schema is a type of multidimensional database in a data warehouse with different logical tables in it, here the entity-relationship tabular diagram is managed into the dimensional of a snowflake diagram. It is represented as a part of a centralized fact table that is associated to multiple and n number of dimensions. Further, these dimensional tables are sub-divided into a number of dimensional tables, that are further normalized to eliminate the redundancy from the table. For example, we can consider a product table that is sub-divided into n number of tables of both products and its sub-products.
A snowflake schema is similar to a star schema architecture. But, the dimensional table of a star schema are generally normalizing table, for an individual table to relate to each dimension. Snowflake schemas are built to ease the queries over more complex relationships and its tables. It is best suitable for one-to-many or many-to-many relationships amongst the dimensional level and is commonly linked to data mart and dimensional tables in data warehouses, where the data’s retrieval velocity is more important than the data manipulation capability. Snowflake schema mainly uses the advanced query tools to build on any abstraction layer between the client and a raw table for examples that have a number of queries with detailed specifications.
Characteristics of Snowflake Schema
In data warehousing architecture, Snowflaking is a mode of dimension modeling where the dimensions are gathered in n number of linked to dimension tables.
The dimension architectural schema of snowflake architecture is built under the following conditions:·
- The snowflake schema in data warehouse occupies less memory in the space.
- It is quite easier to build dimension when it is stopped on to the schema level.
- There are number of tables in the schema to reduce the performance in the warehouse.
- The dimensional table in the database has two or more set of attribute that determine information at different level.
Advantages & Disadvantages of Snowflake Architecture
Below are the advantages and disadvantages of Snowflake Architecture:
There are two important advantages of snowflake architecture listed below:
- This schema provides a structured manner of data that eliminates the issue of data integrity.
- Snowflake schema uses less disk space in the data ware house as data are very much structured.
Few disadvantages of snowflake architecture are listed below with brief description:
- Snowflaking in snowflake schema reduces the space that is used by the dimensional tables, however when it is compared to the entire set of schemas in the data warehouse the space saving is quite insignificantly handled.
- It is better to reduce snowflaking or normalizing of a dimensional table, except whenever required and important.
- It is advised not to separate the hierarchy of any one dimensional table.
- Number of hierarchy for the similar dimensional table are designed at least possible level.
Components of Snowflake Architecture
A snowflake schema is a cloud-based data warehouse that is built on top of cloud infrastructure that is Amazon Web Services (AWS) and is an efficient SaaS offering schema. There is no such virtual or physical hardware to choose options like either select or install or configure or to manage. All present activities like maintenance, managing or tuning are handled by Snowflake architecture. There are 3 important components that summarizes the Snowflake data warehouse. Below layers will give you an architectural view and its component view of snowflake schema.
The 3 main components that constitute a snowflake schema are listed below with brief description:
1. Database Storage
The correct elemental file system in this schema is hired by the S3 in Snowflake’s database account where all the data is compressed, organized, and evenly distributed among the tables so as to optimize the efficiency level. In AWS S3 instance the data is much redundant and it also gives efficient data durability and the availability of data in the ware house. Compute nodes are used to interact with the storage layer in the architecture to get the data to process the query. However, the storage layer is not dependent, users only use the average data storage used in a month. As snowflake schema is supported on the Cloud Service, storage is flexible and it is debited as per the monthly usage.
2. Query Processing
Snowflake schema will give the user an option to develop a “Virtual Warehouses” in the system that are generally used to determine the cluster in AWS EC2. The Virtually Setup Warehouses are owned to process the data or fire a query and is able of running the tasks parallel. The Virtual Warehouses that is setup in the data warehouse can be geared up or it will be interrupted when that is unused to minimize the computation speed.
Query that run in this component layer are used from the data coming from the storage layer. Usually, a virtual warehouse server owns its own independent computer cluster and it will not interact with the other data warehouses.
3. Cloud Services
Snowflake schema will coordinate and handle the other services that includes session classes, authenticated servers, SQL compile query, encrypted data, etc. There are number of virtual warehouses in the data ware house that can be owned and are geared up when other resources are needed for quick query loading and to normalized performance level. The design of snowflake architecture is designed in such a way that each component of the 3 layers can be scaled up independently and has more redundancy.
Snowflakes are more advantageous than standard SQL language. This is an advantage for the management that use with their clients are not needed to be “re-skilled”. Snowflake architecture supports data file formats like Avro, ORC, Parquet, XML and JSON. The capability to ease the stored structure, unstructured data, and few semi structure data may use the address to solve the problem to handle the conflicting data types that might be in a particular datawarehouse. This is a huge track towards the data that provided more value to the data that gives an advanced way of analysis.
This is a guide to Snowflake Architecture. Here we discuss the Introduction to Snowflake Architecture and its advantages and disadvantages along with its Characteristics. You can also go through our suggested articles to learn more –