EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login

Big Data Architecture

Home » Data Science » Data Science Tutorials » Big Data Tutorial » Big Data Architecture

Big Data Architecture

Introduction to Big Data Architecture

When it comes to managing heavy data and doing complex operations on that massive data there becomes a need to use big data tools and techniques. When we say using big data tools and techniques we effectively mean that we are asking to make use of various software and procedures which lie in the big data ecosystem and its sphere. There is no generic solution that is provided for every use case and therefore it has to be crafted and made in an effective way as per the business requirements of a particular company. Thus there becomes a need to make use of different big data architecture as the combination of various technologies will result in the resultant use case being achieved. By establishing a fixed architecture it can be ensured that a viable solution will be provided for the asked use case.

What is Big Data Architecture?

  • This architecture is designed in such a way that it handles the ingestion process, processing of data and analysis of the data is done which is way too large or complex to handle the traditional database management systems.
  • Different organizations have different thresholds for their organizations, some have it for a few hundred gigabytes while for others even some terabytes are not good enough a threshold value.
  • Due to this event happening if you look at the commodity systems and the commodity storage the values and the cost of storage have reduced significantly. There is a huge variety of data that demands different ways to be catered.
  • Some of them are batch related data that comes at a particular time and therefore the jobs are required to be scheduled in a similar fashion while some others belong to the streaming class where a real-time streaming pipeline has to be built to cater to all the requirements. All these challenges are solved by big data architecture.

Explanation of Big Data Architecture

Big Data Architecture Systems

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

Big Data systems involve more than one workload types and they are broadly classified as follows:

  1. Where the big data-based sources are at rest batch processing is involved.
  2. Big data processing in motion for real-time processing.
  3. Exploration of interactive big data tools and technologies.
  4. Machine learning and predictive analysis.

1. Data Sources

The data sources involve all those golden sources from where the data extraction pipeline is built and therefore this can be said to be the starting point of the big data pipeline.

The examples include:
(i) Datastores of applications such as the ones like relational databases

(ii) The files which are produced by a number of applications and are majorly a part of static file systems such as web-based server files generating logs.

(iii) IoT devices and other real time-based data sources.

2. Data Storage

This includes the data which is managed for the batch built operations and is stored in the file stores which are distributed in nature and are also capable of holding large volumes of different format backed big files. It is called the data lake. This generally forms the part where our Hadoop storage such as HDFS, Microsoft Azure, AWS, GCP storages are provided along with blob containers.

Popular Course in this category
Hadoop Training Program (20 Courses, 14+ Projects, 4 Quizzes)20 Online Courses | 14 Hands-on Projects | 135+ Hours | Verifiable Certificate of Completion | Lifetime Access | 4 Quizzes with Solutions
4.5 (5,677 ratings)
Course Price

View Course

Related Courses
MapReduce Training (2 Courses, 4+ Projects)Splunk Training Program (4 Courses, 7+ Projects)Apache Pig Training (2 Courses, 4+ Projects)

3. Batch Processing

All the data is segregated into different categories or chunks which makes use of long-running jobs used to filter and aggregate and also prepare data o processed state for analysis. These jobs usually make use of sources, process them and provide the output of the processed files to the new files. The batch processing is done in various ways by making use of Hive jobs or U-SQL based jobs or by making use of Sqoop or Pig along with the custom map reducer jobs which are generally written in any one of the Java or Scala or any other language such as Python.

4. Real Time-Based Message Ingestion

This includes, in contrast with the batch processing, all those real-time streaming systems which cater to the data being generated sequentially and in a fixed pattern. This is often a simple data mart or store responsible for all the incoming messages which are dropped inside the folder necessarily used for data processing. There are, however, majority of solutions that require the need of a message-based ingestion store which acts as a message buffer and also supports the scale based processing, provides a comparatively reliable delivery along with other messaging queuing semantics. The options include those like Apache Kafka, Apache Flume, Event hubs from Azure, etc.

5. Stream Processing

There is a slight difference between the real-time message ingestion and stream processing. The former takes into consideration the ingested data which is collected at first and then is used as a publish-subscribe kind of a tool. Stream processing, on the other hand, is used to handle all that streaming data which is occurring in windows or streams and then writes the data to the output sink. This includes Apache Spark, Apache Flink, Storm, etc.

6. Analytics-Based Datastore

This is the data store that is used for analytical purposes and therefore the already processed data is then queried and analyzed by using analytics tools that can correspond to the BI solutions. The data can also be presented with the help of a NoSQL data warehouse technology like HBase or any interactive use of hive database which can provide the metadata abstraction in the data store. Tools include Hive, Spark SQL, Hbase, etc.

7. Reporting and Analysis

The insights have to be generated on the processed data and that is effectively done by the reporting and analysis tools which makes use of their embedded technology and solution to generate useful graphs, analysis, and insights helpful to the businesses. Tools include Cognos, Hyperion, etc.

8. Orchestration

Big data-based solutions consist of data related operations that are repetitive in nature and are also encapsulated in the workflows which can transform the source data and also move data across sources as well as sinks and load in stores and push into analytical units. Examples include Sqoop, oozie, data factory, etc.

Conclusion

In this post, we read about the big data architecture which is necessary for these technologies to be implemented in the company or the organization. Hope you liked our article.

Recommended Articles

This has been a guide to Big Data Architecture. Here we discussed what is big data? and we’ve also demonstrated the architecture of big data along with the block diagram. You can also go through our other suggested articles to learn more –

  1. Big Data Technologies
  2. Big Data Analytics
  3. Careers in Big Data
  4. Big Data interview questions
  5. Big Data Programming Languages | Top 5

Hadoop Training Program (20 Courses, 14+ Projects)

20 Online Courses

14 Hands-on Projects

135+ Hours

Verifiable Certificate of Completion

Lifetime Access

4 Quizzes with Solutions

Learn More

0 Shares
Share
Tweet
Share
Primary Sidebar
Big Data Tutorial
  • Big Data Basics
    • Introduction To Big Data
    • What is Big Data
    • Big Data Architecture
    • Big data Concepts
    • Careers in Big Data
    • Is Big Data a Database
    • Trends Of Big Data
    • Big Data Technologies
    • Big Data Programming Languages
    • Challenges of Big Data Analytics
    • What is Big Data Technology
    • Most Critical Aspect of Big Data
    • What is Big data and Hadoop
    • What Is NOSQL
    • Big Data Techniques
    • Big Data in Banking
    • Big Data interview questions
  • Big data and analytics
    • What is Big data analytics
    • What is Data Analysis
    • What is Data Analyst
    • What is Data Analytics
    • Careers in Data Analytics
    • Data Analysis Process
    • Who is a Data Scientist
    • What is Data Visualization
    • Types of Data Visualization
    • Types of Qualitative Data
    • Secondary Data Analysis
    • Data Visualization Tools
    • Benefits of Data Visualization
    • Best Data Visualization Tools
    • What is a Data Scientist?
    • What do Data Scientists Do
    • Skills Required for Data Scientist
    • Data Scientist Skills
    • How to Become a Data Scientist
    • Data Analyst Associate
    • Big Data Analytics
    • Big Data Analytics Examples
    • Big Data Analytics Jobs
    • Customer Data
    • Big Data Analytics Salary
    • Big Data Analytics Software
    • Big Data Analytics Techniques
    • Big Data Analytics Tools
    • Data Analysis Techniques
    • Data Analysis Software
    • Data Quality Tools
    • Data Analysis Tools
    • Data Analysis Tools Research
    • Types of Data Analysis
    • Types of Quantitative Research
    • What is Qualitative Data Analysis
    • Free Data Analysis Tools
    • Data Analytics Trends in 2019
    • Types of Data Analysis Techniques
    • Data Analytics Interview Questions
    • Data Analyst Interview Questions
  • Statistical Analysis
    • Statistical Analysis
    • Statistical Analysis Types
    • Statistical Analysis Softwares
    • Free Statistical Analysis Software in the market
    • Types of Data in Statistics
    • Statistical Analysis Tools
    • Statistical Data Analysis Techniques
    • Statistical Analysis Methods
    • Exploratory Data Analysis
    • Statistical Analysis Regression

Related Courses

Hadoop Certification Training

MapReduce Training

Splunk Training Certification

Apache Pig Training

Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

© 2020 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you
Book Your One Instructor : One Learner Free Class

Let’s Get Started

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA Login

Forgot Password?

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

Special Offer - Hadoop Training Program (20 Courses, 14+ Projects) Learn More