Introduction to Data visualization tools :
Data visualization is the key to understand the outcome of any solution regardless of domain. It brings more statistical inferences from understanding the data, exploring the various patterns among the data and more towards the model understanding results. Data visualization tools is one of the main areas where any person can get into the core understanding of the relationships. Nowadays it is more widely used in business meetings, stakeholders which make them easily identify the core value of the product.
What is Data Visualisation Tools?
There are numerous data visualization tools such as Tableau, QlikView, FusionCharts, HighCharts, Datawrapper, Ploty, D3.js, etc. Though there are humungous data visualization tools used in day to day life in Data visualization, One of the most popular plotting tools is matplot.pyplot.
Reasons why Matplotlib from data visualization tools is most widely used-
- Matplotlib is one of the most important plotting libraries in python.
- The whole plotting module is inspired by plotting tools that are available in MATLAB.
- The main reason is a lot of people come from the areas of Mathematics, Physics, Astronomy, and Statistics and a lot of Engineers and Researchers are used to MATLAB.
- MATLAB is a popular scientific computing toolbox out there, especially for scientific computing. So when people starting python specific plotting library for machine learning / Data science / Artificial Intelligence they got inspired by MATLAB and built a library called matplotlib
matplotlib.pyplot is used widely in creating figures with an area, plotting the lines and we can do visualize the plots attractively.
Let’s dive directly to very simple examples –
import matplotlib.pyplot as plt
plt.plot([2,4, 6, 4])
The above is a list, plt.plot will plot these list elements of Y-axis which is indexed at 0,1,2,3 as their corresponding X-axis.
If we look at the above 2 lines of code, it labels the Y-axis and X-axis respectively. (i.e, naming both axis.)
The above line of code will give the title to the plot. The title tells us what the plot is all about.
There is one problem with the above plot(screenshot 1), if you have noticed, we don’t see a grid-like structure. A grid helps you to read the values from the plot much more easier. Now let’s see how to get the grid.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
Look at the above line of code, instead of giving one array, we have two lists which becomes our X-axis and Y-axis. Here you can notice is, if our x-axis value is 2 it’s corresponding y-axis value is 4 i.e, y-axis values are the squares of x-axis values.
plt.grid() # grid on
The moment you give this it will give a plot with grid embed on it as shown in screenshot 2
Now instead of line plot, We plot a different plot with a different example.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], ‘ro’)
Every X, Y pair has an associated parameter like the color and the shape which we can give accordingly using the functionality of the python keyword pair argument.
In this case, ‘ro’ indicates r – red color and o – circle shaped dots (as shown in screenshot 3).
Let’s say matplot lib works only with the list then we can’t use it widely in the processing of numbers. We can use the NumPy package. Also, everything is converted internally as a NumPy array
Let’s look slightly at the different plot:
import numpy as np
t = np.arange(0., 5., 0.2)
Above line creates values from 0 to 5 with an interval of 0.2.
plt.plot(t, t**2, ‘b–‘, label=’^2’)# ‘rs’, ‘g^’)
plt.plot(t,t**2.2, ‘rs’, label=’^2.2′)
plt.plot(t, t**2.5, ‘g^’, label=‘^2.5′)
In the above lines of code ‘b – – ‘ indicates Blue dashes, ‘rs’ indicates Red squares, ‘g^’ indicates Green triangles(refer screenshot 4)
The above line of code adds legends based online label. Legends make the plot extremely readable.
Lets understand some more properties. If we want the line width to be more, then a simple parameter called linewidth can do it.
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
plt.plot(x, y, linewidth=5.0)
There are many other various parameters available which you can have at the documentation of plot function in matplotlib.pyplot(https://matplotlib.org/api/pyplot_api.html).
The other interesting thing is set properties.
x1 = [1, 2, 3, 4] y1 = [1, 4, 9, 16]
Y1 values are square of X1 values
x2 = [1, 2, 3, 4] y2 = [2, 4, 6, 8]
Y2 values are just twice of X2 values
lines = plt.plot(x1, y1, x2, y2)
By using the above line we can plot these values in a single line. So what happens here is it will plot X1 vs Y1 and X2 vs Y2 and we are storing these in a variable called lines. Also we can change the properties of those lines using keyword arguments.
plt.setp(lines, color=’r’, linewidth=2.0)
Here setp is called as set properties ,lines corresponding to X1,Y1 respectively, color and linewidth are the arguments.The above line of code is written using keyword arguments (refer screenshot 6).
plt.setp(lines, ‘color’, ‘g’, ‘linewidth’, 2.0)
The above line of code represents the matlab syntax .
Here lines corresponds to X2, Y2 respectively. We also have two pairs of arguments ‘colour’,’g’ and ‘linewidth’,’2.0’ (refer screenshot 6).
Either of the way we can plot the line.
- The first way is the native way of how we use in python.
- The second way is preferably used by the people from the MATLAB background.
Conclusion – Data Visualisation Tools
In this data visualization tools post, we have discovered the introduction to visualize the data in Python. To be more specific we have seen
- How to chart data with line plots
- How to summarise the relationship between variables with scatter plots
This has been a guide to data visualization tools. Here we have studied the basic concepts and tools of data visualization with their examples. You may also look at the following articles to learn more –