Introduction to Data Visualization Tools
Data visualization helps handle and analyze complex information using the data visualization tools such as matplotlib, tableau, fusion charts, QlikView, High charts, Plotly, D3.js, etc. as these tools help in getting the graphical representation of the data and information in the form of charts, graph, and maps, using this the data visualization designers can easily create the visual representation of the large dataset which in turn helps in making the effective decision by getting insight from the large dataset.
What are Data Visualization Tools?
There are numerous data visualization tools such as Tableau, QlikView, FusionCharts, HighCharts, Datawrapper, Ploty, D3.js, etc. Though there are humungous data visualization tools used in day-to-day life in Data visualization, One of the most popular plotting tools is matplot. pyplot.
Reasons why Matplotlib from data visualization tools is the most widely used:
- Matplotlib is one of the most important plotting libraries in python.
- The whole plotting module is inspired by plotting tools that are available in Matlab.
- The main reason is a lot of people come from the areas of Mathematics, Physics, Astronomy, and Statistics and a lot of Engineers and Researchers are used to Matlab.
- Matlab is a popular scientific computing toolbox out there, especially for scientific computing. So when people starting python specific plotting library for machine learning / Data science / Artificial Intelligence they got inspired by MATLAB and built a library called matplotlib
matplotlib.pyplot: matplotlib. pyplot is used widely in creating figures with an area, plotting the lines and we can do visualize the plots attractively.
Examples of Data Visualization Tools
Below are the examples mentioned:
import matplotlib.pyplot as plt.
plt.plot([2,4, 6, 4])
The above is a list, plt.plot will plot these list elements of the Y-axis which is indexed at 0,1,2,3 as their corresponding X-axis.
If we look at the above 2 lines of code, it labels the Y-axis and X-axis respectively. (i.e, naming both axis.)
The above line of code will give the title to the plot. The title tells us what the plot is all about.
There is one problem with the above plot(screenshot 1), if you have noticed, we don’t see a grid-like structure. A grid helps you to read the values from the plot much easier. Now let’s see how to get the grid.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16])
Look at the above line of code, instead of giving one array, we have two lists which become our X-axis and Y-axis. Here you can notice is, if our x-axis value is 2 its corresponding y-axis value is 4 i.e, y-axis values are the squares of x-axis values.
plt.grid() # grid on
The moment you give this it will give a plot with a grid embed on it.
Now instead of line plot, We plot a different plot with a different example.
plt.plot([1, 2, 3, 4], [1, 4, 9, 16], ‘ro’)
Every X, Y pair has an associated parameter like the color and the shape which we can give accordingly using the functionality of the python keyword pair argument.
In this case, ‘ro’ indicates r – red color and o – circle shaped dots.
Let’s say matplot lib works only with the list then we can’t use it widely in the processing of numbers. We can use the NumPy package. Also, everything is converted internally as a NumPy array
Let’s look slightly at the different plot:
Below are the different plot:
import numpy as np>
t = np.arange(0., 5., 0.2)
Above line creates values from 0 to 5 with an interval of 0.2.
plt.plot(t, t**2, 'b--', label='^2')# 'rs', 'g^')
plt.plot(t,t**2.2, 'rs', label='^2.2')
plt.plot(t, t**2.5, 'g^', label=‘^2.5')
In the above lines of code ‘b – – ‘ indicates Blue dashes, ‘rs’ indicates Red squares, ‘g^’ indicates Green triangles.
The above line of code adds a legends-based online label. Legends make the plot extremely readable.
If we want the line width to be more, then a simple parameter called linewidth can do it.
x = [1, 2, 3, 4]
y = [1, 4, 9, 16]
plt.plot(x, y, linewidth=5.0)
There are many other various parameters available which you can have at the documentation of plot function in matplotlib.pyplot(https://matplotlib.org/api/pyplot_api.html).
The other interesting thing is set properties:
- x1 = [1, 2, 3, 4] y1 = [1, 4, 9, 16]
Y1 values are square of X1 values
- x2 = [1, 2, 3, 4] y2 = [2, 4, 6, 8]
Y2 values are just twice of X2 values
- lines = plt.plot(x1, y1, x2, y2)
By using the above line we can plot these values in a single line. So what happens here is it will plot X1 vs Y1 and X2 vs Y2 and we are storing these in a variable called lines. Also, we can change the properties of those lines using keyword arguments.
- plt.setp(lines, color=’r’, linewidth=2.0)
Here setp is called as set properties, lines corresponding to X1,Y1 respectively, color and linewidth are the arguments.The above line of code is written using keyword arguments (refer screenshot 6).
- plt.setp(lines, ‘color’, ‘g’, ‘linewidth’, 2.0)
The above line of code represents the matlab syntax .
Here lines corresponds to X2, Y2 respectively. We also have two pairs of arguments ‘colour’,’g’ and ‘linewidth’,’2.0’.
Either of the way we can plot the line:
- The first way is the native way of how we use in python.
- The second way is preferably used by the people from the Matlab background.
In this data visualization tools post, we have discovered the introduction to visualizing the data in Python. To be more specific we have seen how to chart data with line plots and how to summarise the relationship between variables with scatter plots.
This has been a guide to data visualization tools. Here we have studied the basic concepts and tools of data visualization with their examples. You may also look at the following articles to learn more –