Updated April 1, 2023

Introduction to Sparse Matrix in Python

The following article provides an outline for Sparse Matrix in Python. In a matrix, if most of the values are 0, then it is a sparse matrix. It is widely used in machine learning for data encoding purposes and in the other fields such as natural language processing. The main advantages of a sparse matrix are storage capacity and computing time. Since most of the values are zero, it takes only lesser memory, and it can be easily computed by creating a data structure to look only for non-zero values.

Syntax of Sparse Matrix

The following matrix is an example of a sparse matrix:

0 0 5 0 9
0 0 0 7 0
0 0 0 0 0
0 1 7 0 0

The sparsity of a matrix is calculated using the formula:

Sparsity=(no of zero’s)/ size of the matrix

In the above example, it has 15 zero values. Hence the sparsity of the matrix is 0.75 or 75%. Therefore, the sparse matrix is considered the best data structure for storage if the matrix has only a few non-zero values.

There are seven different types of a sparse matrix that are available.

Block Sparse Matrix(bsr)
Coordinate Format Matrix(coo)
Compressed Space Column Matrix(csc)
Compressed Space Row Matrix(csr)
Sparse Matrix With Diagonal Storage(dia)
Dictionary Of Keys Based Sparse Matrix(dok)
Linked List Sparse Matrix(lil)

Choosing the Right Sparse Matrix Type

It is very important to know when to use which type of sparse matrix. Choosing the right matrix only will make the operation more efficient.
Whenever a new sparse matrix must be built from the bottom, then it is advisable to use either a Linked list sparse matrix or dictionary of keys matrix.
These two matrices are, however, not efficient for doing arithmetic calculations.
Whenever there is a need for multiplication or traverse compressed space column of a compressed space row would be the best option; while the former is efficient in slicing columns, the latter is used for efficiently slicing rows.

Drawbacks of Sparse Matrix

The following are the two major drawbacks of a sparse matrix.

They are space complexity and time complexity.

1. Space Complexity

In real-life examples, most of the matrices are sparse. Large memory is required to store for a large matrix, especially a link matrix, which shows links from one site to another. An example of a smaller matrix is the example of the occurrence of a word in a book against all the words in the language. In both the cases, the result of the matrix is mostly going to be zero, and memory must be allocated for all.

2. Time Complexity

The operation of a sparse matrix such as the addition or multiplication of two sparse matrices may take a long time even though the output of most operations is going to be zero. This is a problem that increases with the size of the matrix. This is doubled considering all machine learning methods requires operation on each row and column, which results in higher execution time.

Real-life examples of a sparse matrix:

If a user has watched at least one movie from the movie catalog.
If a user has purchased any product listed in the product catalog.
The number of times a song is listened to in a song catalog.
Natural language processing for manipulating text documents.
Recommendations for products in a product catalog.
When working with a large number of images which has lots of black pixels.
Normalize scores of word frequency in a dictionary.

Examples of Sparse Matrix in Python

Given below are the examples of Sparse Matrix in Python:

Example #1

Code:

print("demo of sparse matrix in python")
print("creating and printing csr matrix")
import numpy as num
from scipy.sparse import csr_matrix,csc_matrix
csrmatrixeg = csr_matrix((3, 2),dtype = num.int8).toarray()
print(csrmatrixeg)
print("next sparse matrix")
r= num.array([0, 1, 0, 2, 2, 0])
c = num.array([0, 0, 2, 0, 0, 2])
d = num.array([1, 2, 5, 7, 9, 3])
op= csr_matrix((d, (r, c)), shape = (3, 4)).toarray()
print(op)
print("demo of creating csc matrix in pyuthon")
egmat = csc_matrix((3, 5),dtype = num.int8).toarray()
print(egmat)
r1= num.array([0, 0, 0, 2, 2, 0])
c1 = num.array([0, 0, 1, 0, 0, 0])
d1 = num.array([1, 2, 0, 0, 9, 3])
op1= csc_matrix((d, (r, c)), shape = (4, 4)).toarray()
print(op1)

Output:

Example #2

Code:

print("demo of sparse matrix in python")
print("creating and printing c00 matrix")
import numpy as num
from scipy.sparse import coo_matrix
coomateg = coo_matrix((3, 2),dtype = num.int8).toarray()
print(coomateg)
print("ooo sparse matrix")
r= num.array([0, 1, 0, 2, 2, 0])
c = num.array([0, 0, 2, 0, 0, 2])
d = num.array([1, 2, 5, 7, 9, 3])
op= coo_matrix((d, (r, c)), shape = (3, 4)).toarray()
print(op)

Output:

An alternate data structure needs to be considered when working with a sparse matrix. For example, the non-zero values alone should be considered, and the zeros should be ignored.

There are multiple such data structures are as follows:

Dictionary: Here, a value is mapped by the intersection of a row and column.
List of Lists: Here, a list is used to store a matrix, and a sub list is used to store the value and column of the list.
Coordinate List: Matrix is stored as a list of tuples with the values of the tuple being row and column index along with the corresponding value.

Compressed Sparse row and Compressed sparse columns are the other commonly used data structures. However, compressed space row is more often used in machine learning as it supports the multiplication of matrices.

Conclusion

Thus, the article explained in detail about sparsed matrix in Python. It explained in detail about the various types of sparse matrix, their use and their efficiency, along with appropriate examples. It also explained how to calculate the sparsity of a matrix and when and where to use which type of sparse matrix.