Introduction to PySpark Round
Round is a function in PySpark that is used to round a column in a PySpark data frame. It rounds the value to scale decimal place using the rounding mode. PySpark Round has various Round function that is used for the operation. The round-up, Round down are some of the functions that are used in PySpark for rounding up the value.
The round function is an important function in PySpark as it rounds up the value to the nearest value based on the decimal function. The return type of Round function is the floating-point number. It has various functions that can be used for rounding up the data based on which we decide the parameter it needs to be round up.
Syntax:
The syntax for the function is:
from pyspark.sql.functions import round, col
b.select("*",round("ID",2)).show()
b: The Data Frame used for the round function.
- select(): The select operation to be used in.
The syntax is used to select all the elements from the Data Frame.
- Round(): The Round Function to be used
It takes on two-parameter:
The Column name and the digit allowed the number to which round-up is possible.
Screenshot:
How does the ROUND operation work in PySpark?
The round operation works on the data frame column where it takes the column values as the parameter and iterates over the column values to round up the items. It accepts one parameter from that we can decide the position to which the rounding off needs to be done. If no parameters are given it will round up to the nearest value and return the data frame out of it.
The round function is an important function that is used when it comes to data rounding as the round-up data can be collected over the new Data Frame or the existing can be selected out of it. It is an iterative approach model that iterates over all the values of a column and applies the function to each and every model. We can either use the round-off function, round up, or round down to round up data elements in a data frame.
Let’s check the creation and usage with some coding examples.
Examples
Let’s see a few examples. Let’s start by creating simple data.
data1 = [{'Name':'Jhon','ID':21.528,'Add':'USA'},{'Name':'Joe','ID':3.69,'Add':'USA'},{'Name':'Tina','ID':2.48,'Add':'IND'},{'Name':'Jhon','ID':22.22, 'Add':'USA'},{'Name':'Joe','ID':5.33,'Add':'INA'}]
A sample data is created with Name, ID, and ADD as the field.
a = sc.parallelize(data1)
RDD is created using sc. parallelize.
b = spark.createDataFrame(a)
b.show()
Created Data Frame using Spark.createDataFrame.
Image:
Let us round the value of the ID and use the round function on it.
b.select("*",round("ID")).show()
This selects the ID column of the data frame and works over each and every element rounding up the value out of it. A new column is generated from the data frame which can be used further for analysis.
The ceil function is a PySpark function that is a Roundup function that takes the column value and rounds up the column value with a new column in the PySpark data frame.
from pyspark.sql.functions import ceil, col
b.select("*",ceil("ID")).show()
Image:
This is an example of a Round-Up Function.
The floor function is a round down function that takes the column value and rounds down the column value with a new column in the data frame.
from pyspark.sql.functions import floor, col
b.select("*",floor("ID")).show()
This is an example of the Round Down Function.
Image:
The round function Rounds the column value to the nearest integer with a new column in the PySpark data frame.
b.select("*",round("ID")).show()
Image:
The round-off function takes up the parameter and rounds it up to the nearest decimal place with a new column in the data frame.
b.select("*",round("ID",2)).show()
Image:
Note:
- ROUND is a ROUNDING function in PySpark.
- PySpark ROUND rounds up the data to a given value in the Data frame.
- It can be used to round up and down the values of the Data frame.
- PySpark ROUND function results can be used to create new columns in the Data frame.
- It uses the function ceil and floor for rounding up the value.
Conclusion
From the above article, we saw the use of round Operation in PySpark. From various examples and classifications, we tried to understand how the ROUND method works in PySpark and what is used at the programming level.
We also saw the internal working and the advantages of having ROUND in PySpark Data Frame and its usage for various programming purposes. Also, the syntax and examples helped us to understand much precisely the function.
Recommended Articles
This is a guide to PySpark Round. Here we discuss the Introduction, syntax, and parameters, how the ROUND operation works in PySpark. examples with code implementation. You may also have a look at the following articles to learn more –
3 Online Courses | 6+ Hours | Verifiable Certificate of Completion | Lifetime Access
4.5
View Course
Related Courses