EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials Spark Tutorial Spark Functions
 

Spark Functions

Arpit Anand
Article byArpit Anand
EDUCBA
Reviewed byRavi Rathore

Updated March 23, 2023

Spark Functions

 

 

Introduction to Spark Functions

Spark Functions are the operations on the dataset that are mostly related to analytics computation. Spark framework is known for processing huge data set with less time because of its memory-processing capabilities. There are several functions associated with Spark for data processing such as custom transformation, spark SQL functions, Columns Function, User Defined functions known as UDF. Spark defines the dataset as data frames. It helps to add, write, modify and remove the columns of the data frames. It support built-in syntax through multiple languages such as R, Python, Java, and Scala. The Spark functions are evolving with new features.

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

List of Spark Functions

Now let us see some spark functions used in Spark.

1. Custom Transformation

Whenever we are adding columns, removing columns, adding rows or removing from a Data Frame we can use this custom transformation function of Spark.

The generic method signature for this is:-

def customTransformName(arg1: String)(df: DataFrame): DataFrame = {
// code that returns a DataFrame
}

Let us suppose we are working over a data frame and adding a column into it using the with column function.

The same thing can be achieved with the help of custom transformations in spark. We can refactor the code and eliminate the other dependent variable that makes the code well structured.

Let us see an example over custom transformation:-

Suppose we have a data frame:-

Val df = List((“Arpit”),(“Anand”)).toDF(“Name”)

Now we want to add two columns over the DF with the Address and State:-

Val df2 = df.withcolumn(“Address”,lit(“Bangalore”))
Val df3 = df2.withcolumn(“State”,lit(“karnataka”))
Df3.show()

This will show the dataframe that is produced with the new columns added as Address and State.

Spark Functions - Data frame 1

The same can be achieved with custom transformation as we can use the transform method over the dataframe to transform the data frame accordingly.

Let us see how it works:-

def Address()(df: DataFrame): DataFrame = {
df.withColumn(
"Address",
lit("Bangalore")
)
}

def State()(df: DataFrame): DataFrame = {
df.withColumn(
"State",
lit("karnataka")
)
}

Using the above data frame used and using the .transform function it will be like:-

transform(Address())
.tansform(State()).show()

It will have the same data frame result as above so here we can see how by using the transform method.

2. SPARK SQL FUNCTIONS

Spark comes over with the property of Spark SQL and it has many inbuilt functions that helps over for the sql operations.

Some of the Spark SQL Functions are :-

Count,avg,collect_list,first,mean,max,variance,sum .

Suppose we want to count the no of elements there over the DF we made.

A simple .count () function will count over the number of rows present over the dataframe.

df3.count()

2 // It will give the output as 2 counting the rows over the dataframe.

In this way only there are lots of SQL functions over there which we can use all over the code.

These functions are used for the columns of a data frame producing the type of result designed for.

3. COLUMNS FUNCTIONS

It returns the Column Objects. It can be used as the spark SQL functions with the columns over which needed to be implemented.

So here we are passing a column value as an input parameter, a set of action is performed over that functions and the result is returned back into data Frame.

4. USER DEFINED FUNCTIONS

There are several user-defined functions also that are used in spark under which a user can create an own set of functions with the rule defined in it all over and can use it over the code when needed.

They are similar to column function but they use Scala over Spark API.

Now let us create a UDF and check how It can help over the spark applications:-

def toUpperFunction(str: String): Option[String] = {
val s = Option(str).getOrElse(return None)
Some(s.toUpperCase())
}
val toUpper = udf[Option[String], String](toUpperFunction)

Here we made a UDF that converts the word to upper Case so whatever the column value we will be passing over there it will convert that into uppercase.

So this is a function which we are defining over own and using it over the data frames in spark application.

So the Dataframe we had earlier

Val df = List((“Arpit”),(“Anand”)).toDF(“Name”)

While passing our function over the column name we will get a new data frame as the changed one.

df.withcolumn(“toUpper”,toUpper(col(“Name”))).show()

Spark Functions - Data frame 2

So here we can see that these set of functions can be explicitly defined by the user and can be passed over desired results.

Conclusion

 So by this above article, we can see that we are having a scope of large no of spark functions that we can use over and over again in our spark application.

Recommended Articles

This is a guide to Spark Functions. Here we discuss some of the spark functions used in Spark along with the examples and workings. You may also have a look at the following articles to learn more –

  1. Spark Dataset
  2. Join in Spark SQL
  3. Spark Stages
  4. Top 5 Tools of Spark with Explanation
  5. Spark Broadcast | How to Work?

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW