EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Spark Tutorial Spark Functions
Secondary Sidebar
Spark Tutorial
  • Basics
    • What is Apache Spark
    • Career in Spark
    • Spark Commands
    • How to Install Spark
    • Spark Versions
    • Apache Spark Architecture
    • Spark Tools
    • Spark Shell Commands
    • Spark Functions
    • RDD in Spark
    • Spark DataFrame
    • Spark Dataset
    • Spark Components
    • Apache Spark (Guide)
    • Spark Stages
    • Spark Streaming
    • Spark Parallelize
    • Spark Transformations
    • Spark Repartition
    • Spark Shuffle
    • Spark Parquet
    • Spark Submit
    • Spark YARN
    • SparkContext
    • Spark Cluster
    • Spark SQL Dataframe
    • Join in Spark SQL
    • What is RDD
    • Spark RDD Operations
    • Spark Broadcast
    • Spark?Executor
    • Spark flatMap
    • Spark Thrift Server
    • Spark Accumulator
    • Spark web UI
    • Spark Interview Questions
  • PySpark
    • PySpark version
    • PySpark Cheat Sheet
    • PySpark list to dataframe
    • PySpark MLlib
    • PySpark RDD
    • PySpark Write CSV
    • PySpark Orderby
    • PySpark Union DataFrame
    • PySpark apply function to column
    • PySpark Count
    • PySpark GroupBy Sum
    • PySpark AGG
    • PySpark Select Columns
    • PySpark withColumn
    • PySpark Median
    • PySpark toDF
    • PySpark partitionBy
    • PySpark join two dataframes
    • PySpark?foreach
    • PySpark when
    • PySPark Groupby
    • PySpark OrderBy Descending
    • PySpark GroupBy Count
    • PySpark Window Functions
    • PySpark Round
    • PySpark substring
    • PySpark Filter
    • PySpark Union
    • PySpark Map
    • PySpark SQL
    • PySpark Histogram
    • PySpark row
    • PySpark rename column
    • PySpark Coalesce
    • PySpark parallelize
    • PySpark read parquet
    • PySpark Join
    • PySpark Left Join
    • PySpark Alias
    • PySpark Column to List
    • PySpark structtype
    • PySpark Broadcast Join
    • PySpark Lag
    • PySpark count distinct
    • PySpark pivot
    • PySpark explode
    • PySpark Repartition
    • PySpark SQL Types
    • PySpark Logistic Regression
    • PySpark mappartitions
    • PySpark collect
    • PySpark Create DataFrame from List
    • PySpark TimeStamp
    • PySpark FlatMap
    • PySpark withColumnRenamed
    • PySpark Sort
    • PySpark to_Date
    • PySpark kmeans
    • PySpark LIKE
    • PySpark?groupby multiple columns

Related Courses

Spark Certification Course

PySpark Certification Course

Apache Storm Course

Spark Functions

By Arpit AnandArpit Anand

Spark Functions

Introduction to Spark Functions

Spark Functions are the operations on the dataset that are mostly related to analytics computation. Spark framework is known for processing huge data set with less time because of its memory-processing capabilities. There are several functions associated with Spark for data processing such as custom transformation, spark SQL functions, Columns Function, User Defined functions known as UDF. Spark defines the dataset as data frames. It helps to add, write, modify and remove the columns of the data frames. It support built-in syntax through multiple languages such as R, Python, Java, and Scala. The Spark functions are evolving with new features.

List of Spark Functions

Now let us see some spark functions used in Spark.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,112 ratings)

1. Custom Transformation

Whenever we are adding columns, removing columns, adding rows or removing from a Data Frame we can use this custom transformation function of Spark.

The generic method signature for this is:-

def customTransformName(arg1: String)(df: DataFrame): DataFrame = {
// code that returns a DataFrame
}

Let us suppose we are working over a data frame and adding a column into it using the with column function.

The same thing can be achieved with the help of custom transformations in spark. We can refactor the code and eliminate the other dependent variable that makes the code well structured.

Let us see an example over custom transformation:-

Suppose we have a data frame:-

Val df = List((“Arpit”),(“Anand”)).toDF(“Name”)

Now we want to add two columns over the DF with the Address and State:-

Val df2 = df.withcolumn(“Address”,lit(“Bangalore”))
Val df3 = df2.withcolumn(“State”,lit(“karnataka”))
Df3.show()

This will show the dataframe that is produced with the new columns added as Address and State.

Spark Functions - Data frame 1

The same can be achieved with custom transformation as we can use the transform method over the dataframe to transform the data frame accordingly.

Let us see how it works:-

def Address()(df: DataFrame): DataFrame = {
df.withColumn(
"Address",
lit("Bangalore")
)
}

def State()(df: DataFrame): DataFrame = {
df.withColumn(
"State",
lit("karnataka")
)
}

Using the above data frame used and using the .transform function it will be like:-

transform(Address())
.tansform(State()).show()

It will have the same data frame result as above so here we can see how by using the transform method.

2. SPARK SQL FUNCTIONS

Spark comes over with the property of Spark SQL and it has many inbuilt functions that helps over for the sql operations.

Some of the Spark SQL Functions are :-

Count,avg,collect_list,first,mean,max,variance,sum .

Suppose we want to count the no of elements there over the DF we made.

A simple .count () function will count over the number of rows present over the dataframe.

df3.count()

2 // It will give the output as 2 counting the rows over the dataframe.

In this way only there are lots of SQL functions over there which we can use all over the code.

These functions are used for the columns of a data frame producing the type of result designed for.

3. COLUMNS FUNCTIONS

It returns the Column Objects. It can be used as the spark SQL functions with the columns over which needed to be implemented.

So here we are passing a column value as an input parameter, a set of action is performed over that functions and the result is returned back into data Frame.

4. USER DEFINED FUNCTIONS

There are several user-defined functions also that are used in spark under which a user can create an own set of functions with the rule defined in it all over and can use it over the code when needed.

They are similar to column function but they use Scala over Spark API.

Now let us create a UDF and check how It can help over the spark applications:-

def toUpperFunction(str: String): Option[String] = {
val s = Option(str).getOrElse(return None)
Some(s.toUpperCase())
}
val toUpper = udf[Option[String], String](toUpperFunction)

Here we made a UDF that converts the word to upper Case so whatever the column value we will be passing over there it will convert that into uppercase.

So this is a function which we are defining over own and using it over the data frames in spark application.

So the Dataframe we had earlier

Val df = List((“Arpit”),(“Anand”)).toDF(“Name”)

While passing our function over the column name we will get a new data frame as the changed one.

df.withcolumn(“toUpper”,toUpper(col(“Name”))).show()

Spark Functions - Data frame 2

So here we can see that these set of functions can be explicitly defined by the user and can be passed over desired results.

Conclusion

 So by this above article, we can see that we are having a scope of large no of spark functions that we can use over and over again in our spark application.

Recommended Articles

This is a guide to Spark Functions. Here we discuss some of the spark functions used in Spark along with the examples and workings. You may also have a look at the following articles to learn more –

  1. Spark Dataset
  2. Join in Spark SQL
  3. Spark Stages
  4. Top 5 Tools of Spark with Explanation
  5. Spark Broadcast | How to Work?
Popular Course in this category
Apache Spark Training (3 Courses)
  3 Online Courses |  13+ Hours |  Verifiable Certificate of Completion |  Lifetime Access
4.5
Price

View Course

Related Courses

PySpark Tutorials (3 Courses)4.9
Apache Storm Training (1 Courses)4.8
1 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more