EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Spark Tutorial SparkContext
Secondary Sidebar
Spark Tutorial
  • Basics
    • What is Apache Spark
    • Career in Spark
    • Spark Commands
    • How to Install Spark
    • Spark Versions
    • Apache Spark Architecture
    • Spark Tools
    • Spark Shell Commands
    • Spark Functions
    • RDD in Spark
    • Spark DataFrame
    • Spark Dataset
    • Spark Components
    • Apache Spark (Guide)
    • Spark Stages
    • Spark Streaming
    • Spark Parallelize
    • Spark Transformations
    • Spark Repartition
    • Spark Shuffle
    • Spark Parquet
    • Spark Submit
    • Spark YARN
    • SparkContext
    • Spark Cluster
    • Spark SQL Dataframe
    • Join in Spark SQL
    • What is RDD
    • Spark RDD Operations
    • Spark Broadcast
    • Spark?Executor
    • Spark flatMap
    • Spark Thrift Server
    • Spark Accumulator
    • Spark web UI
    • Spark Interview Questions
  • PySpark
    • PySpark version
    • PySpark Cheat Sheet
    • PySpark list to dataframe
    • PySpark MLlib
    • PySpark RDD
    • PySpark Write CSV
    • PySpark Orderby
    • PySpark Union DataFrame
    • PySpark apply function to column
    • PySpark Count
    • PySpark GroupBy Sum
    • PySpark AGG
    • PySpark Select Columns
    • PySpark withColumn
    • PySpark Median
    • PySpark toDF
    • PySpark partitionBy
    • PySpark join two dataframes
    • PySpark?foreach
    • PySpark when
    • PySPark Groupby
    • PySpark OrderBy Descending
    • PySpark GroupBy Count
    • PySpark Window Functions
    • PySpark Round
    • PySpark substring
    • PySpark Filter
    • PySpark Union
    • PySpark Map
    • PySpark SQL
    • PySpark Histogram
    • PySpark row
    • PySpark rename column
    • PySpark Coalesce
    • PySpark parallelize
    • PySpark read parquet
    • PySpark Join
    • PySpark Left Join
    • PySpark Alias
    • PySpark Column to List
    • PySpark structtype
    • PySpark Broadcast Join
    • PySpark Lag
    • PySpark count distinct
    • PySpark pivot
    • PySpark explode
    • PySpark Repartition
    • PySpark SQL Types
    • PySpark Logistic Regression
    • PySpark mappartitions
    • PySpark collect
    • PySpark Create DataFrame from List
    • PySpark TimeStamp
    • PySpark FlatMap
    • PySpark withColumnRenamed
    • PySpark Sort
    • PySpark to_Date
    • PySpark kmeans
    • PySpark LIKE
    • PySpark?groupby multiple columns

Related Courses

Spark Certification Course

PySpark Certification Course

Apache Storm Course

SparkContext

By Priya PedamkarPriya Pedamkar

SparkContext

Introduction to SparkContext

Generating the Spark context is the primary and necessary step in any SparkContext for the Spark driver. On worker the nodes, the operations inside the executors are run by the driver program. The gateway point of Spark in Apache functionality is the Spark context. Through the spark context, the driver application of Spark will be passed and they have parameters. A driver program initializes, which has the main function and the SparkContext gets initiated and generated here, as soon as we run any Spark application. Access and allowance to Spark Cluster is done with the help of Resource Manager which are of two types in main the Mesos, YARN. Initially, SparkConf ( spark configuration ) should be made to create a SparkContext.

Syntax for Apache SparkContext:

from pyspark import SparkContext
sc = SparkContext("local", "First App")

How Apache SparkContext is Created?

Initially, SparkConf should be made if one has to create SparkContext. The parameter for configuration of Sparkconf is our Spark driver application will pass to SparkContext. The parameters from these, a few are used in defining the properties of driver application in Spark.

And the other few are utilized in allocating the cluster resources which are the memory size, the number the cores on the worker nodes, used by executors run by the Spark. To put it in a simple way, the Spark context helps in guiding the accession of the Spark cluster. Invoking the text file – textFile, the file sequencing – sequenceFile, and parallelize and a few others can be done after creating the SparkContext object.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,171 ratings)

How Apache Spark Context is created

Parameters:

Profiler_cls Used to do the profiling which is called a custom profiler and the default is pyspark.profiler.BasicProfiler.
JSC Instance of the Java Spark context.
Gateway Install a new JVM or otherwise use the present or existing JVM.
Serializer Serialiser of RDD.
batchSize The number of Python objects show cased as a single object in Java. To disable batching, set 1. To automatically choose the batch size based on object sizes, set 0. or to use an unlimited batch size, set -1.
Environment Worker nodes environment variables.
pyFiles PythonPath has an add on of .zip or .py files to send to the cluster.
SparkHome Directory for Spark installation.
appName The job name of particulars.
Master It connects the cluster URL.
Conf To set all the Spark properties, an object of L, that is, Sparkconf, spark configuration is used.

Below represents the data flow of the Spark context:

The Spark context takes Py4J to use and launches a Java virtual machine which further creates a Java Spark context. PySpark has the context in Spark available as sc which is in default. That is the reason why creating a new Spark context will not work.

the data flow of the Spark context

Code:

class pyspark.SparkContext (
master = None,
appName = None,
sparkHome = None,
pyFiles = None,
environment = None,
batchSize = 0,
serializer = PickleSerializer(),
conf = None,
gateway = None,
jsc = None,
profiler_cls = <class 'pyspark.profiler.BasicProfiler'>
)

Example:

Code:

package com.dataflair.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object Word_Count {
def main(args: Array[String]) {
//Configuration for spark context is set and created
val conf = new SparkConf()
.setAppName("WordCount")
// An object for Spark context is created
val sc = new SparkContext(conf)
//Check whether sufficient params are supplied
if (args.length < 2) {
println("Usage: ScalaWordCount <input> <output>")
System.exit(1)
}
//Read file and create RDD
val rawData = sc.textFile(args(0))
//Using flatMap operation, convert the lines into words
val words = rawData.flatMap(line => line.split(" "))
//Using map and reduceByKey operation, count the individual words
val wordCount = words.map(word => (word, 1)).reduceByKey(_ + _)
// result is saved
wordCount.saveAsTextFile(args(1))
// spark context is stopped
sc.stop
}}

Output:

SparkContext

Conclusion

To sum up, Spark helps to simplify the challenging and computationally intensive task of processing high volumes of real-time or archived data, both structured and unstructured, seamlessly integrating relevant complex capabilities such as machine learning and graph algorithms. Spark brings Big Data processing to the masses. Hence, SparkContext provides the various functions in Spark like get the current status of Spark Application, set the configuration, cancel a job, Cancel a stage and much more. It is an entry point to the Spark functionality. Thus, it acts as a backbone.

Recommended Articles

This is a guide to SparkContext. Here we discuss the introduction to SparkContext and how apache SparkContext is created with respective example. You may also have a look at the following articles to learn more –

  1. Spark Accumulator
  2. Spark Parallelize
  3. Spark Functions
  4. Spark Versions
Popular Course in this category
Apache Spark Training (3 Courses)
  3 Online Courses |  13+ Hours |  Verifiable Certificate of Completion |  Lifetime Access
4.5
Price

View Course

Related Courses

PySpark Tutorials (3 Courses)4.9
Apache Storm Training (1 Courses)4.8
0 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more