EDUCBA

EDUCBA

MENUMENU
  • Free Tutorials
  • Free Courses
  • Certification Courses
  • 360+ Courses All in One Bundle
  • Login
Home Data Science Data Science Tutorials Spark Tutorial Spark Thrift Server
Secondary Sidebar
Spark Tutorial
  • Basics
    • What is Apache Spark
    • Career in Spark
    • Spark Commands
    • How to Install Spark
    • Spark Versions
    • Apache Spark Architecture
    • Spark Tools
    • Spark Shell Commands
    • Spark Functions
    • RDD in Spark
    • Spark DataFrame
    • Spark Dataset
    • Spark Components
    • Apache Spark (Guide)
    • Spark Stages
    • Spark Streaming
    • Spark Parallelize
    • Spark Transformations
    • Spark Repartition
    • Spark Shuffle
    • Spark Parquet
    • Spark Submit
    • Spark YARN
    • SparkContext
    • Spark Cluster
    • Spark SQL Dataframe
    • Join in Spark SQL
    • What is RDD
    • Spark RDD Operations
    • Spark Broadcast
    • Spark?Executor
    • Spark flatMap
    • Spark Thrift Server
    • Spark Accumulator
    • Spark web UI
    • Spark Interview Questions
  • PySpark
    • PySpark version
    • PySpark Cheat Sheet
    • PySpark list to dataframe
    • PySpark MLlib
    • PySpark RDD
    • PySpark Write CSV
    • PySpark Orderby
    • PySpark Union DataFrame
    • PySpark apply function to column
    • PySpark Count
    • PySpark GroupBy Sum
    • PySpark AGG
    • PySpark Select Columns
    • PySpark withColumn
    • PySpark Median
    • PySpark toDF
    • PySpark partitionBy
    • PySpark join two dataframes
    • PySpark?foreach
    • PySpark when
    • PySPark Groupby
    • PySpark OrderBy Descending
    • PySpark GroupBy Count
    • PySpark Window Functions
    • PySpark Round
    • PySpark substring
    • PySpark Filter
    • PySpark Union
    • PySpark Map
    • PySpark SQL
    • PySpark Histogram
    • PySpark row
    • PySpark rename column
    • PySpark Coalesce
    • PySpark parallelize
    • PySpark read parquet
    • PySpark Join
    • PySpark Left Join
    • PySpark Alias
    • PySpark Column to List
    • PySpark structtype
    • PySpark Broadcast Join
    • PySpark Lag
    • PySpark count distinct
    • PySpark pivot
    • PySpark explode
    • PySpark Repartition
    • PySpark SQL Types
    • PySpark Logistic Regression
    • PySpark mappartitions
    • PySpark collect
    • PySpark Create DataFrame from List
    • PySpark TimeStamp
    • PySpark FlatMap
    • PySpark withColumnRenamed
    • PySpark Sort
    • PySpark to_Date
    • PySpark kmeans
    • PySpark LIKE
    • PySpark?groupby multiple columns

Related Courses

Spark Certification Course

PySpark Certification Course

Apache Storm Course

Spark Thrift Server

By Priya PedamkarPriya Pedamkar

Spark Thrift Server

Introduction to Spark Thrift Server

Spark SQL Thrift server is a port of Apache Hive’s HiverServer2 which allows the clients of JDBC or ODBC to execute queries of SQL over their respective protocols on Spark. This is a standalone application that is used by starting start-thrift server.sh and ending it through a stop-thrift server.sh scripts of the shell. Also, it extends the command lines of spark-submit. Let us see how to use Spark as an SQL engine of cloud-base and also, exposing big data as ODBC or JDBC data source through the help of Spark Thrift Server.

How Spark Thrift Server in SQL Works

We can use Spark as an SQL engine of cloud-base and also, exposing big data as ODBC or JDBC data source through the help of Spark Thrift Server. Evolution of traditional database relational engines like SQL has been happening since due to their scalability problems and a couple of SQL frameworks of Hadoop like Cloudier Impala, Presto, Hive, etc. Frameworks like these are basically cloud-based solutions and each of those has its own limitations. These limitations will be listed further in the article below. Using the distributed SQL search engine, the spark, we can expose data in two forms. Those of which are – Permanent table and In-memory table. Let us dive in further.

In – memory table

In this type of data exposure, the in hive’s data storage of in-memory is in columnar format which is hive context and is scoped to the cluster. This makes it faster to access, therefore latency is lowered, Usage of Spark is done for memory caching.

Physical, Permanent memory

Using the Parquet format, data is stored in S3. Pushing data from multiple sources into the Spark and the expose as a table is one of the two methods mentioned above. In any of the two ways, these tables are made accessible as data sources – ODBC OR JDBC with the help of the Spark thrift server. There are multiple clients. Few of those are – CLI, Beeline, ODBC, JDBC, also a few business intelligence tools such as tableau. And visualization of data can be done by connecting these availabilities to thrift servers.

Start Your Free Data Science Course

Hadoop, Data Science, Statistics & others

The below picture shows the above-mentioned illustration.

Spark Thrift server

Spark thrift server CHART

Do you know HiveServer2 thrift?

Spark thrift server is highly familiar with Hive server2 thrift. However, there are differences such as MapReduce of hive job and were as Spark Thrift server uses SQL spark engine for search which is a main useful capability of spark.

All in One Data Science Bundle(360+ Courses, 50+ projects)
Python TutorialMachine LearningAWSArtificial Intelligence
TableauR ProgrammingPowerBIDeep Learning
Price
View Courses
360+ Online Courses | 50+ projects | 1500+ Hours | Verifiable Certificates | Lifetime Access
4.7 (86,171 ratings)

The below mentioned are the different frameworks in SQL – on – Hadoop along with which are mentioned with their drawbacks.

1.    Impala YARN is not supported and the data input must be in the format parquet to completely use its benefits.
2.    Big SQL This is a product of IBM and hence used only for IBM  customers.
3.    Hive Execution of MapReduce for the query execution and hence can be very slow.
4.    Apache drill This is an interactive query of the SQL engine, which is open-source with a resemblance of impala. Stable release has been done years ago. This has more chances of not getting and becoming into an industry standard.
5.    Presto The original hive creators built this framework that is, Facebook. This is a similar approach to Impala. Input must be given in file format. Debugging and deploying must be comforted first to adopt it for production later.

Example to Implement Spark Thrift Server

Below is the example of mentioned:

The input is the source of an HDFS file and is registered as table records with the engine Spark SQL. The input can go with multiple sources. For example – few of the sources include XML, JSON, CSV, and others which are as complex as these in reality. Then to register the final data, we need to perform all kinds of aggregations and computations and then the final data registration is done with the Spark table.

Code:

//Here, a Spark session is created
val sparksess = SparkSession.builder()
.appName(“SQL spark”)
.config("hive.server2.thrift.port", “10000")
.config("spark.sql.hive.thriftServer.singleSession", true)
.enableHiveSupport()
.getOrCreate()
import spark.implicits._
val records = sparksess.read.format(“username").load("here/input.username")
records.createOrReplaceTempView(“records")
spark1.sql("table drop records")
ivNews.write.saveAsTable("records")

The execution of the code can be done in two ways.

First one is bu using a Spark shell:

Giving the Spark shell command will give an interactive shell when you can run all the commands of spark. The below given is the spark-shell command:

spark-shell --conf spark.sql.hive.thriftServer.singleSession=true

By copying and pasting this above code will make you registered with your data

The second one is by using Spark submit:

Do a bundle of the above command line and make or create a file – jar out of that. And then, install mvn clean and build the jar file.

After downloading the repository paste the following code and run it.

//This can be local or nonlocal, It depends on where you run it.
spark-submit MainClass --master<master> <master_file_name_of_jar>

another way to get your data registered with a spark.

Now you have to make your data available to the world. And for this, the Spark automatically shows the tables which are registered as ODBC/JDBC through spark thrift server followed by accessing the data.

Output:

Spark Thrift Server1

Conclusion

We have seen Spark Thrift SQL which was developed from Apache Hive on server 2 and also familiar in operating like the Hiveserver2 thrift server. This is supported by clusters of thrift. Later can be used with tools like Beeline command.

Recommended Articles

This is a guide to Spark Thrift Server. Here we discuss an introduction to Spark Thrift Server, how does it work, and programming example. You can also go through our other related articles to learn more –

  1. Spark Tools
  2. Spark Functions
  3. Spark Components
  4. Spark Versions
Popular Course in this category
Apache Spark Training (3 Courses)
  3 Online Courses |  13+ Hours |  Verifiable Certificate of Completion |  Lifetime Access
4.5
Price

View Course

Related Courses

PySpark Tutorials (3 Courses)4.9
Apache Storm Training (1 Courses)4.8
1 Shares
Share
Tweet
Share
Primary Sidebar
Footer
About Us
  • Blog
  • Who is EDUCBA?
  • Sign Up
  • Live Classes
  • Corporate Training
  • Certificate from Top Institutions
  • Contact Us
  • Verifiable Certificate
  • Reviews
  • Terms and Conditions
  • Privacy Policy
  •  
Apps
  • iPhone & iPad
  • Android
Resources
  • Free Courses
  • Database Management
  • Machine Learning
  • All Tutorials
Certification Courses
  • All Courses
  • Data Science Course - All in One Bundle
  • Machine Learning Course
  • Hadoop Certification Training
  • Cloud Computing Training Course
  • R Programming Course
  • AWS Training Course
  • SAS Training Course

ISO 10004:2018 & ISO 9001:2015 Certified

© 2022 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA
Free Data Science Course

SPSS, Data visualization with Python, Matplotlib Library, Seaborn Package

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA Login

Forgot Password?

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

By signing up, you agree to our Terms of Use and Privacy Policy.

Let’s Get Started

By signing up, you agree to our Terms of Use and Privacy Policy.

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. By closing this banner, scrolling this page, clicking a link or continuing to browse otherwise, you agree to our Privacy Policy

Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more