EDUCBA Logo

EDUCBA

MENUMENU
  • Explore
    • EDUCBA Pro
    • PRO Bundles
    • Featured Skills
    • New & Trending
    • Fresh Entries
    • Finance
    • Data Science
    • Programming and Dev
    • Excel
    • Marketing
    • HR
    • PDP
    • VFX and Design
    • Project Management
    • Exam Prep
    • All Courses
  • Blog
  • Enterprise
  • Free Courses
  • Log in
  • Sign Up
Home Data Science Data Science Tutorials Head to Head Differences Tutorial Spark SQL vs Presto
 

Spark SQL vs Presto

Priya Pedamkar
Article byPriya Pedamkar

Differences Between Spark SQL vs Presto

Presto, in simple terms, is the ‘SQL Query Engine,’ initially developed for Apache Hadoop. It’s an open-source distributed SQL query engine designed for running interactive analytic queries against data sets of all sizes. Spark SQL is a distributed in-memory computation engine with a SQL layer on top of structured and semi-structured data sets. Since it’s in-memory processing, the processing will be fast in Spark SQL.

Head to Head Comparison Between Spark SQL and Presto (Infographics)

Below are the Top 7 comparisons between Spark SQL and Presto:

 

 

Spark SQL vs Presto

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.

Key Differences Between Spark SQL and Presto

Below is the list of the critical difference between Presto and Spark SQL:

  • Apache Spark introduces a programming module for processing structured data called Spark SQL. Spark SQL includes an encoding abstraction called Data Frame which can act as distributed SQL query engine.
  • The motive behind the beginning of Presto was to enable interactive analytics and approaches to the speed of commercial data warehouses with the power to scale the size of organizations matching Facebook.
  • Whereas Spark SQL is a component on top of Spark Core that introduces a new data abstraction called SchemaRDD (Resilient Distributed Datasets), it supports structured/semi-structured data.
  • Presto was designed as an alternative to tools that query HDFS data using MapReduce jobs such as Hive or Pig, but Presto is not limited to HDFS.
  • Spark SQL follows in-memory processing that increases the processing speed. Spark is designed to process various workloads, such as batch queries, iterative algorithms, interactive queries, streaming, etc.
  • Presto is capable of executing federative queries. Below is an example of Presto Federated Queries

Let us assume any RDBMS with table sample1

And HIVE with table sample2,

‘Testdb’ is the database in both hive and MYSQL. Using Presto, we can evaluate data using a single query once their connectors are configured correctly, as shown below-

presto> <Function (select/Group by ..etc.)> hive.Testdb.sample2

Function (select/Group by ..etc.)>mysql.Testdb.sample1

  • Spark SQL architecture consists of Spark SQL, Schema RDD, and Data Frame.
    • A Data Frame is a collection of data; the data is organized into named columns. Technically, it is the same as relational database tables.
    • Schema RDD: Spark Core contains a unique data structure called RDD. Spark SQL works on schemas, tables, and records. Therefore, a user can use the Schema RDD as a temporary table. So that user can call this Schema RDD a Data Frame
  • Data Frame Capabilities: Data frame process the data in the size of Kilobytes to Petabytes on a single node cluster to multiple node clusters,
  • Data Frame supports different data formats ( CSV, elastic search, Cassandra, etc.) and storage systems (HDFS, HIVE tables, MySQL, etc.); it can be integrated with all Big Data tools/frameworks via Spark-Core and provides API for languages such as Python, Java, Scala, and R Programming.
  • Whereas Presto is a distributed engine that works on a cluster setup. Presto architecture is simple to understand and extensible. Presto client (CLI) submits SQL statements to a master daemon coordinator, who manages the processing.
  • Companies using Presto: Facebook, Netflix, Airbnb, Dropbox,, etc.
  • Apache Spark Use Cases can be found in Industries like Finance, Retail, Healthcare, Travel,, etc. Many e-commerce websites like eBay, Alibaba, and Pinterest use Spark SQL to analyze hundreds of petabytes of data on their e-commerce platform.

Comparison Table of Spark SQL vs Presto

Below is the topmost comparison between SQL vs Presto.

 Basis of comparison Presto Spark SQL
Eco-Systems / Platforms  Hadoop, Big Data Processing, etc  Spark Framework, Big Data Processing, etc
Purpose Presto is designed for running SQL queries over Big Data (Huge workloads).
It was designed by Facebook to process their huge workloads.
Spark SQL is one of the components of Apache Spark Core.
Spark Core is the fundamental execution engine for the spark platform
Set up
  • Presto is a distributed SQL query engine for processing pet bytes of data, and it runs on a cluster-like setup with a set of machines.
  • A full Presto cluster setup includes a coordinator (Manager Node) and multiple workers. The user submits the queries from a client, which is the Presto CLI, to the coordinator. The coordinator parses, analyzes, and plans the query execution, and then distributes the query processing to the workers.
  • Spark SQL setup will be out of the box if you install and configure Apache Spark Cluster.
  • Apache Spark is Hadoop’s sub-project.
  • Apaches Spark is a cluster-based Big Data processing technology designed for fast computation.
Capabilities/Features Presto allows data querying over many data sources; For example, Data might be residing in data stores: Hive, Cassandra, RDBMS, and some other proprietary data stores. Spark SQL gives flexibility in integration with other data sources using the data frames and JDBC connectors.
Support for Connectors Presto supports pluggable connectors. These connectors provide data sets for queries.

Below are several pre-existing connectors available in Presto, while Presto provides the ability to connect with custom connectors, as well.
Below are some of the connectors it supports.

  • Hadoop/Hive
  • Cassandra
  • Teradata
  • PostgreSQL
  • Oracle etc

 

A Data Frame interface allows different Data Sources to work on Spark SQL.
Spark SQL includes a server mode with industry-standard JDBC and ODBC connectivity.
Federated Queries Presto supports the Federated Queries. Presto can be configured to connect with different DBs, and once configured, its CLI can be used to launch ‘Federated Queries’.
In one Presto query user can combine data from multiple data sources and run the query.
Spark SQL comes with an inbuilt feature to connect with other databases using JDBC, that is, “JDBC to other Databases,” which aids in the federation feature.
Spark creates the data frames using the JDBC: database feature by leveraging scala/python API. Still, it also works directly with the Spark SQL Thrift server and allows users to query external JDBC tables effortlessly like other hive/spark tables.
Who Uses? Data Analysts, Data Engineers, Data Scientists, etc Data Analysts, Data Engineers, Data Scientists, Spark Developer, etc

Conclusions

Presto is very helpful regarding BI-type queries, and Spark SQL leads performance-wise in large analytics queries. When comparing with respect to configuration, Presto set up easy than Spark SQL. Both Spark SQL and Presto are standing equally in the market and solving different kinds of business problems.

Recommended Articles

We hope that this EDUCBA information on “Spark SQL vs Presto” was beneficial to you. You can view EDUCBA’s recommended articles for more information.

  1. Apache Spark vs Apache Flink – 8 Useful Things You Need To Know
  2. Apache Hive vs Apache Spark SQL – 13 Amazing Differences
  3. Best 6 Comparisons Between Hadoop Vs SQL
  4. Hadoop vs Teradata -Valuable Differences

Primary Sidebar

Footer

Follow us!
  • EDUCBA FacebookEDUCBA TwitterEDUCBA LinkedINEDUCBA Instagram
  • EDUCBA YoutubeEDUCBA CourseraEDUCBA Udemy
APPS
EDUCBA Android AppEDUCBA iOS App
Blog
  • Blog
  • Free Tutorials
  • About us
  • Contact us
  • Log in
Courses
  • Enterprise Solutions
  • Free Courses
  • Explore Programs
  • All Courses
  • All in One Bundles
  • Sign up
Email
  • [email protected]

ISO 10004:2018 & ISO 9001:2015 Certified

© 2025 - EDUCBA. ALL RIGHTS RESERVED. THE CERTIFICATION NAMES ARE THE TRADEMARKS OF THEIR RESPECTIVE OWNERS.

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
Loading . . .
Quiz
Question:

Answer:

Quiz Result
Total QuestionsCorrect AnswersWrong AnswersPercentage

Explore 1000+ varieties of Mock tests View more

EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA
Free Data Science Course

Hadoop, Data Science, Statistics & others

By continuing above step, you agree to our Terms of Use and Privacy Policy.
*Please provide your correct email id. Login details for this Free course will be emailed to you
EDUCBA

*Please provide your correct email id. Login details for this Free course will be emailed to you

EDUCBA Login

Forgot Password?

🚀 Limited Time Offer! - 🎁 ENROLL NOW