Before learning the difference between Hive and SQL, let me give you a short Introduction to Apache Hive.

Apache Hive is a data processing tool on Hadoop. As our data is stored in Hadoop as HDFS files the main business operations are to query and process the data according to requirements. For this we can anytime use the basic MapReduce programming to fetch or process the data but as you know a simple program in MapReduce programming to fetch the data can consist of 100 lines of code. So to reduce the complexity of coding, Apache introduces Hive because in Hive the same fetching of data can be done in a single line “SELECT” statement.

Initially developed by Facebook and later taken by Apache, Hive is a SQL like querying tool to query the data. Same like we use SQL to query the tables from Relational databases, in Hadoop ecosystem we use Hive or HiveQL to query the data present in HDFS and we call it SQL like querying tool because almost all of the commands and queries in Hive have similar syntax and standards as of Relational SQL.

Check out my full post on What is Hive or visit Hive documentation official link

Apache Hive Vs SQL

Since I have mentioned the similarity of Apache HIVE with SQL people usually gets confused thinking Hive as a pure database but there are some key differences in both.         

HIVE
SQL
HIVESQL
Hive is not a database as it does not hold the physical data. Our actual data is stored in HDFS and not in Hive. Hive is just a tool sitting on top of Hadoop/HDFS to query the data as per user demand.SQL is based on a relational database model and it uses a pure database in which the physical data actually resides in tables and it adheres to all the RDBMS and ACID rules.
Apache Hive is built on the concept of ‘write-once and ready-many’. Once the data is loaded/written in Hive tables it can be only added, we do not change it. All Hive queries are based on fetching the relevant data according to requirements.RDBMS is designed for ‘write-many and read-many’ times. We can write/update data as many times we want. SQL queries are based on both writing/updating data and fetching the data.
Hive does not allow row level Insert, Update and Delete commands in its tables. (Note: From Hive 0.14 version row level commands are allowed but they are still not very efficient and moreover these commands support limited file formats).In SQL, programmers spent most of the time writing Insert, update and delete queries. Actually it was designed for row level operations only.
It is easily scalable and that too at very low cost.Not easy to scale up and also is costly.
It was designed to support OLAP systems.It was designed primarily to support OLTP systems.
Hive is used to analyze static big data where throughput is concerned and not latency.SQL is used to analyze dynamic data where latency is more concerned than throughput.

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published.