Updated March 22, 2023
Introduction to Hive Order By
Hive is a data warehousing facility provided by Apache. Hive is built on top of the Hadoop Distributed File System (HDFS) to write, read, querying, and manage large structured or semi-structured data in distributed storage systems such as HDFS. Each HiveQL will be converted to a MapReduce job in the backend. Hive provides Hive Query Language (HiveQL) that is like normal SQL in RDBMS. Like SQL, HiveQL also provides the ORDER BY clause which can be used with the SELECT clause. ORDER BY function is used to sort data either in ascending or descending order on columns specified by the user.
Syntax in Hive Order By
Below is the syntax in hive order by:
Basic Syntax:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name>;
ORDER BY clause along with all other Clauses:
SELECT DISTINCT <column name1>, <column name2>, <column name3> FROM <table name> WHERE <where condition> GROUP BY <column names…> HAVING <having conditions> ORDER BY <column names> <ASC / DESC> LIMIT <limiting number>;
You may also specify ORDER BY 1 or ORDER BY 2 where 1 and 2 represent the column number instead of using ORDER BY <column names>. ORDER BY <column number> can only be used when column names are specified in the SELECT clause and not with the SELECT * clause.
How can we Use ORDER BY Function in Hive?
Below explanation shows that how can we use order by function in a hive:
1. ORDER BY Ascending and Descending
- By default, the sorting order will be ascending, which shows the smallest value in the beginning and the largest value at the end of the result, even if you do not specify it explicitly.
- You can also specify ORDER BY <column names> ASC for ascending order and ORDER BY <column name> DESC for sorting the result in descending order or the specified column.
- The HiveQL syntax for ODER BY ascending order can be shown as below:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name> ASC;
- The HiveQL syntax for ODER BY descending order can be represented as below:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name> DESC;
- Ascending and descending clauses can be simultaneously used with multiple columns in the ORDER BY clause, as shown below:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name1> ASC <column name2> DESC;
2. ORDER BY and NULL Values
- The latest versions of Hive supports sorting order for NULL value also.
- By default, the sorting order of NULL values for ORDER BY <column name> ASC is NULLS FIRST. It sorts all the NULL values to the beginning of the sorted result.
- Similarly, the sorting order of NULL values for ORDER BY <column names> DESC is NULLS LAST by default. It sorts all the NULL values to the end of the sorted result.
- You can also specify NULLS FIRST, and NULLS LAST and ORDER BY <column name> ASC or with ORDER BY <column name> DESC according to your requirements convenience.
- Syntax of NULLS FIRST clause along with ORDER BY ascending:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name> ASC NULLS LAST;
- Syntax of NULLS LAST clause along with ORDER BY descending:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name> DESC NULLS FIRST;
3. ORDER BY and LIMIT Clause
- LIMIT clause is optional with the ORDER BY clause.
- LIMIT clause can be used to improve the performance. LIMIT clause can be used to avoid unnecessary data processing.
- LIMIT clause will select only the limited number of values from the result for the verification purpose.
- LIMIT clause with ORDER By clause can be shown as below:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name> ASC LIMIT 10;
- ORDER BY clause can also be combined with the OFFSET clause and LIMIT clause to reduce the result’s quantity. Syntax of the same can be shown as:
SELECT <column name1>,<column name2> FROM <table name> ORDER BY <column name> ASC LIMIT 10 OFFSET 10;
Examples in Hive Order By
Consider the following sample table Employee which contains Employee Id as EmpID, Employee name as EmpName, Designation, Department as Dept, Job Level as JL, and Salary.
Example #1
Code:
SELECT * FROM Employee ORDER BY JL ASC;
Output:
Example #2
Code:
SELECT * FROM Employee ORDER BY Salary DESC LIMIT 3;
Output:
Example #3
Code:
SELECT EmpId, EmpName, Designation, Dept FROM Employee where Salary < 50000 ORDER BY EmpName ASC JL ASC;
Output:
Conclusion
ORDER BY in Hive allows you to sort data in either ascending or descending order. ORDER BY can be combined with other clauses to get the sorted data. ORDER BY is different from the SORT BY as SORT BY sorts the data within a reducer but ORDER BY sorts the entire data.
Recommended Articles
This is a guide to Hive Order By. Here we discuss How we can use order by function in a hive, and different examples with codes and outputs. You can also go through our other related articles to learn more –