168

I tried df.orderBy("col1").show(10) but it sorted in ascending order. df.sort("col1").show(10) also sorts in ascending order. I looked on stackoverflow and the answers I found were all outdated or referred to RDDs. I'd like to use the native dataframe in spark.

Vedom
  • 2,887
  • 3
  • 13
  • 16

6 Answers6

251

You can also sort the column by importing the spark sql functions

import org.apache.spark.sql.functions._
df.orderBy(asc("col1"))

Or

import org.apache.spark.sql.functions._
df.sort(desc("col1"))

importing sqlContext.implicits._

import sqlContext.implicits._
df.orderBy($"col1".desc)

Or

import sqlContext.implicits._
df.sort($"col1".desc)
Gabber
  • 7,069
  • 3
  • 29
  • 44
  • also when you're ordering ascending by all columns, the `asc` keyword is not necessary: `..orderBy("col1", "col2")`. – Dan Mar 04 '20 at 20:03
109

It's in org.apache.spark.sql.DataFrame for sort method:

df.sort($"col1", $"col2".desc)

Note $ and .desc inside sort for the column to sort the results by.

Sky
  • 2,351
  • 1
  • 18
  • 26
Vedom
  • 2,887
  • 3
  • 13
  • 16
  • 5
    `import org.apache.spark.sql.functions._` and `import sqlContext.implicits._` also get you a lot of nice functionality. – David Griffin May 19 '15 at 18:14
  • 6
    @Vedom: Shows a syntax error: `df.sort($"Time1", $"Time2".desc) SyntaxError: invalid syntax` at the $ symbol – kavya Sep 07 '16 at 07:28
  • @kaks, need to import functions/implicits as described above to avoid that error – Rimer Nov 01 '17 at 14:01
66

PySpark only

I came across this post when looking to do the same in PySpark. The easiest way is to just add the parameter ascending=False:

df.orderBy("col1", ascending=False).show(10)

Reference: http://spark.apache.org/docs/2.1.0/api/python/pyspark.sql.html#pyspark.sql.DataFrame.orderBy

Nic Scozzaro
  • 5,117
  • 1
  • 31
  • 42
19
import org.apache.spark.sql.functions.desc

df.orderBy(desc("columnname1"),desc("columnname2"),asc("columnname3"))
Paul Reiners
  • 9,480
  • 31
  • 110
  • 189
Nitya Yekkirala
  • 225
  • 2
  • 3
7
df.sort($"ColumnName".desc).show()
OneCricketeer
  • 151,199
  • 17
  • 111
  • 216
Nilesh Shinde
  • 439
  • 5
  • 9
3

In the case of Java:

If we use DataFrames, while applying joins (here Inner join), we can sort (in ASC) after selecting distinct elements in each DF as:

Dataset<Row> d1 = e_data.distinct().join(s_data.distinct(), "e_id").orderBy("salary");

where e_id is the column on which join is applied while sorted by salary in ASC.

Also, we can use Spark SQL as:

SQLContext sqlCtx = spark.sqlContext();
sqlCtx.sql("select * from global_temp.salary order by salary desc").show();

where

  • spark  -> SparkSession
  • salary -> GlobalTemp View.
zx485
  • 26,827
  • 28
  • 51
  • 55
RPaul
  • 151
  • 5