Is it possible to ignore null values when using lead window function in Spark
It is not.
I would like to get most recent value but ignoring null
Just use last (or first) with ignoreNulls:
def last(columnName: String, ignoreNulls: Boolean): Column
Aggregate function: returns the last value of the column in a group.
The function by default returns the last values it sees. It will return the last non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val df = Seq(
(1, Some(100), 2017), (1, None, 2016), (1, Some(20), 2015),
(1, Some(100), 2014)
).toDF("id", "value", "date")
df.withColumn(
"last_value",
last("value", true).over(Window.partitionBy("id").orderBy("date"))
).show
+---+-----+----+----------+
| id|value|date|last_value|
+---+-----+----+----------+
| 1| 100|2014| 100|
| 1| 20|2015| 20|
| 1| null|2016| 20|
| 1| 100|2017| 100|
+---+-----+----+----------+