1

What is wrong with this usage of first? I want to take the first row for each id in my dataframe, however it returns an error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Could not resolve window function 'first_value'. Note that, using window functions currently requires a HiveContext;

The code is:

WindowSpec window = Window.partitionBy(df.col("id"));
df= df.select(first(df.col("*")).over(window));

I am using a HiveContext.

lte__
  • 6,325
  • 16
  • 66
  • 122
  • Can you - for tests - try following code: `WindowSpec window = Window.partitionBy(df.col("id")); df= df.select(first(df.col("id")).over(window));` It's possible that window function cannot be used with * – T. Gawęda Sep 09 '16 at 10:27

1 Answers1

-1

Did you read/create your spark dataframe with SparkContext or HiveContext? Window functions require HiveContext to be used

More detail here: Window function is not working on Pyspark sqlcontext

Community
  • 1
  • 1
phi
  • 8,181
  • 3
  • 18
  • 25