0

I have written a Java Spark SQL UDF as below.

import org.apache.spark.sql.api.java.UDF1;
public class LowerCase_UDF implements UDF1<String,String> 
{
    @Override
    public String call(String t1) throws Exception 
    {   
        String output="";
        output=t1.toLowerCase();
        return output;
    }
}

What is the process to register this function in spark? If I run sqlContext.udf.register("LowerCaseUDF", call), it throws an exception "error: notfound: value call"

I have added the jar file generated to the spark-client/lib folder. But it does not seem to work. We want the function to be in Java for certain reasons. Any help on this will be appreciated. Thank you

himanshuIIITian
  • 5,699
  • 4
  • 48
  • 65

1 Answers1

0

To register a UDF in Spark SQL using Java, you can use the following code:

sparkSession.udf().register("lowercase_udf", new LowerCase_UDF(), DataTypes.StringType);

And then you can use it like this:

dataset.withColumn("lower", functions.callUDF("lowercase_udf", functions.col("value")));

This will give you output something like this:

+--------+-------+
|value   |lower  |
+--------+-------+
|Michael |michael|
|Andy    |andy   |
|Justin  |justin |
+--------+-------+

I hope it helps!

himanshuIIITian
  • 5,699
  • 4
  • 48
  • 65
  • @himanshullTian, could you please help on this issue https://stackoverflow.com/questions/55332897/how-to-add-new-column-to-datasetrow-using-map-function-on-the-dataset – Pyd Mar 25 '19 at 13:38