I try using Hyperloglog function but it's not working
I added dependendcy to file pom.xml
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.2.1</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.12.10</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.swoop/spark-alchemy -->
<dependency>
<groupId>com.swoop</groupId>
<artifactId>spark-alchemy_2.12</artifactId>
<version>1.2.0</version>
</dependency>
When build file jar mvn clean package and submit it to Spark, it returns error as image
This is my code example
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import static com.swoop.alchemy.spark.expressions.hll.functions.hll_init;
public class Test{
public static void main(String[] args) {
SparkSession spark = SparkSession.builder().appName("Test").master("local").getOrCreate();
spark.conf().set("com.swoop.alchemy.hll.implementation", "AGGREGATE_KNOWLEDGE");
// HLLFunctionRegistration.registerFunctions(spark);
Dataset<Row> df = spark.read().format("csv").option("header", true).load("path");
df = df.withColumn("user_id_hll", hll_init("user_id"));
df.printSchema();
df.show(false);
}
}
Who can help me fix this bug??
Thanks everyone!!!