I can't using Hyperloglog in Spark java

Question

I try using Hyperloglog function but it's not working

I added dependendcy to file pom.xml

    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.2.1</version>
      <scope>provided</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.scala-lang/scala-library -->
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.12.10</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.swoop/spark-alchemy -->
    <dependency>
      <groupId>com.swoop</groupId>
      <artifactId>spark-alchemy_2.12</artifactId>
      <version>1.2.0</version>
    </dependency>

When build file jar mvn clean package and submit it to Spark, it returns error as image

This is my code example

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

import static com.swoop.alchemy.spark.expressions.hll.functions.hll_init;

public class Test{
    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder().appName("Test").master("local").getOrCreate();

        spark.conf().set("com.swoop.alchemy.hll.implementation", "AGGREGATE_KNOWLEDGE");
//        HLLFunctionRegistration.registerFunctions(spark);
        Dataset<Row> df = spark.read().format("csv").option("header", true).load("path");

        df = df.withColumn("user_id_hll", hll_init("user_id"));

        df.printSchema();

        df.show(false);
    }

}

Who can help me fix this bug??

Thanks everyone!!!

I resolved the problem. You can build your project with jar using maven plugin You can refer here: https://stackoverflow.com/questions/574594/how-can-i-create-an-executable-jar-with-dependencies-using-maven — Darknesss, May 24 '22 at 14:35

I can't using Hyperloglog in Spark java

0 Answers0