3

I'm following an example for PCA analysis in Spark 3.0.0, using Scala 2.12.10. I'm having trouble understanding some of the nuances of Scala and I'm quite new to programming in Scala.

After defining the data as such:

val data = Array(
            Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
            Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
            Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0)
        )

the dataframe is created as such:

val df = spark.createDataFrame(data.map(Tuple1.apply)).toDF("features")

My question is: what does data.map(Tuple1.apply) do? I guess what bugs me is the fact apply doesn't have arguments.

Thank you in advance! Perhaps someone can also recommend me a good beginner Scala / Spark book so my questions can be better ones in the future?

MDSvensson
  • 33
  • 6

1 Answers1

2

It makes a Tuple of 1 element that the toDF can use as input to create a Dataframe with 1 column of type vector. That's all, but very handy.

Some references https://mungingdata.com/apache-spark/best-books/. I found the Databricks courses too simple and omitting relevant aspects. Some good sites exist: https://sparkbyexamples.com/ https://www.waitingforcode.com/ This latter offers a good course at little cost.

On Scala apply there is also an excellent answer on SO What is the apply function in Scala?

thebluephantom
  • 14,410
  • 8
  • 36
  • 67
  • Thank you for your answer. Can you elaborate though? Focusing on the Tuple1.apply, can we go over that part? So the .map maps every element of data, the Vectors, to 1 element tuples? I've tried looking for examples of .apply but wasn't very successful. There is a lot to learn about scala. Any good books I could use? I'm using "Scala Programming for Beginners" by Ray Yao but some concepts appear to be missing there based on what I find on Stack Overflow – MDSvensson Aug 27 '20 at 11:44
  • Well, it has to do with Product. – thebluephantom Aug 27 '20 at 12:00
  • Thanks :) Scala is surely not the easiest language but it is rewarding! – MDSvensson Aug 27 '20 at 18:03
  • There is pure scala and scala with spark – thebluephantom Aug 27 '20 at 19:34