Get attribute names from normalized vector of features - Spark Scala

Asked Nov 21 '21 at 00:09

Active Nov 26 '21 at 19:36

Viewed 77 times

I have a dataframe called trainingData that contains a normalized vector of features like this:

+--------------------+
|        normFeatures|
+--------------------+
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
+--------------------+
only showing top 10 rows

I want to get the name of the features/attributes that form normFeatures (the features are in trainingData as well)

I have found a similar question for PySpark: How to map features from the output of a VectorAssembler back to the column names in Spark ML?

So I have tried with: println(trainingData.schema("normFeatures").metadata.getMetadata("ml_attr").getMetadata("attrs"))

But the key attrs does not exist: Exception in thread "main" java.util.NoSuchElementException: key not found: attrs

The key ml_attr only contains: {"num_attrs":16936}

Any idea about how to do it?