0

I have a dataframe called trainingData that contains a normalized vector of features like this:

+--------------------+
|        normFeatures|
+--------------------+
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
+--------------------+
only showing top 10 rows

I want to get the name of the features/attributes that form normFeatures (the features are in trainingData as well)

I have found a similar question for PySpark: How to map features from the output of a VectorAssembler back to the column names in Spark ML?

So I have tried with: println(trainingData.schema("normFeatures").metadata.getMetadata("ml_attr").getMetadata("attrs"))

But the key attrs does not exist: Exception in thread "main" java.util.NoSuchElementException: key not found: attrs

The key ml_attr only contains: {"num_attrs":16936}

Any idea about how to do it?

Marco Bonelli
  • 55,971
  • 20
  • 106
  • 115
rayqz
  • 199
  • 8

0 Answers0