I have a dataframe called trainingData that contains a normalized vector of features like this:
+--------------------+
| normFeatures|
+--------------------+
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
|(16936,[0,1,2,3,9...|
+--------------------+
only showing top 10 rows
I want to get the name of the features/attributes that form normFeatures (the features are in trainingData as well)
I have found a similar question for PySpark: How to map features from the output of a VectorAssembler back to the column names in Spark ML?
So I have tried with: println(trainingData.schema("normFeatures").metadata.getMetadata("ml_attr").getMetadata("attrs"))
But the key attrs does not exist: Exception in thread "main" java.util.NoSuchElementException: key not found: attrs
The key ml_attr only contains: {"num_attrs":16936}
Any idea about how to do it?