0

I have a tfidf vector but i want to revert it to get the original sentence. Is this possible?

The vectorization was done with sklearn's TfidfVectorizer. I have access to the original data.

Use Case: I have a bunch of sentences that belong to a cluster. However, i do not have the center of the cluster. If i convert the sentences to thier tfidf representations and divde that array by the number of sentences in that cluster, the resulting value/sequence of values should be/represent the center of the cluster.

tfidf_features = tfidf_vec.transform(list_of_sentences)  
tfidf_features_array = tfidf_features.toarray()
ss = np.sum(tfidf_features_array, axis = 0)
vv = tfidf_features_array.shape[0]
xx = ss/vv
TFIDF_Clust_Center = xx.reshape(1, -1)

I then want to see what sentence or what collection of words are the center of the cluster by reversing the tfidf vector

[0.695, 0.789, 0.183, ... ] --> "The quick brown fox..." 

How do i acheive this?

KoKo
  • 199
  • 3
  • 21

0 Answers0