How can I use Word Embeddings with Naive Bayes to get most important feature in text classification?

Question

Let's say I have a corpus of news articles, for which I want to classify into various topics (e.g. "Entertainment", "Tech", "Science") and obtain the most predictive words under each topic.

The most straightforward way to do this is to TF-IDF the words, and use a Naive Bayes model to do the classification, and thereafter obtain the top most predictive words under each topic with their associated probability P(Word|Topic).

My question is, can I do this using Word Embedding with Naive Bayes, instead of TF-IDF with Naive Bayes? I'd come across a few articles elaborating how to do text classifications with Word Embedding, such as using the word vectors as weights for a deep learning model, or this suggesting to take a coordinate mean/min/max of the word vectors for each document to do the classification. However, in these examples, the explainability of the word's importance under each topic is lost.

I really do prefer to just use Naive Bayes given that its feature importance is provided in the form of P(Word|Topic), which is much easier to explain to the lay person than deep learning model weights or even coefficients. Accuracy matters less than explainability here. However, I just wonder if Naive Bayes can be paired with Word Embedding to push up the accuracy and still retain the same level of explainability.

How can I use Word Embeddings with Naive Bayes to get most important feature in text classification?

0 Answers0