0

I am working with a dataset which has 34 features (numerical, nominal) and the target class. Several of the columns have missing values, especially one column has approximately 50% missing values.

I have not been concerning, because in R Naive Bayes works perfect no matter missing values or type of features, but since I read in scikit docs that Naive Bayes cannot handle mixed data and Missing values, I concern.

I want to ask you if hopefully is any library in python that works exactly like NaiveBayes in R, or what can I do to run Naive Bayes for mixed features having also missing values.

Panos
  • 1
  • 1
  • Welcome to CV.SE. 1. Can you please point us to the R Naive Bayes implementation you are using? 2. There is a chance that R is simply dropping certain rows. 3. A lot of algorithms use "numerical only" data, i.e. even categorical data are just encoded in numerical form, a common way is using indicator variables, you may want to invest time to ensure you encode your data appropriately and not leave it to the inner workings of a particular method implementation.
  • – usεr11852 Apr 03 '22 at 23:16
  • Hello, maybe i must redefine my issue. After recalling the university excercises in R, I found that we did not use mixed Naive bayes, but only with categorical data AND NO missing values. I missunderstood the word missing values with the word zero Conditional probability of an attribute value given a class label. So now iam trying to reproduce an NB algorithm from a paper, and reproduce the results. They used WEKA, they dont define the problem of mixed and missing values for the evaluation of the given dataset. So basically i must learn what WEKA NB does with missing and mixed data – Panos Apr 04 '22 at 20:02