0

Im trying creating a function in python to replace any forms of NaN to NaN.

import pandas as pd
import numpy as np

data=pd.read_csv("diabetes.csv")

def proc_all_NaN(data):
    nan_sym=["_","-","?","","na","n/a"]
    for i in nan_sym:
        data.replace(i,np.nan)

proc_all_NaN(data)

I expect the output of my fuction to be a dataframe with NaN where the dataframe had all these types of NaN: "_","-","?","","na","n/a".

The output when i call the function is just my data without any change.

Could you help me, because i dont get my coding mistake

Willem Van Onsem
  • 397,926
  • 29
  • 362
  • 485
Hapanas
  • 3
  • 1

1 Answers1

1

You can define the type of null values when you read the file using pd.read_csv(). Per the docs:

na_values : scalar, str, list-like, or dict, optional Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘’, ‘#N/A’, ‘#N/A N/A’, ‘#NA’, ‘-1.#IND’, ‘-1.#QNAN’, ‘-NaN’, ‘-nan’, ‘1.#IND’, ‘1.#QNAN’, ‘N/A’, ‘NA’, ‘NULL’, ‘NaN’, ‘n/a’, ‘nan’, ‘null’.

In your case, you can try:

data=pd.read_csv("diabetes.csv", na_values=["_","-","?","","na","n/a"])
calestini
  • 3,354
  • 6
  • 20
  • 31