My DataFrame consists of 2919 rows.
Now, for example I have this column "2ndFlrSF"
2ndFlrSF: Second floor's Area in square feet
and these are the values in it after I run my Pandas command
conc1['2ndFlrSF'].value_counts()
where conc1 is my DataFrame
Output:
0 1668
546 23
728 18
504 17
672 13
600 13
720 13
896 11
886 10
756 9
780 9
862 8
601 7
702 7
840 7
754 6
462 6
676 6
744 6
804 6
630 6
878 6
739 6
567 6
689 6
858 5
741 5
704 5
684 5
678 5
...
605 1
591 1
1150 1
1152 1
1158 1
1160 1
1074 1
1072 1
1066 1
1060 1
956 1
966 1
679 1
980 1
673 1
990 1
992 1
994 1
998 1
1000 1
1004 1
1008 1
661 1
1028 1
659 1
1036 1
1038 1
1042 1
1048 1
1721 1
Name: 2ndFlrSF, Length: 635, dtype: int64
As you can see it's mostly filled with 0's as values which is irrelevant. I have many more such columns. What should I do with such columns? And how should I impute NaN values in them accordingly?
2ndFlrSFcould come from buildings with only 1 floor, so that the value is necessarily 0. If so, this answer should cover your situation, also. If you still have an outstanding question about your particular situation, please edit your question to specify what is still at issue. – EdM Jul 22 '18 at 22:05