Recently I did a school project where instead of predicting height as a continuous value, we turned it into three categories of 'tall', 'average' and 'short' and predicted it via logistic regression with some other, mostly categorical, variables. We were told to use first quartile of '175' and third of '185' to divide height into categories, as height follows normal distribution from 170cm to 200cm. However, once I plotted the data we were given, it doesn't seem to fit normal distribution, as vast majority of samples are outside of what should be the bell curve.
My question is this: when analyzing problems like these and dividing into categories, should you use the data from the actual dataset you are observing or if it is something that should follow normal distribution, should you assume normal distribution values for things like mean and quartiles in order to get a results that match the real world?
I apologize if my question seems stupid, I am pretty new to data science, and this task confused me...