Let's say I want to investigate if there is any correlation between the response (continuous) variable individual_fish_size and the three explanatory (continuous) variables Depth, substrate and Temperature. But as my fishes come from several different species and the distribution of these species along the three variables is certainly not random, I'd like to include the categorical variable Fish.species to get rid of the variance due to difference of size along fish.species.
Once I ran such a model, does it make sense to do a selection (a backward selection for example) of the best formula by AIC (This can be realized with the function step in R)? Or would we do better to avoid such a thing because we might lose the part of the variance that is explained by fish.species.