I am currently working on a logistic regression analysis and have encountered a situation where I have approximately 16 million data points. I am interested in understanding the influence of such a large dataset on the fit of the logistic regression model.
The data I have is highly skewed, with one class being significantly underrepresented compared to the other. Additionally, I have been using McFadden's pseudo-$R^2$ as a measure of goodness of fit for my logistic regression model.
My question is: How does the volume of data, especially in the context of skewed data, impact the fit of a logistic regression model? I always believed that having more data would improve the model's performance, but my teacher hinted that there might be other considerations.