What's the minimum size of a corpus that you need to cover substantially the grammar of a language?
I know that the limits of 'substantial' might be open to speculation. But imagine you wanted to describe the grammar of a speaker at the C2 level (from the Common European Framework of Reference for Languages), or, if you wish, full professional proficiency level. That is, the level of complexity you expect from an educated speaker.
This would, obviously dismiss texts that illustrate regional varieties, or structures that have become out of use (like 'must needs').