The test question is poorly written. There can be a distinction between “seeing” and “watching,” but either is equally reasonable as the missing word in the example sentence. Claiming either B or D is wrong is, itself, wrong.
Off the top of my head, a better sentence to use to get at this distinction between “watching” and “seeing” would be something like
We sat _____ the Moon for half an hour.
Here, “watching” is better than “seeing” because “watching” describes a dedicated activity, while “seeing” is more momentary. (Of course, this sentence makes “to see” or “to watch” more reasonable answers, while they are simply wrong in the original sentence, so in that regard this sentence is not better. Ultimately, having one question mix a test of vocabulary in “see” vs. “watch” with a test of grammar in “seeing” vs. “to see” is probably a bad idea.)
Furthermore, the rest of the sentence is not very well-written either. Describing the Moon, or the seeing/watching, as being “in the open air,” is very weird—where else are you going to see or watch the Moon? Also, Mid-Autumn Day is the name of a holiday, and so should not use “the,” as it is a proper name. (Another way of translating the holiday, “Mid-Autumn Festival,” would probably use “the,” but “Mid-Autumn Day” would not.)
So, unfortunately, I suspect that the author of this question is not really qualified to be testing others on English language skills. Their own skill with the English language does not seem to be strong enough.