Possible Duplicate:
Where to find a large text corpus?
I know someone has asked a similar question here, but I'm wondering whether anyone knows of a large textual corpus that is available for research use. The number of documents it contains isn't terribly important--rather, I'm looking for something that is on the TB-/PB-level that I can use to test the scalability of some algorithms. I thought about using the english Wikipedia data dump, but I think it's only about 25GB. My other thought was using a database of twitter messages, but, if I remember right, those aren't freely-available. Does anyone have a recommendation?