My question is not strictly binds to the topic of text mining, but maybe you can help. I am hunting for a keyword set, which has the following criterions: - contains only english words/n-gramms or named entities - manual (tagged by human) tags of global news - has some main topics (5-10), e.g. tech, business, sport, ... - has to be relatively big (10000+ tags) - has be up to date - it would be nice that the keywords have frequency weights. I thought that couple of the biggest news portals has tag cloud which (partly) fits on the criterions. But I didn't find anything on BBC News, CNN News, Reuters, ... Interestingly I found some portals in my mother language (hungarian), I cant believe that, there isnt anything on global level. I dont need API, I can parse the HTML if necessary. Maybe corpora can be useful.
Thanks.