I was wondering if there are any available data sets on newspaper headline/article aggregation over 2019 and 2020, I'm interested in exploring sentiment analysis over the period but have been unable to find anything contemporary enough?
Asked
Active
Viewed 369 times
1 Answers
1
"A Million News Headlines" is "sourced from the Australian Broadcasting Corporation" containing "data of news headlines published over a period of eighteen years from 2003-02-19 to 2020-12-31" available from:
- https://www.kaggle.com/therohk/million-headlines, or
- https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/SYBGZL
"India News Headlines Dataset" is "an archive of notable events in the Indian subcontinent" from "2001-01-01 to 2020-12-31":
Although the following do not cover 2019 and 2020 as requested I include for information:
- "One Week of Global News Feeds" is "a snapshot of most of the new news content published online" covering "the 7 Day-period of August 24 through August 30 for the years 2017 and 2018" https://www.kaggle.com/therohk/global-news-week
- "News Headline Collection" is "Headline Dataset collected over three years (Jan-2014 to Dec-2016)" https://sites.google.com/view/headlinedataset/home
There is also a "Dataset of major newspapers content" question on this site posted Apr 12 2015 with some useful links Dataset of major newspapers content
datakrunch
- 21
- 2