I am looking to run a Map Reduce framework and do some analysis on a Twitter dataset for the FIFA 2014 World Cup event. I couldn't find a place to get these datasets for free. Can someone suggest a place to get at least a sample of this dataset for free? The dataset I'm looking for is actual tweets posted by the users, which contain related hashtags like #FIFA,#GER Vs #ARG, #BRZ, #Neymar, etc. for the duration of the entire World Cup.
Asked
Active
Viewed 3,692 times
8
-
Why not download tweets via the API? – Thomas Nov 13 '14 at 09:55
-
1@Thomas Unless you have the tweet id or you search by user, tweets aren't retrievable by API search after one week. – philshem Nov 13 '14 at 16:16
-
FYI, it's against the Terms and Service to share the tweet data, but you can share the tweet ids. If you find a list of tweet ids related to your topic, then you can use the API to retrieve these tweets. – philshem Nov 13 '14 at 16:18
-
Thanks for the idea @philshem I have tweet id's with me. I went through twitter API GET statuses/lookup using which I can get the entire JSON object for up to 180 requests in a span of 15 min window. I have a total of 3 million tweet id's now. I am planning to write a script or schedule job which sends requests to twitter API once in every 15 min. Is there any better approach than this? In the mean while I am also looking in to option 1 suggested by Jeanne Holm – theja_swarup Nov 16 '14 at 00:26
-
@theja_swarup I think you can comma separate 200 (?) tweet ids in one request. Still requires scheduling and patience. – philshem Nov 16 '14 at 08:08
1 Answers
4
Getting access to historical Twitter data may end up coming with a price. Here are some options:
- I've had great luck using Topsy in looking at a wide variety of tweets ranging from disease vectors in Africa to sentiment analysis. Here's the link for 86K #FIFA tweets for the last 30 days. You can expand to "all time", search by language, and look at influencers.
- Use the Twitter API to get the data you can for free. Good developer resources are available.
- The most comprehensive historical archive may be via Gnip, but unfortunately it is not free and it's unclear what the actual costs are.
Good luck!
Jeanne Holm
- 4,447
- 1
- 18
- 40
-
Thanks Jeanne. The first option looks promising. I will look in to it. – theja_swarup Nov 15 '14 at 23:59
-
2