10

I'm seeking corpora of American English SMS (<=160 characters) text messages. What corpora are available?

Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
Dan
  • 520
  • 4
  • 15

2 Answers2

10

The largest English corpus I've found (over 10,000 messages) is the National University of Singapore's SMS corpus -- select the corpus with "all" messages -- however, closer examination reveals that relatively few of the messages originate from US participants.

A corpus of SMS spam messages has been created which are written in English. There are over 1,000 legitimate messages and only a few hundred are spam-related.

Dr. Caroline Tagg created a corpus of SMS messages (although I believe they are primarily in British English), but I cannot find the corpus online. However, her paper contains hundreds of messages from the corpus.

I created a text message corpus of SMS messages in American English that contains over 4,900 messages, a few hundred of which are related to illegal drug use. Now available only on archive.org.

Dan
  • 520
  • 4
  • 15
3

I recently made a corpus available of text messages from Spanish-dominant bilingual young adults in New York City (Bilingual Youth Texts Corpus). It is at www.byts.commons.gc.cuny.edu

Dan
  • 520
  • 4
  • 15
MichelleMc
  • 31
  • 2