12

Is a labeled dataset of spoken digits, that means of people saying "zero", "one", "two", three", "four", "five", "six", "seven", "eight", or "nine" available?

I would also be interested in such a dataset in another language.

I want to try some speech recognition algorithms. This means the dataset should be audio files which were created by recording humans saying those digits.

Martin Thoma
  • 487
  • 3
  • 13
  • 1
    The traditional data-set for this is TIDIGITS which has duration 1-7 digits, but you could just disgard the longer ones. But that is not open (and is $500). Austalk is a new dataset that has similar data (and a bunch of other stuff as it is a historical corpus of language), but again its not open (it is however free to researchers). I think all its digits are of length >5. I am doubt such open dataset exists, otherwise Kaldi would include it. – Frames Catherine White Feb 19 '15 at 08:00
  • What do you mean by "digits of length >5"? – Martin Thoma Feb 19 '15 at 08:40
  • By "Kaldi", do you mean http://kaldi.sourceforge.net/about.html? – Martin Thoma Feb 19 '15 at 08:41
  • 1
    Yes. That is Kaldi, correct. It includes a lot of example scripts/recipes for various datasets. most of which are closed. But some are open. But I do not believe any of the open ones are "Isolated Digits". By >?5, i mean thaey are strings of digts liek "Two five four nine zero", ie they have more than 5 or more digits spoken. – Frames Catherine White Feb 19 '15 at 17:26
  • 1
    @Oxinabox I wrote to the TIDIGITS people and they said "Students can obtain data for free through our data scholarship program." No intention to open the data, though. – philshem Mar 26 '15 at 07:34

3 Answers3

6

https://github.com/Jakobovski/free-spoken-digit-dataset is a free spoken digit dataset (FSDD).

As an added bonus it comes with a few useful python utility functions.

I created this dataset because I had the same problem. Please contribute to increase the dataset's size.

Jakobovski
  • 160
  • 1
  • 4
3

Forvo is a collection of pronunciations by human speakers, they have a huge amount of data, and I am pretty sure they have all digits for the 20 most spoken languages.

Examples:

License: BY-NC-SA 3.0

Nicolas Raoul
  • 8,426
  • 5
  • 28
  • 61
  • 1
    I was also going to suggest Forvo. Unfortunately, most numbers won't be replicated enough times to constitute a data set. – philshem Mar 23 '15 at 10:35
3

How about the UCI in Arabic? Ther are not exactly audio files, but it gets the job done. https://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit