Is a spoken digit dataset available?

Question

Is a labeled dataset of spoken digits, that means of people saying "zero", "one", "two", three", "four", "five", "six", "seven", "eight", or "nine" available?

I would also be interested in such a dataset in another language.

I want to try some speech recognition algorithms. This means the dataset should be audio files which were created by recording humans saying those digits.

The traditional data-set for this is TIDIGITS which has duration 1-7 digits, but you could just disgard the longer ones. But that is not open (and is $500). Austalk is a new dataset that has similar data (and a bunch of other stuff as it is a historical corpus of language), but again its not open (it is however free to researchers). I think all its digits are of length >5. I am doubt such open dataset exists, otherwise Kaldi would include it. — Frames Catherine White, Feb 19 '15 at 08:00
By "Kaldi", do you mean http://kaldi.sourceforge.net/about.html? — Martin Thoma, Feb 19 '15 at 08:41
Yes. That is Kaldi, correct. It includes a lot of example scripts/recipes for various datasets. most of which are closed. But some are open. But I do not believe any of the open ones are "Isolated Digits". By >?5, i mean thaey are strings of digts liek "Two five four nine zero", ie they have more than 5 or more digits spoken. — Frames Catherine White, Feb 19 '15 at 17:26
@Oxinabox I wrote to the TIDIGITS people and they said "Students can obtain data for free through our data scholarship program." No intention to open the data, though. — philshem, Mar 26 '15 at 07:34

Jakobovski · Answer 1 · 2016-06-21T10:50:39.550

6

https://github.com/Jakobovski/free-spoken-digit-dataset is a free spoken digit dataset (FSDD).

As an added bonus it comes with a few useful python utility functions.

I created this dataset because I had the same problem. Please contribute to increase the dataset's size.

edited Jun 21 '16 at 10:50

answered Jun 21 '16 at 09:49

Jakobovski

160
1
4

Nicolas Raoul · Answer 2 · 2015-03-23T10:47:41.563

3

Forvo is a collection of pronunciations by human speakers, they have a huge amount of data, and I am pretty sure they have all digits for the 20 most spoken languages.

Examples:

License: BY-NC-SA 3.0

edited Mar 23 '15 at 10:47

answered Feb 20 '15 at 05:30

Nicolas Raoul

8,426
5
28
61

1

I was also going to suggest Forvo. Unfortunately, most numbers won't be replicated enough times to constitute a data set. – philshem Mar 23 '15 at 10:35

score 3 · Answer 3 · answered Jul 05 '16 at 17:35

3

How about the UCI in Arabic? Ther are not exactly audio files, but it gets the job done. https://archive.ics.uci.edu/ml/datasets/Spoken+Arabic+Digit

answered Jul 05 '16 at 17:35

Pau Vilimelis Aceituno

31
1

Is a spoken digit dataset available?

3 Answers3

Linked