20

I am experimenting NLTK package using Python. I tried to downloaded NLTK using nltk.download(). I got this kind of error message. How to solve this problem? Thanks.

The system I used is Ubuntu installed under VMware. The IDE is Spyder.

enter image description here

After using nltk.download('all'), it can download some packages, but it gets error message when downloading oanc_masc

enter image description here

Zulu
  • 7,776
  • 9
  • 44
  • 55
user288609
  • 11,491
  • 24
  • 77
  • 114

4 Answers4

22

To download a particular dataset/models, use the nltk.download() function, e.g. if you are looking to download the punkt sentence tokenizer, use:

$ python3
>>> import nltk
>>> nltk.download('punkt')

If you're unsure of which data/model you need, you can start out with the basic list of data + models with:

>>> import nltk
>>> nltk.download('popular')

It will download a list of "popular" resources.

Ensure that you've the latest version of NLTK because it's always improving and constantly maintain:

$ pip install --upgrade nltk

EDITED

In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')

And if anyone wants to find nltk_data directory, see https://stackoverflow.com/a/36383314/610569

And to config nltk_data path, see https://stackoverflow.com/a/22987374/610569

alvas
  • 105,505
  • 99
  • 405
  • 683
  • 1
    thanks for the reply. I tried nltk.download('all'), it successfully proceeded with downloading some packages, but it got stuck when downloading sth related to oasc_masc, I included the related screenshot in the original post. – user288609 Dec 26 '14 at 18:48
  • 1
    what is your nltk version? what is the output of this on your terminal: `python -c "import nltk; print nltk.__version__"`? – alvas Dec 26 '14 at 18:50
  • Hi there @alvas I'm having similar issues using nltk.download('all') on Ubuntu, except I get HTTP Error 404: Not Found in both IDLE and command line. My NLTK version is 2.0b9. Do you have any idea what might be going on? – Joansy Dec 05 '15 at 23:36
  • @Joansy, Please update your NLTK. `sudo pip install nltk` or `sudo apt-get install python-nltk`. Once it's updated the problem should resolve itself. Otherwise, you would have to set the url manually. Try updating NLTK first, if it doesn't work, then come back again =) – alvas Dec 06 '15 at 00:20
8

From command line, after importing nltk, try

nltk.download('popular', halt_on_error=False)

After an error it will ask to retry broken package, just decline with n and it will continue with proper packages.

alvas
  • 105,505
  • 99
  • 405
  • 683
tolgayilmaz
  • 3,659
  • 1
  • 18
  • 18
  • I had several `UnicodeDecodeError`, and I had to launch this command several times in order to download everything, but it worked in the end. Thanks ! – CoMartel May 31 '17 at 08:30
1

a) in OSX either run

sudo /Applications/Python\ 3.6/Install\ Certificates.command

b) switch to admin user (the one you have set up with administrator privileges)

and type at command line:

/Applications/Python\ 3.6/Install\ Certificates.command

Notes:

  • "\" are necessary because they escape blank characters in file names.
  • This procedure worked if you have python 3.6 installed, otherwise change it in order to match your install python version... for this execute:

ls /Applications

and look at the python directory name you have there.

UpAndAdam
  • 4,300
  • 2
  • 26
  • 43
Alexandre
  • 11
  • 1
-2

I had this error:

Resource punkt not found. Please use the NLTK Downloader to obtain the resource: import nltk nltk.download('punkt')

When I tried to solve by writing:

import nltk

nltk.download()

my computer shut downs suddenly and anaconda also closed. When I try to open it always shows an error.

I solved the problem by writing:

import nltk

nltk.download('punkt')
double-beep
  • 4,567
  • 13
  • 30
  • 40
  • This probably won't help. His problem was unable to nltk.download('all'), more likely only unable to nltk.download('oanc_masc') – Jzou Jan 31 '20 at 22:01