0

I'm trying to load a .json file from an output of an application so I can feed it into different machine learning algorithms so I can classify the text, problem is I can't seem to figure out why NLTK is not loading my .json file, even if I try it with their own .json file, it doesn't seem to work. From what I gather based on the book, I should only need to import 'nltk' and I can use the function 'load' from 'nltk.data'. Can somebody help me realise what I am doing wrong?

Below is the code I used to try loading my the file from nltk.

import nltk
nltk.data.load('corpora/twitter_samples/negative_tweets.json')

After trying that out I got an error from it.

C:\Python34\python.exe "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py"
Traceback (most recent call last):
   File "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py", line 7, in <module>
     nltk.data.load('corpora/twitter_samples/negative_tweets.json')
  File "C:\Python34\lib\site-packages\nltk\data.py", line 810, in load
    resource_val = json.load(opened_resource)
  File "C:\Python34\lib\json\__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "C:\Python34\lib\json\__init__.py", line 312, in loads
    s.__class__.__name__))
TypeError: the JSON object must be str, not 'bytes'

Process finished with exit code 1

EDIT #1 : I'm using Python 3.4.1 and NLTK 3.

EDIT #2 : Below is another try I did but now using json.load()

  import json
  json.load('corpora/twitter_samples/negative_tweets.json')

But I encountered a similar error

C:\Python34\python.exe "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py"
Traceback (most recent call last):
  File "C:/Users/JarvinLi/PycharmProjects/ThesisTrial1/Trial Loading.py", line 5, in <module>
    json.load('corpora/twitter_samples/quotefileNeg.json')
  File "C:\Python34\lib\json\__init__.py", line 265, in load
    return loads(fp.read(),
AttributeError: 'str' object has no attribute 'read'

Process finished with exit code 1
Jarvin Li
  • 1
  • 4
  • Looks like a Python 3 vs Python 2 issue. Are you using an older version of NLTK? – tripleee Jul 04 '16 at 08:58
  • I'm using Python 3.4.1, and NLTK 3. @tripleee – Jarvin Li Jul 04 '16 at 10:55
  • Seems to be a weird issue. Can you use double-quotes and additionally escape the `/` and check? – Ic3fr0g Jul 04 '16 at 12:19
  • Probably a bug in NLTK then. http://stackoverflow.com/questions/6862770/python-3-let-json-object-accept-bytes-or-let-urlopen-output-strings discusses the underlying problem. – tripleee Jul 04 '16 at 12:27
  • @MayurH Even Windows accepts forward slashes as directory separators completely transparently. Escaping slashes makes no sense because they have no special meaning (unlike backslash). – tripleee Jul 04 '16 at 12:29
  • @MayurH I tried what you suggested and it did not work, I tried out different combinations of it but none worked. – Jarvin Li Jul 05 '16 at 01:09
  • @tripleee Thanks for the link, I'll try reading it and hopefully understand it to fix my problems. Thanks :) – Jarvin Li Jul 05 '16 at 01:09

0 Answers0