54

I have some difficulty in importing a JSON file with pandas.

import pandas as pd
map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

This is the error that I get:

ValueError: If using all scalar values, you must pass an index

The file structure is simplified like this:

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

It is from the machine learning course of University of Washington on Coursera. You can find the file here.

smci
  • 29,564
  • 18
  • 109
  • 144
Marco Fumagalli
  • 2,065
  • 3
  • 20
  • 35
  • 2
    This is much more a pandas question than it it a JSON question -- you wouldn't have this specific error in any context that *didn't* involve pandas, but you **can** get this specific error without JSON being involved. – Charles Duffy Jul 14 '16 at 17:44
  • 1
    See for instance, http://stackoverflow.com/questions/17839973/construct-pandas-dataframe-from-values-in-variables -- a question with the same error, but no JSON involved. – Charles Duffy Jul 14 '16 at 17:46
  • 1
    look like you are taking the ML courses from Emily :) – Bryan Fok Mar 09 '17 at 05:40
  • 1
    It is expecting a list. So if you do like this will work. `pd.DataFrame([{"biennials": 522004, "lb915": 116290}])`. – Aung Jun 30 '17 at 08:06

6 Answers6

72

Try

ser = pd.read_json('people_wiki_map_index_to_word.json', typ='series')

That file only contains key value pairs where values are scalars. You can convert it to a dataframe with ser.to_frame('count').

You can also do something like this:

import json
with open('people_wiki_map_index_to_word.json', 'r') as f:
    data = json.load(f)

Now data is a dictionary. You can pass it to a dataframe constructor like this:

df = pd.DataFrame({'count': data})
ayhan
  • 64,199
  • 17
  • 170
  • 189
19

You can do as @ayhan mention which will give you a column base format

Method 1

Or you can enclose the object in [ ] (source) as shown below to give you a row format that will be convenient if you are loading multiple values and planing on using matrix for your machine learning models.

df = pd.DataFrame([data])

Method 2

Adonis H.
  • 331
  • 2
  • 7
6

I think what is happening is that the data in

map_index_to_word = pd.read_json('people_wiki_map_index_to_word.json')

is being read as a string instead of a json

{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}

is actually

'{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}'

Since a string is a scalar, it wants you to load it as a json, you have to convert it to a dict which is exactly what the other response is doing

The best way is to do a json loads on the string to convert it to a dict and load it into pandas

myfile=f.read()
jsonData=json.loads(myfile)
df=pd.DataFrame(data)
Anant Gupta
  • 1,025
  • 9
  • 11
2
{
"biennials": 522004,
"lb915": 116290
}

df = pd.read_json('values.json')

As pd.read_json expects a list

{
"biennials": [522004],
"lb915": [116290]
}

for a particular key, it returns an error saying

If using all scalar values, you must pass an index.

So you can resolve this by specifying 'typ' arg in pd.read_json

map_index_to_word = pd.read_json('Datasets/people_wiki_map_index_to_word.json', typ='dictionary')
0

For example cat values.json

{
name: "Snow",
age: "31"
}

df = pd.read_json('values.json')

Chances are you might end up with this Error: if using all scalar values, you must pass an index

Pandas looks up for a list or dictionary in the value. Something like cat values.json

{
name: ["Snow"],
age: ["31"]
}

So try doing this. Later on to convert to html tohtml()

df = pd.DataFrame([pd.read_json(report_file,  typ='series')])
result = df.to_html()
Sonal
  • 399
  • 3
  • 6
0

I solved this by converting it into an array like so

[{"biennials": 522004, "lb915": 116290, "shatzky": 127647, "woode": 174106, "damfunk": 133206, "nualart": 153444, "hatefillot": 164111, "missionborn": 261765, "yeardescribed": 161075, "theoryhe": 521685}]
Laksh Matai
  • 176
  • 1
  • 5