5

I'm trying to get a list of major cities in the world: their name, population, and location. I found what looked like a good query on Wikidata, slightly tweaking a built-in query example:

SELECT DISTINCT ?cityLabel ?population ?gps WHERE {
  ?city (wdt:P31/wdt:P279*) wd:Q515.
  ?city wdt:P1082 ?population.
  ?city wdt:P625 ?gps.
  FILTER (?population >= 500000) .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)

The results, at first glance, appear to be good, but it's missing a ton of important cities. For example, San Francisco (population 800,000+) is not in the list, when I specifically asked for all cities with a population greater than 500,000.

Is there something wrong with my query? If not, there must be something wrong with the data Wikidata is using. Either way, how can I get a valid data set, with an API I can query from a Python script? (I've got the script all working for this; I'm just not getting back valid data.)

1 Answers1

2

Thanks to @Andrew for pointing a way to check data.

You can have 'San Francisco' and two other cities changing your query to:

SELECT DISTINCT ?cityLabel ?population ?gps WHERE {
  ?city (wdt:P31/wdt:P279*) ?type.
  ?city wdt:P1082 ?population.
  ?city wdt:P625 ?gps.
  FILTER (?population >= 500000) .
  FILTER(?type=wd:Q515 || ?type=wd:Q3301053)
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
ORDER BY DESC(?population)

the changes are: ?type and FILTER(?type=wd:Q515 || ?type=wd:Q3301053)

(your query gives 251 entries, after change: 254)

  • SF does have a population value (several of them, in fact) - https://m.wikidata.org/wiki/Q62 ; it's also correctly marked P31:Q515 so the search should pick it up. – Andrew is gone May 30 '16 at 22:36