5

I'm trying to get a list of all articles (just the names, not the contents) that are indirectly within the category Japanese people in Wikipedia.

I've tried using a tool called Quick intersection. The results aren't too bad, but I'm getting some apparently spurious results. For example, I get the article John Andru, who competed in an Olympic games in Japan, but who I don't think is Japanese.

I don't know whether this is a bug with Quick Intersection, or whether there is incorrect data in Wikipedia that is causing John to be categorized this way.

One approach I considered was working out what categories John Andru directly or indirectly belonged to. However, I don't know how to do that. I tried searching for that information, but the only thing I've found so far is this question, which wasn't answered.

How can I fix this dirty data problem?

Is Quick Intersection the most suitable tool for what I'm trying to do?

Patrick Hoefler
  • 5,790
  • 4
  • 31
  • 47
Andrew Grimm
  • 283
  • 2
  • 7

1 Answers1

5

As far as I can see, there is no link from the category Japanese people to the article John Andru (at least based on dump from 1 October 2013), so I'm not sure why did the tool tell you otherwise.

But the category structure is pretty messed up. For example, John Andru is in the category 18th-century German writers, through chain of categories like this:

Category:18th-century German writers → Category:Immanuel Kant → Category:Kantianism → Category:A priori → Category:Analysis → … → Category:Statistics → … → Category:Anatomy → … → Category:Memory → … → Category:Design → … → Category:Justice → … → Category:Anti-corruption measures → … → Category:Students → … → Category:New Left → … Category:Free will → … → Category:Technology → … → Category:History → … → Category:Millennia → Category:2nd millennium → … → Category:1932 births → John Andru

svick
  • 869
  • 4
  • 9