0

Background: I'm gonna attempt to make a list of the most used words/kanji on different message boards on 2ch.net so that japanese learners quckly can participate in online discussion and thus become motivated to continue.

I'm looking for a way to separate words, but it's not as simple as in english. Words can either be one kanji or consist of multiple, like "巨人" (giant) or "人" (human), and there are no spaces either.

So I probably need to have some japanese word processing library, and I only know python, javascript and java. (I prefer python)

Alexander
  • 35
  • 1
  • 6
  • @lattyware This isn't as simple as splitting the characters, because a word can consist of multiple characters as well as being alone. – Alexander Jul 27 '13 at 13:21
  • Yes, if you read the answers on the post I linked, people talk about splitting Japanese into words. – Gareth Latty Jul 27 '13 at 13:39

1 Answers1

0

I searched for "natural language processor" for japanese and found this

https://jprocessing.readthedocs.org/en/latest/

and it seems to be what I was looking for.

Alexander
  • 35
  • 1
  • 6