13

I found two public databases that describe the composition of Chinese characters.

1 - CJK Decomposition Data

2 - Chinese Characters Decomposition on Wiki Commons

I can see how the formats they use are different, but I don't think they are incompatible. Are these two data sources related or completely independent? Is it known which one is of higher quality?

The reason I'm asking is because I'm working on a tool that makes it easier to look up characters. It allows you to search by radical anywhere in the character, rather than by just the primary radical used in dictionaries.

Sjors Provoost
  • 453
  • 1
  • 3
  • 8
  • They are related, there're some rules for character composition. 2. CJK consists of Chinese, Japanese, and Korean. So it would be a larger set. 3. Quality depends on your criteria. Mainland China, Hong Kong, Taiwan have different standards. So, this is really a professional question ... it's difficult. Maybe you can collect data from this site, there're many experts helping refining their sources -- I think it is the best Chinese online dictionary, currently.
  • – Stan Aug 04 '13 at 18:09
  • Sjors, what are you using the data for? When you don't say explicitly, people on this site tend to assume you are using whatever you ask for to study Chinese, which I doubt is the case here. – Stumpy Joe Pete Aug 08 '13 at 16:41
  • @StumpyJoePete I updated my question to explain. – Sjors Provoost Aug 10 '13 at 15:29
  • 1
    You should check out this question and its answers. I can't comment on the difference in quality, but there are several sources of such data, and a tool already exists that works like you want. – Stumpy Joe Pete Aug 10 '13 at 21:50
  • @StumpyJoePete thanks. I assume you're referring to Tatoeba? That is indeed similar to what I'm trying to do and they are using the Wikimedia data. Of course I stubbornly believe I can do even better :-) – Sjors Provoost Aug 11 '13 at 08:26
  • Best of luck in your quest :) – Stumpy Joe Pete Aug 11 '13 at 20:07