Improved romanisation of Katakana

Question

I'm in the planning stages of programming my own Katakana => Romaji converter. However, I've noticed that every other converter already out there just converts literally. I want to try and employ a more "intelligent" converter (note that the end result will be a default, the user will be able to tweak the transliteration if they so choose)

As an example, have this name of a character from one of my stories: ネックスス

Converters will give the result: Nekkususu... but nobody who isn't familiar with Romaji would read that as Nexus...

So I'm trying to figure out some rules for better transliteration. So far, I have the following:

The substring kkus can be replaced with x
A u at the end is most likely silent and can be dropped (these two rules already "fix" Nexus)
Tu is better as Tsu, and Ti as Chi.

However, that's where my knowledge ends. What I'd like to know is, for starters, are the above three rules correct? Are there exceptions I should be aware of? Are there other rules that could make my converter more accurate?

As mentioned before, perfection isn't required because the user may adjust the result, but I would feel better about the feature if it at least made an effort.

Welcome to JL! Are you aiming for a Katakana-SourceLanguage converter? Because I don't think "Nexus" is romaji nor a faithful transliteration of ネックスス. — Helix Quar, May 13 '14 at 13:55
@helix Is isn't? What would ネックスス be, then? Or, more importantly (to me), how would you write Nexus in Katakana? — Niet the Dark Absol, May 13 '14 at 13:56
@Earthliŋ Ah, I see. Yes... The one I gave would be more like "Nexous", wouldn't it? Sorry, still getting used to Katakana, most of my knowledge comes from reading Pokémon names! :D — Niet the Dark Absol, May 13 '14 at 14:00
@NiettheDarkAbsol Are you familiar with Hepburn romanization and other transliteration systems? — Helix Quar, May 13 '14 at 14:11
@helix Hmm, interesting. I should've figured there'd be official standards on this XD — Niet the Dark Absol, May 13 '14 at 14:17
I think we normally say/write Nexus as ネクサス... http://ja.wikipedia.org/wiki/%E3%83%8D%E3%82%AF%E3%82%B5%E3%82%B9 (Don't ask me why) — , May 13 '14 at 15:33

score 7 · Accepted Answer · answered May 13 '14 at 14:12

7

I think there are no consistent rules for transcribing foreign words to katakana and thus the task of reverting the process is even harder.

The most obvious hurdle will be deciding whether ラリルレロ should be La Li Lu Le Lo or Ra Ri Ru Re Ro (or something completely different), e.g. レディー is either lady, or ready.

Moreover, there are many source languages, like English, Portuguese, French, German. For example, another — not so serious, but obvious — question is whether カクコ should be Ka Ku Ko, or Ca Cu Co, cf. カルテ Ger. Karte and かるた Pt. carta (although かるた is now written Karuta in English).

So, I think that a better name for your project is not to make a converter, but to build a database, which you might do by building a simple converter and having users "teach" the best guess. (Google Translate has users teach it translations, so does Detexify.)

A 外来語 dictionary file would be a good place to start.

answered May 13 '14 at 14:12

Earthliŋ

48,176
10
128
199

Should've considered such ambiguity... especially with the one in Kingdom Hearts with "Reverse/Rebirth" both being リバス (I think!) Anyway, thanks for the answer. Maybe this is too ambitious... It might just be better to stick to a literal, naive transliteration and letting the user adjust it... – Niet the Dark Absol May 13 '14 at 14:19
6

What you just went through is most likely what every single author of every katakana to romaji converter that's out there went through. "Wouldn't it be great...? Uh, never mind..." – Earthliŋ May 13 '14 at 14:23
Haha! You have no idea how often I go through that in my projects! – Niet the Dark Absol May 13 '14 at 14:25
I think we normally say/write "Reverse/Rebirth" as リバース... http://ja.wikipedia.org/wiki/%E3%83%AA%E3%83%90%E3%83%BC%E3%82%B9 (Don't ask me why) – May 13 '14 at 15:36

Improved romanisation of Katakana

1 Answers1

Linked