4

Transliterator::listIDs() will list IDs, but apparently it's not a complete list.

In the example from this page, the ID looks like:

Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove; Lower();

which is kind of weird, because IDs are supposed to be unique. This looks more like a rule, but it doesn't work if I pass it to the createFromRules method :)

Anyway, I'm trying to remove any punctuation from the string, except dash (-), or characters from a specific list.

Do you know if that's possible? Or is there some documentation that better explains the syntax for the transliterator ?

nice ass
  • 16,339
  • 7
  • 47
  • 84

2 Answers2

6

The ids that Transliterator::listIDs() are the "basic ids". The example you gave is a "compound id". You can see the ICU docs on this.

You can also create your own rules with Transliterator::createFromRules().

You can take a look at the prefefined rules:

<?php
$a = new ResourceBundle(NULL, sprintf('icudt%dl-translit', INTL_ICU_VERSION), true);

foreach ($a['RuleBasedTransliteratorIDs'] as $name => $v) {
    $file = @$v['file'];
    if (!$file) {
        $file = $v['internal'];
        echo $name, " (direction $file[direction]; internal)\n";
    } else { 
        echo $name, " (direction: $file[direction])\n";
        echo $file['resource'];
    }
    echo "\n--------------\n";
}

After formatting, the result looks like this.

chx
  • 10,915
  • 7
  • 51
  • 118
Artefacto
  • 93,596
  • 16
  • 191
  • 218
  • 1
    friendly reminder: that's a pretty intense .txt file for machine low on memory, chrome and sublime text may stop responding handling it... – bitinn Dec 16 '13 at 07:25
1

Just in case someone wants a working example. The example mentioned (from the php manual) uses procedural style. To make it work with an object oriented style, use create() instead of createFromRules()

removePunctuation($string) {
    $transliterator = Transliterator::create("Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC; [:Punctuation:] Remove;", \Transliterator::FORWARD);

    return $transliterator->transliterate($string);
}
Simon
  • 174
  • 10