I have to parse text on which non-ASCII characters are encoded to a notation \X--\, where -- is the character's Unicode number. For example:
vis\XED\vel numa das imagens pr\XE9\vias \XE0\ administra\XE7\\XE3\o
Should be converted to
visível numa das imagens prévias à administração
I could do this like a Neanderthal: looking for a "\X", confirming there's a "/" 2 characters later, replacing the whole thing by the respective character, rinse and repeat until no further matches found. However, there's surely a better way to do this.
Then, I tried using regular expressions, something I don't understand nearly well enough. On RegExr I ended up with the regular expression '/\X\w{2}\/', that matched what I needed. But when I tried using it with preg_replace_callback(), specifically with the string "/\\X\w{2}\\/" as the regex, I get an "Illegal / unsupported escape sequence" error. I tried a few other regexes I found online, both on this site and elsewhere, to no avail.
Finally, I'm also not quite sure what the best way is to replace the Unicode number with the appropriate character.
So, my question is two-fold:
• What's the ideal way to find the escaped characters?
• How can I get a UTF character from its Unicode number?