6

I need to test whether character is a letter or a space before moving on further with processing. So, i

    for (Character c : take.toCharArray()) {
        if (!(Character.isLetter(c) || Character.isSpaceChar(c)))
            continue;

        data.append(c);

Once i examined the data, i saw that it contains characters which look like a unicode representation of characters from outside of Latin alphabet. How can i modify the above code to tighten my conditions to only accept letter characters which fall in range of [a-z][A-Z]?

Is Regex a way to go, or there is a better (faster) way?

James Raitsev
  • 87,465
  • 141
  • 322
  • 462
  • 1
    Wait, why do you consider "é" to not be a letter? Usually people are looking for ways to make their code handle international input *better*, not *worse*... – Borealid Feb 06 '12 at 02:11
  • @Borealid, In my case the control character is an oddity, which i am currently further investigating. `é` certainly is a valid character, which for the purposes of my program should not be there. – James Raitsev Feb 06 '12 at 02:13
  • 1
    The regex to do this is to check against the Latin script property with `\p{sc=Latin}`. – tchrist Feb 06 '12 at 02:51
  • Related: [*Identify if a Unicode code point represents a character from a certain script such as the Latin script?*](https://stackoverflow.com/q/62109781/642706) – Basil Bourque May 31 '20 at 04:53

3 Answers3

18

If you specifically want to handle only those 52 characters, then just handle them:

public static boolean isLatinLetter(char c) {
    return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
}
Louis Wasserman
  • 182,351
  • 25
  • 326
  • 397
Ernest Friedman-Hill
  • 79,064
  • 10
  • 147
  • 183
4

If you just want to strip out non-ASCII letter characters, then a quick approach is to use String.replaceAll() and Regex:

s.replaceAll("[^a-zA-Z]", "")

Can't say anything about performance vs. a character by character scan and append to StringBuilder, though.

Alistair A. Israel
  • 6,289
  • 1
  • 29
  • 40
1

I'd use the regular expression you specified for this. It's easy to read and should be quite speedy (especially if you allocate it statically).

Samuel Edwin Ward
  • 6,386
  • 3
  • 31
  • 60
  • Could you provide an example to do it the right way? I'd like to see what's faster. – James Raitsev Feb 06 '12 at 02:27
  • It's getting rather late in the day in this locality, so I'm afraid you'll have to wait for code, particularly if you want it to compile :) – Samuel Edwin Ward Feb 06 '12 at 02:50
  • But, as an aside, you might be overly concerned with speed at this time. Surely this isn't the slowest operation you're performing? It might be more efficient to optimize the time that a future developer (who might be you!) spends trying to understand this bit of code. – Samuel Edwin Ward Feb 06 '12 at 02:52