5

I have an app that syncs data from a remote DB that users populate. Seems people copy and paste crap from a ton of different OS's and programs which can cause different hidden non ASCII values to be imported into the system.

For example I end up with this:

Artist:â â Ioco

This ends up getting sent back into system during sync and my JSON conversion furthers the problem and invalid characters in various places cause my app to crash.

How do I search for and clean out any of these invalid characters?

the Tin Man
  • 155,156
  • 41
  • 207
  • 295
Slee
  • 26,182
  • 50
  • 143
  • 241
  • In a nutshell: create a new mutable string, iterate over all characters, check if it's an ASCII character and if so, append it to a the string. –  Jun 15 '11 at 17:15
  • 17
    In 2011 there's really no excuse not to handle unicode properly (http://www.joelonsoftware.com/articles/Unicode.html). Remember that real people can and do have names like José or Müller or Jönsson or even Məmmədov or ბერიძე or 陈. – damian Jun 15 '11 at 17:27
  • 4
    This "crap" letters from other languages than English. You should try to figure out the right encoding to preserve the letters. – vikingosegundo Jun 15 '11 at 17:33
  • turns out it wasn't those characters but some hexadecimal values I could not see – Slee Jun 17 '11 at 13:23
  • 2
    Those 'hexadecimal values' that you can't see will be components of multi-byte (Unicode) characters that your software isn't handling properly. – damian Jun 18 '11 at 14:08

1 Answers1

23

While I strongly believe that supporting unicode is the right way to go, here's an example of how you can limit a string to only contain certain characters (in this case ASCII):

NSString *test = @"Olé, señor!";

NSMutableString *asciiCharacters = [NSMutableString string];
for (NSInteger i = 32; i < 127; i++)  {
    [asciiCharacters appendFormat:@"%c", i];
}

NSCharacterSet *nonAsciiCharacterSet = [[NSCharacterSet characterSetWithCharactersInString:asciiCharacters] invertedSet];

test = [[test componentsSeparatedByCharactersInSet:nonAsciiCharacterSet] componentsJoinedByString:@""];

NSLog(@"%@", test); // Prints @"Ol, seor!"
Morten Fast
  • 6,292
  • 26
  • 36
  • 1
    No, because `stringByTrimmingCharactersInSet` only trims the ends of the string, and therefore won't remove all the characters. – Morten Fast Aug 15 '12 at 09:04
  • 1
    I agree that Unicode is the way to go. However in some cases this might still be valid. I have to generate QR Codes and I think that umlauts and the like are not ideal characters there. – Besi Apr 16 '14 at 13:30
  • Thanks, mate! This was brilliant. – Felipe Feb 16 '17 at 21:32