1

I need to substring char* to some length and need to convert to NSString.

char *val substring Length

I tried

NSString *tempString = [NSString stringWithCString:val encoding:NSAsciiStringEncoding];
NSRange range = NSMakeRange (0, length);
NSString *finalValue = [tempString substringWithRange: range];

This works but not for other special character languages like chinese. If i convert To UTF8Encoding then substring length will mismatch.

Is there any other way to substring the char* then convert to UTF8 encoding?

Cintu
  • 913
  • 2
  • 15
  • 32

2 Answers2

0

You have to use the encoding, the string is encoded in.

In your case, you say to interpret the string as ASCII string. ASCII does not have chinese characters. Therefore this cannot work with chinese characters: They are not there.

Likely you have an UTF8 encoded string. But simply switching to UTF8 does not help. Since NSString and OS X/iOS at all encodes 16-Bit Unicode, but extended Unicode has 20 bits, chinese characters needs multiple codes. This has some effects, for example -length returns the number of codes, not the number of chinese characters. However, with -rangeOfComposedCharacterSequencesForRange: you can adjust the range.

For example (CJK unified ideograph-0x20016):

NSString *str = @"";                            // One chinese whatever
NSLog(@"%ld", [str length]);                      // This are "2" characters

NSRange range = {0, 1};                           // Range for the "first" character
NSLog(@"%ld %ld", range.location, range.length);  // 0 1
range = [str rangeOfComposedCharacterSequencesForRange:range];
NSLog(@"%ld %ld", range.location, range.length);  // 0 2

You can get a better answer, if you add information about the encoding of the string coming in and the required encoding for putting out.

Strings are not UTF8 or whatever strings. Strings are strings. Their storage, their representation in computer memory has an encoding, but they don't have an encoding themselves.

Amin Negm-Awad
  • 16,289
  • 3
  • 34
  • 50
  • So i can use UTF8 and substring with rangeOfComposedCharacterSequencesForRange ? – Cintu Sep 07 '17 at 07:17
  • Input is not NSString, its `char *`. `char*` having each uni characters, which i need to substring and convert to NSString – Cintu Sep 07 '17 at 07:22
  • `char *` cannot have "each unicode character" since `char` is 8 bit. This is not enough by far. If it is UTF8, use UTF8. If it is true 20 bit encoding (using 3 or 4 bytes, resp `char`s) use the corresponding encoding constant. But first of all: Get information about the encoding you get. (Not the type you have in your program.) And *always* use `-rangeOfComposedCharacterSequencesForRange:` to adjust the range. – Amin Negm-Awad Sep 07 '17 at 07:36
-1

I found the solution for my question

char subString[length+1]; 
strncpy(subString, val, length);
subString[length] = '\0'; // place the null terminator
NSString *finalString = [NSString stringWithCString: subString encoding:NSUTF8StringEncoding];

I did the char* sub string and UTF8 encoding both.

Cintu
  • 913
  • 2
  • 15
  • 32