5

I have CharSequence source, int start, int end

I would like to strip all "control characters" from source between start and end and return this as a new CharSequence

by "control character" I mean undeseriable characters like Tab and Return, line feeds and such... basically all that was in ASCII < 32 (space) ... but I don't know how to do it in this "modern age"

what is a char? is it unicode? How can I remove these "control characters" ?

ycomp
  • 7,646
  • 18
  • 55
  • 86
  • Have you checked this: http://stackoverflow.com/questions/4283351/how-to-replace-special-characters-in-a-string – assylias Feb 23 '12 at 16:46
  • Use [`String#replaceAll()`](http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29). – Matt Ball Feb 23 '12 at 16:48

4 Answers4

2

You could use CharSequence.subSequence(int, int) and String.replaceAll(String, String) as follows:

source.subSequence(0, start).toString() + source.subSequence(start, end).toString().replaceAll("\\p{Cntrl}", "") + source.subSequence(end, source.length()).toString()
Dan Cruz
  • 14,754
  • 6
  • 40
  • 64
1

Assuming that you can get the whole source into memory, you can do this:

String tmp = source.toString();
String prefix = tmp.substring(0, start-1);
String suffix = tmp.substring(end+1);
String middle = tmp.substring(start, end).replaceAll("\\s", "");
CharSequence res = prefix + middle + suffix;
Sergey Kalinichenko
  • 697,062
  • 78
  • 1,055
  • 1,465
1

Use Character.isISOControl(char) if using latest Guava library.
Yes char is Unicode.

speksy
  • 590
  • 7
  • 12
Highland Mark
  • 932
  • 1
  • 6
  • 12
1

Using Guava's CharMatcher:

return CharMatcher.JAVA_ISO_CONTROL.removeFrom(string);
Louis Wasserman
  • 182,351
  • 25
  • 326
  • 397