0

I am trying to read a file which contains some japanese characters.

RandomAccessFile file = new RandomAccessFile("japanese.txt", "r");
String line;
while ((line = file.readLine()) != null) {
   System.out.println(line);
}

Its returning some garbled characters instead of japanese. But when I am converting the encoding, it printing it properly.

line = new String(line.getBytes("ISO-8859-1"), "UTF-8");

What does this mean? Is the text file in ISO-8859-1 encoding?

$ file -i japanese.txt returns following:

japanese.txt: text/plain; charset=utf-8

Please explain which it explicitely requires the file to convert from Latin 1 to UTF-8?

Shashwat Kumar
  • 4,869
  • 2
  • 26
  • 58

2 Answers2

3

No, readString is an obsolete method, still before charsets/encodings and such. It turns every byte into a char with high byte 0. Byte 0x85 is a line separator (EBCDIC NEL), and if that were in some UTF-8 multibyte sequence, the actual line would be broken into two lines. And some more scenarios are feasible.

Best use Files. It has a newBufferedReader(path, Charset) and a fixed default charset UTF-8.

Path path = Paths.get("japanese.txt");
try (BufferedReader file = Files.newBufferedReader(path)) {
    String line;
    while ((line = file.readLine()) != null) {
        System.out.println(line);
    }
}

Now you'll read correct Strings.

A RandomAccessFile basically is for binary data.

Joop Eggen
  • 102,262
  • 7
  • 78
  • 129
1

It looks like it is ISO, but I would try reading with that encoding and seeing what happens.

Since you don't do random access, I would just create a BufferedReader with the right encoding and use that:

String charSetName = // either UTF-8 or iso - try both
FileInputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, Charset.forName(charSetName));
BufferedReader reader = new BufferedReader(isr);

while ((line = reader.readLine()) != null) {
    System.out.println(line);
}
rghome
  • 7,961
  • 8
  • 39
  • 57