13

If I write this code, I get this as output --> This first:  and then the other lines

try {
    BufferedReader br = new BufferedReader(new FileReader(
            "myFile.txt"));

    String line;
    while (line = br.readLine() != null) {
        System.out.println(line);
    }
    br.close();

} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (IOException e) {
    e.printStackTrace();
}

How can I avoid it?

Nayuki
  • 17,167
  • 5
  • 51
  • 77
Milton90
  • 517
  • 2
  • 7
  • 15

2 Answers2

19

You are getting the characters  on the first line because this sequence is the UTF-8 byte order mark (BOM). If a text file begins with a BOM, it's likely it was generated by a Windows program like Notepad.

To solve your problem, we choose to read the file explicitly as UTF-8, instead of whatever default system character encoding (US-ASCII, etc.):

BufferedReader in = new BufferedReader(
    new InputStreamReader(
        new FileInputStream("myFile.txt"),
        "UTF-8"));

Then in UTF-8, the byte sequence  decodes to one character, which is U+FEFF. This character is optional - a legal UTF-8 file may or may not begin with it. So we will skip the first character only if it's U+FEFF:

in.mark(1);
if (in.read() != 0xFEFF)
  in.reset();

And now you can continue with the rest of your code.

Nayuki
  • 17,167
  • 5
  • 51
  • 77
  • If I am correct, BOM character would occur only once in the entire file? – Adil Apr 19 '18 at 15:37
  • It can occur multiple times if a dumb program concatenated multiple files that each contained a header BOM. – Nayuki Apr 19 '18 at 15:53
2

The problem could be in encoding used. try this:

BufferedReader in = new BufferedReader(new InputStreamReader(
      new FileInputStream("yourfile"), "UTF-8"));
Joey
  • 330,812
  • 81
  • 665
  • 668
Tala
  • 8,608
  • 5
  • 33
  • 36