Tag Archives: Character Encoding

Accessing files the right way

While working with files or streams there are many pitfalls on your way.

The first decision is: is the data binary or text?

If you have binary data the handling is straight forward:

<br />
InputStream is = ...<br />

InputStream is the way to handle binary data.

But what if you have text data (e.g. a log file you want to read)? Then this is a job for a Reader.

If you now want to read a text the first try is

<br />
Reader r = new InputStreamReader(is);<br />

But there is a pitfall with this approach as written in the documentation:

Creates an InputStreamReader that uses the default charset.

The Reader is created with the default charset. When running on different platforms you will have different results. If you access an InputStream that is connected to another computer (e.g. via a network) then you will likely be in trouble because you cannot expect a certain encoding.

To ensure to read the text clearly you must define an encoding and use this explicitly on both sides.

To open a Reader with a certain encoding use

<br />
Reader r = new InputStreamReader(is, &quot;UTF-8&quot;);<br />

So you can ensure that the data is read with the defined encoding (UTF-8 in this case).