3

How would I properly generate a "javax.ws.rs.core.Response" (to be returned) that supports Chinese character encoding within an Excel file?

To clarify, i have a file (CSV excel) which contains some Chinese content, and I need to return a javax response which then displays the Chinese characters in the document properly (on the client side).

Currently I'm doing the following:

return Response.status( 200 )
        .header( "content-disposition", 
                 "attachment;filename=SampleCSV.csv;charset=Unicode" )
        .entity( result )
        .build();

but when this response is built and returned to the client side (and a popup screen is displayed asking to download the file), the Chinese content of the excel file is gobbly gooed.

Any suggestion will be highly appreciated.

Flimzy
  • 68,325
  • 15
  • 126
  • 165
Mohammad Najar
  • 1,959
  • 2
  • 20
  • 31

2 Answers2

5

The RFC that defines the content-disposition header doesn't mention a charset clause

Try also adding a proper content-type header to the response:

.header("Content-Type", "text/csv; charset=utf-8")

Be sure to use utf-8, and not unicode. If that works, then you can remove the charset clause from the content-disposition header.

Community
  • 1
  • 1
Sean Reilly
  • 20,946
  • 3
  • 47
  • 62
3

You specify charset=Unicode, which is not valid because Unicode is not a single encoding. It's a character set with a family of encodings. UTF-8 and UTF-16 are commonly-used encodings.

You can control the response header, to affect how the browser/client interprets the response, using the @Produces annotation. I've seen different opinions about whether this works:

I'm fairly certain that this only changes the encoding declared in the response headers; it doesn't change the encoding that's actually used to convert the response string into bytes to send over the network. These two must match, otherwise the browser/client will misinterpret the response, because it believes that you used a different encoding than you actually did.

If you return a java.lang.String object, JAx-RS uses a system default encoding to convert it to a byte stream. If the JAX-RS server is running on Unix this is UTF-8, which usually works well, but on Windows it's something weird that doesn't.

Therefore you should force it to use a specific encoding, by wrapping the result object in an OutputStreamWriter that specifies the encoding. This prevents JAX-RS from using the default conversion.

To be specific, if result is a java.lang.String object in your code, you may need to create an OutputStreamWriter around it that specifies an encoding, such as UTF-8, to affect byte stream that JAX-RS writes to the network. I haven't tested this code, but it might work:

.entity(new OutputStreamWriter(result, "UTF-8"))

I had this problem with Tika, which sends a StreamingOutput instead of a Response, and constructs it with a default OutputStreamWriter, which uses the system's default encoding instead of something predictable.

I modified Tika to specify the encoding when constructing the OutputStreamWriter, and added a charset to the @Produces annotation, and that fixed it for me.

Community
  • 1
  • 1
qris
  • 7,405
  • 3
  • 39
  • 43