19

I see that java.net.URLDecoder.decode(String) is deprecated in 6.

I have the following String:

String url ="http://172.20.4.60/jsfweb/cat/%D7%9C%D7%97%D7%9E%D7%99%D7%9D_%D7%A8%D7%92%D7%99%D7%9C%D7%99%D7%9"

How should I decode it in Java 6?

BalusC
  • 1,040,783
  • 362
  • 3,548
  • 3,513
danny.lesnik
  • 18,305
  • 28
  • 125
  • 197

5 Answers5

57

You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data).

Draemon
  • 32,863
  • 13
  • 73
  • 103
  • 6
    @whoever downvoted: care to elaborate on which part of this is wrong? – Draemon Feb 17 '12 at 00:14
  • 3
    This is the correct answer! This trips people up all the time. URLEncoder/URLDecoder encode and decode form data *for* URLs, not URLs themselves. The URL class provides the encoding and decoding of the URL itself. And the URI class is an updated, better specified, more general API -- every URL string is also a URI string, so use URI for parsing duties. The URL class itself warns against confusing the use of URLEncoder/Decoder: "The URLEncoder and URLDecoder classes can also be used, but only for HTML form encoding, which is not the same as the encoding scheme defined in RFC2396." – Bob Kerns Oct 23 '12 at 17:02
  • 2
    java.net.URI.decode() is private now – Azee Feb 20 '14 at 16:03
  • 3
    The *media*-type `application/x-www-form-urlencoded` refers to the encoding used for URL's, and the detailed rules specified by `URLDecoder` make it clear that it's perfectly valid for use in decoding a URL. So it's simpler, and probably faster to use `URLDecoder`. – Lawrence Dol Dec 04 '14 at 20:30
  • 2
    URLDecoder will replace "+" with " ", which is incorrect. "+" should only be changed to " " in the query string keys and values. – Dobes Vandermeer Feb 28 '16 at 20:12
27

Now you need to specify the character encoding of your string. Based off the information on the URLDecoder page:

Note: The World Wide Web Consortium Recommendation states that UTF-8 should be used. Not doing so may introduce incompatibilites.

The following should work for you:

java.net.URLDecoder.decode(url, "UTF-8");

Please see Draemon's answer below.

Community
  • 1
  • 1
Tim Cooper
  • 151,519
  • 37
  • 317
  • 271
  • 4
    -1 this is just plain wrong. The documentation clearly states that this method uses application/x-www-form-urlencoded which is only used for the query string. – Draemon Feb 17 '12 at 00:13
  • -1 see my comments on @Draemon's correct answer below. – Bob Kerns Oct 23 '12 at 17:04
  • 3
    This would be the correct answer, if the question were correct! If you were using the one-arg version of decode() correctly, you should use the two-argument version. – Bob Kerns Oct 23 '12 at 17:06
  • +1 For directing users to the other answer. :) – 700 Software Feb 13 '14 at 16:14
  • 1
    This answer is in fact correct, since the form encoding referenced defers to URL encoding. The *media*-type `application/x-www-form-urlencoded` refers to the encoding used for URL's, and the detailed rules specified by `URLDecoder` make it clear that it's perfectly valid for use in decoding a URL. So it's simpler, and probably faster to use `URLDecoder`. I recommend that you unstrike this answer. – Lawrence Dol Dec 04 '14 at 20:31
7

As the documentation mentions, decode(String) is deprecated because it always uses the platform default encoding, which is often wrong.

Use the two-argument version instead. You will need to specify the encoding used n the escaped parts.

Joachim Sauer
  • 291,719
  • 55
  • 540
  • 600
5

Only the decode(String) method is deprecated. You should use the decode(String, String) method to explicitly set a character encoding for decoding.

Mathias Schwarz
  • 6,939
  • 21
  • 28
2

As noted by previous posters, you should use java.net.URI class to do it:

System.out.println(String.format("Decoded URI: '%s'", new URI(url).getPath()));

What I want to note additionally is that if you have a path fragment of a URI and want to decode it separately, the same approach with one-argument constructor works, but if you try to use four-argument constructor it does not:

String fileName = "Map%20of%20All%20projects.pdf";
URI uri = new URI(null, null, fileName, null);
System.out.println(String.format("Not decoded URI *WTF?!?*: '%s'", uri.getPath()));

This was tested in Oracle JDK 7. The fact that this does not work is counter-intuitive, runs contrary to JavaDocs and it should be probably considered a bug.

It could trip people who are trying to use an approach symmetrical to encoding. As noted for example in this post: "how to encode URL to avoid special characters in java", in order to encode URI, it's a good idea to construct a URI by passing different URI parts separately since different encoding rules apply to different parts:

String fileName2 = "Map of All projects.pdf";
URI uri2 = new URI(null, null, fileName2, null);
System.out.println(String.format("Encoded URI: '%s'", uri2.toASCIIString()));
Community
  • 1
  • 1
Dmitriy Korobskiy
  • 1,329
  • 15
  • 23