34

When the XML file was convert to ASCII. It is different values for user at the three characters of utf and UTF.

<?xml version="1.0" encoding="utf-8"?>


<?xml version="1.0" encoding="UTF-8"?>

I tried to create a new xml file with vs2005. utf-8 form file generated by default.

which one is a more standard definition? thanks.

Nano HE
  • 8,789
  • 29
  • 95
  • 137
  • 3
    Since lowercase letters are more common, `utf-8` will probably take up very slightly less space when compressed. – Zaz Nov 18 '14 at 15:06
  • @Zaz Yes, lowercase compresses better https://encode.ru/threads/1889-gzthermal-pseudo-thermal-view-of-Gzip-Deflate-compression-efficiency – Volker E. Oct 15 '17 at 02:32

5 Answers5

42

The IANA character set registry says:

no distinction is made between use of upper and lower case letters.

But that page, the XML specification, and unicode.org are consistent about capitalizing UTF-8.

dan04
  • 82,709
  • 22
  • 159
  • 189
  • @dan04. I would like to mark your reply as the answer. Thanks for the useful links. @All, Because I need convert the whole xml file to ASCII format and compare the ASCII body .... That's why I care the **upper and lower case letters.**. thank you all. – Nano HE Jul 15 '10 at 02:35
  • 2
    additionally, Googling `charset utf-8 uppercase|lowercase bug|solved` turns up quite a number of bug rapports that were solved/circumvented by using uppercase `UTF-8` while I found no rapports (within one evening of googling this subject) where a problem could be solved changing uppercase to lowercase. Afflicted software included Apache xerces (MacOS X), jsp, jetty (breaking AWS S3 signatures, see: https://github.com/golang/go/issues/19430) and numerous others. Based on this on could make a argument that uppercase UTF-8 charset enjoys better compatibility (especially with legacy tools). – GitaarLAB Jan 26 '18 at 07:31
  • I confirm UTF-8 (uppercase). I get bad encoded results with lowercase characters when using it in MVC CORE 3.1... – Miroslav Siska Nov 16 '20 at 16:36
16

From the XML specification:

"XML processors SHOULD match character encoding names in a case-insensitive way"

This indicates that you can use upper case or lower case or even mixed case if you wish. However, the specification uses "UTF-8" in all its examples so for consistency I'd go with that.

Artelius
  • 46,971
  • 12
  • 87
  • 103
11

For those interested in the gory details - including links to some of the related standards and precedents - I blogged a couple of years ago about Case-Sensitivity of UTF-8 in XML Declarations.

codingoutloud
  • 2,045
  • 19
  • 21
6

In my experience (which is primarily with .NET), character set identifiers are treated as case-insensitive, so UTF-8 and utf-8, as well as Utf-8 or any other variation thereof, always mean the same thing. This would also be the case for other character sets, such as ISO-8859-1 (Latin 1), etc. The casing should not matter, as case is not a meaninful factor in such an identifier.

I do extensive work with web services across multiple platforms, and I have never really seen a "standard" form used. I've seen every variation of a variety of character sets...often different variations from a single business partner.

jrista
  • 31,580
  • 14
  • 88
  • 128
5

Upper-case is the de-facto standard. It should still work with any combination of case, however.

rspeed
  • 1,582
  • 15
  • 21