18

Sorting a string with number is done differently from one language to another. For example, in English digits come before letters in an ascending sorting. But, in German, digits are ascendant sorted after letters.

I tried to sort strings using a Collator as follows:

private Collator collator = Collator.getInstance(Locale.GERMANY);
collator.compare(str1, str2)

But above comparison does not take into account digits after letters rule.

Does any one have an idea why Java is not taking this rule (digits after letter) into account for the time being I am using RuleBasedCollator as follows:

private final String sortOrder = "< a, A < b, B < c, C < d, D < e, E < f, F < g, G < h, H < i, I < j, J < k, K < l, L < m, M < n, N < o, O < p, P < q, Q < r, R < s, S < t, T < u, U < v, V < w, W < x, X < y, Y < z, Z < 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9";

private Collator collator = new RuleBasedCollator(sortOrder);
bluish
  • 24,718
  • 26
  • 114
  • 174
Amir
  • 201
  • 2
  • 5
  • 6
    Is it deliberate that you don't have Umlauts and the Sharp-s (äöüß) in your sort order? I'd say they are important for having a German collator. – Joachim Sauer Oct 08 '12 at 09:30
  • yes, for the test case i have omitted umlauts and special characters. just wanted to keep it very simple. – Amir Oct 08 '12 at 09:33
  • 4
    Also: which rules do you follow that sort digits after the other characters? There are several different collations for German and at least some of those sort numbers first. – Joachim Sauer Oct 08 '12 at 09:35
  • i have just tried Locale.GERMANY collation, can you point me to a collation which sorts digits after alphabets? – Amir Oct 08 '12 at 09:38
  • If you are using Java 7, you can set a variant on your `Locale` which can be a BCP 47 extension (cf. http://docs.oracle.com/javase/tutorial/i18n/locale/create.html, and for BCP 47 http://docs.oracle.com/javase/tutorial/i18n/locale/extensions.html). AFAIK, there's a reorder setting for collation, but I've never actually worked with this. – s.d Oct 10 '12 at 10:52
  • 4
    What is your source for "*But, in German, digits are ascendant sorted after letters."*? – assylias Nov 27 '12 at 11:43

1 Answers1

14

You can check/debug the source code to see why nothing changes:

Collator.getInstance(Locale.GERMANY);

Calls the following piece code:

public static synchronized
Collator getInstance(Locale desiredLocale)
{
    // Snipping some code here
    String colString = "";
    try {
        ResourceBundle resource = LocaleData.getCollationData(desiredLocale);

        colString = resource.getString("Rule");
    } catch (MissingResourceException e) {
        // Use default values
    }
    try
    {
        result = new RuleBasedCollator( CollationRules.DEFAULTRULES +
                                        colString,
                                        CANONICAL_DECOMPOSITION );
    }
// Snipping some more code here

Over here you can see that the specific rules (colString which is empty in your case anyway) are placed after the defaults (CollationRules.DEFAULTRULES).

And as you have discovered that defaults have the numerics placed first:

  // NUMERICS

    + "<0<1<2<3<4<5<6<7<8<9"
    + "<\u00bc<\u00bd<\u00be"   // 1/4,1/2,3/4 fractions

    // NON-IGNORABLES
    + "<a,A"
    + "<b,B"
    + "<c,C"
    + "<d,D"
Jasper
  • 2,136
  • 3
  • 32
  • 49