2

To compare two Strings, ignoring case, it looks like I first need to convert to a lower case version of the string:

let a_lower = a.to_lowercase();
let b_lower = b.to_lowercase();
a_lower.cmp(&b_lower)

Is there a method that compares strings, ignoring case, without creating the temporary lower case strings, i.e. that iterates over the characters, performs the to-lowercase conversion there and compares the result?

Shepmaster
  • 326,504
  • 69
  • 892
  • 1,159
Thomas S.
  • 4,924
  • 3
  • 28
  • 64

2 Answers2

4

There is no built-in method, but you can write one to do exactly as you described, assuming you only care about ASCII input.

use itertools::{EitherOrBoth::*, Itertools as _}; // 0.9.0
use std::cmp::Ordering;

fn cmp_ignore_case_ascii(a: &str, b: &str) -> Ordering {
    a.bytes()
        .zip_longest(b.bytes())
        .map(|ab| match ab {
            Left(_) => Ordering::Greater,
            Right(_) => Ordering::Less,
            Both(a, b) => a.to_ascii_lowercase().cmp(&b.to_ascii_lowercase()),
        })
        .find(|&ordering| ordering != Ordering::Equal)
        .unwrap_or(Ordering::Equal)
}

As some comments below have pointed out, case-insensitive comparison is not going to work properly for UTF-8, without operating on the whole string, and even then there are multiple representations of some case conversions, which could give unexpected results.

With those caveats, the following will work for a lot of extra cases compared with the ASCII version above (e.g. most accented Latin characters) and may be satisfactory, depending on your requirements:

fn cmp_ignore_case_utf8(a: &str, b: &str) -> Ordering {
    a.chars()
        .flat_map(char::to_lowercase)
        .zip_longest(b.chars().flat_map(char::to_lowercase))
        .map(|ab| match ab {
            Left(_) => Ordering::Greater,
            Right(_) => Ordering::Less,
            Both(a, b) => a.cmp(&b),
        })
        .find(|&ordering| ordering != Ordering::Equal)
        .unwrap_or(Ordering::Equal)
}
Peter Hall
  • 43,946
  • 11
  • 101
  • 168
  • 3
    Any method that uses `str::chars` will not compare unicode strings properly. – mcarton Sep 13 '20 at 15:40
  • To complement what I believe mcarton is talking about: `str::chars` iterates codepoints, but because of *precomposition* it's possible to have strings which are *canonically equivalent* but have different contents at a technical level. `chars` will not take that information in account. An other large issue is that case conversion is locale-dependent e.g. the lowercase of `I` is `i`… unless you're in turkic where it's ı. I'm sure there are other pitfalls there. – Masklinn Sep 13 '20 at 16:07
  • I'm not trying to compare for equality, but for . BTW, is `string.to_lowercase()` considering the locale, so that one char could become multiple chars? – Thomas S. Sep 13 '20 at 17:33
  • @ThomasS. please read [Why is capitalizing the first letter of a string so convoluted in Rust?](https://stackoverflow.com/q/38406793/155423) – Shepmaster Sep 14 '20 at 11:57
  • 1
    "assuming you only care about ASCII input" is awfully bad practice these days. – Jack Aidley Sep 15 '20 at 09:33
1

If you are only working with ASCII, you can use eq_ignore_ascii_case:

assert!("Ferris".eq_ignore_ascii_case("FERRIS"));
Ibraheem Ahmed
  • 8,130
  • 1
  • 27
  • 37
  • If it is about strings, e.g. from user input or file names, this does not help. How often strings are only within the application and fully under programmer's control? – Thomas S. Jan 31 '21 at 11:56
  • @ThomasS. True, but it is still a valid answer and may be useful to some people. – Ibraheem Ahmed Jan 31 '21 at 15:37