Ignoring diacritics/accents when searching

Question

Is there a way to instruct Vim that I want to ignore diacritics/accents when searching? For example, I would like to be able to search for

kočička

by entering

/kocicka

The ignorecase and smartcase options are very useful, but they do not seem to have anything to do with diacritics/accents.

Related feature request https://github.com/vim/vim/issues/8026 — Michal Čizmazia, Feb 16 '22 at 17:27

user9433424 · Accepted Answer · 2016-04-18T12:16:13.947

As @muru mentioned in the comment, you could use an equivalence class (described in :help /[[) which seems to be a character class expression evaluated as a set of similar characters (i.e. are the same once you remove any accent/diacritic).

For example, to look for kočička and kocicka with the same pattern, you could use this:

ko[[=c=]]i[[=c=]]ka

where [[=c=]] is the equivalence class for the c character.

To automatically insert this character class whenever you hit c while performing a search, you could use this mapping:

cnoremap <expr> c getcmdtype() =~ '[?/]' ? '[[=c=]]' : 'c'

which can be broken down like this:

<expr> type the evaluation of an expression
getcmdtype() =~ '[?/]' test whether you're writing a backward or forward search
'[[=c=]]' return the equivalence class for the c character if the previous test succeeded
'c' return the c character otherwise

The previous mapping has 2 drawbacks:

it only covers the c character
it can make the pattern difficult to read

It could be improved by remapping <CR> like this:

cnoremap <CR> <C-\>e getcmdtype() =~ '[?/]' ? substitute(getcmdline(), '\a', '[[=\0=]]', 'g'): getcmdline()<CR><CR>

When you hit <CR> after writing a pattern for a search, the mapping will automatically replace all the alphabetic characters by their equivalence class counterpart.

The mapping for <CR> is similar to the previous mapping for c, except it doesn't use the argument <expr> but the system mapping <C-\>e.
<expr> allows you to insert the evaluation of an expression, while <C-\>e allows you to replace the whole command line with the evaluation of an expression.

Further, if you would like to go in the reverse direction, e.g., /kočička matches kocicka, then you can use '[[:lower:][:upper:]]' instead of '\a'. The alternatives '[:alpha:]' and '\I' don't seem to work with multi-byte characters; however, '[^[:punct:]]' seems to work (though I'm less sure), and I would guess building your own equivalence class (e.g., '[А-яЁё]') as well. — kevinlawler, Jan 03 '18 at 21:04
I wish there was a setting for that. While using [[=c=]] works but mistype means you need to click backspace 7 times. As well readability suffers. — daliusd, Feb 07 '19 at 11:17

score 1 · Answer 2 · answered Sep 01 '23 at 08:41

I have found you can use ranges with accented letters just as you do for standard letters. This works well enough for spanish. E.g. You can use /[á-ú] just like /[a-z]. You can even combine them: /[a-zá-ú] to include both standard letters and accented vowels.

If you want to know what you are exactly including in your search with /[á-ú] range, I suggest to open a new buffer in Vim and enter the following:

" Put the first 1000 ascii charachters in the current line
:for i in range(1,1000) | call setline('.', getline('.') . nr2char(i)) | endfor
" Test the search expression
/[á-ú]

Inspiration for the code above: https://vi.stackexchange.com/a/12450/12510

Ignoring diacritics/accents when searching

2 Answers2