22

Is there a way to instruct Vim that I want to ignore diacritics/accents when searching? For example, I would like to be able to search for

kočička

by entering

/kocicka

The ignorecase and smartcase options are very useful, but they do not seem to have anything to do with diacritics/accents.

s3rvac
  • 549
  • 3
  • 10

2 Answers2

19

As @muru mentioned in the comment, you could use an equivalence class (described in :help /[[) which seems to be a character class expression evaluated as a set of similar characters (i.e. are the same once you remove any accent/diacritic).

For example, to look for kočička and kocicka with the same pattern, you could use this:

ko[[=c=]]i[[=c=]]ka

where [[=c=]] is the equivalence class for the c character.


To automatically insert this character class whenever you hit c while performing a search, you could use this mapping:

cnoremap <expr> c getcmdtype() =~ '[?/]' ? '[[=c=]]' : 'c'

which can be broken down like this:

  • <expr> type the evaluation of an expression
  • getcmdtype() =~ '[?/]' test whether you're writing a backward or forward search
  • '[[=c=]]' return the equivalence class for the c character if the previous test succeeded
  • 'c' return the c character otherwise

The previous mapping has 2 drawbacks:

  1. it only covers the c character
  2. it can make the pattern difficult to read

It could be improved by remapping <CR> like this:

cnoremap <CR> <C-\>e getcmdtype() =~ '[?/]' ? substitute(getcmdline(), '\a', '[[=\0=]]', 'g'): getcmdline()<CR><CR>

When you hit <CR> after writing a pattern for a search, the mapping will automatically replace all the alphabetic characters by their equivalence class counterpart.


The mapping for <CR> is similar to the previous mapping for c, except it doesn't use the argument <expr> but the system mapping <C-\>e.
<expr> allows you to insert the evaluation of an expression, while <C-\>e allows you to replace the whole command line with the evaluation of an expression.

user9433424
  • 6,138
  • 2
  • 21
  • 30
  • 1
    Further, if you would like to go in the reverse direction, e.g., /kočička matches kocicka, then you can use '[[:lower:][:upper:]]' instead of '\a'. The alternatives '[:alpha:]' and '\I' don't seem to work with multi-byte characters; however, '[^[:punct:]]' seems to work (though I'm less sure), and I would guess building your own equivalence class (e.g., '[А-яЁё]') as well. – kevinlawler Jan 03 '18 at 21:04
  • 2
    I wish there was a setting for that. While using [[=c=]] works but mistype means you need to click backspace 7 times. As well readability suffers. – daliusd Feb 07 '19 at 11:17
1

I have found you can use ranges with accented letters just as you do for standard letters. This works well enough for spanish. E.g. You can use /[á-ú] just like /[a-z]. You can even combine them: /[a-zá-ú] to include both standard letters and accented vowels.

If you want to know what you are exactly including in your search with /[á-ú] range, I suggest to open a new buffer in Vim and enter the following:

" Put the first 1000 ascii charachters in the current line
:for i in range(1,1000) | call setline('.', getline('.') . nr2char(i)) | endfor

" Test the search expression /[á-ú]

Inspiration for the code above: https://vi.stackexchange.com/a/12450/12510

jabellcu
  • 111
  • 4