3

Okay, maybe something wrong with unicode or etc, but the code tells everything:

$ cat leo
сказывать
ссказываю
сказав
BladeMight@Chandere ~ 23:24:58
$ cat leo | perl -pe 's/^с+каз/Рассказ/g'
Рассказывать
ссказываю
Рассказав
BladeMight@Chandere ~ 23:25:00
$ cat leo | sed -r 's/^с+каз/Рассказ/g'
Рассказывать
Рассказываю
Рассказав

I have file leo, contents in cyrillic, so i wanted to replace wrong places with the regex ^с+каз in perl -pe, but it replaces only the ones that have only 1 с(cyrillic one), e.g. + does nothing in this case(and for non-cyrillic it works fine), although in sed -r it works perfectly. Why could that be?

BladeMight
  • 2,292
  • 2
  • 20
  • 33

1 Answers1

4

Perl needs to be told that your source code is UTF-8 (-Mutf8) and that it should treat stdin and stdout as UTF-8 (-CS).

$ cat leo | perl -Mutf8 -CS -pe 's/^с+каз/Рассказ/g'
Рассказывать
Рассказываю
Рассказав
hobbs
  • 206,796
  • 16
  • 199
  • 282
  • 1
    NOTE: `use utf8` is necessary only if inside of the code used `utf8` encoding (for example search pattern in this particular case). An options `-CS` is required practically anytime when `utf8` input/output takes place. – Polar Bear Nov 30 '19 at 00:11