0

I'm trying to have a regex describe a single-quote-delimited string. Inside the string, I can have either any printable (or whitespace) character (which is NOT a single quote), OR a series of TWO single quotes, which would be an "escaped" single quote.

The [[:print:]] character class (also written as \p{XPosixPrint}) fits the bill for the characters I want to allow... except that it would ALSO allow a single "single quote" ('). Which I don't want to happen.

So, is there a simple way to do that, like, describing a character to match two expressions at the same time (like [[:print:]] and [^'] ), or do I have to create a custom character class enumerating everything I'm allowing (or forbidding) ?

Kzwix
  • 167
  • 7
  • See [Exclude characters from a character class](https://stackoverflow.com/questions/17327765/exclude-characters-from-a-character-class) for solutions in other languages. – ikegami Oct 19 '21 at 17:42

1 Answers1

4
/(?!')\p{Print}/                     # Worst performance and kinda yuck?
/\p{Print}(?<!')/                    # Better performance but yuckier?
/[^\P{Print}']/                      # Best performance, but hard to parse.[1]
use experimental qw( regex_sets );   # No idea why still experimental.
/(?[ \p{Print} - ['] ])/             # Best performance and clearest.
/[^\p{Cn}\p{Co}\p{Cs}\p{Cc}']/       # Non-general solution.
                                     # Best performance but fragile.[2]

\p{Print} is an alias of \p{XPosixPrint}.


  1.    char that is (printable and not('))
     = char that is (not(not(printable and not('))))
     = char that is (not(not(printable) or not(not('))))
     = char that is (not(not(printable) or '))
     = [^\P{Print}']
    
  2. \p{Print} includes all the characters except unassigned, private use, surrogates and control characters.

    /[^\p{Cn}\p{Co}\p{Cs}\p{Cc}']/
    

    is short for

    /[^\p{General_Category=Unassigned}\p{General_Category=Private_Use}\p{General_Category=Surrogates}\p{General_Category=Control}']/
    

    or

    use experimental qw( regex_sets );   # No idea why still experimental.
    /(?[ !(
         \p{General_Category=Unassigned}
       + \p{General_Category=Private_Use}
       + \p{General_Category=Surrogates}
       + \p{General_Category=Control}
       + [']
    ) ])/
    
ikegami
  • 343,984
  • 15
  • 249
  • 495
  • My Perl didn't like the version with /(?[ \p{Print} - ['] ])/ I have 5.16.3, is that a version problem ? – Kzwix Oct 20 '21 at 13:50
  • If you get the message saying it can't find experimental.pm, it's because you didn't install it. (It comes with Perl since 5.18.0.) If you get the message "*Need perl 5.18.0 or later for feature regex_sets*", well, it's pretty self explanatory. – ikegami Oct 20 '21 at 14:14
  • Note that 5.16 was released in 2012 (and 5.16.3 in 2013). It's not exactly fresh. – ikegami Oct 20 '21 at 14:16
  • I don't *think* I got the message telling me it needed Perl 5.18.0, but I might have missed it. It merely told me it didn't like the syntax in the regex. As for the obsolete side of this version, I fully agree with you, but I can only use the tools I'm allowed to use (countries and big companies are all about "validated" tools). So I went with the "hard to parse" one instead, which works just fine. Thanks again. – Kzwix Oct 21 '21 at 18:13
  • Re "*It merely told me it didn't like the syntax in the regex.*", Then you left out the `use experimental qw( regex_sets );` – ikegami Oct 24 '21 at 06:14