On https://www.emacswiki.org/emacs/MultilineRegexp one finds the hint to use
[\0-\377[:nonascii:]]*\n
instead of the standard
.*\n
to match any character up to a newline to avoid stack overflow for huge texts (37 KB). Is the overflow the concern here, or is a matching run for the former also more performant than the latter?
[\0-\377[:nonascii:]]*would do so less then\\(.\\|\n\\)*. So I think the emacswiki is wrong on this one. – Stefan Nov 21 '16 at 15:46|might need more backtracking, but whether it actually does depends on how it's compiled. – npostavs Nov 21 '16 at 20:05\\(.\\|\n\\)*and never even thought about[\0-\377[:nonascii:]]*. It's good to know about the latter, but it's even better to know that it doesn't add anything (so I'll stick to the one that is easier for me to read). – Drew Nov 21 '16 at 21:42(re-search-forward "\\(.\\|\n\\)*")on a large buffer gives "Stack overflow in regexp matcher", while(re-search-forward "[\0-\377[:nonascii:]]*")does not. It seems emacswiki was right. – npostavs Nov 23 '16 at 18:24[\0-\377[:nonascii:]]*(which is rather unusual, since you might as well usepoint-maxrather than search for it via such a regexp) (for the curious: the crux of the matter is whether the set of chars that can match after the * is disjoint from the set of char that can match within the . If it is disjoint, then the regexp engine will skip recording intermediate steps, and hence avoid eating up stack space. So `.\nand[^a]adon't consume the stack, whereas.a` does). – Stefan Nov 23 '16 at 18:30