3

I'm writing a project that gives advice about variable names, and I want it to tell if a name matches any of the reserved classes of identifiers. The first one ("private") is pretty straightforward, just name.startswith('_'), but dunder and class-private names are more complicated. Is there any built-in function that can tell me? If not, what are the internal rules Python uses?

For dunder, checking name.startswith('__') and name.endswith('__') doesn't work because that would match '__' for example. Maybe a regex like ^__\w+__$ would work?

For class-private, name.startswith('__') doesn't work because dunder names aren't mangled, nor are names with just underscores like '___'. So it seems like I'd have to check if the name starts with two underscores, doesn't end with two underscores, and contains at least one non-underscore character. Is that right? In code:

name.startswith('__') and not name.endswith('__') and any(c != '_' for c in name)

I'm mostly concerned about the edge cases, so I want to make sure I get the rules 100% correct. I read What is the meaning of single and double underscore before an object name? but there's not enough detail.

wjandrea
  • 23,210
  • 7
  • 49
  • 68

1 Answers1

5

Here are the internal rules. Note that I can barely read C, so take this with a grain of salt.

Dunder

Based on is_dunder_name in Objects/typeobject.c (using str.isascii from Python 3.7):

len(name) > 4 and name.isascii() and name.startswith('__') and name.endswith('__')

Alternatively, that regex ^__\w+__$ would work, but it would need re.ASCII enabled to make sure \w only matches ASCII characters.

Class-private

Based on _Py_Mangle in Python/compile.c:

name.startswith('__') and not name.endswith('__') and not '.' in name

Although strictly speaking, a name with a dot is an "attribute reference", not a name, so you could remove that check:

name.startswith('__') and not name.endswith('__')

And this matches the rules in the documentation under Identifiers (Names).

(Sidenote: I hadn't realized, but not name.endswith('__') ensures that the name contains at least one non-underscore.)

wjandrea
  • 23,210
  • 7
  • 49
  • 68
  • 1
    The mangling rules are [documented](https://docs.python.org/3/reference/expressions.html#atom-identifiers), so you don't really have to parse C code ;) – zvone Jul 12 '20 at 21:05
  • @zvone Huh, that doesn't mention the "no dots" requirement, so I guess it's using a stricter definition of "identifier"... which makes sense, really. Something of the form `identifier "." identifier` is an "attribute reference", not an identifier itself. – wjandrea Jul 12 '20 at 21:20