How to tell if a single line of python is syntactically valid?

Question

It is very similar to this:

How to tell if a string contains valid Python code

The only difference being instead of the entire program being given altogether, I am interested in a single line of code at a time.

Formally, we say a line of python is "syntactically valid" if there exists any syntactically valid python program that uses that particular line.

For instance, I would like to identify these as syntactically valid lines:

for i in range(10):

x = 1

Because one can use these lines in some syntactically valid python programs.

I would like to identify these lines as syntactically invalid lines:

for j in range(10 in range(10(

x =++-+ 1+-

Because no syntactically correct python programs could ever use these lines

The check does not need to be too strict, it just need to be good enough to filter out obviously bogus statements (like the ones shown above). The line is given as a string, of course.

FYI, `x =+ 1` is syntactically valid. It assigns `+1` to `x`. — John Kugelman, May 03 '16 at 19:39
What about implicit line concatenation (which would make `for j in range(10` also possibly syntatically valid) — mgilson, May 03 '16 at 19:39
`for j in range(10` is also valid if the next line continues with something like `):`, and `if x < 3` could be part of a multi-line expression as well. Almost anything could be part of a multi-line string, too. — user2357112, May 03 '16 at 19:40
I think the question that you need to answer is *why* you need/want to do this — OneCricketeer, May 03 '16 at 19:41
i see. I'll make the incorrect statements more outrageous. @cricket_007 I'm training a neural network to generate statements — Evan Pu, May 03 '16 at 19:43
The `for` is still syntactically valid. The assignment isn't quite valid any more unless, say, it's part of a triple-quoted string or a line-continued comment. I don't think you quite understand what you're trying to do. — user2357112, May 03 '16 at 19:47
The accepted answer in the SO question you linked should work, no? — RobertR, May 03 '16 at 19:49
@RobertR it won't quite work because something like "for x in range(10):" should be valid, but on the linked question it returns False, because that particular statement alone does not parse to an AST as it's still missing some pieces — Evan Pu, May 03 '16 at 19:50
@user2357112 Those syntactically invalid lines he provided, whether they truly are syntactically invalid or not, have nothing to do with the actual question. — RobertR, May 03 '16 at 19:56
@RobertR: On the contrary, it's important to resolve this, because appropriate solutions to the problem will vary depending on whether the questioner *wants* to exclude these lines, or whether he only wants to exclude lines that are actually syntactically invalid under his definition, or whether he wants to take an entirely different approach to training his neural network, or whatever else the resolution ends up being. — user2357112, May 03 '16 at 20:01
@user2357112 just choose either interpretatio, I don't really care. I don't care about completion (the way you put it) in particular — Evan Pu, May 03 '16 at 20:08
A formal description of the full grammar is [available](https://docs.python.org/3/reference/grammar.html) on the official Python site. — Jongware, May 03 '16 at 20:17
I think if your goal is to write an ML model, you should limit your input to basic, well-formed Python. Or use entire programs. — erip, May 03 '16 at 20:20
The question could stand some improvement in precise wording, since even the example invalid lines which "correct python programs could never use" are valid if the lines before and after, for example, were """ (triple quotes). Given the nature of the question, I don't think this distinction is mere pedantry either. — Peter Hansen, May 06 '16 at 21:00
@PeterHansen I would love to, can you word it for me? I tried to edit it but I don't quite know how to properly phrase it. In my world, triple quotes don't exist, and continued expressions such as x + y +\n z don't exist — Evan Pu, May 11 '16 at 05:13
@EvanPu I'm not sure how to improve it without changing the whole approach. Few languages are really "line-based", but your model seems to be built on generating single line statements. I tried an approach like that years ago and decided, were I to continue with it, that I'd switch to generating parts of an AST directly. Alternatively, you might just clarify that your definition of "line" isn't strict and includes certain types of multi-line statement, such as triple-quoted strings or those where all but the last end with a trailing backslash. — Peter Hansen, May 11 '16 at 12:54

Alyssa Haroldsen · Accepted Answer · 2016-05-05T20:50:55.897

This uses codeop.compile_command to attempt to compile the code. This is the same logic that the code module does to determine whether to ask for another line or immediately fail with a syntax error.

import codeop
def is_valid_code(line):
    try:
        codeop.compile_command(line)
    except SyntaxError:
        return False
    else:
        return True

It can be used as follows:

>>> is_valid_code('for i in range(10):')
True
>>> is_valid_code('')
True
>>> is_valid_code('x = 1')
True
>>> is_valid_code('for j in range(10 in range(10(')
True
>>> is_valid_code('x = ++-+ 1+-')
False

I'm sure at this point, you're saying "what gives? for j in range(10 in range(10( was supposed to be invalid!" The problem with this line is that 10() is technically syntactically valid, at least according to the Python interpreter. In the REPL, you get this:

>>> 10()
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    10()
TypeError: 'int' object is not callable

Notice how this is a TypeError, not a SyntaxError. ast.parse says it is valid as well, and just treats it as a call with the function being an ast.Num.

These kinds of things can't easily be caught until they actually run. If some kind of monster managed to modify the value of the cached 10 value (which would technically be possible), you might be able to do 10(). It's still allowed by the syntax.

What about the unbalanced parentheses? This fits the same bill as for i in range(10):. This line is invalid on its own, but may be the first line in a multi-line expression. For example, see the following:

>>> is_valid_code('if x ==')
False
>>> is_valid_code('if (x ==')
True

The second line is True because the expression could continue like this:

if (x ==
    3):
    print('x is 3!')

and the expression would be complete. In fact, codeop.compile_command distinguishes between these different situations by returning a code object if it's a valid self-contained line, None if the line is expected to continue for a full expression, and throwing a SyntaxError on an invalid line.

However, you can also get into a much more complicated problem than initially stated. For example, consider the line ). If it's the start of the module, or the previous line is {, then it's invalid. However, if the previous line is (1,2,, it's completely valid.

The solution given here will work if you only work forward, and append previous lines as context, which is what the code module does for an interactive session. Creating something that can always accurately identify whether a single line could possibly exist in a Python file without considering surrounding lines is going to be extremely difficult, as the Python grammar interacts with newlines in non-trivial ways. This answer responds with whether a given line could be at the beginning of a module and continue on to the next line without failing.

It would be better to identify what the purpose of recognizing single lines is and solve that problem in a different way than trying to solve this for every case.

I agree with your logic. But shouldn't the "for j in range(10 in range(10(" be invalid because it has unmatched parenthesis (a syntax only error)? — Evan Pu, May 03 '16 at 20:56
No, since that line may continue onto the next line. I'll explain in the answer. — Alyssa Haroldsen, May 03 '16 at 20:57
sounds reasonable enough! Seems more safer than adding the "pass" and trying to compile to an AST (the other answer). I will use this — Evan Pu, May 03 '16 at 21:05
@EvanPu Have fun! For examples of integer evil, see [this](http://codegolf.stackexchange.com/a/28851) beautiful code golf answer. — Alyssa Haroldsen, May 03 '16 at 21:26
I guess A solution would be to pre-pend it with something trivial: codeop.compile_command("if x == x:\n pass\nelse:") — Evan Pu, May 03 '16 at 21:39
@EvanPu More important would be to know the reasoning for why this needs to be done, and try solving that problem instead. — Alyssa Haroldsen, May 03 '16 at 21:46
if you google "sk_p" I believe you'll find my paper on this ! — Evan Pu, Jun 29 '19 at 17:39

Neo · Answer 2 · 2016-05-03T20:26:06.083

-1

I am just suggesting, not sure if going to work... But maybe something with exec and try-except?

code_line += "\n" + ("\t" if code_line[-1] == ":" else "") + "pass"
try:
    exec code_line
except SyntaxError:
    print "Oops! Wrong syntax..."
except:
    print "Syntax all right"
else:
    print "Syntax all right"

Simple lines should cause an appropriate answer

edited May 03 '16 at 20:26

answered May 03 '16 at 20:21

Neo

3,164
2
17
32

I was just about to suggest the `+= "pass"` approach. You might want to `.rstrip` the line though. Also, you don't need the new line and the indentation. – Jared Goguen May 03 '16 at 20:24
3

Executing lines is opening Pandora's box. Let's see if `while True:` is syntactically valid. How about `import os; os.system('rm -rf /')`? – John Kugelman May 03 '16 at 20:27
@JohnKugelman Right, but this is going to need to be sand-boxed anyways. If OP is randomly generating programs, some of them may not halt, and some of them affect the environment. – Jared Goguen May 03 '16 at 20:29
@JohnKugelman You're right... But I can't think of any way to do it without simply making python interpreter... Does someone know of a way to execute python code without really executing it? Stupid question I know but can help much with this – Neo May 03 '16 at 20:32
this is similar to what my labmate and I discussed just now. I will try this approach and report back in an hour. thanks! – Evan Pu May 03 '16 at 20:36
@Neo This is what [`compile`](https://docs.python.org/3/library/functions.html#compile) is designed to do. – Alyssa Haroldsen May 03 '16 at 21:12

How to tell if a single line of python is syntactically valid?

2 Answers2