How should goto into an inner (or “sibling”) scope be implemented in a compiler?

Question

(Note: This is something of a companion to my question about closures over goto labels. I believe that the answer to this question will be useful in solving the other problem.)

In C, it is perfectly legal for a goto statement to transcend scopes, as demonstrated in this question:

int cond(void);
void use(int);

void foo() 
{
    {
        int y;
        label:
        y = 2;
        use(y);
    }

    {
        int z = 3;
        use(z);

        /* jump to sibling scope: */ if(cond()) goto label;
    }

    /* jump to inner scope: */ if(cond()) goto label;
}

In contrast, code like that is not permitted in Algol 60 or Algol 68, and probably in other languages. (Algol 60 allows jumps into compound statements, but not into blocks.)

How should such jumps be implemented? In my language, entry into a block creates a new stack frame; it is convenient to treat block activation and procedure calls identically, since blocks also return a value, like closed-clauses in Algol 68 or Common Lisp's progn. Thus a transfer into a block requires creating frames for any levels between the current one and the target of the transfer. On the other hand, a transfer to an outer scope is simple, requiring only that the stack be unwound to a certain level. (And a transfer within the same scope is, of course, trivial.)

Suppose that we are generating byte code for a traditional stack-based virtual machine.

Here is an example:

begin
  integer x, p;
  p := 10;
  x := 20;
  go to label;
  begin
    integer y;
    y := begin
           integer q;
           label:
             q := x + 2;
             p + q;
         end;
  end;
end;

This should assign p + q (i.e., 32) to x.

I don't think inner scopes should count as separate stack frames. As a rule of thumb, you should only create a separate frame if the scope can be extracted as a separate function without side effects. I can't think of any approach where an unconditional jump exiting the stack frame is a good idea. — Locke, Jun 02 '21 at 00:38
@Locke I guess I just like the elegance of treating procedure calls and blocks the same. I got the idea from [Randell and Russell's book about Algol 60 implementation](http://www.softwarepreservation.org/projects/ALGOL/book/Randell_ALGOL_60_Implementation_1964.pdf). — texdr.aft, Jun 02 '21 at 00:43
It isn't a bad idea in general. The bigger issue is that this assumes labels are visible outside their given scopes. Having leaky inner scopes means we can't properly isolate them. If you remove this feature, it would resolve the issue. — Locke, Jun 02 '21 at 00:50
@Locke Would you consider the labels in the C code to be visible outside their scope? (On the last point, you are completely correct.) Also, unconditional jumps exiting stack frames (into outer scopes) are useful in Common Lisp when combined with closures, to implement the condition system. — texdr.aft, Jun 02 '21 at 01:00
C allows these jumps precisely because a C block doesn't have to be a stack frame, and few compilers treat them as such. By and large, the stack frame is big enough to hold the deepest enclosed block, so jumps around the function's code present no problems. There is one glaring exception: if a variable length array is allocated in a block, then an interior stack frame is necessary. In this case, C does not permit jumps into the scope of the VLA, precisely because jumping into a stack frame is tricky. — rici, Jun 02 '21 at 01:03
Obviously that comment was C-specific. But so is the premise to the question. — rici, Jun 02 '21 at 01:05
@rici Good points all around. I'm leaning towards forbidding them, especially since variable-length arrays are commonplace in Algol, although I still wonder if there's a decent way to support the jumps. — texdr.aft, Jun 02 '21 at 01:08
I don't think it's semantically sound. A block can be inside a loop (or can be a loop), and if your language supports closures, each iteration has its own bound variables. What can it possibly mean to jump into the scope of a bound variable? For that matter, what does it mean to jump backwards to before the variable was bound? A language has to have precise definitions of the semantics of constructs it allows. There are times when the only reasonable resolution is to disallow the construct. (For the second question, it's entirely reasonable IMHO to treat the variable initialization as... — rici, Jun 02 '21 at 01:21
... a slightly sugared `let = in `, in which case there is an implicit block starting with the variable declaration. That resolves the semantics of looping jumps. But jumping in from outside still seems problematic to me. — rici, Jun 02 '21 at 01:22
(I would also say that the C standard treats declaration of a VLA to be a sugared "let": "For such an object that does have a variable length array type, its lifetime extends from the **declaration of the object** until execution of the program leaves the scope of the declaration". (emph added). For any other objects (previous subclause), the lifetime is from block entry to block termination. 6.2.4 paragraphs 6 and 7 in case you want to look it up.) Also, FWIW, the scope of a label in C is the entire function in which it appears, not just the block. So the leak is standard, so to speak. — rici, Jun 02 '21 at 01:28
@rici Previously I considered leaving out inner goto statements to be a "needless restriction", but now you've shown me that that isn't quite true. I won't allow them; labels and goto's will be constrained as with Common Lisp's `tagbody`. — texdr.aft, Jun 02 '21 at 01:42

How should goto into an inner (or “sibling”) scope be implemented in a compiler?

0 Answers0