To Yield or Not to Yield: An Inferable Question

Posted by -

Blocks play a critical role in idiomatic Ruby code. They're a crucial part of any successful Ruby API, and the standard library shows it: from Files, and Enumerable's handy methods, to even the bytecode implementation of the def and class constructs, blocks are everywhere. Like other formal arguments, whether a method takes a block is a fundamental part of a method's API.

Unlike formal arguments, our documentation tools don't infer block argument automatically. Unlike formal arguments, when a block is required but not provided, the method will partially execute before raising an exception. Even worse, when a block is unnecessary but is provided, Ruby doesn't even notice.

The Question

The question we wish to answer is: can we statically infer how a method uses block arguments? This question has several prongs:

  1. When provided a block, will the block be executed before the method ends? 2a. If yes, how many times? (A finite union of possibly-infinite ranges is a sufficient answer) 2b. If yes, with what arities? (Again, a finite union of possibly-infinite ranges describes this well)
  2. When not provided a block, will the method attempt (and fail) to call it?
  3. Will the block be stored somewhere, so it can be called after the method ends? (formally, will the block argument escape the call)

It's important to note that these questions are relevant to any function argument passed in any language that supports it; Ruby simply brings these questions to the fore by making block use unavoidable (and awesome).

Refresher: Block Use

As a super-quick reminder, blocks are consumed by methods in one of two ways: as an explicit argument, and with yield (or both, but that's just weird. In fact, I think Laser should issue a style-level warning for that...)

If you choose yield, then you must use block_given? to check if there is a block argument:

  def foo(a, b)
    if a > b && block_given?
      yield a
    end
  end

If you use an explicit block argument, you can also just check if it is nil before calling it, as well as block_given?:

  def bar(a, b, &block)
    yield a if block
    yield b if block_given?
  end

Neither of the two methods above will raise a LocalJumpError if called without a block.

Explicit block arguments provide an important functionality, widely used in friendly Ruby APIs such as Sinatra's path handlers and Rails' filters: you can save the block, and run it later:

  class C
    def initialize
      @paths = {}
    end
    def get(path, &blk)
      @paths[path] = blk
    end
    def handle_request(path)
      @paths[path].call  # URLs are always right
    end
  end

Having the block as an argument introduces two related issues that make analysis tougher: aliasing, and escaping. It also introduces the possibility that the block is passed onto another method, but that's less of a concern: if we can analyze one call with a block, we can handle more.

For now though, we turn our attention to the easier case: yield.

Inferring yield Use

Let's see if we can answer the above 4 questions when a method does not have an explicit block argument. Since the block argument cannot be accessed as a Proc object via a standard Ruby interface (to the best of my knowledge), we know the answer to the question "Will the block be stored somewhere, so it can be called after the method ends?" is no. (That was a tiny lie. I'll address Kernel#binding later.)

We can break the "will the method call the block" questions into putting the method in one of four classes: block-required, block-optional, block-ignored, and block-foolish:

Notice the use of "may yield" in the diagram. There will always be cases where one cannot statically infer whether a yield will occur. A contrived example:

  def foo(x)
    yield x if gets.size > 10
  end

It would be tough to prove that branch will never be taken, so foo is a block-required method: if you want no exceptions, you had better provide a block.

It shouldn't be hard to come up with an idea of where we're going at this point: we want to identify all yields guarded by block_given?, and all yields not guarded by block_given?, and see which sets are empty or non-empty. If we saw this method:

  def foo_2(x)
    yield x if gets.size > 10 && block_given?
  end

we know that foo_2 will never yield if there's no block to yield to. This approach is good enough, right? Yes? Okay, consider:

  def foo_hell(x)
    a = block_given?
    x.each do |b|
      c = a
      a = yield(b) if c
      p(a)
      c = a unless c
      a = c
    end
  end

Yes, it pained me to write that code, as it's both contrived and poorly written. But it illustrates two problems we need to confront. Calls to block_given? won't always be sitting in the condition that guards a yield, as it's not unreasonable to store its result in a variable (especially if you have many places in the method where you yield). If we do observe a call to block_given? being saved to a variable, we have to track that value as it moves through other variables.

Our intuition is that this still shouldn't be too hard, for good reason: we know that block_given?, no matter how many times we call it, will give the same, constant result: either true or false. Indeed, we've reduced the question of yielding to the problem of constant propagation.

Constant Propagation

Constant Propagation is typically used by optimizing compilers, and many techniques exist to implement it, at varying time complexities. The following two methods are equivalent, yet one should not be surprised that some constant propagation algorithms do not prove both have constant return values:

  def foo_easy
    a = '0123456789101112'
    b = '1314151617181920'
    a + b
  end
  def foo_hard
    i = 0
    str = ''
    while i <= 20
      str << i.to_s
      i += 1
    end
    str
  end

In foo_easy, every variable is a simple constant, and in foo_hard, none are. What is a simple constant? Kildall defined it as such: (ACM Link, PDF Link, 1MB)

Simple constants are all values that can be proved to be constant subject to two constraints: no information is assumed about which direction branches will take, and only one value for each variable is maintained along each path in the program.

During the execution of a method, block_given?'s return value is a simple constant. And there are some delightful algorithms for dealing with simple constant propagation. I have implemented Wegman and Zadeck's Sparse Conditional Constant (SCC) algorithm (ACM Link, PDF Link, 2MB). If you're interested in the theory and history behind this approach, I recommend reading the paper. In short, it can discover all simple constants and more, because it also eliminates branches on simple constants (like if block_given?!).

The Final Algorithm

So if you've got SCC and the method body, we're nearly there. block_given? only has two possible return values, so we simulate both!

  1. Replace all calls of Kernel#block_given? with false.
  2. Run SCC, removing branches provably never taken
  3. DFS the control flow graph, inserting all yield instructions into the set YieldNoBlock 3.5 Restore everything.
  4. Replace all calls of Kernel#block_given? with true.
  5. Run SCC, removing branches provably never taken
  6. DFS the control flow graph, inserting all yield instructions into the set YieldWithBlock

Interpreting the results are simple:

  1. If YieldNoBlock is empty, and YieldWithBlock is empty: block-ignored
  2. If YieldNoBlock is empty, and YieldWithBlock is non-empty: block-optional
  3. If YieldNoBlock is non-empty, and YieldWithBlock is non-empty: block-required
  4. If YieldNoBlock is non-empty, and YieldWithBlock is empty: block-foolish

Now that you have all the calls in YieldWithBlock, you can also look at the arguments passed at each yield-site. If no splats are used, you can precisely determine the possible arities yielded; if splats are used, you may only be able to estimate the arity as an infinite range.

Nothing's Foolproof

If this seems too good to be true, it's because it is, kind of. This is a conservative algorithm: it errs on the side of caution, assuming yields may happen when they don't. A simple program that defeats it is this:

  def beats_me
    arr = (0..100).to_a
    arr << block_given?
    arr.shuffle!
    should_yield = arr.reject { |x| Integer === x }.first
    yield(42) if should_yield
  end

I threw in the #shuffle! call to add an impure function into the mix, as my implementation of SCC will soon actually be able to simulate the call to Array#reject, as the block provided is provably constant as well (it itself is a pure function of its argument, x).

Tidying Up: aliasing Kernel#block_given?

Technically, there are other ways to find out if there is a block being passed to the current method: you could re-implement it in C, and call it something else, or you could alias Kernel#block_given?. You cannot call a Ruby method that itself calls block_given?. I've been writing a lot, so I'll leave that one to you.

Laser currently handles alias correctly, tracking the new aliases, which is pretty easy as alias uses the lexical context to decide which method to alias. alias_method, on the other hand, is a normal method call to self, which is easy to track when used typically, and undecidable to track for all programs. Thus, if a program contains a call to alias_method with :block_given? as a second argument, or a non-constant second argument, then we can't be so sure what methods to look for in the above analysis.

Luckily, Laser is a static analyzer/linter, and every once in a while, it is permitted to just say "no" to dangerous lifestyle choices (The compiler writers aren't so lucky). Laser will only consider proven aliases to #block-given? when computing yield behavior.

Oh Yeah: Kernel#binding

Remember when I said the block can't escape if you don't declare it as an explicit argument? That was a little white lie because of our good friend, the binding method. Kernel#binding returns an object that encapsulates the current context and exposes one method to use that context: eval. So while you may not be able to smuggle an actual Proc object out of your method, you can do this:

  def stash
    @stash = binding
  end
  def recover_and_call
    @stash.eval('yield')
  end
  def example
    x = 1
    stash { p x; x += 1 }
    recover_and_call
    recover_and_call
    p x
  end

Which outputs:

  1
  2
  3

In case you were wondering: yes, my thesis depends entirely on all you Rubyists not doing things like this.

Binding objects aren't completely obscure, either: ERb uses them and allows you to provide your own bindings, typically to expose local variables to the template.


Enjoy this article? Then feel free to:


Comments