Blocks play a critical role in idiomatic Ruby code. They're a crucial part
of any successful Ruby API, and the standard library shows it: from
Files,
and Enumerable's
handy
methods, to
even the bytecode implementation of the def and class constructs,
blocks are everywhere. Like other formal arguments, whether a method
takes a block is a fundamental part of a method's API.
Unlike formal arguments, our documentation tools don't infer block argument automatically. Unlike formal arguments, when a block is required but not provided, the method will partially execute before raising an exception. Even worse, when a block is unnecessary but is provided, Ruby doesn't even notice.
The Question
The question we wish to answer is: can we statically infer how a method uses block arguments? This question has several prongs:
- When provided a block, will the block be executed before the method ends? 2a. If yes, how many times? (A finite union of possibly-infinite ranges is a sufficient answer) 2b. If yes, with what arities? (Again, a finite union of possibly-infinite ranges describes this well)
- When not provided a block, will the method attempt (and fail) to call it?
- Will the block be stored somewhere, so it can be called after the method ends? (formally, will the block argument escape the call)
It's important to note that these questions are relevant to any function argument passed in any language that supports it; Ruby simply brings these questions to the fore by making block use unavoidable (and awesome).
Refresher: Block Use
As a super-quick reminder, blocks are consumed by methods in one
of two ways: as an explicit argument, and with yield (or both,
but that's just weird. In fact, I think
Laser should issue a
style-level warning for that...)
If you choose yield, then you must use block_given? to check
if there is a block argument:
def foo(a, b) if a > b && block_given? yield a end end
If you use an explicit block argument, you can also just check if it
is nil before calling it, as well as block_given?:
def bar(a, b, &block) yield a if block yield b if block_given? end
Neither of the two methods above will raise a LocalJumpError if called
without a block.
Explicit block arguments provide an important functionality, widely used in friendly Ruby APIs such as Sinatra's path handlers and Rails' filters: you can save the block, and run it later:
class C def initialize @paths = {} end def get(path, &blk) @paths[path] = blk end def handle_request(path) @paths[path].call # URLs are always right end end
Having the block as an argument introduces two related issues that make analysis tougher: aliasing, and escaping. It also introduces the possibility that the block is passed onto another method, but that's less of a concern: if we can analyze one call with a block, we can handle more.
For now though, we turn our attention to the easier case: yield.
Inferring yield Use
Let's see if we can answer the above 4 questions when a method does not
have an explicit block argument. Since the block argument cannot be accessed
as a Proc object via a standard Ruby interface (to the best of my knowledge),
we know the answer to the question "Will the block be stored somewhere, so it
can be called after the method ends?" is no. (That was a tiny lie. I'll address
Kernel#binding later.)
We can break the "will the method call the block" questions into putting the method in one of four classes: block-required, block-optional, block-ignored, and block-foolish:

Notice the use of "may yield" in the diagram. There will always be cases where one cannot statically infer whether a yield will occur. A contrived example:
def foo(x) yield x if gets.size > 10 end
It would be tough to prove that branch will never be taken, so foo is a
block-required method: if you want no exceptions, you had better provide
a block.
It shouldn't be hard to come up with an idea of where we're going at this point:
we want to identify all yields guarded by block_given?, and all yields not
guarded by block_given?, and see which sets are empty or non-empty. If we
saw this method:
def foo_2(x) yield x if gets.size > 10 && block_given? end
we know that foo_2 will never yield if there's no block to yield to. This
approach is good enough, right? Yes? Okay, consider:
def foo_hell(x) a = block_given? x.each do |b| c = a a = yield(b) if c p(a) c = a unless c a = c end end
Yes, it pained me to write that code, as it's both contrived and poorly written. But
it illustrates two problems we need to confront. Calls to block_given? won't always
be sitting in the condition that guards a yield, as it's not unreasonable to store its result
in a variable (especially if you have many places in the method where you yield).
If we do observe a call to block_given? being saved to a variable, we have to track
that value as it moves through other variables.
Our intuition is that this still shouldn't be too hard, for good reason: we know that
block_given?, no matter how many times we call it, will give the same, constant result:
either true or false. Indeed, we've reduced the question of yielding to the problem
of constant propagation.
Constant Propagation
Constant Propagation is typically used by optimizing compilers, and many techniques exist to implement it, at varying time complexities. The following two methods are equivalent, yet one should not be surprised that some constant propagation algorithms do not prove both have constant return values:
def foo_easy a = '0123456789101112' b = '1314151617181920' a + b end def foo_hard i = 0 str = '' while i <= 20 str << i.to_s i += 1 end str end
In foo_easy, every variable is a simple constant, and in foo_hard, none are. What is
a simple constant? Kildall defined it as such: (ACM Link,
PDF Link, 1MB)
Simple constants are all values that can be proved to be constant subject to two constraints: no information is assumed about which direction branches will take, and only one value for each variable is maintained along each path in the program.
During the execution of a method, block_given?'s return value is a simple constant. And there are some
delightful algorithms for dealing with simple constant propagation. I have implemented
Wegman and Zadeck's Sparse Conditional Constant (SCC) algorithm (ACM Link,
PDF Link, 2MB).
If you're interested in the theory and history behind this approach, I recommend reading the
paper. In short, it can discover all simple constants and more, because it also eliminates
branches on simple constants (like if block_given?!).
The Final Algorithm
So if you've got SCC and the method body, we're nearly there. block_given? only has two possible
return values, so we simulate both!
- Replace all calls of
Kernel#block_given?withfalse. - Run SCC, removing branches provably never taken
- DFS the control flow graph, inserting all
yieldinstructions into the set YieldNoBlock 3.5 Restore everything. - Replace all calls of
Kernel#block_given?withtrue. - Run SCC, removing branches provably never taken
- DFS the control flow graph, inserting all
yieldinstructions into the set YieldWithBlock
Interpreting the results are simple:
- If
YieldNoBlockis empty, andYieldWithBlockis empty: block-ignored - If
YieldNoBlockis empty, andYieldWithBlockis non-empty: block-optional - If
YieldNoBlockis non-empty, andYieldWithBlockis non-empty: block-required - If
YieldNoBlockis non-empty, andYieldWithBlockis empty: block-foolish
Now that you have all the calls in YieldWithBlock, you can also look at
the arguments passed at each yield-site. If no splats are used, you can
precisely determine the possible arities yielded; if splats are used, you
may only be able to estimate the arity as an infinite range.
Nothing's Foolproof
If this seems too good to be true, it's because it is, kind of. This is a conservative algorithm: it errs on the side of caution, assuming yields may happen when they don't. A simple program that defeats it is this:
def beats_me arr = (0..100).to_a arr << block_given? arr.shuffle! should_yield = arr.reject { |x| Integer === x }.first yield(42) if should_yield end
I threw in the #shuffle! call to add an impure function into the mix, as
my implementation of SCC will soon actually be able to simulate the call to
Array#reject, as the block provided is provably constant as well (it itself
is a pure function of its argument, x).
Tidying Up: aliasing Kernel#block_given?
Technically, there are other ways to find out if there is a block being
passed to the current method: you could re-implement it in C, and call it
something else, or you could alias Kernel#block_given?. You cannot
call a Ruby method that itself calls block_given?. I've been writing a
lot, so I'll leave that one to you.
Laser currently handles alias correctly, tracking the new aliases,
which is pretty easy as alias uses the lexical context to decide which
method to alias. alias_method, on the other hand, is a normal method
call to self, which is easy to track when used typically, and undecidable
to track for all programs. Thus, if a program contains a call to alias_method
with :block_given? as a second argument, or a non-constant second argument,
then we can't be so sure what methods to look for in the above analysis.
Luckily, Laser is a static analyzer/linter, and every once in a while, it
is permitted to just say "no" to dangerous lifestyle choices (The compiler
writers aren't so lucky). Laser will only consider proven aliases to
#block-given? when computing yield behavior.
Oh Yeah: Kernel#binding
Remember when I said the block can't escape if you don't declare it as
an explicit argument? That was a little white lie because of our good
friend, the binding method. Kernel#binding returns an object that
encapsulates the current context and exposes one method to use that
context: eval. So while you may not be able to smuggle an actual Proc
object out of your method, you can do this:
def stash @stash = binding end def recover_and_call @stash.eval('yield') end def example x = 1 stash { p x; x += 1 } recover_and_call recover_and_call p x end
Which outputs:
1 2 3
In case you were wondering: yes, my thesis depends entirely on all you Rubyists not doing things like this.
Binding objects aren't completely obscure, either: ERb uses them and allows you to provide your own bindings, typically to expose local variables to the template.

Enjoy this article? Then feel free to: