Introduction
A major portion of the novel work done on my undergraduate thesis over the last 6-8 months went into developing Laser enough that it could infer block usage in a method: does a method call require a block, ignore a block, etc. This blog post shows how Laser successfully analyzes dozens of potential method constructions with interesting block use characteristics!
Defining Block Use
As explained in my preliminary post on the subject, there are two major sets forming the set of all methods in Ruby:
- Block-Optional methods: never raise due to the lack of a block
- Block-Required methods: may raise due to the lack of a block
Each of these has an additionally interesting subset which never invokes a provided block:
- Block-Ignored (subset of Block-Optional) methods: never call or store a block when provided
- Block-Foolish (subset of Block-Required) methods: never call or store a block when provided
Given these definitions, it is clear why an analyzer would like to characterize methods into these sets, as specifically as possible:
- It is an error to pass a block to a Block-Ignored method
- It is an error to fail to pass a block to a BLock-Required method
- It is likely an error to define a Block-Foolish method
I've shown (quite trivially) in my thesis that the set of Block-Required methods is undecidable, and the set of Block-Optional methods is Turing-unrecognizable. However, it should not surprise most Rubyists that we can categorize lots of methods into these groups.
Examples
Here's a bunch of passing specs from Laser's latest checkin! There's a lot, so I list them by block-use category, roughly in increasing "difficulty." All following code snippets are methods for which Laser correctly identifies the block-use category given. In addition, any example which makes use of ".call" will also work where "[]" or "===" are substituted for ".call". Laser does not consider dynamic aliases for this purpose yet, though it could be added quite trivially; I'm temporarily concerned about overhead.
Block-Ignored
These should be easy, right? Methods that never invoke a block are Block-Ignored. These are actually an unrecognizable set, but for our purposes, a Block-Ignored method must be quite pathological for Laser to fail to identify it.
def foo(x) y = gets() * 2 z = y c = z * z end
When "yield" appears in a method, we must do a bit more work:
def foo(x) x = 2 ** 16 if x == 65536 puts x else yield end end
The same is true for explicit block arguments:
def foo(other_arg, &blk) other_arg.call(5) end
Block-Required
From Laser's point of a view, we try to prove that a method never calls its block when provided, and never tries to call its block when not provided. When we fail to prove both of these, a method is considered Block-Required: Laser believes it is possible it may call its block, no matter how the method is invoked.
An unconditional yield is simple enough:
def tap yield self self end
Though we musn't forget unconditional yields:
def one if gets.size < 0 yield else 1 end end
Calling the explicit block argument must also be identified:
def one(&blk) blk.call(2, 3) 1 end
Or when we might call the block argument:
def one(other_arg, &blk) if gets.size > 2 other_arg = blk end other_arg.call(2) end
Or when we get the block from Proc.new:
def one blk = Proc::new blk.call(2, 3) 1 end
Block-Optional
We are intimately familiar with Block-Optional methods: As of Ruby 1.9, all the
Enumerable methods are Block-Optional,
and I can't say I've met anyone who doesn't like the block-bearing form of
File.open. Identifying these methods
means proving that when the block is nil, then the method won't raise as a result. This manifests itself in several
forms.
For brevity the following similar examples are slightly tweaked from their spec counterparts.
Of course, there's checking block_given? and branching on it:
def one if block_given? yield 1 else 1 end end
There's also the block_given? alias iterator?, defined?(yield), and so on:
def one yield 1 if iterator? end
def one yield 1 if defined?(yield) end
def one yield 1 if defined?(yield($., *$*)) end
There's checking an explicit block argument against nil:
def one(&blk) blk.call(2, 3) if blk != nil end
and when the block is obtained via Proc.new:
def one blk = Proc::new blk.call(2, 3) if blk != nil end
Then there's another issue, brought up to me on ruby-talk by Robert Klemme:
what if the exception raised by yield is
caught? That threw a monkey wrench into my construction at the time. Now, in the CFG Laser builds and analyzes, I build
a different execution path for exceptions raised by yield, providing a constant target for exception handlers. This
means Constant Propagation can succeed analyzing many exception handlers.
The following snippet works if you rescue any of LocalJumpError, StandardError, Exception, Object, Kernel,
or BasicObject.
def one yield 1 4 rescue LocalJumpError 2 end
The rescue modifier construction rescues StandardError, so it rescues a LocalJumpError:
def one yield 1 rescue 2 end
And rescue with no specified exception handler also just rescues StandardError, so it works, too:
def one yield 1 4 rescue 2 end
Lastly, let's consider delegation of blocks: you receive a block, obtain a reference to it (either via Proc.new, an
explicit argument, and so on), and then call another method, re-using the block.
Since Array#each is Block-Optional, so too is bar below.
def bar(x, &blk) [1, 2, x].each(&blk) end
We should be able to delegate to Ruby methods and still maintain inference. Both YP1#foo and YP1#bar below are successfully
inferred as Block-Optional:
class YP1 def foo(x) yield x if block_given? end def bar(y, &blk) foo(y, &blk) end end
Using known instance variable types also allows us to correctly analyze the behavior of common delegation or proxy
object patterns. Below, Laser will infer that YP3's instance variable @data can be either of class NilClass,
Array, or Hash. It will also infer that YP4's instance variable @data can be either of class NilClass, YP3 or Array.
(It thinks NilClass is possible because that is how instance variables are initialized; it doesn't yet do anything special with
unconditional ivar assignments in #initialize)
For YP3, Array#each and Hash#each are block-optional and NilClass#each always raises an error, so YP3#each is Block-Optional.
For YP4, Array#each and YP3#each are block-optional, and NilClass#each always raises an error, so YP4#each is also Block-Optional.
class YP3 def initialize(data) @data = data end def each(&blk) @data.each(&blk) end end class YP4 def initialize(data) @data = data end def foobar(&blk) @data.each(&blk) end end YP4Temp1 = YP4.new(YP3.new([1, 2, 3])) YP4Temp2 = YP4.new(YP3.new({a: :b, c: :d})) YP4Temp3 = YP4.new([5, 6])
One more Block-Required example
I wanted to have a spec for a block-required example similar to the previous one. Here it is:
class YP5 def each yield 1 yield 2 yield 3 end end class YP6 def initialize(data) @data = data end def foobar(&blk) @data.each(&blk) end end YP6Temp1 = YP6.new(YP5.new)
Block-Foolish
There's a reason I embed "Foolish" in the name of these methods. By definition, they are nigh-useless. One
would expect a user to create one accidentally, by typing unless block_given? instead of if block_given?;
Laser considers the existence of these methods an error.
def one if block_given? # or defined?(yield), or iterator?, or ... 1 else yield 1 end end
Again, we need to cover explicit blocks, Proc.new, etc:
def one(&blk) if blk == nil blk.call(2, 3) end 1 end
def one blk = Proc::new if blk == nil blk.call(2, 3) end 1 end
Closing Thoughts
There's more work to be done in this part of Laser's analysis:
- Recursion (mutual or not) is not supported for inter-procedural inference.
- No analysis is done to infer what happens when the block is passed as a normal argument to another method; it is assumed to be used.
- No analysis is done concerning storage of the block and later use.
- Like all Laser analyses, it assumes constants (especially important ones like
LocalJumpError) will not be redefined after load-time. Compiler writers reading this: I'm sorry. - Laser is written in all Ruby (except for Ripper). It's incredibly slow. For many analyses it interprets Ruby code, in all Ruby.
That said, this represents an enormous success in static analysis and understanding of Ruby methods. I hope to, later in the Summer, integrate this work with YARD; it'd be cool to see this information show up in your docs without your writing it!

Enjoy this article? Then feel free to: