Laser Infers Block Use Patterns

Posted by -

Introduction

A major portion of the novel work done on my undergraduate thesis over the last 6-8 months went into developing Laser enough that it could infer block usage in a method: does a method call require a block, ignore a block, etc. This blog post shows how Laser successfully analyzes dozens of potential method constructions with interesting block use characteristics!

Defining Block Use

As explained in my preliminary post on the subject, there are two major sets forming the set of all methods in Ruby:

Each of these has an additionally interesting subset which never invokes a provided block:

Given these definitions, it is clear why an analyzer would like to characterize methods into these sets, as specifically as possible:

I've shown (quite trivially) in my thesis that the set of Block-Required methods is undecidable, and the set of Block-Optional methods is Turing-unrecognizable. However, it should not surprise most Rubyists that we can categorize lots of methods into these groups.

Examples

Here's a bunch of passing specs from Laser's latest checkin! There's a lot, so I list them by block-use category, roughly in increasing "difficulty." All following code snippets are methods for which Laser correctly identifies the block-use category given. In addition, any example which makes use of ".call" will also work where "[]" or "===" are substituted for ".call". Laser does not consider dynamic aliases for this purpose yet, though it could be added quite trivially; I'm temporarily concerned about overhead.

Block-Ignored

These should be easy, right? Methods that never invoke a block are Block-Ignored. These are actually an unrecognizable set, but for our purposes, a Block-Ignored method must be quite pathological for Laser to fail to identify it.

def foo(x)
  y = gets() * 2
  z = y
  c = z * z
end

When "yield" appears in a method, we must do a bit more work:

def foo(x)
  x = 2 ** 16
  if x == 65536
    puts x
  else
    yield
  end
end

The same is true for explicit block arguments:

def foo(other_arg, &blk)
  other_arg.call(5)
end

Block-Required

From Laser's point of a view, we try to prove that a method never calls its block when provided, and never tries to call its block when not provided. When we fail to prove both of these, a method is considered Block-Required: Laser believes it is possible it may call its block, no matter how the method is invoked.

An unconditional yield is simple enough:

def tap
  yield self
  self
end

Though we musn't forget unconditional yields:

def one
  if gets.size < 0
    yield
  else
    1
  end
end

Calling the explicit block argument must also be identified:

def one(&blk)
  blk.call(2, 3)
  1
end

Or when we might call the block argument:

def one(other_arg, &blk)
  if gets.size > 2
    other_arg = blk
  end
  other_arg.call(2)
end

Or when we get the block from Proc.new:

def one
  blk = Proc::new
  blk.call(2, 3)
  1
end

Block-Optional

We are intimately familiar with Block-Optional methods: As of Ruby 1.9, all the Enumerable methods are Block-Optional, and I can't say I've met anyone who doesn't like the block-bearing form of File.open. Identifying these methods means proving that when the block is nil, then the method won't raise as a result. This manifests itself in several forms.

For brevity the following similar examples are slightly tweaked from their spec counterparts.

Of course, there's checking block_given? and branching on it:

def one
  if block_given?
    yield 1
  else
    1
  end
end

There's also the block_given? alias iterator?, defined?(yield), and so on:

def one
  yield 1 if iterator?
end
def one
  yield 1 if defined?(yield)
end
def one
  yield 1 if defined?(yield($., *$*))
end

There's checking an explicit block argument against nil:

def one(&blk)
  blk.call(2, 3) if blk != nil
end

and when the block is obtained via Proc.new:

def one
  blk = Proc::new
  blk.call(2, 3) if blk != nil
end

Then there's another issue, brought up to me on ruby-talk by Robert Klemme: what if the exception raised by yield is caught? That threw a monkey wrench into my construction at the time. Now, in the CFG Laser builds and analyzes, I build a different execution path for exceptions raised by yield, providing a constant target for exception handlers. This means Constant Propagation can succeed analyzing many exception handlers.

The following snippet works if you rescue any of LocalJumpError, StandardError, Exception, Object, Kernel, or BasicObject.

def one
  yield 1
  4
rescue LocalJumpError
  2
end

The rescue modifier construction rescues StandardError, so it rescues a LocalJumpError:

def one
  yield 1 rescue 2
end

And rescue with no specified exception handler also just rescues StandardError, so it works, too:

def one
  yield 1
  4
rescue
  2
end

Lastly, let's consider delegation of blocks: you receive a block, obtain a reference to it (either via Proc.new, an explicit argument, and so on), and then call another method, re-using the block.

Since Array#each is Block-Optional, so too is bar below.

def bar(x, &blk)
  [1, 2, x].each(&blk)
end

We should be able to delegate to Ruby methods and still maintain inference. Both YP1#foo and YP1#bar below are successfully inferred as Block-Optional:

class YP1
  def foo(x)
    yield x if block_given?
  end
  def bar(y, &blk)
    foo(y, &blk)
  end
end

Using known instance variable types also allows us to correctly analyze the behavior of common delegation or proxy object patterns. Below, Laser will infer that YP3's instance variable @data can be either of class NilClass, Array, or Hash. It will also infer that YP4's instance variable @data can be either of class NilClass, YP3 or Array. (It thinks NilClass is possible because that is how instance variables are initialized; it doesn't yet do anything special with unconditional ivar assignments in #initialize)

For YP3, Array#each and Hash#each are block-optional and NilClass#each always raises an error, so YP3#each is Block-Optional. For YP4, Array#each and YP3#each are block-optional, and NilClass#each always raises an error, so YP4#each is also Block-Optional.

class YP3
  def initialize(data)
    @data = data
  end
  def each(&blk)
    @data.each(&blk)
  end
end
class YP4
  def initialize(data)
    @data = data
  end
  def foobar(&blk)
    @data.each(&blk)
  end
end
YP4Temp1 = YP4.new(YP3.new([1, 2, 3]))
YP4Temp2 = YP4.new(YP3.new({a: :b, c: :d}))
YP4Temp3 = YP4.new([5, 6])

One more Block-Required example

I wanted to have a spec for a block-required example similar to the previous one. Here it is:

class YP5
  def each
    yield 1
    yield 2
    yield 3
  end
end
class YP6
  def initialize(data)
    @data = data
  end
  def foobar(&blk)
    @data.each(&blk)
  end
end
YP6Temp1 = YP6.new(YP5.new)

Block-Foolish

There's a reason I embed "Foolish" in the name of these methods. By definition, they are nigh-useless. One would expect a user to create one accidentally, by typing unless block_given? instead of if block_given?; Laser considers the existence of these methods an error.

def one
  if block_given?  # or defined?(yield), or iterator?, or ...
    1
  else
    yield 1
  end
end

Again, we need to cover explicit blocks, Proc.new, etc:

def one(&blk)
  if blk == nil
    blk.call(2, 3)
  end
  1
end
def one
  blk = Proc::new
  if blk == nil
    blk.call(2, 3)
  end
  1
end

Closing Thoughts

There's more work to be done in this part of Laser's analysis:

That said, this represents an enormous success in static analysis and understanding of Ruby methods. I hope to, later in the Summer, integrate this work with YARD; it'd be cool to see this information show up in your docs without your writing it!


Enjoy this article? Then feel free to:


Comments