Note: This article is part of a series exploring how Laser, my Ruby static analysis tool, can help improve the quality of Ruby code. I'm presenting on this at RubyConf 2011 about it, and you should come!
Ruby has a plethora of built-in functions and idioms, and overriding them without messing up is nontrivial. For example, you can easily override to_s to return something other than a String, or ! to return an integer. Some methods, like catch, method, and include can be overridden, but are difficult (or impossible) to implement in Ruby; you should most likely call super in overrides of such methods. Worse, some methods, when overridden, cannot be made to act as expected when the overridden method is called, even using super.
Laser can help us find bugs in all of these situations!
to_s Returns a String. No Excuses.
Whenever Laser sees a call to a method, it calculates the return type based on what it knows about the receiver and arguments. It needs to do this as part of its type inference algorithm. While it's there, it has a small table of expectation:
EXPECTATIONS = {'to_s' => Types::STRING,
'to_str' => Types::STRING,
'to_i' => Types::ClassType.new('Integer', :covariant),
'to_int' => Types::ClassType.new('Integer', :covariant),
'to_f' => Types::FLOAT,
'to_a' => Types::ARRAY,
'to_ary' => Types::ARRAY,
'!' => Types::BOOLEAN }
Whenever a return type for a method is calculated, Laser checks if the name of the method is in that table. If it is, it ensures the type calculated is a subtype of the expectation. :covariant in the to_i/to_int entries above means, in this context, "this class or any subclass". So as long as all the possible types returned by to_i are all Integer or a subclass, we're good.
If things don't match up, well, then Laser leaves you a nasty note:
➜ laser git:(master) ✗ cat temp.rb class Foo def initialize(bar) @bar = bar end def to_s @bar.to_s end end class Baz def initialize(bar) @bar = bar end def to_s @bar # uh oh... end end f = Foo.new(gets.to_i).to_s g = Baz.new(gets.to_i).to_s ➜ laser git:(master) ✗ bundle exec bin/laser temp.rb 2 warnings found. 1 are fixable. ================================ temp.rb:0 Extra blank lines (1) - This file has 1 blank lines at the end of it. temp.rb:14 Error (8) - All methods named to_s should return a subtype of #<ClassObjectType: String>
Notice that it only picks out the to_s which is seen to return an integer.
Question-mark methods
One of my favorite tiny features of Ruby is that method names can end in question-marks. I wish all identifiers could, but I digress.
I've toyed with the idea that all methods whose name ends in a question-mark should return either true, false, or nil. Now, every Ruby object makes sense in a boolean context, so this isn't strictly necessary. But it might be worth pursuing: it keeps the output of these methods sane. Once I figure out a nice syntax, all warnings in Laser will be configurable on/off, so I figure there's no reason to add it.
So, just like the above examples where a name maps to a required output type, methods are checked for a trailing '?'. Any such methods must return a subtype of TrueClass | FalseClass | NilClass:
➜ laser git:(master) ✗ cat temp.rb class HashWrapper def initialize(h) @hash = h end def has_key?(k) @hash[k] end end hw = HashWrapper.new({gets => gets}) b = hw.has_key?('foobar') ➜ laser git:(master) ✗ bundle exec bin/laser temp.rb 2 warnings found. 1 are fixable. ================================ temp.rb:0 Extra blank lines (1) - This file has 1 blank lines at the end of it. temp.rb:6 Error (8) - All methods whose name ends in ? should return a subtype of TrueClass | FalseClass | NilClass
Keep in mind, these warnings are only triggered if Laser observes the method being called in the first place. It doesn't bother trying to figure out return types without an observed call, because you won't have enough information to find out much useful anyway. And if a method's never called, you'll get a different warning.
Here's to the Magic Ones
Ruby goes out of its way to use as few keywords as possible, making language features like include or raise methods. This leverages the existing method-call syntax while reducing the number of reserved words in the language. However, that leads to some trouble.
A little-known issue with overriding Module#private or the other visibility-modifier methods is that it breaks the zero-argument, lexically-scoped form in MRI:
class Foo def self.private(*args) puts "called private with args: #{args.inspect}" super end private def bar puts 'bar' end end Foo.new.bar
results in:
called private with args: []
bar
It's a misfeature, and not really something you can do much about. I'm not sure if it can be fixed by using method_added and other hooks, but regardless, from a static analyzer's point of view, this is something the user is probably not aware of and probably should be.
A few other methods are the same way: Kernel#local_variables, Kernel#binding, Kernel#block_given?, and so on. Most of the time, they're providing some "magic" behavior you couldn't have otherwise implemented in Ruby, and they often refer to the current interpreter state: the call stack, the current frame, etc.
Laser always warns when it sees you override these methods, even if they're not called. That's because it knows you likely messed up without even seeing a call. Each dangerous method has a semi-custom message describing the pitfalls of overriding the method in question:
➜ laser git:(master) ✗ cat temp.rb class BadClass def self.public(*args) puts 'hooked into public' super # works for 1-n args, not zero end def block_given? puts 'block_given? called' super # useless end def __method__ puts '__method__ called' super # useless end end ➜ laser git:(master) ✗ bundle exec bin/laser temp.rb 6 warnings found. 1 are fixable. ================================ temp.rb:0 Extra blank lines (1) - This file has 1 blank lines at the end of it. temp.rb:3 Error (8) - Overriding Module#public breaks its zero-argument lexically-scoped behavior. temp.rb:7 Error (8) - Overriding Kernel#block_given? irreparably breaks the method. temp.rb:11 Error (8) - Overriding Kernel#__method__ irreparably breaks the method. temp.rb:6 Unused method () - The method BadClass#block_given? is never called. temp.rb:10 Unused method () - The method BadClass#__method__ is never called.
Cool. I just found a bug, too: I excluded singleton classes from reporting unused methods (so that Class: "foo"#split doesn't show up as unused, for example), but I should be checking for singleton classes which are not subclasses of Module. That's why public didn't show up as unused.
This takes us to the hardest case to warn about: methods which you can override, but when you do so, you had better call super or risk breaking lots of code.
Expecting a Guaranteed Super Call
One special property of Laser is that it performs flow analysis on Ruby programs by attempting to construct a Control flow graph of the program. Doing so conservatively - without leaving out any potential control flows - is very hard, and Laser is not correct in that regard. It will be confused by some Ruby programs. But for the many programs it can model, it uses this CFG to make all kinds of interesting inferences.
One closely-related data structure to the CFG is the dominator tree. It can tell you what portions of code dominate or always run before another portion of code. Using this, and a good bit of constant propagation and dead-code elimination, Laser can tell that in this program, throw always calls super before exiting successfully:
class Foo def throw(tag, val=nil) if Symbol === tag p tag else raise 'expected a symbol' end super end end
Laser gives no warning when it sees this method. But when we run the following, it lets us know we dun goofed:
➜ laser git:(master) ✗ cat temp.rb class Foo def self.include(arg1, *args) # do other stuff p args.size end def self.extend(arg1, *args) if arg1.name == 'Bar' super(*args) end end end ➜ laser git:(master) ✗ bundle exec bin/laser temp.rb 3 warnings found. 1 are fixable. ================================ temp.rb:0 Extra blank lines (1) - This file has 1 blank lines at the end of it. temp.rb:4 Error (8) - Always call super when overriding Module#include. temp.rb:7 Error (8) - Always call super when overriding Module#extend.
Right now, Laser walks the dominator tree from the "successful exit" of the graph, looking for a dominating call to super or super_vararg instructions. This means you wrote code that explicitly has 1 guaranteed super instruction.
The next step is to check all paths to see if a super instruction happens on every possible path. Needless to say, this is a lot more expensive. But it may be worth doing, since the above heuristic will flag some methods which always call super!
That's all for now - if you want to learn more, and see pretty graphs (Graph-theory graphs, that is), be sure to check out my RubyConf 2011 talk!

Enjoy this article? Then feel free to: