Log in

No account? Create an account
The Rust object system struggles to its feet (plus, a little brainteaser) - Lindsey Kuper [entries|archive|friends|userinfo]
Lindsey Kuper

[ website | composition.al ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

The Rust object system struggles to its feet (plus, a little brainteaser) [Jul. 23rd, 2011|07:56 pm]
Lindsey Kuper
[Tags|, ]

When I last wrote about my work on Rust a month ago, I had just finished implementing some very basic support for extending Rust objects with new methods. In the grand tradition of computer science publishing, the example program I gave was one that very carefully avoided doing any of the things that didn't work yet. The most egregious issue was that if you extended an object with new methods, you could, um, no longer call its original methods. (Really. Well, I had to start somewhere.) Even once that was fixed, a lot of things were wrong. Outer methods couldn't refer to inner ones, since object extension didn't play nicely with self. And method overriding didn't work at all.

In the last month, I've made all of those things work to some extent. For instance, programs like these two from the Rust test suite that exercise the features I just mentioned can now compile, run, and exit normally, having behaved as expected. (And do, on three platforms, whenever someone lands a change to the compiler.) Hooray!

A lot of work remains to be done on the object system, of course, including a pretty huge issue that I've been putting off fixing. Everyone loves a brainteaser, right? Consider the following Rust program:

use std;

fn main() {

    obj cat() {
        fn ack() -> str {
            ret "ack";
        fn meow() -> str {
            ret "meow";
        fn zzz() -> str {
            ret self.meow();

    auto shortcat = cat();

    auto longcat = obj() {
        fn lol() -> str {
            ret "lol";
        fn nyan() -> str {
            ret "nyan";
        with shortcat

    assert (shortcat.zzz() == "meow");
    assert (longcat.zzz() == "meow");

Here we've got an object with three methods, shortcat, that's created by calling the cat() object constructor. When the call to shortcat.zzz() is made, we look up zzz in shortcat's vtable and run the code associated with it, which directs us to look up meow in shortcat's vtable and run the code associated with it. Finally, that code returns the string "meow", and the first assertion succeeds.

Now consider the call to longcat.zzz(). The longcat object was created by extending shortcat with two new methods, lol and nyan. Neither one of the new methods overrides any of shortcat's methods, and neither one should interfere with zzz or meow. So we would expect the call to longcat.zzz() to return "meow", too. Instead, in the current buggy implementation, it returns something else, and the assertion fails.

So, the brainteaser: can you guess what longcat.zzz() returns instead of "meow", and why? (Note that the program compiles with no errors or warnings, doesn't segfault when run, and doesn't raise any errors under Valgrind.)

Scroll down for the answer and an explanation...

This is entirely Paul Stansifer's fault.

First, note that longcat has five entries in its vtable, appearing in alphabetical order: ack, lol, meow, nyan, zzz. Of those, lol and nyan are "normal" entries, and the others are "forwarding functions" that, roughly speaking, forward us along to the appropriate vtable entries from shortcat. (Implementing this forwarding behavior has taken me much of the last month.) So, when we call longcat.zzz(), we get forwarded along to shortcat's vtable entry for zzz, which contains the compiled code for ret self.meow().

This is where the problem arises: the code we're looking at was compiled under the assumption that self is shortcat, since of course that's what it was when shortcat's vtable was created. shortcat's vtable only has three methods, ack, meow, and zzz, in that order. The call to self.meow() therefore compiled to, roughly, "do whatever the second slot in my vtable says to do". But since self is now longcat, the second slot in its vtable is not meow but lol. So longcat.zzz() returns "lol", and the assertion fails. Interestingly, as I mentioned, this code runs cleanly under Valgrind -- no memory corruption or anything like that. Furthermore, the assertion might have accidentally succeeded if I'd chosen names for the methods that alphabetize differently. So this is a truly nasty bug.

To fix the mismatch, we need the self in shortcat.zzz() to have the same type as shortcat does. By "same type", I mean that its vtable must have the same number of methods, with the same names, in the same order, and having the same types as the original. But self is supposed to be longcat at that point, so how are we going to manage to give it the same type as shortcat? The answer, unsurprisingly, is another level of indirection. We give self a vtable that has three backwarding functions, one for each of the original three methods ack, meow, and zzz, which point to the corresponding entries in longcat's vtable. From there, of course, they will forward right back to shortcat's vtable -- unless longcat has overridden one of them, in which case we'd use the overriding entry from longcat. (There's no overriding happening in this example, but we need a solution that would work in case there were.) With luck, I'll land this backwarding behavior in Rust sometime in the next week. Stay tuned for the next exciting episode of Lindsey Implements An Object System!


[User Picture]From: gwillen
2011-07-24 04:46 am (UTC)
Heh, I guessed the bug correctly, but didn't assume the compiler would alphabetize the methods, so I was guessing "the address of the string 'hello world'".

Is it clearly decided and set, within the semantics of Rust, that a self call in a base class method will call the overriding subclass method if one exists? Because if not, I would treat that decision with some care. I suggest reading e.g. this paper by Aldrich, which explains why such /open recursion/ semantics can be dangerous and undesirable:

(Reply) (Thread)
[User Picture]From: gwillen
2011-07-24 04:50 am (UTC)
Also take a look at Peter Norvig's classic HashtableWithPlurals example in Java:

(Reply) (Parent) (Thread)
[User Picture]From: graydon
2011-07-24 06:21 am (UTC)
Not Decided And Set but it's a decent enough starting position, I think, for doing overriding. You always run the risk of confusing *someone* when you override; the hashtable with plurals thing is cute, sure, but who's to say (even if the base class didn't do that) that randomly sticking an "s" on everything inserted is safe wrt. all other invariants? Overriding is a crapshoot at the best of times; you do it to attempt reuse, at the cost of coupling to someone else's logic...

In any case, I prefer not to overdesign; the Rust object system is intentionally minimal for now, but if we stub our toes on the issue enough times, we'll add workarounds (say, an inner.foo() for wrapped redispatch, similar to 'super', and a self::foo() non-virtual call form).
(Reply) (Parent) (Thread)