?

Log in

No account? Create an account
The Rust object system struggles to its feet (plus, a little brainteaser) - Lindsey Kuper [entries|archive|friends|userinfo]
Lindsey Kuper

[ website | composition.al ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

The Rust object system struggles to its feet (plus, a little brainteaser) [Jul. 23rd, 2011|07:56 pm]
Lindsey Kuper
[Tags|, ]

When I last wrote about my work on Rust a month ago, I had just finished implementing some very basic support for extending Rust objects with new methods. In the grand tradition of computer science publishing, the example program I gave was one that very carefully avoided doing any of the things that didn't work yet. The most egregious issue was that if you extended an object with new methods, you could, um, no longer call its original methods. (Really. Well, I had to start somewhere.) Even once that was fixed, a lot of things were wrong. Outer methods couldn't refer to inner ones, since object extension didn't play nicely with self. And method overriding didn't work at all.

In the last month, I've made all of those things work to some extent. For instance, programs like these two from the Rust test suite that exercise the features I just mentioned can now compile, run, and exit normally, having behaved as expected. (And do, on three platforms, whenever someone lands a change to the compiler.) Hooray!

A lot of work remains to be done on the object system, of course, including a pretty huge issue that I've been putting off fixing. Everyone loves a brainteaser, right? Consider the following Rust program:

use std;

fn main() {

    obj cat() {
        fn ack() -> str {
            ret "ack";
        }
        fn meow() -> str {
            ret "meow";
        }
        fn zzz() -> str {
            ret self.meow();
        }
    }

    auto shortcat = cat();

    auto longcat = obj() {
        fn lol() -> str {
            ret "lol";
        }
        fn nyan() -> str {
            ret "nyan";
        }
        with shortcat
    };

    assert (shortcat.zzz() == "meow");
    assert (longcat.zzz() == "meow");
}

Here we've got an object with three methods, shortcat, that's created by calling the cat() object constructor. When the call to shortcat.zzz() is made, we look up zzz in shortcat's vtable and run the code associated with it, which directs us to look up meow in shortcat's vtable and run the code associated with it. Finally, that code returns the string "meow", and the first assertion succeeds.

Now consider the call to longcat.zzz(). The longcat object was created by extending shortcat with two new methods, lol and nyan. Neither one of the new methods overrides any of shortcat's methods, and neither one should interfere with zzz or meow. So we would expect the call to longcat.zzz() to return "meow", too. Instead, in the current buggy implementation, it returns something else, and the assertion fails.

So, the brainteaser: can you guess what longcat.zzz() returns instead of "meow", and why? (Note that the program compiles with no errors or warnings, doesn't segfault when run, and doesn't raise any errors under Valgrind.)

Scroll down for the answer and an explanation...

This is entirely Paul Stansifer's fault.

First, note that longcat has five entries in its vtable, appearing in alphabetical order: ack, lol, meow, nyan, zzz. Of those, lol and nyan are "normal" entries, and the others are "forwarding functions" that, roughly speaking, forward us along to the appropriate vtable entries from shortcat. (Implementing this forwarding behavior has taken me much of the last month.) So, when we call longcat.zzz(), we get forwarded along to shortcat's vtable entry for zzz, which contains the compiled code for ret self.meow().

This is where the problem arises: the code we're looking at was compiled under the assumption that self is shortcat, since of course that's what it was when shortcat's vtable was created. shortcat's vtable only has three methods, ack, meow, and zzz, in that order. The call to self.meow() therefore compiled to, roughly, "do whatever the second slot in my vtable says to do". But since self is now longcat, the second slot in its vtable is not meow but lol. So longcat.zzz() returns "lol", and the assertion fails. Interestingly, as I mentioned, this code runs cleanly under Valgrind -- no memory corruption or anything like that. Furthermore, the assertion might have accidentally succeeded if I'd chosen names for the methods that alphabetize differently. So this is a truly nasty bug.

To fix the mismatch, we need the self in shortcat.zzz() to have the same type as shortcat does. By "same type", I mean that its vtable must have the same number of methods, with the same names, in the same order, and having the same types as the original. But self is supposed to be longcat at that point, so how are we going to manage to give it the same type as shortcat? The answer, unsurprisingly, is another level of indirection. We give self a vtable that has three backwarding functions, one for each of the original three methods ack, meow, and zzz, which point to the corresponding entries in longcat's vtable. From there, of course, they will forward right back to shortcat's vtable -- unless longcat has overridden one of them, in which case we'd use the overriding entry from longcat. (There's no overriding happening in this example, but we need a solution that would work in case there were.) With luck, I'll land this backwarding behavior in Rust sometime in the next week. Stay tuned for the next exciting episode of Lindsey Implements An Object System!

LinkReply

Comments:
[User Picture]From: lindseykuper
2011-07-24 10:44 pm (UTC)
Although we hadn't been calling it that, permutation subtyping is the justification for it, yeah. I'm not sure what you mean by "if you have to carefully get the ordering stuff right anyway", though. Which 'you', the language user or the language implementor?
(Reply) (Parent) (Thread)
From: aleffert
2011-07-25 04:35 am (UTC)
I meant the language implementor. I was assuming some sort of fancier structural typing system. The particular case was suppose you have a type with fields A, B, and C. Then you upcast to A, C. The field offsets are now funny unless you do some sort of active cast or you do your lookups via a hash.
(Reply) (Parent) (Thread)
[User Picture]From: lindseykuper
2011-07-25 06:51 am (UTC)
But then it's not just a question of permutation, right? The issue you're describing seems to me to have the same flavor as the one I'm describing in this post, which, as you and others noticed, would need to be dealt with whether or not methods were alphabetized, so I think the the alphabetization stuff is kind of a red herring here. I'm not sure what an "active cast" is -- does wrapping the object that has A, B, C with an object whose vtable only has entries for A and C (as Graydon described) count as one of those?
(Reply) (Parent) (Thread)
From: aleffert
2011-07-25 04:29 pm (UTC)
Well the alphabetization drags in permutation, which I did not necessarily know was part of your system. In class based systems you don't usually worry about it, because you can only pull fields off the end. This way you don't need the multilevel v-table, I think? you're talking about in your post, since field offsets don't change as you go down the inheritance hierarchy, you just get a bigger object with more fields. (Of course if you're in C++ then multiple inheritance gets involved and it all goes to hell).

I'm not sure where I got the terminology active cast so it may be incorrect, but yeah, I meant what Graydon was talking about - a cast that has dynamic effects. Maybe this is also just a dynamic cast? I know C++ has some static/dynamic cast operation that I'm not familiar with the details of.
(Reply) (Parent) (Thread)