?

Log in

No account? Create an account
Perturbation theory - Lindsey Kuper [entries|archive|friends|userinfo]
Lindsey Kuper

[ website | composition.al ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Perturbation theory [Jul. 3rd, 2008|12:46 pm]
Lindsey Kuper
[Tags|]

A perturbation of a string is a permutation, but with one m removed and one r and one b added. For example, perturbation is a perturbation of permutation.

Challenge!: I'll bake cookies for the first person who can generate a list of all the words in your words file that have at least one valid perturbation. (In this case, a "valid perturbation" is one that also appears in words. Perturbation is undefined for strings that don't have an m to begin with.)

Update, eleven hours later: Thank you for playing! Jesse jes5199 wins the cookies with his Ruby implementation. Peter pmb was hot on his heels with Python. Further versions welcome.

Without looking at Jes or Peter's code, Alex oniugnip and I each tried it ourselves. Alex had it in Python in 20 minutes or so (although he says Peter's is more concise). (Alex is working on a Scheme version now; I'm looking over his shoulder as we speak, and one of the things I see is:

(define test-words
  '("yarmulke" "parmalat" "aardvark" "chutzpah" "intersperse" 
    "permute" "parmalat" "perturb"))

This is perhaps the best thing ever.)

Here's my Perl version (which took a couple of hours, because I'm a slow programmer) and output. My results match Alex's, but differ from Jesse's; it looks like we have different words. In mine, 2070 words have perturbations (wow; I was expecting maybe a hundred). There are 2735 perturbations in all. imitatively -> veritability is one of my favorites!

And I'm more and more envious of you Python kids. I like that you can print anything. And that you can .append to an array.

LinkReply

Comments:
[User Picture]From: catechism
2008-07-03 06:22 pm (UTC)
I can totally do this! It is going to be the most hacktastic python EVER. But I can totally do it. I think.
(Reply) (Thread)
[User Picture]From: catechism
2008-07-03 06:30 pm (UTC)
Though probably I cannot do it *first*. Hmmmm.
(Reply) (Parent) (Thread)
[User Picture]From: jes5199
2008-07-03 06:40 pm (UTC)
the first file only uses words that have no capital letters and no non-letter characters.
the second file allows capitals (though the case is ignored) and non-letter characters.
(Reply) (Parent) (Thread) (Expand)
[User Picture]From: pmb
2008-07-03 09:14 pm (UTC)
#!/usr/bin/env python

words = {}
import sys
for line in sys.stdin:
    word = line.strip()
    sword = "".join(sorted(word))
    if sword not in words:
        words[sword] = []
    words[sword].append(word)

for sword in words:
    if 'm' not in sword:
        continue

    i = sword.index('m')
    nword = sword[:i] + sword[i+1:] + 'rb'
    nword = "".join(sorted(nword))

    if nword in words:
        print ", ".join(words[sword]), '->', ", ".join(words[nword])


when run like
$ python words.py < /usr/share/dict/words
will print out (in 2.6 seconds) the contents of http://soy.dyndns.org/~peter/w.txt
(Reply) (Thread)
[User Picture]From: pmb
2008-07-03 09:17 pm (UTC)
But Jesse wins the cookies :-P
(Reply) (Parent) (Thread)
From: keturn
2008-07-04 12:57 am (UTC)
apparently I need to be a lot quicker to get cookies.

Is there some sort of service I can subscribe to that will send me text messages when there are cookie opportunities?
(Reply) (Thread)
[User Picture]From: lindseykuper
2008-07-04 03:27 am (UTC)
If you implement that service, I will send you cookies.
(Reply) (Parent) (Thread)
[User Picture]From: pixelherder
2008-07-04 05:43 am (UTC)
Eh, I'm late to the party, but since I have strange taste for C++, here's my 20 minute C++ version.

The best run took 0.992s real time on my OS X machine and it found 1,522 perturbations from the 234,936 entry word list. It may not be quite as concise at the Ruby and Python implementations, but I don't think it's that bad either.
(Reply) (Thread)
[User Picture]From: jes5199
2008-07-04 03:03 pm (UTC)
that's actually not bad at all. You've got a real dictionary, you've got sort(), you aren't do any memory management. The type signatures are even reasonable.

The only thing that jumps out as "what the hell?" is the parameters on
transform( sorted.begin(), sorted.end(), sorted.begin(), ::tolower );
begin end begin??? I can't intuit what's going on.

also, I think it's cute that you did
sorted[ m ] = 'r';
saves a byte! a ruby programmer would never think of that.
(Reply) (Parent) (Thread) (Expand)
[User Picture]From: jes5199
2008-07-04 03:24 pm (UTC)
I also did a python version. And I'm sort of fascinated by the little differences from ruby
okay, there's array.append(x) instead of array.push x. Fine, whatever.
then there's no string.split(//). I loaded the regexp library, and it's got re.split(string), but "//" doesn't work. Then I find an example of [letter for letter in word]. woah. Is that good? Peter doesn't even bother to split his letters into an array, he's got some function that can sort anything. sorted(iterable).
But I didn't know about that. so array.sort() sorts in-place, like ruby's array.sort!, not like array.sort. okay, I can probably remember that.
where's array.join? no? holy crap: "".join(array) that's just crazy talk.
The think I did with Hash.new{|h,k|} is pretty ugly ruby, but the python my_dictionary.setdefault(key, default_value) is basically useless (Peter's version doesn't pretend that dictionaries should have default values.)
one more quirk letters.index('m'), in ruby, that's your standard test of "is that here or not?" because it returns false (nil, actually) if it doesn't find it. In python land, the same method throws an exception if it doesn't find the thing you want, because they've got the pretty if 'm' not in word: syntax.
(Reply) (Thread)
[User Picture]From: pmb
2008-07-04 08:14 pm (UTC)
The default value stuff is because in Python, near as I can tell, a default value is a default value, rather than a copy-constructed instance of the default value. So if you put a mutable object there, you will get burned, because there is only one instance, instead of a constantly refreshing supply of instances.

Also, more things should be immutable by default. I'm starting to think that having const be a keyword in C was a mistake. They should have made a keyword mutable or something like that, with everything being constant by default. Erlang may have got that one right, which is one of the reasons it is queued up for more study (by me) this fall.
(Reply) (Parent) (Thread) (Expand)
[User Picture]From: jes5199
2008-07-04 03:33 pm (UTC)
it's interesting to me that we all came up with the same algorithm. is there another way we could have solved this?

also, I just noticed that in perl-land, you killed the "m" with a regexp and just did a string append for "rb". I didn't think of that! I did those operations on the array of letters.

Peter and Pixelherder have languages that can sort the letters of a string, and so don't even bother splitting 'em.
(Reply) (Thread)
[User Picture]From: lindseykuper
2008-07-06 11:42 pm (UTC)
it's interesting to me that we all came up with the same algorithm. is there another way we could have solved this?

I got a beautiful R5RS Scheme version from Dr. Stone that doesn't use a hash table. The whole thing is implemented with lists. It's the same algorithm, though, but a different data structure. You could think of it as slow hashing, maybe.

Both he and Alex had one optimization that shaves off a little time: don't add words that don't have an r and a b to the list of hash keys in the first place since they're not possible perturbations.

also, I just noticed that in perl-land, you killed the "m" with a regexp and just did a string append for "rb".

Why didn't I just do s/m/rb/? Sheesh.

(Reply) (Parent) (Thread)
[User Picture]From: pixelherder
2008-07-04 07:16 pm (UTC)
By the way, my favorite perturbation is probably make -> break.
(Reply) (Thread)
[User Picture]From: lindseykuper
2008-07-06 11:43 pm (UTC)
That one is awesome! I like madness -> drabness, too.
(Reply) (Parent) (Thread)
From: boojum
2008-07-06 03:57 pm (UTC)
I am way too late to the party on this, but working on it was fun. More, please!

(The difference between searching in a list and searching in the keys of a hash table is the difference between "ran for more than an hour and got bored" and "under three minutes". Hooray for actual application of big-O notation.
(Reply) (Thread)
From: (Anonymous)
2008-07-16 05:57 am (UTC)

haskell vixie

module Main where
import qualified Data.Map as DM (Map, fromList, toList, filter, map, lookup)
import Data.List (elem, delete, sort, map)

perturb str = sort ('r':'b':(delete 'm' str))

unJust (Just z) = z
filterSomethings = DM.map unJust . DM.filter (/= Nothing)

lookupIn = flip DM.lookup

main = do
  contents <- getContents
  let allWords       = lines contents
  let mWords         = filter ('m' `elem`) $ allWords
  let candidates     = DM.fromList $ zip mWords (map perturb mWords)
  let wordIndex      = DM.fromList $ zip (map sort allWords) allWords
  let perturbations  = filterSomethings $ DM.map (lookupIn wordIndex) candidates
  sequence $ map (\(a,b) -> putStrLn (a ++ " \t-> " ++ b)) $ DM.toList perturbations

-- misterbeebee
(Reply) (Thread)