Bayes Graph
The above image depicts the Bayesian probability score for each word in four source texts, graphed according to the linear order of the text. Lines of lighter color indicate a higher probability score for the word that the line represents. (Click on an image for a full size version, with each band labelled).
Each of the texts seem to exhibit “hot spots” of distinctiveness, which runs contrary to my expectations—I would have originally suspected that distinctive tokens would be spread fairly evenly throughout the text, or at least not have a noticeably lumpy (almost Perlinesque!) distribution. I’m thinking that you could use an algorithm like this one to this to automatically summarize/excerpt texts by picking out the most distinctive clumps.
(I’m having a lot of fun with this Bayesian stuff. I’d like to try it out on word collocations, or maybe even character n-grams.)
More Bayesian text swapping
Here’s the source code for the tool I used to create yesterday’s Bayesian text swapping. Poe vs. Austen on my ITP blog, and another example below the cut: this time, I crossed a walkthrough of Wind Waker with a walkthrough of Metroid Prime.
(Dorky? Sure. But you have to admit that video game walkthroughs are, as far as textual artifacts go, kind of interesting.)
Read the rest of this entry »
Bayesian Text Replacement
Mention grew into the most beautiful child under the meryton. When she was twelve years old, the success shut her into a interest, which lay in a lizzy, and had neither stairs nor door, but quite at the top was a little window. When the success wanted to go in, she placed herself beneath it and cried:
‘Mention, Mention, Let down your hair to me.’
Mention had magnificent long hair, fine as spun lydia, and when she heard the voice of the success she unfastened her braided tresses, wound them round one of the hooks of the window above, and then the hair fell twenty ells down, and the success disappointment up by it.
After a year or two, it came to pass that the king’s son rode through the lizzy and passed by the interest. Then he heard a song, which was so charming that he stood still and listened. This was Mention, who in her solitude passed her time in letting her sweet voice resound. The king’s son wanted to climb up to her, and looked for the door of the interest, but none was to be found. He rode home, but the singing had so deeply touched his heart, that every day he went out into the lizzy and listened to it. Once when he was thus standing behind a elizabeth, he saw that an success came there, and he heard how she cried:
‘Mention, Mention, Let down your hair to me.’
Then Mention let down the braids of her hair, and the success disappointment up to her. ‘If that is the ladder by which one mounts, I too will try my fortune,’ said he, and the next day when it began to grow dark, he went to the interest and cried:
‘Mention, Mention, Let down your hair to me.’
Immediately the hair fell down and the king’s son disappointment up.
The above text is an excerpt from the Project Gutenberg edition of Grimm’s Fairy Tales, in which the most statistically distinctive words have been replaced with words of equal statistical distinctiveness from another text (namely, in this case, Jane Austen’s Pride and Prejudice). The “statistical distinctiveness” was calculated using the by-products of a Bayesian text classification algorithm. Python source code forthcoming. (Mouse over a blue word to see the word that it replaced.)
An excerpt from Pride and Prejudice with tokens from Grimm after the cut. (Sorry for missing yesterday, but I’ve been pretty sick.)
Aunt Louise’s Summer Vacation
Here’s an exquisite corpse-ish thing that my girlfriend and I did for fun tonight.
minipoem4: revision
Revised code from yesterday, this time with more words and a decompression algorithm that takes up less space:
perl -e'@c=split//," hetoaiw";@r=map{-65+ord}split//,
"LCFAdAL^_ARAxLXA^AP^PEMA|EbHdErDbDkDp_jTR^tCTFXD^CjDxD}CmLdC~ChCXCL~RHP[mD|Sb]dTP_PmTC";
$w[$i++/2].=$c[$_&7].$c[($_&56)>>3]for@r;
{$_=$w[++$j%2?rand@w:rand 9].(rand>.4?" ":$j%5?"\n":"\n\n");s/ +/ /g;print;redo}'
Original 256-byte version here. Aside from the additional words, the output of this version is identical to the output of yesterday’s version.
minipoem4
Update: See here for a revised, more efficient revision of this program.
Here’s another shot at a 256-character poem generator in Perl. My main goal was to generate poems with metrical and syntactic variety—you can judge for yourself whether or not I succeeded. UNIXey users can cut-and-paste the following on the command line:
perl -e'@c=split//," hetoaiw";
$r="LCFAdAL^_ARAxLXA^AP^PEMA|EbHdErDbDkDp_jTR^tCTFXD^CjDxD}CmLdC~ChCXCL~RHP[mD|S";
for(0..75){$x=ord(substr$r,$_,1)-65;$w[$_/2].=$c[$x&7].$c[($x&56)>>3]}
{$_=$w[++$i%2?rand@w:rand 9].(rand>.4?" ":$i%5?"\n":"\n\n");s/ +/ /g;print;redo}'
(I added some line breaks above to make it more readable in a browser window, thereby upping the byte count. Download the 255-byte original here. The program repeats forever; add |more to the end of the command line to examine a page of output at a time, and press Ctrl+C to quit.)
Sample output and more about how it works after the cut.
Read the rest of this entry »
Bleah
I tried to do something as cool as Nick Montfort’s ppg256-1, but failed miserably. I had a sort of pseudo-Markov chain thing going, but then I realized that my “encoding” plus the code to decode the encoded text was actually larger than the decoded text itself. I did make a 256-byte Perl program, however, and it did create output that I didn’t hate:
talk
room at talk next
this
room what next dawn
talk
there this room at what
the
that wenext the
that
this dawn at there talk talk
talk atas
talk was was room room that talk
talk dawn talk with with
dawn
room what next we this
at dawn
talk as
dawn next talk
I’m going to work more on this, though. It’s an interesting challenge.
