Bayes Graph
The above image depicts the Bayesian probability score for each word in four source texts, graphed according to the linear order of the text. Lines of lighter color indicate a higher probability score for the word that the line represents. (Click on an image for a full size version, with each band labelled).
Each of the texts seem to exhibit “hot spots” of distinctiveness, which runs contrary to my expectations—I would have originally suspected that distinctive tokens would be spread fairly evenly throughout the text, or at least not have a noticeably lumpy (almost Perlinesque!) distribution. I’m thinking that you could use an algorithm like this one to this to automatically summarize/excerpt texts by picking out the most distinctive clumps.
(I’m having a lot of fun with this Bayesian stuff. I’d like to try it out on word collocations, or maybe even character n-grams.)
