Purr LaGuerre Considers Unigrams
Purr LaGuerre is a large striped tomcat who at 18 pounds is as large as a Maine Coon Cat. Other than that he is mostly just cat except for the fact he can speak to humans when he wants to. Purr LaGuerre’s name comes from long ago when someone observed he had more stripes than a master sergeant thus making him a perfect war cat. He is also by his own admission an authentic cat genius. If he speaks to you he might remind you of this by mentioning some obscure but correct fact that perfectly illustrates his point.
Purr LaGuerre normally lives with Sean and Amanda in order to enlighten them, but today he had decided to do a little geocaching with BluEyz and Bugsy. After Purr Laguerre startled the cats of the house by speaking to them in a perfect BluEyz voice, the three geocachers left the house. They proceeded down the street and turned on the Oak Hill Trail just around the block to show off another of B&B’s geocaches.
After a few minutes, Bugsy said, “We wanted to create a unique code or cipher to make this a little more challenging but we settled on a simple substitution cipher for the GPS standard location.”
“You could always put the location down in Unigrams”, replied Purr LaGuerre as he paused to sharpen his claws on a birch tree.
BluEyz immediately responded, “Unigrams are just percentages of how many times a word or the letters are used in a document. If you count the various characters in some text and then divide each unique character count by total number of characters, you end up with a Unigram which is just a percentage. The same is true for words. Unigrams can be for characters or words. ”
Bugsy added, “And not only that, they are not unique – A standard English test file has lots of Unigrams that are the same. They only indicate tree probability for a Bigram or a Trigram test. Sometimes they are used in speech recognition programs.”
Purr LaGuerre thought for a moment and replied, “If you make the data set small enough, then the Unigrams can be easily calculated and can be made unique.” He continued, “Let’s make a list of some dates Americans might know. I’ll give you two, 9-11 and the Fourth of July. The only requirements being all the digits from 0 to 9 are used at least once and that the 0 and 1 Unigrams be unique.”
|The List of Dates Americans Might Know|
Orwell’s Big Brother Takes Over
The Attack on America by Al Qaeda
The REAL Millennium Beginning
A Space Odyssey
Ides of March
I Have A Dream – MLK
IRS Tax Day
Apollo 11 Mission (Moon Landing)
Groundhog Day Beginning
Purr LaGuerre then stated, “Because I am an authentic cat genius and also a savant. I will now calculate, accurate to 4 decimal positions truncating as necessary, the Unigrams in my head.” Presently, he announced, “The dates given are sufficient, the Unigrams for zero and one are different.”
Bugsy, who has nerdlike tendencies, announced he preferred to use his solar powered calculator and soon asked, “Do I use only the numbers in the dates or do I consider letters, punctuation, dashes, and spaces?”
Purr LaGuerre replied, “Let’s make it easy on your calculator – use only the numbers in the dates and truncate after 4 decimal positions. If you used everything, the ones in Apollo 11 would change the 1’s count and the rest of the characters would increase the total character count, but the Unigrams created could be used in the same way as those created from the numbers only.”
Soon Bugsy exclaimed, “Some of these Unigrams are not unique. What about collisions?”
Purr LaGuerre responded, “Not a problem if you use different multipliers for the Unigrams that collide; they will then be unique. In fact just multiply each Unigram, except for the zero Unigram, by its base number, and they will be unique numbers but they won’t be Unigrams anymore because they aren’t directly percentages.”
Bugsy then asked, “I get .1791 for the revised 3 Unigram, Is that right?”
Purr LaGuerre impatiently responded, “Yes. I’m glad you can truncate and multiply so easily.”
BluEyz worried aloud, “Geocache puzzles are supposed to be easily solved so that people will be able to find the cache. Let’s not make it so hard that only a few of the very smartest geocachers can find the location.”
The three decided to post the location using revised Unigrams just to see if there are more than a few geocachers who can solve anything. Bugsy’s last thought was to drop the decimal point and present the cache location in standard code notation (groups of five characters). BluEyz desire was that someone be able to solve the puzzle and find the cache. Purr LaGuerre had no changes as he felt the Unigram puzzle perfectly illustrated his cat genius.
Purr LaGuerre’s Mentally Calculated
Cat Genius Geocache Location In
DD MM.MMM and DDD MM.MMM format
with assumed decimal
17882 38831 34939 61788 14902 38808
95417 93134 31344 17908 95089 52388