What’s 1000 words worth?

information
words
Published

July 14, 2010

Let’s randomly select 1,000 lines from the dictionary and appends the number of bytes in that sample to a file.

for i in {1..500}; do
    awk 'BEGIN {srand()} {printf "%05.0f %s \n",rand()*99999, $0; }' /usr/share/dict/words | sort -n |\\
    head -1000 | sed 's/^[0-9]* //' | dd 2>&1 | grep "bytes transferred" | awk '{print $1}' >>sizes.dat
done

 

then, in R:

 

> sizes <- read.table("~/sizes.dat", header=TRUE)
> mean(sizes)
   bytes 
11581.83 
> sd(sizes)
   bytes 
90.32316 
> qqnorm(sizes$bytes)
> plot(density(sizes$bytes))
> hist(sizes$bytes, col=rainbow(15, start=.4))
> mean(sizes$bytes) / 1024
[1] 11.31038

 

11.31k is not a very large picture. Each of the exploratory plots (quantile x normal, density, histogram) is larger! Even these pictures of me and my daughter and the fat giraffes are 13k, 13k, and 15k respectively:

13k 13k 15k

Still, here are all the many, many google image hits for ‘entropy’ pictures that are 128 x 128 pixels. Many of these are are in the roughly 11k range.

Happy Bastille Day.