PDA

View Full Version : Compressing dictionaries



Piglet
02-14-2009, 05:25 PM
I have spent some time wondering about how to compress a dictionary for scrabble or such. The code doesn't need a definition of the word, it just needs to know if the word exists. I've tried several methods but the 'bit-pair' route seems the best.
Basically, start with 26 pointers to tables. Each table entry is 2 bits thus:

00 = does not exist
01 = exists but cannot end a word
10 = exists and may end a word
11 = exists and may or may not end a word.

The table for the third letter is made from the non 00 entries of the last table and is situated after the second letter table. You can easily calculate the size.
I noted that after the third letter, the tables get very small very fast so often it's just 1 or 2 entries.
I have never seen this method used, but then I lead a sheltered life. Anyone know this route already or a better one?

retro
02-15-2009, 06:55 AM
Hmm, interesting.

When I wrote a basic Hangman game for a college project, I used SOWPODS as a word list. That's the official Scrabble word list used here in the UK and many other places, except the USA and Canada who use TWL.

Of course, the problem is that SOWPODS contains 267,751 words, so it is quite a large list! For me, it wasn't a huge problem - I was basically loading the list and selecting one word at random at the beginning of the game, not checking the dictionary constantly.

It is apparently broken up thusly:

2 letter words: 124
3 letter words: 1,292
4 letter words: 5,454
5 letter words: 12,478
6 letter words: 22,157
7 letter words: 32,909
8 letter words: 40,161
9 letter words: 40,727
10 letter words: 35,529
11 letter words: 27,893
12 letter words: 20,297
13 letter words: 13,857
14 letter words: 9,116
15 letter words: 5,757


If you need SOWPODS, try the following:

http://members.ozemail.com.au/~rjackman/wordlist.html

http://www.absp.org.uk/words/words.html

http://home.teleport.com/~stevena/scrabble/wordlists.html

Sorry, that doesn't really answer your question! All I can say is that I've only ever thought about it as "check word exists in DB" rather than looking at each letter and whether it ends a word.

Why not check out an open source Scrabble for pointers? Here are examples:

http://pyscrabble.sourceforge.net/

http://www.gtoal.com/wordgames/scrabble.html