I was interested in a question about words and letter replacement. If I take a word, such as shoot
, can I remove a single letter and still make a word?
In this case, yes, of course: hoot
. In fact, I can make three other words as well: soot, shot, shoo
. So how many words do themselves contain words? And which word contains the most words within it?
To answer these questions, we must decide on what counts as a 'word'. Do we include place names, proper nouns, etc.? Fortunately, someone else has done their level best to answer that question, so I'll just take the so-known 'SOWPODS' English word list containing a total of 267,753 words.
-
Go through every word in the dictionary:
-
Go through every letter in the word:
-
Remove that letter from the word, check if the result is in the dictionary
- If it is, store it alongside the origin (parent) word
- If it isn't, skip it
-
Remove that letter from the word, check if the result is in the dictionary
-
Go through every letter in the word:
I could have approached this with a more efficient direct approach, if I was only caring about a specific question, but I wanted to keep the code relatively general as I wanted to look into a few different interesting properties of these word series.
The code for this is available on GitHub.
Words within words
The result of running this finds that 129,677 (48.43%) words in the English dictionary contain at least one other word, when removing one letter.
This does spit out some pretty peculiar words, such as 'eards', 'sared', and 'yerds'. No idea what they mean or why they're in this dictionary, but they are valid scrabble words. It's also notable that the child 'eards' is connected to two of the parent words.
So we've looked at words with the most number of children. How about going another level and looking at words with the most number of children and grand-children?
This gave one clear winner: spains with 12 grand-children. Spains has five immediate children, which share custody over a few of the grand-children. The connections to the grandchildren are a bit messy, you can hover over the links to see more clearly which are connected. I did count only unique grand-child names, though spains wins either way. Why stop there?
Looking at most children, grandchildren and great-grandchildren, the winner is steares.
We can keep going for more generations, click the button to cycle through to more generations. Anything above 7 generations, the results are the same.
Word chains
This made me curious about what is the longest chain of words I can make, where removing one letter at a time produces a new word. I first found one such chain of 8 words, starting at 'choreologies' and ending at 'loges'
Links search
I also implemented a basic search feature so that I could explore if there were any particularly interesting chains or groups. Enter a word into the text box and hit 'Search' to find that word and any connected words. From then, you can click on the circle to then load up any words connected to that.