import json
import numpy as np
import pandas as pd
import ipywidgets as widgets
from sklearn.preprocessing import LabelEncoder
from sklearn.feature_selection import mutual_info_classif
from warnings import simplefilter
from collections import deque
Wordle’s addictive nature has caused it to spread like wildfire, captivating players around the world who find themselves entangled in the pursuit of streaks. It has become a quest for daily victories, a battle against time to maintain unbroken chains of triumphant solves. The desire to conquer the next puzzle and extend those streaks has become an obsession that consumes countless hours of human capital.
We’re witnessing a peculiar phenomenon: a society on the brink of crumbling under the weight of unsolved Wordle puzzles. Productivity is plummeting, deadlines are missed, pets and children are neglected while people fret endlessly over deciphering those elusive five-letter words.
In the face of this crisis, I took it upon myself, to save our civilization from its Wordle-induced downfall. The solution was simple: I developed a Wordle solver. Yes, a program designed to expedite the solving process and allow people to return to more important tasks without sacrificing their streaks or sanity.
I am certain that in a few years from now, I’ll be recognized as the hero of productivity, wielding the power of math and Python to restore balance in the world. Please don’t make my statue too big, I’m quite modest.
Read on if you want to learn about the inner workings of my Wordle solver.
How to play Wordle
In case you’re one of the lucky few who hasn’t been caught by the grip of Wordle’s addictive tendrils, let’s go over the rules.
The objective is to guess a five-letter mystery word within six attempts. After each guess, you receive feedback in the form of colored boxes. Green boxes indicate correct letters in the right position, while yellow boxes indicate correct letters in the wrong position. Gray boxes mean the word doesn’t contain this letter.
After trying SPOON we know the position of S. We also know that the word must contain an A and a N. The secret word doesn’t contain any of the letters E, R, I, O, or P.
There are four more tries left. Feeling anxious already?
The approach
I started by writing Python code to produce a decision tree to tackle Wordle games. Each node within the tree suggests a word to try. The branches represent the possible color patterns that Wordle provides as feedback. By navigating this tree, players can solve the daily Wordle puzzles.
In this post I’ll first focus on the code for generating the decision tree. Then I’ll presents a user interface built from widgets that enables interactive traversal of this tree.
Importing libraries
We start by importing the necessary libraries. Note how we don’t import any tree models from sklearn
as we’ll roll our own.
Getting the list of possible answers
The following list was taken directly from the Wordle JavaScript source code. The source code actually contains two lists. One is the list of possible answers. The second is a list of valid 5-letter words.
We’ll only use the first list. This will still allow the decision tree to solve every possible puzzle reasonably efficiently. Tree induction will be much faster this way.
We might need to update this list in the future if the New York Times decides to update the possible set of answers. You might also want to update this list if you want to create a decision tree for one of the many Wordle clones.
= ["cigar","rebut","sissy","humph","awake","blush","focal","evade","naval","serve","heath","dwarf","model","karma","stink","grade","quiet","bench","abate","feign","major","death","fresh","crust","stool","colon","abase","marry","react","batty","pride","floss","helix","croak","staff","paper","unfed","whelp","trawl","outdo","adobe","crazy","sower","repay","digit","crate","cluck","spike","mimic","pound","maxim","linen","unmet","flesh","booby","forth","first","stand","belly","ivory","seedy","print","yearn","drain","bribe","stout","panel","crass","flume","offal","agree","error","swirl","argue","bleed","delta","flick","totem","wooer","front","shrub","parry","biome","lapel","start","greet","goner","golem","lusty","loopy","round","audit","lying","gamma","labor","islet","civic","forge","corny","moult","basic","salad","agate","spicy","spray","essay","fjord","spend","kebab","guild","aback","motor","alone","hatch","hyper","thumb","dowry","ought","belch","dutch","pilot","tweed","comet","jaunt","enema","steed","abyss","growl","fling","dozen","boozy","erode","world","gouge","click","briar","great","altar","pulpy","blurt","coast","duchy","groin","fixer","group","rogue","badly","smart","pithy","gaudy","chill","heron","vodka","finer","surer","radio","rouge","perch","retch","wrote","clock","tilde","store","prove","bring","solve","cheat","grime","exult","usher","epoch","triad","break","rhino","viral","conic","masse","sonic","vital","trace","using","peach","champ","baton","brake","pluck","craze","gripe","weary","picky","acute","ferry","aside","tapir","troll","unify","rebus","boost","truss","siege","tiger","banal","slump","crank","gorge","query","drink","favor","abbey","tangy","panic","solar","shire","proxy","point","robot","prick","wince","crimp","knoll","sugar","whack","mount","perky","could","wrung","light","those","moist","shard","pleat","aloft","skill","elder","frame","humor","pause","ulcer","ultra","robin","cynic","aroma","caulk","shake","dodge","swill","tacit","other","thorn","trove","bloke","vivid","spill","chant","choke","rupee","nasty","mourn","ahead","brine","cloth","hoard","sweet","month","lapse","watch","today","focus","smelt","tease","cater","movie","saute","allow","renew","their","slosh","purge","chest","depot","epoxy","nymph","found","shall","harry","stove","lowly","snout","trope","fewer","shawl","natal","comma","foray","scare","stair","black","squad","royal","chunk","mince","shame","cheek","ample","flair","foyer","cargo","oxide","plant","olive","inert","askew","heist","shown","zesty","hasty","trash","fella","larva","forgo","story","hairy","train","homer","badge","midst","canny","fetus","butch","farce","slung","tipsy","metal","yield","delve","being","scour","glass","gamer","scrap","money","hinge","album","vouch","asset","tiara","crept","bayou","atoll","manor","creak","showy","phase","froth","depth","gloom","flood","trait","girth","piety","payer","goose","float","donor","atone","primo","apron","blown","cacao","loser","input","gloat","awful","brink","smite","beady","rusty","retro","droll","gawky","hutch","pinto","gaily","egret","lilac","sever","field","fluff","hydro","flack","agape","voice","stead","stalk","berth","madam","night","bland","liver","wedge","augur","roomy","wacky","flock","angry","bobby","trite","aphid","tryst","midge","power","elope","cinch","motto","stomp","upset","bluff","cramp","quart","coyly","youth","rhyme","buggy","alien","smear","unfit","patty","cling","glean","label","hunky","khaki","poker","gruel","twice","twang","shrug","treat","unlit","waste","merit","woven","octal","needy","clown","widow","irony","ruder","gauze","chief","onset","prize","fungi","charm","gully","inter","whoop","taunt","leery","class","theme","lofty","tibia","booze","alpha","thyme","eclat","doubt","parer","chute","stick","trice","alike","sooth","recap","saint","liege","glory","grate","admit","brisk","soggy","usurp","scald","scorn","leave","twine","sting","bough","marsh","sloth","dandy","vigor","howdy","enjoy","valid","ionic","equal","unset","floor","catch","spade","stein","exist","quirk","denim","grove","spiel","mummy","fault","foggy","flout","carry","sneak","libel","waltz","aptly","piney","inept","aloud","photo","dream","stale","vomit","ombre","fanny","unite","snarl","baker","there","glyph","pooch","hippy","spell","folly","louse","gulch","vault","godly","threw","fleet","grave","inane","shock","crave","spite","valve","skimp","claim","rainy","musty","pique","daddy","quasi","arise","aging","valet","opium","avert","stuck","recut","mulch","genre","plume","rifle","count","incur","total","wrest","mocha","deter","study","lover","safer","rivet","funny","smoke","mound","undue","sedan","pagan","swine","guile","gusty","equip","tough","canoe","chaos","covet","human","udder","lunch","blast","stray","manga","melee","lefty","quick","paste","given","octet","risen","groan","leaky","grind","carve","loose","sadly","spilt","apple","slack","honey","final","sheen","eerie","minty","slick","derby","wharf","spelt","coach","erupt","singe","price","spawn","fairy","jiffy","filmy","stack","chose","sleep","ardor","nanny","niece","woozy","handy","grace","ditto","stank","cream","usual","diode","valor","angle","ninja","muddy","chase","reply","prone","spoil","heart","shade","diner","arson","onion","sleet","dowel","couch","palsy","bowel","smile","evoke","creek","lance","eagle","idiot","siren","built","embed","award","dross","annul","goody","frown","patio","laden","humid","elite","lymph","edify","might","reset","visit","gusto","purse","vapor","crock","write","sunny","loath","chaff","slide","queer","venom","stamp","sorry","still","acorn","aping","pushy","tamer","hater","mania","awoke","brawn","swift","exile","birch","lucky","freer","risky","ghost","plier","lunar","winch","snare","nurse","house","borax","nicer","lurch","exalt","about","savvy","toxin","tunic","pried","inlay","chump","lanky","cress","eater","elude","cycle","kitty","boule","moron","tenet","place","lobby","plush","vigil","index","blink","clung","qualm","croup","clink","juicy","stage","decay","nerve","flier","shaft","crook","clean","china","ridge","vowel","gnome","snuck","icing","spiny","rigor","snail","flown","rabid","prose","thank","poppy","budge","fiber","moldy","dowdy","kneel","track","caddy","quell","dumpy","paler","swore","rebar","scuba","splat","flyer","horny","mason","doing","ozone","amply","molar","ovary","beset","queue","cliff","magic","truce","sport","fritz","edict","twirl","verse","llama","eaten","range","whisk","hovel","rehab","macaw","sigma","spout","verve","sushi","dying","fetid","brain","buddy","thump","scion","candy","chord","basin","march","crowd","arbor","gayly","musky","stain","dally","bless","bravo","stung","title","ruler","kiosk","blond","ennui","layer","fluid","tatty","score","cutie","zebra","barge","matey","bluer","aider","shook","river","privy","betel","frisk","bongo","begun","azure","weave","genie","sound","glove","braid","scope","wryly","rover","assay","ocean","bloom","irate","later","woken","silky","wreck","dwelt","slate","smack","solid","amaze","hazel","wrist","jolly","globe","flint","rouse","civil","vista","relax","cover","alive","beech","jetty","bliss","vocal","often","dolly","eight","joker","since","event","ensue","shunt","diver","poser","worst","sweep","alley","creed","anime","leafy","bosom","dunce","stare","pudgy","waive","choir","stood","spoke","outgo","delay","bilge","ideal","clasp","seize","hotly","laugh","sieve","block","meant","grape","noose","hardy","shied","drawl","daisy","putty","strut","burnt","tulip","crick","idyll","vixen","furor","geeky","cough","naive","shoal","stork","bathe","aunty","check","prime","brass","outer","furry","razor","elect","evict","imply","demur","quota","haven","cavil","swear","crump","dough","gavel","wagon","salon","nudge","harem","pitch","sworn","pupil","excel","stony","cabin","unzip","queen","trout","polyp","earth","storm","until","taper","enter","child","adopt","minor","fatty","husky","brave","filet","slime","glint","tread","steal","regal","guest","every","murky","share","spore","hoist","buxom","inner","otter","dimly","level","sumac","donut","stilt","arena","sheet","scrub","fancy","slimy","pearl","silly","porch","dingo","sepia","amble","shady","bread","friar","reign","dairy","quill","cross","brood","tuber","shear","posit","blank","villa","shank","piggy","freak","which","among","fecal","shell","would","algae","large","rabbi","agony","amuse","bushy","copse","swoon","knife","pouch","ascot","plane","crown","urban","snide","relay","abide","viola","rajah","straw","dilly","crash","amass","third","trick","tutor","woody","blurb","grief","disco","where","sassy","beach","sauna","comic","clued","creep","caste","graze","snuff","frock","gonad","drunk","prong","lurid","steel","halve","buyer","vinyl","utile","smell","adage","worry","tasty","local","trade","finch","ashen","modal","gaunt","clove","enact","adorn","roast","speck","sheik","missy","grunt","snoop","party","touch","mafia","emcee","array","south","vapid","jelly","skulk","angst","tubal","lower","crest","sweat","cyber","adore","tardy","swami","notch","groom","roach","hitch","young","align","ready","frond","strap","puree","realm","venue","swarm","offer","seven","dryer","diary","dryly","drank","acrid","heady","theta","junto","pixie","quoth","bonus","shalt","penne","amend","datum","build","piano","shelf","lodge","suing","rearm","coral","ramen","worth","psalm","infer","overt","mayor","ovoid","glide","usage","poise","randy","chuck","prank","fishy","tooth","ether","drove","idler","swath","stint","while","begat","apply","slang","tarot","radar","credo","aware","canon","shift","timer","bylaw","serum","three","steak","iliac","shirk","blunt","puppy","penal","joist","bunny","shape","beget","wheel","adept","stunt","stole","topaz","chore","fluke","afoot","bloat","bully","dense","caper","sneer","boxer","jumbo","lunge","space","avail","short","slurp","loyal","flirt","pizza","conch","tempo","droop","plate","bible","plunk","afoul","savoy","steep","agile","stake","dwell","knave","beard","arose","motif","smash","broil","glare","shove","baggy","mammy","swamp","along","rugby","wager","quack","squat","snaky","debit","mange","skate","ninth","joust","tramp","spurn","medal","micro","rebel","flank","learn","nadir","maple","comfy","remit","gruff","ester","least","mogul","fetch","cause","oaken","aglow","meaty","gaffe","shyly","racer","prowl","thief","stern","poesy","rocky","tweet","waist","spire","grope","havoc","patsy","truly","forty","deity","uncle","swish","giver","preen","bevel","lemur","draft","slope","annoy","lingo","bleak","ditty","curly","cedar","dirge","grown","horde","drool","shuck","crypt","cumin","stock","gravy","locus","wider","breed","quite","chafe","cache","blimp","deign","fiend","logic","cheap","elide","rigid","false","renal","pence","rowdy","shoot","blaze","envoy","posse","brief","never","abort","mouse","mucky","sulky","fiery","media","trunk","yeast","clear","skunk","scalp","bitty","cider","koala","duvet","segue","creme","super","grill","after","owner","ember","reach","nobly","empty","speed","gipsy","recur","smock","dread","merge","burst","kappa","amity","shaky","hover","carol","snort","synod","faint","haunt","flour","chair","detox","shrew","tense","plied","quark","burly","novel","waxen","stoic","jerky","blitz","beefy","lyric","hussy","towel","quilt","below","bingo","wispy","brash","scone","toast","easel","saucy","value","spice","honor","route","sharp","bawdy","radii","skull","phony","issue","lager","swell","urine","gassy","trial","flora","upper","latch","wight","brick","retry","holly","decal","grass","shack","dogma","mover","defer","sober","optic","crier","vying","nomad","flute","hippo","shark","drier","obese","bugle","tawny","chalk","feast","ruddy","pedal","scarf","cruel","bleat","tidal","slush","semen","windy","dusty","sally","igloo","nerdy","jewel","shone","whale","hymen","abuse","fugue","elbow","crumb","pansy","welsh","syrup","terse","suave","gamut","swung","drake","freed","afire","shirt","grout","oddly","tithe","plaid","dummy","broom","blind","torch","enemy","again","tying","pesky","alter","gazer","noble","ethos","bride","extol","decor","hobby","beast","idiom","utter","these","sixth","alarm","erase","elegy","spunk","piper","scaly","scold","hefty","chick","sooty","canal","whiny","slash","quake","joint","swept","prude","heavy","wield","femme","lasso","maize","shale","screw","spree","smoky","whiff","scent","glade","spent","prism","stoke","riper","orbit","cocoa","guilt","humus","shush","table","smirk","wrong","noisy","alert","shiny","elate","resin","whole","hunch","pixel","polar","hotel","sword","cleat","mango","rumba","puffy","filly","billy","leash","clout","dance","ovate","facet","chili","paint","liner","curio","salty","audio","snake","fable","cloak","navel","spurt","pesto","balmy","flash","unwed","early","churn","weedy","stump","lease","witty","wimpy","spoof","saner","blend","salsa","thick","warty","manic","blare","squib","spoon","probe","crepe","knack","force","debut","order","haste","teeth","agent","widen","icily","slice","ingot","clash","juror","blood","abode","throw","unity","pivot","slept","troop","spare","sewer","parse","morph","cacti","tacky","spool","demon","moody","annex","begin","fuzzy","patch","water","lumpy","admin","omega","limit","tabby","macho","aisle","skiff","basis","plank","verge","botch","crawl","lousy","slain","cubic","raise","wrack","guide","foist","cameo","under","actor","revue","fraud","harpy","scoop","climb","refer","olden","clerk","debar","tally","ethic","cairn","tulle","ghoul","hilly","crude","apart","scale","older","plain","sperm","briny","abbot","rerun","quest","crisp","bound","befit","drawn","suite","itchy","cheer","bagel","guess","broad","axiom","chard","caput","leant","harsh","curse","proud","swing","opine","taste","lupus","gumbo","miner","green","chasm","lipid","topic","armor","brush","crane","mural","abled","habit","bossy","maker","dusky","dizzy","lithe","brook","jazzy","fifty","sense","giant","surly","legal","fatal","flunk","began","prune","small","slant","scoff","torus","ninny","covey","viper","taken","moral","vogue","owing","token","entry","booth","voter","chide","elfin","ebony","neigh","minim","melon","kneed","decoy","voila","ankle","arrow","mushy","tribe","cease","eager","birth","graph","odder","terra","weird","tried","clack","color","rough","weigh","uncut","ladle","strip","craft","minus","dicey","titan","lucid","vicar","dress","ditch","gypsy","pasta","taffy","flame","swoop","aloof","sight","broke","teary","chart","sixty","wordy","sheer","leper","nosey","bulge","savor","clamp","funky","foamy","toxic","brand","plumb","dingy","butte","drill","tripe","bicep","tenor","krill","worse","drama","hyena","think","ratio","cobra","basil","scrum","bused","phone","court","camel","proof","heard","angel","petal","pouty","throb","maybe","fetal","sprig","spine","shout","cadet","macro","dodgy","satyr","rarer","binge","trend","nutty","leapt","amiss","split","myrrh","width","sonar","tower","baron","fever","waver","spark","belie","sloop","expel","smote","baler","above","north","wafer","scant","frill","awash","snack","scowl","frail","drift","limbo","fence","motel","ounce","wreak","revel","talon","prior","knelt","cello","flake","debug","anode","crime","salve","scout","imbue","pinky","stave","vague","chock","fight","video","stone","teach","cleft","frost","prawn","booty","twist","apnea","stiff","plaza","ledge","tweak","board","grant","medic","bacon","cable","brawl","slunk","raspy","forum","drone","women","mucus","boast","toddy","coven","tumor","truer","wrath","stall","steam","axial","purer","daily","trail","niche","mealy","juice","nylon","plump","merry","flail","papal","wheat","berry","cower","erect","brute","leggy","snipe","sinew","skier","penny","jumpy","rally","umbra","scary","modem","gross","avian","greed","satin","tonic","parka","sniff","livid","stark","trump","giddy","reuse","taboo","avoid","quote","devil","liken","gloss","gayer","beret","noise","gland","dealt","sling","rumor","opera","thigh","tonga","flare","wound","white","bulky","etude","horse","circa","paddy","inbox","fizzy","grain","exert","surge","gleam","belle","salvo","crush","fruit","sappy","taker","tract","ovine","spiky","frank","reedy","filth","spasm","heave","mambo","right","clank","trust","lumen","borne","spook","sauce","amber","lathe","carat","corer","dirty","slyly","affix","alloy","taint","sheep","kinky","wooly","mauve","flung","yacht","fried","quail","brunt","grimy","curvy","cagey","rinse","deuce","state","grasp","milky","bison","graft","sandy","baste","flask","hedge","girly","swash","boney","coupe","endow","abhor","welch","blade","tight","geese","miser","mirth","cloud","cabal","leech","close","tenth","pecan","droit","grail","clone","guise","ralph","tango","biddy","smith","mower","payee","serif","drape","fifth","spank","glaze","allot","truck","kayak","virus","testy","tepee","fully","zonal","metro","curry","grand","banjo","axion","bezel","occur","chain","nasal","gooey","filer","brace","allay","pubic","raven","plead","gnash","flaky","munch","dully","eking","thing","slink","hurry","theft","shorn","pygmy","ranch","wring","lemon","shore","mamma","froze","newer","style","moose","antic","drown","vegan","chess","guppy","union","lever","lorry","image","cabby","druid","exact","truth","dopey","spear","cried","chime","crony","stunk","timid","batch","gauge","rotor","crack","curve","latte","witch","bunch","repel","anvil","soapy","meter","broth","madly","dried","scene","known","magma","roost","woman","thong","punch","pasty","downy","knead","whirl","rapid","clang","anger","drive","goofy","email","music","stuff","bleep","rider","mecca","folio","setup","verso","quash","fauna","gummy","happy","newly","fussy","relic","guava","ratty","fudge","femur","chirp","forte","alibi","whine","petty","golly","plait","fleck","felon","gourd","brown","thrum","ficus","stash","decry","wiser","junta","visor","daunt","scree","impel","await","press","whose","turbo","stoop","speak","mangy","eying","inlet","crone","pulse","mossy","staid","hence","pinch","teddy","sully","snore","ripen","snowy","attic","going","leach","mouth","hound","clump","tonal","bigot","peril","piece","blame","haute","spied","undid","intro","basal","shine","gecko","rodeo","guard","steer","loamy","scamp","scram","manly","hello","vaunt","organ","feral","knock","extra","condo","adapt","willy","polka","rayon","skirt","faith","torso","match","mercy","tepid","sleek","riser","twixt","peace","flush","catty","login","eject","roger","rival","untie","refit","aorta","adult","judge","rower","artsy","rural","shave"] answers_list
Next we’ll turn this list into a Series
object. We’ll also sort the answers.
= pd.Series(answers_list).sort_values(ignore_index=True) answers
Wordle’s clues algorithm
After each guess Wordle provides you with clues as to how close your guess was. Instead of colors we’ll use the following encoding:
- F (False), the letter does not appear in the word in any spot;
- P (Position), the letter appears in the word but is in the wrong spot;
- T (True), the letter is in the word and in the correct spot.
The match
function generates the clues for a guess and an answer.
def match(guess, answer):
= ['F'] * 5
result = list(answer)
chars
# Mark correct letter and position as T
for i in range(5):
if guess[i] == answer[i]:
= 'T'
result[i] = '_'
chars[i]
# Mark correct letter, wrong position as P
for i in range(5):
for j in range(5):
if guess[i] == chars[j] and result[i] == 'F':
= 'P'
result[i] = '_'
chars[j]
# All other positions are marked as F by default
return ''.join(result)
Filling the DataFrame
Using the wordlist and the match
function we can create the DataFrame
we’ll use to build the decision tree.
# Add column with possible answers
= answers.to_frame(name="answer") df
# Suppress PerformanceWarning
="ignore", category=pd.errors.PerformanceWarning)
simplefilter(action
# Add columns for answers
for index, value in answers.items():
= df["answer"].map(lambda v: match(value, v)) df[value]
Let’s print the first ten rows to see how things turned out.
10) df.head(
answer | aback | abase | abate | abbey | abbot | abhor | abide | abled | abode | ... | wryly | yacht | yearn | yeast | yield | young | youth | zebra | zesty | zonal | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | aback | TTTTT | TTTFF | TTTFF | TTFFF | TTFFF | TTFFF | TTFFF | TTFFF | TTFFF | ... | FFFFF | FPPFF | FFTFF | FFTFF | FFFFF | FFFFF | FFFFF | FFPFP | FFFFF | FFFPF |
1 | abase | TTTFF | TTTTT | TTTFT | TTFPF | TTFFF | TTFFF | TTFFT | TTFPF | TTFFT | ... | FFFFF | FPFFF | FPTFF | FPTTF | FFPFF | FFFFF | FFFFF | FPPFP | FPPFF | FFFPF |
2 | abate | TTTFF | TTTFT | TTTTT | TTFPF | TTFFP | TTFFF | TTFFT | TTFPF | TTFFT | ... | FFFFF | FPFFP | FPTFF | FPTFP | FFPFF | FFFFF | FFFTF | FPPFP | FPFTF | FFFPF |
3 | abbey | TTFFF | TTFFP | TTFFP | TTTTT | TTTFF | TTFFF | TTFFP | TTFTF | TTFFP | ... | FFFFT | PPFFF | PPPFF | PPPFF | PFPFF | PFFFF | PFFFF | FPTFP | FPFFT | FFFPF |
4 | abbot | TTFFF | TTFFF | TTFPF | TTTFF | TTTTT | TTFTF | TTFFF | TTFFF | TTPFF | ... | FFFFF | FPFFT | FFPFF | FFPFT | FFFFF | FPFFF | FPFPF | FFTFP | FFFPF | FPFPF |
5 | abhor | TTFFF | TTFFF | TTFFF | TTFFF | TTFTF | TTTTT | TTFFF | TTFFF | TTPFF | ... | FPFFF | FPFPF | FFPPF | FFPFF | FFFFF | FPFFF | FPFFP | FFPPP | FFFFF | FPFPF |
6 | abide | TTFFF | TTFFT | TTFFT | TTFPF | TTFFF | TTFFF | TTTTT | TTFPP | TTFTT | ... | FFFFF | FPFFF | FPPFF | FPPFF | FPPFP | FFFFF | FFFFF | FPPFP | FPFFF | FFFPF |
7 | abled | TTFFF | TTFFP | TTFFP | TTFTF | TTFFF | TTFFF | TTFPP | TTTTT | TTFPP | ... | FFFPF | FPFFF | FPPFF | FPPFF | FFPPT | FFFFF | FFFFF | FPPFP | FPFFF | FFFPP |
8 | abode | TTFFF | TTFFT | TTFFT | TTFPF | TTFPF | TTFPF | TTFTT | TTFPP | TTTTT | ... | FFFFF | FPFFF | FPPFF | FPPFF | FFPFP | FPFFF | FPFFF | FPPFP | FPFFF | FPFPF |
9 | abort | TTFFF | TTFFF | TTFPF | TTFFF | TTFPT | TTFPP | TTFFF | TTFFF | TTTFF | ... | FPFFF | FPFFT | FFPTF | FFPFT | FFFFF | FPFFF | FPFPF | FFPTP | FFFPF | FPFPF |
10 rows × 2310 columns
Preprocessing
We’ll separate the DataFrame
into features and the target variable. Then we’ll use sklearn
’s LabelEncoder
to convert the features into numeric values. We’ll use the same encoding for all columns. We do this because in our tree induction algorithm below we use sklearn
’s mutual_info_classif
function that expects features to be numeric.
= df.drop(columns="answer")
X = df["answer"] y
Let’s prepare the label encoder.
= LabelEncoder()
le le.fit(X.stack().unique())
LabelEncoder()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
LabelEncoder()
Now we can convert all columns.
= X.apply(le.transform) X
Again, let’s print the first few rows to check if the encoding went alright.
X.head()
aback | abase | abate | abbey | abbot | abhor | abide | abled | abode | abort | ... | wryly | yacht | yearn | yeast | yield | young | youth | zebra | zesty | zonal | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 237 | 231 | 231 | 214 | 214 | 214 | 214 | 214 | 214 | 214 | ... | 0 | 36 | 18 | 18 | 0 | 0 | 0 | 10 | 0 | 3 |
1 | 231 | 237 | 233 | 217 | 214 | 214 | 216 | 217 | 216 | 214 | ... | 0 | 27 | 45 | 51 | 9 | 0 | 0 | 37 | 36 | 3 |
2 | 231 | 233 | 237 | 217 | 215 | 214 | 216 | 217 | 216 | 215 | ... | 0 | 28 | 45 | 46 | 9 | 0 | 6 | 37 | 33 | 3 |
3 | 214 | 215 | 215 | 237 | 231 | 214 | 215 | 220 | 215 | 214 | ... | 2 | 108 | 117 | 117 | 90 | 81 | 81 | 46 | 29 | 3 |
4 | 214 | 214 | 217 | 231 | 237 | 220 | 214 | 214 | 223 | 225 | ... | 0 | 29 | 9 | 11 | 0 | 27 | 30 | 19 | 3 | 30 |
5 rows × 2309 columns
We can see that the patterns have been replaced by numbers. Every number is an index of a value in the le.classes_
array. For demonstration purposes, let’s look at the first 10 items of this array.
10] le.classes_[:
array(['FFFFF', 'FFFFP', 'FFFFT', 'FFFPF', 'FFFPP', 'FFFPT', 'FFFTF',
'FFFTP', 'FFFTT', 'FFPFF'], dtype=object)
Tree induction
Now we’re ready to generate our tree.
Several tree induction algorithms exist. They generally work top down and use some metric to determine what feature to split by at each node. The sklearn
library provides CART (Classification And Regression Trees) that by default use the Gini impurity as a metric. Nodes in CART, however, only support binary splits which would make our decision tree unnecessarily deep and large. An alternative candidate algorithm could be ID3 or ID4.5. These algorithms use a concept known as information gain (the reduction in entropy after splitting by a variable) to determine what feature to split by, and they support non-binary nodes. A few Python libraries exist that implement ID3/ID4.5. We could use one of those. However, we require a bit more control over how we select the best feature, which is why we roll our own tree induction algorithm.
To determine what feature (word) to split by in each node, we use a metric called mutual information which is the expected value of the information gain. This metric is commonly used for feature selection and is already provided by the sklearn
library as mutual_info_classif
. At each node multiple words might be equally good candidates; therefor, in addition to mutual information, we will also consider whether a feature could actually be the correct answer which will generally lead to slightly more optimal trees.
The approach I describe here won’t guarantee an optimal tree. We’re using a greedy strategy to select the most promising features. Selecting less promising features earlier on might yield more optimal splits at a later stage; however, tree induction would take much longer. We’ll still end up with an efficient tree for solving the daily Wordle, there’s just no guarantee it will be the optimal one. Realise that this is true for tree induction algorithms like CART/ID3/ID4.5 as well. Tree induction algorithms simply make locally optimal choices at each split based on the available data, and these choices may not lead to the overall best tree structure.
class WordleDecisionTree:
def fit(self, input, output, labels):
= input.copy()
data = output
data[output.name] self.labels = labels
self.tree = self.decision_tree(data, data, input.columns, output.name)
def decision_tree(self, data, orginal_data, feature_attribute_names, target_attribute_name):
= np.unique(data[target_attribute_name])
unique_classes if len(unique_classes) <= 1:
return unique_classes[0]
else:
# determine best feature using mutual information
= dict(zip(feature_attribute_names, mutual_info_classif(
stats
data[feature_attribute_names],
data[target_attribute_name], =True)
discrete_features
))= max(stats, key=lambda key: stats[key])
best_feature
# create tree structure, empty at first
= {best_feature: {}}
tree
# remove best feature from available features, it will become the parent node
= [i for i in feature_attribute_names if i != best_feature]
feature_attribute_names
# create nodes under parent node
= np.unique(data[best_feature])
parent_attribute_values for value in parent_attribute_values:
= data.where(data[best_feature] == value).dropna()
sub_data
= np.unique(sub_data[target_attribute_name])
remaining_features if len(remaining_features) <= 2:
= self.decision_tree(sub_data, orginal_data, remaining_features, target_attribute_name)
subtree else:
= self.decision_tree(sub_data, orginal_data, feature_attribute_names, target_attribute_name)
subtree
# add subtree to original tree
if self.labels[int(value)] != "TTTTT":
self.labels[int(value)]] = subtree
tree[best_feature][
return tree
We’ll instantiate this class and generate the tree. This took about 12 minutes on my machine.
= WordleDecisionTree()
model model.fit(X, y, le.classes_)
The tree is stored as a nested Python dictionary. Let’s convert it to a JSON string and print the first fifteen lines.
print('\n'.join(json.dumps(model.tree, indent=4).split('\n')[:15]))
{
"raise": {
"FFFFF": {
"mulch": {
"FFFFF": {
"goody": {
"FFTFF": "known",
"FTFFT": {
"bobby": {
"FTFFT": "poppy"
}
},
"FTFPT": "downy",
"FTFTT": {
"dowdy": {
The root node of this tree shows the best word to start with. Again, remember that we are only considering words that could actually be a valid answer. We can’t make any claims about this word being the best word to start with. What we can claim however is that on some day we will guess the correct answer on a first try.
print(next(iter(model.tree)))
raise
We can determine the max depth of the decision tree.
def depth(d):
= deque([(id(d), d, 1)])
queue = set()
memo while queue:
= queue.popleft()
id_, o, level if id_ in memo:
continue
memo.add(id_)if isinstance(o, dict):
+= ((id(v), v, level + 1) for v in o.values())
queue return level
depth(model.tree)
11
Every level in this nested dictionary is either a guess or a response. So a depth of 11 shows we can always guess the correct answer in 6 tries or less. I imagine you letting out a huge sigh of relief now.
Saving the decision tree
We’ll save the decision tree as a JSON file so we don’t have to rerun the previous steps.
with open('wordle.json', 'w') as fp:
json.dump(model.tree, fp)
Loading the decision tree
Whenever we want to use the decision tree, we can just load it. Here we load it into the variable tree
.
with open('wordle.json', 'r') as fp:
= json.load(fp) tree
Using the decision tree
The first word to try, as we’ve already seen is:
print(next(iter(tree)))
raise
After we try raise we might get feedback such as ‘FPFFP’. We can use this to get the next guess:
print(next(iter(tree['raise']['FPFFP'])))
cleat
Suppose that after trying this word we get back ‘FTTTT’. We can just expand the chain of keys to find out what the next guess should be.
print(next(iter(tree['raise']['FPFFP']['cleat']['FTTTT'])))
bleat
And so on until the word has been guessed.
Creating the user interface
Traversing the decision tree like this is a bit cumbersome. Let’s build the GUI using Jupyter widgets to ease navigating the tree. We’ll display a grid which shows the next guess. Every element in this grid is a button widget. We change the caption on these buttons to display the next word to try. By clicking these buttons you can change their color to enter the clues Wordle provides as feedback.
We let traverse_tree
be equal to the root node of the decision tree. We start at the top row of the grid.
= tree
traverse_tree = 0 active_row
We update the color of the button every time it is clicked. On every update we traverse the tree and find the next guess.
def on_btn_click(b, row, col):
global traverse_tree
global active_row
for r in range(6):
==row)
enable_row(r, r
if active_row < row:
= traverse_tree[get_word(row - 1)][get_pattern(row - 1)]
traverse_tree = row
active_row
if b.style.button_color == '#FFFFFF':
= '#C9B458'
b.style.button_color elif b.style.button_color == '#C9B458':
= '#787C7E'
b.style.button_color elif b.style.button_color == '#787C7E':
= '#6AAA64'
b.style.button_color elif b.style.button_color == '#6AAA64':
= '#C9B458'
b.style.button_color
if row < 6:
= get_pattern(row)
pattern if pattern:
+ 1, get_next_word(row).upper())
set_word(row else:
+ 1, " ")
set_word(row
+ 1, get_word(row + 1).strip()) enable_row(row
The following function finds the next guess in the tree.
def get_next_word(row):
try:
if isinstance(traverse_tree[get_word(row)][get_pattern(row)],str):
return traverse_tree[get_word(row)][get_pattern(row)]
return next(iter(traverse_tree[get_word(row)][get_pattern(row)]))
except:
return " "
return " "
We want to call on_btn_click()
for every button in the interface. A function that handles a click on a button only takes one argument, the button that is clicked. However, the on_btn_click()
takes three arguments: the button, the row and the column. To bind the on_btn_click()
to each button we use some glue code. The following function returns a handler for every button’s click. Each handler encapsulates the row and column in the grid and forwards the call to the on_btn_click()
.
def create_on_btn_click_fun(row, col):
def _on_btn_click(b):
on_btn_click(b, row, col)return _on_btn_click
We need a function the display the next guess in a row in the grid.
def set_word(row, word):
for c in range(5):
= word[c] vbox.children[row].children[c].description
We want the player to only press buttons in the currently active row. For this we create a function that can enable or disable a row.
def enable_row(row,b):
for c in range(5):
= not b vbox.children[row].children[c].disabled
Once the user has completely set the pattern in a row we want to return this pattern. The following function takes care of this. If not all colors have been set yet, it will return None
.
def get_pattern(row):
= []
pattern for c in range(5):
if vbox.children[row].children[c].style.button_color == '#FFFFFF':
return None
elif vbox.children[row].children[c].style.button_color == '#C9B458':
'P')
pattern.append(elif vbox.children[row].children[c].style.button_color == '#787C7E':
'F')
pattern.append(elif vbox.children[row].children[c].style.button_color == '#6AAA64':
'T')
pattern.append(return ''.join(pattern)
We also write a function that returns the word in a row.
def get_word(row):
= []
letters for c in range(5):
letters.append(vbox.children[row].children[c].description)return ''.join(letters).lower()
We need to be able to clear the grid and set it up to solve another puzzle.
def on_reset_button_click(b):
global traverse_tree
global active_row
for r in range(6):
==0)
enable_row(r, rfor c in range(5):
= vbox.children[r].children[c]
b = '#FFFFFF'
b.style.button_color = ' '
b.description = tree
traverse_tree = 0
active_row 0, next(iter(traverse_tree)).upper()) set_word(
Now, let’s display the actual grid and a reset button.
= []
vbox_items for r in range(6):
= []
hbox_items for c in range(5):
= widgets.Button(description=' ', style=dict(button_color='#FFFFFF', font_weight='bold'), layout=widgets.Layout(width='32px', height='32px', border='solid 1px'))
button
button.on_click(create_on_btn_click_fun(r,c))
hbox_items.append(button)
vbox_items.append(widgets.HBox(hbox_items))
= widgets.VBox(vbox_items)
vbox = widgets.Button(description="Reset")
reset_button
reset_button.on_click(on_reset_button_click)
display(vbox, reset_button) on_reset_button_click(reset_button)
To use this GUI, you simply enter the suggested word into Wordle. By clicking the letters you change the colors and get the next guess.
Taking back control of your life
Use one of the following links to play with an interactive version:
The binder link points to a minimal Voila web app version that just loads the decision tree from file and displays the user interface. The other links include the actual tree induction algorithm.
Now you no longer need to worry about keeping your streak going. Please go back to doing something useful!