Lesson Design, Refined
Designing and refining a complete ESL lesson package
This week’s lessons
Two lessons this week, and they pair nicely — one is about making plans, the other about breaking them. Both follow the full W1–W12 architecture, so if you’re curious about the design behind them, the post below walks through it.
Making Plans (high elementary and above) — organizing time, from team meetings to weekend trips. Future-forms grammar role play and a Talk Zone where students plan a real event.
Cancelling, Postponing & Rescheduling (high elementary and above) — the natural follow-up. Vocabulary, collocations, pronunciation, and a Talk Zone of real scheduling dilemmas.
Download the lessons at the end of the post. Or read on for the methodology.
Lesson Design, Refined
A complete weekly lesson package — vocabulary, listening, reading, pronunciation, critical thinking, role play, answer keys, audio — used to take me weeks of solid work to build. Now a good package can be completed in a day, or sometimes two or a week when it needs proper consideration and revision. Not because I cut corners — the opposite. The format has been refined over years until every piece earns its place, and AI is doing the heavy lifting on the parts that used to drain me: images, formatting, layout, answer keys, audio scripts and the audio itself. I’m doing the parts that actually need a teacher’s judgement: topic, level, pedagogy, what to keep, what to cut.
And the work doesn’t end when the package is published. Every lesson gets refined again and again after it’s been taught — once I’ve seen what worked, what fell flat, what students struggled with, what surprised them. That post-teaching review is where the design actually matures.
The starting point: target vocabulary
Once the topic is chosen, everything hinges on the target vocabulary list. Get this wrong and the whole package is wrong — too easy and students disengage, too hard and they drown.
The list is usually 15 items — not 10, not 20. Fifteen is the sweet spot for a unit: enough to be meaningful, few enough that students can actually retain them across a week of worksheets. And it’s not 15 single words. It’s a mix of words, phrases, and collocations — because real language doesn’t come in single units, and learners who only memorise single words never sound natural.
Every item on the list has to clear four tests:
• Current. Vocabulary has to be up to date — language students will actually encounter in news, on social media, in the world they live in now. Stale textbook vocabulary is one of the biggest reasons ESL materials feel dead.
• Transferable. Each item must work both inside and outside the topic. A word that only makes sense in one narrow context isn’t worth teaching.
• Visual. Every item is visualizable — you just need creativity and imagination. A concrete word is easy: a photo will do. An abstract concept needs a symbol or scene that evokes it — chains breaking for “freedom,” a maze for “confusion,” a handshake for “trust.” The backbone of the list is visual, and visual stretches as far as your imagination does.
• Mixed parts of speech. Nouns, verbs, adjectives, and adverbs in balance — so the list naturally flows into collocation, word-formation, and grammar exercises downstream.
This is the single most important 30 minutes of the entire process. AI assists, but it cannot lead. The list is built by a teacher, for a specific group of learners, with an eye on the week ahead.
The unit’s architecture: the W1–W12 backbone
The unit assembles around a structure I’ve tested relentlessly over years of classroom use. This isn’t a framework I sketched once and stuck with — it’s been pulled apart, reordered, expanded, cut back, and rebuilt more times than I can count. Every worksheet earns its place. Every sequence has been tested against real students in real classrooms.
The full version:
• W1 — Brainstorm (open the unit)
• W2 — Visual vocabulary
• W3 — Classroom discussion / listening as topic introduction
• W4 — Dialogue
• W5–W6 — Vocabulary and language (definitions, collocations, word formation)
• W7 — Personalized questions
• W8 — Critical thinking
• W9 — Grammar/listening in a role play context
• W10 — Pronunciation
• W11 — Data / graphs (visualizing data)
• W12 — Talk Zone role play
The sequence follows a deliberate arc — from receptive to productive, from controlled to free, from single words to full conversations. Students meet the vocabulary first as pictures, then in sentences, then in definitions, then in collocations, then in their own opinions, then in role play. By the end of the unit, they own the language.
On W8 in particular. W8 (critical thinking) has earned its place in the spine more than almost any other worksheet. I’ve deviated from it over the years — tried alternatives — and I’ve always come back. It’s adaptable and flexible. It hits a sweet spot most exercises miss: the language load is manageable for even lower-level students, but the thinking it asks them to do is real. They engage. They discuss. They talk about the ideas — sometimes in English, sometimes in their own language, but always with the concepts the unit has put in front of them. That’s why the W8 skill includes three options each time — so you can pick the version that’s most relevant to your students that week.
Just as important as the sequence is the pace. No single exercise is long. Each one is short enough to hold attention, then hands students off to a different cognitive mode — visual, auditory, written, spoken, analytical, creative. The unit flows through multiple modes of language and thinking, so students never sit too long in one register. That rhythm is what keeps a class engaged across a full week of material.
On W12 in particular. W12 closes every unit, and not by accident. The earlier worksheets build the language; W12 is where students put it to work on something that matters. Each Talk Zone presents a real-world scenario or dilemma — fake health products to evaluate, customer complaints to negotiate, ethical choices to defend, conflicting information to weigh. Students aren’t just rehearsing English; they’re rehearsing decisions. They have to pick a position, justify it with the vocabulary they’ve just learned, and respond to a partner who may push back. That’s the moment when the unit’s language stops being a list of words on a page and starts being something students can actually use to navigate the world.
The essence version. Not every class has time for the full version. In many programs — pharmacy, medical sciences, engineering, business, and other non-language majors — English is a supplementary subject, not a core one. Students are juggling heavier priorities. For those contexts, there’s a leaner version that still holds the design together:
W2 · W3 · W4 · W5 · W6a · W9 · W10 · (W8 or W12 or both)
Eight worksheets at a minimum. Visual vocabulary, listening, dialogue, vocabulary exercises, collocations, grammar, pronunciation, and critical thinking, role play, or both — whichever serves the class better that week. Strip it any further and you start losing the arc — students don’t get from receptive to productive in a meaningful way. This is the floor. Knowing where the floor is — knowing what can be cut and what cannot — takes experience, intuition, and insight.
Flexing for mixed-level classes. The design also flexes laterally. In a mixed-level class — and most real classes are mixed-level — weaker students complete the essence sequence while stronger students push further: deeper word formation (W6b, W6c), both critical thinking and role play (W8 + W12), or extension exercises. Everyone works on the same topic, with the same vocabulary, at the same time. They just go to different depths. The architecture handles that without breaking.
Out-of-the-box additions. Occasionally I’ll add a W13 — a survey, a Dataset Detective, or another non-standard exercise — when the topic calls for variety or when students need a change of pace. These aren’t part of the regular spine, but the design accommodates them. The system is structured, not rigid.
The pedagogical architecture is the part of my expertise AI cannot replicate. The order, the progression, the why of each worksheet, the judgement of what can be cut when time runs out, and the flexibility to flex for mixed-level classes — that’s thirty years of classroom experience, not something a model can generate.
The production cycle: a day, sometimes two
The six stages of building a weekly lesson package. The cycle never closes — every lesson refines the next.
A good lesson package takes a day of focused work. Or — let’s be honest — usually more. It’s a heavy day. Two or more if the topic deserves real consideration — a complex topic, a new format, an audience I’m still getting to know.
The order matters. Not just for efficiency — for how the topic becomes real as you build:
• Vocabulary first. The 15-item target list. Everything else hangs off it.
• Then the images (W2). Generating the visual vocabulary worksheet brings me into the topic. Until I can see it, I’m still planning. Once the pictures exist, I’m inside the unit.
• Then the audio (W3, W4, W9). Dialogues, listening quiz, grammar role play — with voices and rhythm. This is what brings the topic alive. The unit stops being a plan and starts being something a class can actually do.
• Then the rest of the worksheets. With vocabulary, images, and audio in place, the rest follow more quickly: vocabulary exercises (W5–W6), personalised questions (W7), critical thinking (W8), pronunciation (W10), data/graphs (W11), and the Talk Zone (W12). W1 (the brainstorm) is often built last — it’s the opening of the unit, but easiest to write once everything else exists.
• Then QA and assembly. Answer keys checked, level verified, pedagogy sense-tested. Final PDF binder assembled.
Honest moment: it doesn’t necessarily flow perfectly the first time. There are iterations. The AI gets the level wrong, or sneaks in vocabulary I didn’t ask for, or produces a category that looks plausible but isn’t pedagogically sound. I push back, regenerate, edit. The day includes friction — just much less of it than the week it used to take.
The QA pass — where the teacher still matters
This is the section every teacher will recognise. Even with the right skills, the AI gets things wrong:
• Category logic that looks plausible but isn’t. The AI groups “stress” with “anxiety” and “fatigue” — fine. But it also sneaks “motivation” into the same group because they’re all “feelings.” A teacher catches that.
• Vocabulary above the stated level. Pre-intermediate creep is real. The AI quietly upgrades a word because it’s a more “interesting” synonym.
• Answer keys with subtle errors. Multiple-choice questions with more than one valid answer. Answer keys that cluster around B and C, or fall into repeating A-B-C-D patterns. Word banks that aren’t shuffled. Definition matches where the distractor is also correct. These are a little embarrassing in the classroom.
• Cultural assumptions. Examples that work in an American classroom but make no sense in Thailand, or vice versa.
This is where thirty years of classroom experience does the work no AI can do. The AI builds the worksheets. The teacher decides whether they’re ready to teach.
After the classroom: where the lesson actually finishes
A lesson isn’t finished when it’s published. It’s finished after it’s been taught.
Every package gets a second life once it meets real students. Some examples land beautifully; others fall flat. A gap-fill I thought was straightforward turns out to be too difficult. A discussion question I thought would spark debate gets blank stares. A vocabulary item I thought was current is something students have never heard. A role play that works for one class does not engage another class.
So I refine. Notes go straight into the next version. The SKILL.md gets updated. The worksheet gets rewritten. Sometimes a whole section gets dropped because the classroom told me it wasn’t working — and sometimes a small piece gets expanded because students couldn’t get enough of it.
This is the part of lesson design that no AI can do for you. The model can produce a worksheet. Only a teacher in front of real students can tell you whether it actually teaches.
And here’s what years of battle-testing has shown me: the W-system itself works at any level. Beginner, elementary, pre-intermediate, intermediate, upper-intermediate, advanced — the format holds. The architecture doesn’t change. What changes is the content level: the vocabulary becomes harder, the sentences longer, the discussion questions more abstract, the role plays more nuanced. But the bones of the unit — the sequence, the flow, the pedagogical logic — stay the same.
That’s the real return on a decade of refinement. Once the design is solid, you don’t need a different system for every level. You need one well-designed system that scales with the learner.
What this means for teachers
Two takeaways:
A day-long lesson package is real — but it requires several things working together: clear pedagogy, the right skills, and a trained human eye. And it requires a clear mind. Dozens of decisions get made along the way, and tired thinking can lead you astray.
Most teachers won’t build this from scratch — but watching it built is itself the lesson. The point isn’t to copy my workflow exactly. It’s to see that AI-assisted teaching, done seriously, still puts the teacher at the centre.
Next time, I’ll share the data behind why I chose Claude as my AI partner — across 22 metrics, tested against ChatGPT and Gemini for ESL work specifically


