A student has a bug. They open ChatGPT, paste in the error message, and thirty seconds later they have a fix. The code runs. But thirty minutes later — and thirty days later — they still don't understand why the original version didn't work. They don't remember the bug. They don't remember the fix. They just got past it.
This small scene plays out millions of times a day — in programming courses, in self-study, in the first hours of every introductory CS class. And it tells us something important about the difference between getting an answer and building knowledge. A difference that ChatGPT, in its current form, systematically blurs.
The hidden tradeoff: answer vs. understanding
When you learn something new — especially something as abstract as programming — your brain has two different goals that often work against each other. The first is to solve the problem in front of you. The second is to build an internal model that will let you solve similar problems on your own later.
These two goals look the same. They aren't. Solving the current problem can happen externally — through someone else's help, a copy-paste, a tool. Building an internal model can only happen through your own cognitive work.
ChatGPT is remarkably good at the first. It's exactly as useless for the second as a calculator is for learning multiplication. This isn't a critique of the tool. It's a critique of how we use it.
What ChatGPT is optimized for (and what it isn't)
Large language models weren't trained to teach. They were trained to be helpful and pleasant to converse with. RLHF — reinforcement learning from human feedback — rewards responses that satisfy the user in the moment: clear, direct, complete.
For learning, "satisfying in the moment" almost always works against the outcome.
A good teacher often does the opposite of what we want. They don't give an answer; they ask a question. They don't explain everything; they say "you try first." They don't write the code; they help the student recognize their own mistake. ChatGPT, by design, does none of this — unless you push hard for it. And even when you do, it's calibrated to back off quickly: if you say "just give me the answer," it gives it. That isn't pedagogy. That's customer service.
What an LLM is optimized for — satisfying you in the moment — is exactly what makes the learning shallower.
Cognitive Load Theory: why "less effort" often means "less learning"
Cognitive science has a well-studied framework that explains why "easier" isn't the same as "better" — Cognitive Load Theory, developed by John Sweller.
It splits the mental effort of learning into three types:
- Intrinsic load — effort inherent to the material itself. Recursion is, by definition, harder to grasp than a linear loop.
- Extraneous load — effort wasted on things that don't help understanding: a poor explanation, confusing notation, a misleading library.
- Germane load — the effort in which learning actually happens: building a mental model, connecting a new idea to an old one, formulating your own explanation.
The job of a good teacher — human or AI — is to minimize extraneous load, manage intrinsic load, and protect germane load.
ChatGPT does the first two easily: it explains clearly and breaks complexity down well. But it systematically removes germane load too — the very work the student needed to do to learn. When the model writes the explanation, draws the analogy, formulates the conclusion — it does the cognitive work for you. The result looks like learning. It isn't.
The generation effect and desirable difficulty
Another central finding in learning research comes from Robert Bjork and his concept of desirable difficulties. Knowledge you generated yourself sticks better than knowledge you read. Knowledge you reached through effort leaves a deeper trace than knowledge that was handed to you.
This is why retrieval practice beats rereading. Why spaced practice beats massed practice. Why a student who makes a mistake and corrects it themselves remembers better than one who gets the answer directly.
This is exactly where ChatGPT, used without a teaching framework, does the most damage. It removes desirable difficulty. It erases the moments where learning would have actually happened. The paradox is brutal: the more "helpful" the tool, the less you learn from it.
ICAP: from passive consumption to interactive engagement
Michelene Chi offers another useful framework — ICAP, which describes four levels of engagement in learning:
- Passive — listening to a lecture, watching a video, reading a ChatGPT answer.
- Active — taking notes, highlighting, copying the code and running it.
- Constructive — writing your own explanation, generating an example, asking a question.
- Interactive — debating, defending a position, explaining to another person.
Chi's research consistently shows that each higher level produces measurably better learning outcomes. The default way students use ChatGPT in trouble — "here's my code, fix it" — is pure Passive. You receive an answer, read it, ship it. And almost nothing is built inside your head.
A good AI tutor is a tool that moves the learner up the ladder from Passive — one that makes them explain, predict, hypothesize, react to challenges. Not one that spares them that work.
Symptoms of bad AI tutoring
Watch a student who has leaned too heavily on ChatGPT, and you see a specific cluster of symptoms:
- They can produce code but can't explain it.
- They don't recognize their own mistakes in a new context.
- They don't remember solutions they "got past" a week ago.
- They have an inflated sense of competence — the fluency illusion — because everything flows smoothly while the model writes it.
- They freeze when the tool isn't available.
These aren't moral failures. They're structural consequences of the interaction's design. If you give a student a tool that strips away germane load, you get a student without germane learning.
And here's the important point: the problem isn't ChatGPT. The problem is that we use it as a tutor when it wasn't designed to be one.
What good AI tutoring looks like
What would AI tutoring that works look like — tutoring that builds understanding instead of routing around it? From the literature on intelligent tutoring systems, and from my own research, several principles stand out.
1. Diagnose before explaining
Before giving an answer, the AI tutor should figure out where exactly the block is. Does the student misunderstand the problem statement? The concept? Do they hold a wrong mental model? Different blocks call for different interventions.
2. Scaffolding that fades
The first problem can include many hints. The tenth — almost none. The system should fade its support in step with the learner's growing autonomy.
3. Socratic questions, not direct answers
Not "here's why this is wrong," but "what do you expect this line to return? Now run it — what does it actually return? Why the difference?"
4. Calibrate the load
Reduce the extraneous. Manage the intrinsic, distribute it across steps. Protect the germane — don't let the student skip past it.
5. Errors as signal
An error isn't a failure — it's the richest diagnostic information you have. A good AI tutor wants to see errors, because they reveal the mental model behind them.
6. Ask for explanation
Before the student moves on, have them explain the solution in their own words. The self-explanation effect is one of the strongest established findings in cognitive science.
7. Faded worked examples
Start with fully worked examples. Then partially completed ones (faded examples). Then a blank page. This is empirically the most effective trajectory for novices.
Same question, two answers
To make this concrete — picture a first-year student who asks:
Why does this loop run 11 times instead of 10? for (int i = 0; i <= 10; i++)
Default ChatGPT response:
Because the condition is i <= 10, which includes 10. The loop starts at 0 and goes through 10 inclusive — 11 iterations total. If you want 10 iterations, use i < 10.
Clear. Correct. Useful. And almost nothing has been learned.
A well-designed AI tutor's response:
Before we answer — let's check together. How many values can i take if it starts at 0 and the condition is i <= 10? List them. … Now, is the number of iterations equal to the number of values? Why? And if we try to formulate a rule: for i = a; i <= b; i++ — how many iterations? What about i < b?
The first response solves the problem. The second one builds the model — the model that will protect you from the same mistake six months from now, in a completely different piece of code.
A good AI tutor doesn't remove the difficulty. It calibrates it.
The path forward
The conclusion of all this isn't "ban ChatGPT in classrooms." That would be naive — and wrong. ChatGPT can be a remarkable learning aid, if the student knows how to use it and if the tool itself is designed to teach, not just to answer.
What we need is the next generation of AI tutors — systems that model what the learner knows and doesn't know; that ask before they answer; that scale their support to competence; that protect the cognitive work where learning actually happens; and that treat error as information, not failure.
This is the field of intelligent tutoring systems — and it's exactly where I'm building my doctoral research and the tools at CodeGrade. The goal isn't to make programming easier. The goal is to make learning to program deeper.
Because the question, in the end, isn't whether AI will change education. It already has. The question is whether it will change it toward shallower convenience — or toward deeper understanding. And that depends on what tool we design. What teacher we let into the classroom.