What New Research Says About AI in Classrooms
AI tutoring is already making its way into classrooms, often faster than schools can assess its instructional value. Much of the public conversation centers on whether AI is effective at all. A more useful question for educators is how AI works, under what conditions, and what kind of instruction it actually supports.
A recent randomized controlled trial conducted by researchers at Google DeepMind and collaborators offers one of the clearest looks so far at AI-supported math tutoring in real classrooms. The findings are careful, limited, and instructive. They also reveal why instructional design matters more than the technology itself.
You can read the full paper here: https://storage.googleapis.com/deepmind-media/LearnLM/learnLM_nov25.pdf
This is important context. The system studied was not a consumer chatbot. It was a purpose-built instructional model called LearnLM, deployed with explicit constraints and human oversight.
What Was Studied
In the study, students using a digital math platform were randomly assigned to one of three types of instructional support:
- Static, pre-written hints
- Text-based tutoring from a human tutor
- AI-generated tutoring responses that were reviewed and approved by a human tutor
The AI did not operate independently. Every AI-generated response was reviewed by a human tutor before being shown to a student. Tutors could approve responses as written, make edits, or replace them entirely.
The intervention took place over multiple weeks within classroom settings. Researchers evaluated learning using three outcome measures: whether students corrected mistakes, whether they resolved underlying misconceptions, and whether they were able to apply what they learned to subsequent problems.
That outcome, application beyond the immediate task, is the most demanding measure. It is also the hardest to influence through short-term instructional support.
The study design is summarized below, showing how students were assigned to different types of instructional support.

Figure: Study design from “AI tutoring can safely and effectively support students.”
Students were randomly assigned to static hints, human tutoring, or AI-generated tutoring with human review.
Students were randomly assigned to static hints, human tutoring, or AI-generated tutoring with human review.
What the Results Show
Two findings are straightforward.
First, interactive support outperformed static hints. Students learned more when guidance responded to their thinking. This aligns with long-established research on formative feedback.
Second, AI-supported tutoring performed similarly to human tutoring in terms of immediate outcomes and showed a modest improvement in later application tasks.
The authors write:
“We estimate that receiving support from LearnLM improved a student’s odds of success by a factor of 1.3 relative to human tutors, corresponding to an average treatment effect of +5.5%.”
(LearnLM RCT, p. 9)
The effect size is small, and the confidence intervals are wide. This is not evidence that AI is superior to human instruction. It is evidence that in a text-based tutoring environment, instructional structure carries more weight than the source of the explanation.
Education coverage has summarized this as AI tutors offering reliable instruction when paired with human oversight.
Read here: https://www.the74million.org/article/ai-tutors-with-a-little-human-help-offer-reliable-instruction-study-finds/
Read here: https://www.the74million.org/article/ai-tutors-with-a-little-human-help-offer-reliable-instruction-study-finds/
Reliability, however, is not the same as effectiveness.
The Most Important Insight Is Where the AI Struggled
The most revealing part of the study is not the performance comparison. It is the tutor's feedback.
Human reviewers often intervened not because the AI was incorrect, but because it continued too long. Explanations persisted after students were ready to move on. Questioning extended past its instructional usefulness.
The paper notes:
“Tutors often found it necessary to step in when LearnLM’s Socratic questions, while pedagogically sound, persisted longer than a student’s patience.”
(LearnLM RCT, p. 12)
This is not a minor usability issue. It is a pedagogical one.
Good instruction requires judgment about when to probe and when to stop. Knowing when learning has stabilized is as important as eliciting reasoning.
What This Reveals About AI Tutoring
Many AI tutoring tools are built on conversational models. Conversation rewards continuation. Instruction does not.
Effective instruction depends on:
- Clear learning goals
- Alignment to curriculum and sequencing
- Sensitivity to cognitive load
- Timely instructional closure
The study surfaces a core limitation of chat-based AI tutoring. Even sound pedagogical strategies become counterproductive when pacing and stopping rules are misaligned with student readiness.
This is not a model capability problem. It is a design problem.
Where StarSpark Stands Apart From Other AI Tutoring Tools
Many AI tutoring tools begin with a simple premise: add a chat interface to math problems and let the model guide the student.
That approach prioritizes interaction. It does not guarantee instruction.
StarSpark was built around a different assumption. Learning improves when guidance is structured, aligned, and purposeful, not just responsive.
That difference shows up in how StarSpark approaches AI-powered tutoring and instruction:
- Explanations are curriculum-aligned, grounded in what students are actually learning in school, not generated in isolation
- Guidance follows structured reasoning pathways, so students see how ideas connect rather than chasing answers through conversation.
- Pacing is intentional, advancing when understanding is demonstrated and stopping when it is sufficient
- Instruction is designed to move learning forward, not to prolong interaction or questioning
In practice, this means StarSpark behaves less like a chatbot and more like a consistent instructional guide.
The goal is not to keep students engaged in conversation. The goal is to help them understand, practice, and progress.
This distinction matters. The study makes clear that even sound pedagogical strategies lose effectiveness when explanations persist beyond their usefulness. StarSpark’s design avoids that trap by making instructional intent, not conversational momentum, the organizing principle.
Learn more about how StarSpark approaches AI math tutoring here: https://starspark.ai/how-it-works
What This Study Supports and What It Does Not
This research supports the idea that AI can participate in instructional support without degrading learning outcomes when carefully constrained.
- It does not support replacing teaching with chatbots.
- It does not suggest that reliability alone leads to learning.
- It does not remove the need for curriculum, structure, or instructional intent.
If anything, it reinforces a truth educators already know well. Instructional design does most of the work.
A Better Question for Schools
The question is not whether AI tutors can explain math problems. This study shows that they can.
The question is whether AI systems are designed to teach in ways that are coherent, aligned, and respectful of how students actually learn.
This study demonstrates the effects of implementing guardrails. At StarSpark, those guardrails are the foundation of our platform.
Explore StarSpark With a Free Classroom Pilot
We work with teachers and schools to pilot StarSpark at no cost. Pilots are designed to fit into real classrooms and help educators evaluate how structured AI tutoring can support instruction and student learning.
If you are curious about using StarSpark with your students, we welcome the conversation.
Frequently Asked Questions
- Do AI tutors improve student learning?
Research suggests AI tutors can support learning when carefully designed and supervised, but instructional quality and curriculum alignment matter more than automation. - Can AI tutors replace teachers?
No. Studies show AI can perform similarly to humans in limited, text-based tutoring environments, but human instruction provides relational and contextual benefits AI cannot replicate. - What makes AI tutoring effective in classrooms?
Effective AI tutoring is structured, curriculum-aligned, paced appropriately, and designed to support understanding rather than prolong interaction.