Emergency icon Important Updates

In this episode of MedEd Thread, we talk with Dr. Marissa Zhu, Director of Assessment at Wayne State University School of Medicine, to discuss the evolving challenges of preparing medical students for the USMLE Step 1 exam. With pass rates declining since the shift to pass/fail scoring, Dr. Zhu shares insights into how AI-powered tools like Stepwise can help bridge the gap between faculty-authored resources and third-party study materials. Tune in to learn how thoughtful integration of technology and human validation can improve educational outcomes and restore student confidence in formal medical curricula.

Subscribe:    Apple Podcasts    |    Podcast Addict    |    Spotify    |    Buzzsprout

Bridging the Curriculum Gap

Podcast Transcript

Dr. James K. Stoller: 

Welcome to MedEd Thread, a Cleveland Clinic education podcast that explores the latest innovations in medical education and amplifies the tremendous work of our educators across the enterprise.

Dr. Tony Tizzano: 

Hello. Welcome to today's episode of MedEd Thread, an education podcast exploring Stepwise, an AI powered tool to bridge the gap between faculty authored resources for step one, examine preparation content. I'm your host, Dr. Tony Tizzano, director of Student and Lerner Health, here at Cleveland Clinic in Cleveland, Ohio.

Today I am very pleased to have Dr. Marissa Zhu, director of Assessment at Wayne State University School of Medicine here to join us. Marissa, welcome to today's podcast.

Dr. Marissa Zhu: 

Thank you. Thank you so much for having me on here, Dr. Tizzano.

Dr. Tony Tizzano: 

It's gonna be our pleasure, I'm sure. So, Marissa, to get us started, can you please tell us a little bit about yourself, your educational background, your role at WSU School of Medicine, and how you got connected with the Cleveland Clinic's MedEd Thread.

Dr. Marissa Zhu: 

Sure. So I am currently the director of Assessment at Wayne State School of Medicine, and my formal background is I have a doctorate in educational psychology and technology, so it's kind of a dual program where I learn about the cognitive psychology, the science behind how people learn, and the role that technology can play in supporting that process.

And how I got started in my current role is I actually began as a curriculum specialist and eventually was offered the position as director of assessment. So this tool kind of bridges both where I bring in my background in designing curricula and embed the evaluation, the assessment side of it.

Dr. Tony Tizzano: 

Perfect. And you know, I listened to your pedigree here and it sounds perfect for what we're trying to accomplish because for our listeners, you know, the doubling time of the medical lexicon is now about 37 days. Everything ever written in medicine doubles in that amount of time. And how can we focus and augment what we're trying to teach in a way that brings the most important things to the table.

So for the purpose of sort of framing the topic for our listeners, could you give some context around the importance of enhancing preparation for the step one exam, which is required for medical licensure beyond what is typically provided in the classroom.

Dr. Marissa Zhu: 

Yes, absolutely. And this is top of mind to our students and also medical students around the nation. Everybody is very concerned about step one. Specifically, since the shift to pass fail in 2022, we've seen nationally that the pass rates have dropped from about 95% for MD programs in 2021, right before the shift to about 89% in 2024.

And so that has certainly impacted our students. It has caused increased sense of anxiety around step one preparation and some doubt as to whether or not our curriculum was adequately preparing our students. And so from our side, we want our students to trust the form of curriculum and to spend less time worrying about what tools or resources should I use and more time focusing on developing clinical reasoning, deepening their knowledge base.

And we want it to ease our students' anxiety and we don't really want them to feel like, oh, I have to go and piece together my own curriculum from a different hodgepodge of different third-party resources, which is what a lot of students are doing, going outside of the former curriculum to try to prepare themselves for step one.

And as you mentioned, step one is very, very important and it's been getting harder. So this last year we were just analyzing some new data. We learned that historically, there are self-assessments or practice exams, CBSEs, CBSSAs, and they used to be pretty reliable predictors of how well you're going to do on the step one. But this past year across the nation, those metrics are not as consistent as they used to be. Meaning that a lot of students, when they took the practice exam, it showed them a score that indicated they were ready to take step one and pass. However, a lot of students ended up failing.

And one of the reasons we think is that the assessment, step one has actually gotten harder and harder over the years. The questions have gotten longer, a little bit more complex, and just as a field it's constantly emerging and changing. Step one is also becoming a little bit more complex, and these other traditional metrics we're sort of playing catch up to try to support student knowledge and support student, uh, readiness.

Dr. Tony Tizzano: 

Marissa, what are those acronyms? What did that stand for? Those, those measurements that you mentioned.

Dr. Marissa Zhu: 

Sure. The CBSE is the comprehensive basic science exam, and that is an in-person exam that's meant to mimic step one and give our students a sense of their readiness to be able to take step one.

And the CBSSA is the comprehensive basic self-assessment. So it's basically a voucher that students can take on their own time at home, but it functions the same way. It gives you an indication of how well you might be prepared for taking the actual exam.

Dr. Tony Tizzano:

So despite maybe doing reasonably well on those and, and having some measure of confidence going in, we're seeing a drop in pass rate.

Dr. Marissa Zhu: 

Yes.

Dr. Tony Tizzano: 

And you know, we devote so much time as students and I wonder which side of that equation, so I know that in many board exams for the American College of OB/GYN, for example, the questions become more and more aimed at can you actually take the information learned, the factoids, assimilate them, and make judgments on real life scenarios and bring them into practice?

Is that where step one is headed too?

Dr. Marissa Zhu: 

Yes, we see this because they keep a bank of retired questions, and you can see that the older questions are more simple. A lot of the older retired questions are like memorization based, more wrote knowledge, and the more recent ones are more focused on clinically relevant scenarios, problem solving, clinical reasoning. So that's part of why it's getting more complex.

Dr. Tony Tizzano: 

Yeah. I recall helping my kids and even myself with story problems in math, and they, they can do anything with the math problem by itself, but a story problem they all of a sudden are choking. But that's really what you're trying to do is apply what you've learned.

And so I wonder if we really need to teach differently. And I, I, you just got me sidetracked thinking about this. But the, the other thing is do we clearly state what the needed information is to do well on that exam? And we'd always fault someone saying, oh, you're just teaching to the exam. But if this is what's important, why don't we teach to the exam? Uh, as a question for you.

Dr. Marissa Zhu: 

Exactly. And I'm so glad you brought that up because we do hear that, and I can resonate with that coming from a educator background where we do wanna push against not just teaching to the exam because that's typically it leads to a more narrow constrained learning experience.

But in regards to step one, step two, those tests are actually designed in a way to facilitate more critical thinking and clinically relevant scenarios. So I think that in specifically in regards to medical education, teaching to the test actually is not a bad thing because the test is designed to be a lot more robust from typical, traditional standardized assessments.

And so it's actually forcing us to update our curriculum in a way that is moving away from dry memorization. And of course, there's still foundational facts and bugs and drugs that you need to memorize no matter what. But moving away from, from memorizing facts, factoids, things like that to helping students be able to bring it together, helping students apply what they just learned in a more clinically relevant scenario.

And I think if we do view that as a reason to align our curricula to step one, rather than thinking, oh, I need to just teach the test. But rather recognize that the test is actually pushing for more clinical relevance and problem solving, then I think that is, uh, maybe a shift that some of the faculty or the administration should, should make as we try to try to kind of bridge the gaps between formal curriculum, traditional medical school curriculum, and what we're seeing on step one.

Dr. Tony Tizzano: 

Yeah, I think those are excellent points, and I know that, you know when you can explain to someone why. Even when you're explaining it to your patient, it resonates and is, I think, recalled much more easily than when you just tell them what to do.

So if you can take these factoids and make it into a concept, the concept to me, even though it may have multiple parts, is easier to remember than all these bits and pieces. So with that in mind, is there a disconnect then between faculty and these third party resources that are trying to bridge the gap?

Dr. Marissa Zhu: 

Yes. And I would say I, I wanna hedge what I'm saying here a little bit in that I think there is a, the disconnect, more so, I think comes from student perceptions and not necessarily what the faculty or the former curricula is doing.

So what I'm seeing is that the students perceive a, a disconnect between faculty lectures, faculty authored resources, and third party resources and step one preparation. They're seen as perhaps different or sometimes competing priorities. So for instance, one of the disconnects would be that lecture materials or faculty author resources, there's a lot of variation in quality, variation in level of depth. Whereas a third party resources or step one preparation, those are much more standardized and those are much more predictable in terms of what you're getting.

And there's a perception that faculty lectures or author resources maybe. More aligned with institutional goals or oftentimes the faculty's personal interests or their research or their area of specialization. Whereas third party resources are more geared towards step one specific preparation and content.

And so a lot of this comes from unclear or lack of transparency about the formal curriculum and how much it actually does or does not align with step one. So a lot of students end up skipping or bypassing faculty authored materials unless, unless they're required to view them or, or engage with them in some way in favor of Fours and Beyonds or UWorld, On Key, Sketchy, third party resources like that.

Dr. Tony Tizzano: 

Those are excellent points. And you know, I think that you, you're a professional educator. That's what you've studied. And if there's anything I've learned from in the last four years being part of the education institute or part of education at Cleveland Clinic is there is a method to the madness.

There are persons who study the manner in which everything from a PowerPoint is put together to bring the best takeaway, to make it more clear. You may be fabulous at all the artwork and so forth, but sometimes it clouds the issue. You get lost looking at the picture, and you miss the fact that you're trying to, to hammer home.

And you can be eminent in your field, doesn't necessarily mean you're a great teacher, unfortunately. And you know, I run into this all the time and, you know, you give a talk that's clinically based and someone will ask a very esoteric question that really only they and their graduate student could possibly answer. And so if, if your material that you're presenting to students looks, anything like that, much of it will be lost.

And so I, you know, how do we go about educating educators to help you know, direct our attention at the very issue you're trying to do with Stepwise, and we'll get to that in a second, but what are your thoughts around that?

Dr. Marissa Zhu: 

That was the part of the motivation behind creating this tool. You know, step one is just an impetus for, for changing and updating materials, but it's not the endpoint.

The goal is to really help educators or faculty who who have excellent content knowledge and expertise be more effective in as educators, and be able to explain things in a way that is clear to students and makes clear why they should need to know this for their careers.

And even though this tool is called Stepwise, it really is used to evaluate and look at the topics covered in a lecture and offer actionable feedback points for faculty. So offering feedback points on what, what was a little bit too in depth for what is appropriate at an M two level. And also giving feedback on parts of the lecture could have been explained a little bit better, could be paired down. Or in some cases bolstered with a little bit more depth, a little bit more detail, and how to make certain concepts a little bit more clinically relevant.

Dr. Tony Tizzano: 

Boy, that just sounds so on point. So when you look at these various AI generated tools for exam preparation and, and would Stepwise fall into that in part?

Dr. Marissa Zhu: 

Yes. So Stepwise, we have made it to function in two ways. The first is as more of a faculty development tool to provide them with a score report on how their lectures line up with step one content and where there are some opportunities for improvement or revisions.

The second part is that it produces active learning worksheets or just tailored worksheets for the students to use as they follow along a lecture. And so the idea is that we want students to engage with faculty resources. We want students to benefit from the wealth of knowledge that a faculty member has, but to be able to do it in a way that also fosters active engagement.

Watching a 30 minute lecture, it could be a very passive learning experience. How do we make that more active? Because part of the reason students use third party resources such as UWorld, which is a self-testing resource and On Key as well, is that those are very active learning resources. You are answering questions, you are recalling facts.

Same thing with Boards and beyond. Those are short videos that kind of stimulate more active engagement in the way that the material is presented. And so some faculty can do that innately, or they do develop their videos in such a way that it does feel a little bit more active, but for a lot of lecture or lec- lecture videos, it's helpful to have an additional resource on the side.

So as a student's following along, they can answer multiple choice questions or they can try out some USMLE style with questions, application questions that are geared towards the material that's being covered.

Dr. Tony Tizzano: 

Well, that just sounds fabulous. So, you know, beyond the person trying to develop the content and getting a, a scorecard per se, is there, there's a methodology to the validation of what they're doing as well?

Dr. Marissa Zhu: 

Yes, and we are currently in the midst of wrapping up this validation project because it's a language processing model, it's not necessarily designed to give you a, you know, picture perfect calculation. It's not like an automated calculator. So we're trying to refine this tool to a point where it gets reasonably reliable estimates.

And again, the numbers it produces is not meant to be a, you know, psychometric tool. And it's not, definitely not meant to be like a here you, you, you failed, or you passed if you didn't get this number or score. It's really meant to be like a formative, oh, these are the areas where you scored lower in terms of alignment and here are some ways to improve it. Again, if that's the direction you wanna take.

So in order to control for the variability of using a tool like this AI generated, we, oh, by the way, we use, uh, Chat GBT as a platform for building this tool. And this is not a proprietary tool. As long as someone has a licensing or an account with Chat GBT, they can create it for themselves with a little bit of personalization.

So what we did was we created a content scorecard. So I sat down with faculty students and we came up with a content scorecard for what criteria it would be for. So we came up with three point criteria. Content relevance, so how relevant is it to step one. Detail level, is it too detailed or perhaps not detailed enough? And then the third criterion would be clarity and accuracy. Was it presented in a way that was easy to understand? Was it accurate?

And the validation process was we ran the lectures multiple times using this tool. And then we ask for human raters. So we have three faculty members and three students as part of this research project.

And they also, by following the same rubric as the AI, we asked them to rate each lecture and each topic and to see how well the human rater scores line up with the the AI generated scores. And from various rounds, we kind of tweaked both our scoring guide and also the tool itself to be more consistent. So what we wanted to see was a more of a conversions between the human raters evaluation and the AI tools evaluation.

Dr. Tony Tizzano: 

So this is the move then, as I recall, we, you initially had told me that there was a, a first version and then a secondary version. Is that what you're talking about? How you transitioned to this other version? What were the changes that you made?

Dr. Marissa Zhu: 

Yeah. We wanted to be transparent about the process because we know that a lot of other schools are using AI for similar purposes to provide feedback and sometimes scoring.

And what we found in the first version of Stepwise, Stepwise 1.0 I call it, is that it did a good job of providing, recognizing the topics or recognizing the content and giving good recommendations. However, there's a lot of variability in terms of how it was scoring the content. So it would, we would run the same lecture three times and kind of compare the the level of variance, and sometimes it would be off by up to 18% for the same lecture.

And part of the variability we found was because it was struggling to identify what were the topics covered. So it, it had trouble recognizing the difference between when a faculty just mentioned, oh, by the way, blah, blah, blah, blah, blah, versus actually covering a topic as intended in the lecture. And so the topic, it would be off by one to six topic counts per lecture in the initial version because we didn't really anticipate that it would have trouble recognizing what the actual topics were for a lecture.

And so for the second version, we fix that. We tweak the prompt to recognize, to, to give some parameters for, here's how you recognize a topic. It it is a topic if it is mentioned both in the lecture transcript and also covered in a dedicated slide.

We also gave some other parameters for when it's not a topic, and for instance, we told it if it's briefly mentioned, or if the lecturer says, and that'll come up in another slide, or I'll, I'll cover another time, then that's not the focus of this particular lecture. So kind of giving it the, those extra guidance, those little baby steps help to make it a little bit more accurate in terms of how it was recognizing.

And we also found that, um, and this was interesting to me as someone from an instructional design standpoint, we found that the lectures that had learning objectives, clear learning objectives the tool was better at identifying the lecture topics and scoring when the lecture, the faculty member actually created learning objectives for their lecture.

And we think it's because when a faculty member actually writes down their learning objectives it helps the tool understand what they should be looking for. What was their main point of this lesson, this lecture. Whereas some lectures we didn't have, uh, learning objectives, and so the tool was kind of left to, to figure it out.

Dr. Tony Tizzano: 

Well, I think that speaks for itself, whether you're using AI or not. I mean, having those learning objectives, please. It takes time to put 'em together, but you know, it helps us focus as students, as learners. So, you know, when all was said and done, you know, what were your takeaways from this novel approach to validation? What came out of it that you put into practice?

Dr. Marissa Zhu: 

We found that it was very important. This was different from a traditional validation process in which usually not working with AI, you wanna just establish consensus among raters, human raters. But this was more of a two-way adjustment where we want to establish consistency amongst the raters, but also at the same time tweak the tool to make sure that the tool was reasoning and processing in a similar way. So it was like a two-way adjustment.

And what I found was that in working with AI, it is absolutely vital to have a human validation process, no matter what you are making, because not only does it help refine and prove the accuracy of the AI's reasoning process. Again, AI learns from humans and so it's, it's a two way street, but it also helps to establish trust in the AI's output.

So our faculty and our students who are involved, they really appreciate the insights of the AI and helping being part of this process help them see and trust the outputs, and also to be able to dialogue with one another about what matters. What matters in terms of how, how do we gauge what is relevant versus not detail level clarity and accuracy.

And so as a result of this, I think bidirectional effort, we were able to improve the consistency, the interrater reliability, which was originally at .55 to 0.72 interrater reliability, a weighted Kappa score of 0.72, which is what we were aiming for in the second round. So again, it's a time consuming process for sure to do all of this work and to rescore and to incorporate all of these insights, but I think it's absolutely worthwhile for building a, a stronger tool.

Dr. Tony Tizzano: 

Yeah. What you say sounds very complex, but I, you know, I get some takeaways right away and I, I think, you know, quality in, quality out. I mean, if you give good objectives and good guidelines and, and objectives for learning, then it's going to help AI to begin with.

So those need to be clear from the beginning and then those objectives being spelled out is important for students as well. But the point you made about human validation, it certainly at this point, you know, AI is new to most of us. Some, some folks that I know here at the clinic, they, they breed this, they know it inside and out, but you know, I'm an old dog and I look at this and I'm like, oh boy, how do I know it's right? And so forth. So having that human validation part is really important, I think, at this juncture in its evolution. Would you agree?

Dr. Marissa Zhu: 

Yes, absolutely. Absolutely.

Dr. Tony Tizzano:

So when you put this all together, how might these findings be incorporated into practice when someone's trying to look at their curriculum and assess it?

Dr. Marissa Zhu:

I think that a tool like stepwise could be very helpful, first and foremost as a faculty development and a quality improvement tool. So for instance, on one of our raters was a new faculty, and they're in the process of developing their lectures and trying to figure out what should I cover, what should I update?

They're able to see what were some of the older lectures. But a tool like this gives 'em a starting point for figuring out, okay, what should I focus on? What were some of maybe the areas of strength and weaknesses from the previous lecture and what can I build upon? And also is a great tool for gauging your own material.

So for instance, if you wanted to start with a draft or an early version of a lecture and then run it through the tool to see, okay, what, what could I do to make it a little bit better? Is this clear? I think it provides really actionable feedback points, and it also helps for faculty who are looking for ways to update existing materials.

Dr. Tony Tizzano, MD: 

Boy, I love this. Faculty development is such a, a key issue for us. So when you look at the future, Marissa, what lies on the horizon if you had, head of education Dr. Stoller always says, if you had a magic wand, you know, what would you wish for? Where, where do you see it going?

Dr. Marissa Zhu: 

If I had a magic wand, I always think that any tool, any resource is not going to be a magic bullet.

I don't believe in magic bullets. I don't believe in quick fixes. I think that for any resource, AI or otherwise to be effective, it has to be integrated into a, a good ecosystem, a good system, I think that's what makes the difference. So for a tool like Stepwise, I don't see it operating in isolation. In it a score card or an active learning worksheet by itself, it's not going to make a huge difference. Not gonna be a game changer unless it's embedded within, a system that fully uses it part of faculty development workshops.

So for instance, I see in order for this tool to be truly supportive and helpful to have faculty development workshops presented alongside it so that faculty new, older, any faculty could have access to this tool, know how to use it, feel comfortable with it's outputs, and ideally be able to, to meet with someone and a specialist who can walk them through what it means and provide a little bit of, you know, human items alongside how to use it and implement it.

And I see it as being most useful when we formally integrate the outputs. For instance, the worksheets are embedded alongside the lectures and rolled out and communicated to the students. So I also think that even the scorecard, once we get to that point of where we're very comfortable with analysis of sharing that out with the students saying what we're seeing so far is a lot of these lectures are actually quite aligned.

They're 80, 90, 90 something percent aligned to step one, and sharing something like that can only establish trust and transparency, and it can offer reassurance for students to know, oh, actually a lot of our curriculum is aligned to step one. Oh, here's what we're learning and here's how it applies.

And again, this all has to be wrapped up in a broader communication effort. It has to be formally integrated into the curriculum and also should have existing supports in terms of faculty development resources and rolled out as a faculty development service, not just, Hey, here's a, a report card.

Dr. Tony Tizzano: 

Yeah. I absolutely love it. I, I just absolutely love it. I think, you know, it's not a standalone, but it's an important piece of the puzzle that we can, we can take a lot from.

So there's been a lot of information. Are there some thoughts that you might have or questions that I didn't pursue that you were, think might be important for our audience to hear?

Dr. Marissa Zhu: 

From where I sit as a director of assessment, I hear a lot of concerns about what it means to focus too much on step one.

I've heard recent discussions about we don't want to purely prepare students to be exam takers. We want to provide them with a more well-rounded, robust educational experience. And I, I completely agree with that. And I do agree that the curriculum should not be based on purely preparing students for, for any exam, step one or otherwise.

But ultimately the goal is to prepare students comprehensively for clinical practice. So what I think that means is it's important to build not just test takers, but but doctors ultimately. And that does mean that it's important to think about how do we balance out the foundation.

So we know that step one tracks with future performance on step two exam and three, but it also, it doesn't always align with clinical competencies. So it is important to still find ways to cover the basic sciences thoroughly and not just focus on step one. But really, I think in order to justify having a strong basic science foundation is to think about ways to embed clinical reasoning practice.

And again, that doesn't necessarily mean making everything match up to step one, but how are specific details or fundamental concepts relevant to future physicians? That is the end game.

Dr. Tony Tizzano: 

So the question will always remain. Does step one or does an exam really reflect what a physician should know and be able to apply and do we teach what they should really know and apply and which is the more accurate side of the equation? Well,

Dr. Marissa Zhu: 

Right.

Dr. Tony Tizzano: I have to thank you so much. This has been a fascinating and wonderfully insightful podcast. To our listeners, if you would like to suggest a medical education topic to us on an episode, please email us at education@ccf.org.

Thank you very much for joining, and we look forward to seeing you on our next podcast. Have a wonderful day.

Dr. James K. Stoller: 

This concludes this episode of MedEd Thread, a Cleveland Clinic Education podcast. Be sure to subscribe to hear new episodes via iTunes, Google Play, Stitcher, Spotify, or wherever you get your podcasts. Until next time, thanks for listening to MedEd Thread and please join us again soon.

MedEd Thread
MedED podcast logo VIEW ALL EPISODES

MedEd Thread

MedEd Thread explores the latest innovations in medical education and amplifies the tremendous work of our educators across the Cleveland Clinic enterprise.  
More Cleveland Clinic Podcasts
Back to Top