Apr 6, 20227 min read

Negative Test Results

Hello,

As I’ve shared in two prior posts ("How the Sausage Gets Made" and "This Is Only a Test"), I have been a part of writing New York State (NYS) 3-8 Assessment questions. I wanted to understand the process since the tests are so often criticized. By participating, I came to understand how carefully each and every reading passage is selected by NYS teachers and how each and every question is written by NYS teachers through a very deliberate process that truly is aligned to the standards. What’s more, due to the careful vetting process that scrutinizes each draft question, most of the questions never even make it to the tests. If I hadn’t participated in the process, I certainly wouldn’t have known this.

While writing test questions, however, there is no input into the design of the actual tests. In other words, there is no discussion or consideration for how many passages and questions each student will ultimately see on the test. I’m bringing this up because I’m sitting in a dissonant space right now. On one hand, I believe the passages and the questions in isolation really do test the standards. The amount of training and time that go into writing and critiquing each question is nothing but thorough. On the other hand, when the passages and questions are compiled into the assessments, I’m not sure we’re really seeing what students know and are able to do. Why? Continue reading and let me know what you think…

Theory In Action

My younger sister, Emily, is a fourth-grade special education co-teacher. On the Friday following the administration of the 3-8 NYS ELA Assessments, I called her to see how she was doing. She had the assessments on her mind.

“I wonder if the people who make the tests realize how many kids they make cry?” Emily asked.

“Wow,” I said. “How many kids did you see crying?”

“A lot,” Emily told me, which was surprising since there were fewer than 500 K-6 students in her school according to the most recent enrollment data from NYS Education Department (NYSED). Emily went on to tell me that there was a third-grade student who went to a corner to cry. Emily and I then started talking about the amount of time the students spent trying to do the test and how anxious the kids were about it.

“It’s untimed. Students shouldn’t worry about how long it takes them,” I noted.

“Right, but they start to worry about what they’re missing,” she countered. “They ask, ‘Will I get lunch?’ or ‘Will I miss gym?’ They’re worried and so even though they could usually answer the questions, their stress levels are heightened so they’re really not showing what they know.”

“Yeah. I can see how that would happen. I’ve always thought there were too many passages for students to read and respond to. I mean, if you can do the first couple well, then you’ll probably be able to do all of them well. If you can’t do the first couple well, what are the odds you’ll be able to do the rest well?”

“Right. On the test, the students had six short answer questions and one extended response. They’re nine years old. Should a nine-year-old have to take a three-hour test?”

Granted, the tests are not timed. NYSED says fourth-grade students will need an average of 65-75 minutes each day on the two days of the test. This, however, doesn’t reflect the kids who are not the average student. A nine-year-old student with disabilities who gets time and half or double time would, on average, need 90 minutes (on the low end) or up to two and a half hours (on the high end). That’s a very long time for anyone, let alone a nine-year-old to sit and take a test. Also, that’s just one day. Since the test takes two days, double that. “It’s not that much less for a third grader who is just eight" Emily reminded me.

“You’re making me think. I mean, if I ran a mile in school regularly but then one day I came to school and was told I had to run a marathon, I would get very anxious too. There is something then to be said about preparing kids for the stamina needed for the test.”

“To prepare for this type of stamina, that would mean we’d have students sitting taking tests for over an hour regularly. Why would we do that to kids? As well, if we’re doing that, then we’re not teaching, right?”

“Right. That’s a great point! Furthermore,” I said, “Where in the standards is ‘stamina?' Nowhere. It’s not a thing. So if in order to do well on the assessments you need to demonstrate stamina for testing, then what are the tests really testing?”

“Right,” Emily agreed.

“And so you’re giving students passages and questions that they otherwise could answer if they were done in a different context—like over multiple days in a thirty-minute chunk, maybe, but now you’re putting them in conditions that make the results invalid and unreliable. Even if the questions are aligned to the standards,” which would mean the questions are valid, “the conditions in which you’re asking the questions are not” I shared.

A False Negative

This conversation came just a day or two after I got an email from a middle school ELA teacher who wrote, “...we are concerned again with refusals and how this will impact the ELA department. There have been a handful of students that are turning in opt-out notes today [on day 2] and we are curious how that will be viewed when scoring the test.”

I responded, “With regard to opting out on day 2, you are correct. Students who answer even 1 question on the test over the course of both tests have their tests scored. This will have a deceptively negative impact on the results since it will appear that the student is a level 1 or 2 when, in reality, had they completed the test, their score may have been much higher.” Again there’s the issue of the validity of the results. If a student didn’t complete a test, a lack of completion doesn’t actually demonstrate what the student knows and is able to do.

I don’t know what the answer is here. I’m sure there’s no way to score the student only according to the questions/parts they completed. However, using incomplete data is not a great solution nor does it really provide the information you’re looking for. If you asked me to recite the alphabet and I quit after the letter C, you don’t know if I know the rest of the letters; all you know is I stopped at C. Students who do not take any part of the test are not given a zero, they are excluded from the denominator of the equation (although this is messy too since there is a requirement to have at least 95 percent of students take the test).

Participation Grade

In NYS, when the Common Core Learning Standards (CCLS) were rolled out—launching CCLS-aligned assessments—there was a great deal of pushback regarding the assessments. This became what is known as the “opt-out movement.” You might wonder why there is a participation expectation. The answer is actually a pretty good one. The 95 percent participation rate was intended to prevent mass cheating. After all, without the participation expectation, a school or district that is worried about the ratings might tell students who are unlikely to do well to stay home and therefore only have the “good kids” take the assessments.

Why participate at all, you may wonder? The US Department of Education provides states money through Title grants. This amounts to millions of dollars for each state. In order for the state’s department of education to access those funds, the states have to agree to administer 3-8 Assessments and commencement exams. Thus, in order for schools and districts to get the money from the state and federal governments, they administer these assessments and exams. I don’t blame the government for having some accountability connected to the money. I certainly don’t want to pay for things that are not meeting my expectations either.

Theory and Practice

In theory, I don’t have a problem with testing. I like that my car gets diagnostic assessments to know everything is working well. I like that my doctors have sent me for testing to make sure I am not treated for things I don’t have (even when I might have thought I had any number of problems) and treated for things I do have. For example, I have not felt well and so I took a COVID test and the came back negative. I was grateful to know I had a run-of-the-mill cold and not COVID. I annually get a physical (which is a test) and have my teeth cleaned twice a year (another test). I’m thrilled people have to take driver’s tests before getting their license.

I also think tests in school are a good way to know how kids are doing with regard to their learning. That said, the practice of state testing in schools is what causes me concern. If we want to know what kids know and can do, then we have to have mechanisms that are both valid (they measure what we intend to measure) and reliable (the results would be consistent if repeated). “To understand the basics of test reliability, think of a bathroom scale that gave you drastically different readings every time you stepped on it regardless of whether you had gained or lost weight. If such a scale existed, it would be considered not reliable…Test validity is requisite to test reliability. If a test is not valid, then reliability is moot. In other words, if a test is not valid there is no point in discussing reliability because test validity is required before reliability can be considered in any meaningful way. Likewise, if a test is not reliable it is also not valid.”

I’m not saying “don’t take the tests” and I’m not saying, “don’t give the tests.” I’m saying, since we should be testing kids because we want to know what they know, we should do it in a manner that actually achieves this outcome. If we expect this for our cars and for our health, then we should also expect this for our kids and I’m sure you would agree. What I’m wondering about is how do we make improvements to the system? After all, the issue, at least for me, is not about eliminating state assessments, it’s about designing, implementing, and using them in a manner that is aligned with the intended outcomes—i.e., knowing what our students know and can do so that we can best prepare them to be successful today and tomorrow.

~Heather

P.S. My Catch of the Week this week is this video about gratitude. There's so much I love about it, but most importantly, I love that it is based on science AND it gives advice that we can all put into practice TODAY to increase our happiness. It's amazing the difference that one phone call can make! I hope this video inspires you to pick up a pen or, better yet, the phone.

Negative Test Results

Theory In Action

A False Negative

Participation Grade

Theory and Practice

Recent Posts

1 Comment

Subscribe Form