Teaching & Learning
A POSSIBLE MODEL FOR HIGHER EDUCATION: THE PHYSICS REFORM EFFORT
What to Measure And How to Measure
Investigation of the extent to which a paradigm shift from teaching to learning is taking place requires measurement of students' learning in college classrooms. But Wilbert McKeachie 1987 has pointed out that the time-honored gauge of student learning-course exams and final grades-typically measures lower-level educational objectives such as memory of facts and definitions rather than higher-level outcomes such as critical thinking and problem solving.
(For more general characterizations of higher-order learning see Anderson & Krathwohl 2001 and Shavelson & Huang 2003.)
The same criticism (Hake 2002a) as to assessing only lower-level learning applies to Student Evaluations of Teaching (SET's), since their primary justification as measures of student learning appears to lie in the modest correlation with overall ratings of course (+ 0.47) and instructor (+ 0.43) with "achievement" as measured by course exams or final grades (Cohen 1981). For general characterizations of higher-order learning see Anderson & Krathwohl 2001 and Shavelson & Huang 2003. In their "Chart 1," the latter display higher-level learning such as "procedural" (see, e.g., Anderson 2004), "schematic," and "strategic" knowledge within knowledge domains, as might be measured and enhanced by disciplinary experts.
How then can we measure students' higher-level learning in college courses? Several indirect (and therefore in my view problematic) gauges have been developed; e.g., Reformed Teaching Observation Protocol (RTOP), National Survey Of Student Engagement (NSSE), Student Assessment of Learning Gains (SALG), and Knowledge Surveys (KS's) (Nuhfer & Knipp 2003).
On the other hand, Richard Hersh 2005 has discussed two types of direct measures developed by the Learning Assessment Project <http://www.cae.org/content/pro_collegiate.htm >
(of which he is co-director) that "evaluate students' ability to articulate complex ideas, examine claims and evidence, support ideas with relevant reasons and examples, sustain a coherent discussion, and use standard written English."
But Shavelson & Huang 2003 warn that "learning and knowledge are highly domain-specific-as, indeed, is most reasoning. Consequently, the direct impact of college is most likely to be seen at the lower levels of Chart 1 - domain-specific knowledge and reasoning. Yet, in the formulation of most college goal statements for learning-and consequently in choices about the kinds of tests to be used on a large scale to hold higher education accountable-the focus is usually in large part on the upper regions of Chart 1" (those emphasized by the Learning Assessment Project & Knipp 2003; my italics).
(For a discussion and references for all but the last see Hake, 2005.)
In sharp contrast to the invalid or indirect measures discussed in the above two paragraphs is the direct measure of students' higher-level domain-specific learning through pre/post testing using (a) valid and consistently reliable tests devised by disciplinary experts, and (b) traditional courses as controls. Such pre/post testing, pioneered by economists (Paden & Moyer 1969) and physicists (Halloun & Hestenes 1985a,b), is rarely employed in higher education, in part because of the tired old canonical objections recently lodged by Suskie 2004 and countered by Hake 2004a and Scriven 2004. Despite the nay-sayers, pre/post testing is gradually gaining a foothold in introductory astronomy, economics, biology, chemistry, computer science, economics, engineering, and physics courses (see Hake 2004b for references). It should be emphasized that such low-stakes formative pre/post testing is the polar opposite of the high-stakes summative testing mandated by the U.S. Department of Education's No Child Left Behind act for K-12 (USDE 2005a) that is now contemplated for higher education (USDE 2005b).
As the NCLB experience shows, such testing often falls victim to "Campbell's Law" (Campbell 1975, Nichols & Berliner 2005): "The more any quantitative social indicator is used for social decision making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
What Physics Has Learned:
Physics education researchers (PER's) have employed formative pre/post testing to show that traditional (T) introductory physics courses promote very little change in students' understanding of basic physics concepts; regardless of the experience, enthusiasm, talents, and motivation of their professors. This has driven some physicists to develop novel "interactive engagement" (IE) methods, among
them: Microcomputer-based Labs, Concept Tests, Modeling, Active Learning Problem Sets, Overview Case Studies, and Socratic Dialogue Inducing Labs (for references see Hake 2002b). That such Interactive Engagement methods are relatively effective in promoting student higher-level learning has been demonstrated by the nearly two-standard deviation (cf. Bloom's 1984 "two sigma problem") superiority in normalized average learning gains <g> of IE courses over T
(traditional) courses (Hake 1998a,b, 2002b,c and corroborative references therein). Notable examples are large enrollment courses at Harvard (Crouch & Mazur 2001), North Carolina State University (Beichner & Saul 2004), MIT (Dori & Belcher 2004), the University of Colorado at Boulder (Pollock 2005), and California Polytechnic State University at San Luis Obispo (Hoellwarth, et al. 2005).
Some definitions are in order. In the above paragraph (a) the average normalized gain <g> is the actual gain [<%post> - <%pre>] divided by the maximum possible gain [100% - <%pre>], where the angle brackets indicate the class averages; (b) T courses are operationally defined courses as those reported by instructors to make little or no use of IE methods, relying primarily on passive-student lectures, recipe labs, and algorithmic problem exams; (c) IE courses are operationally defined as those designed at least in part to promote conceptual understanding through interactive engagement of students in heads-on (always) and hands-on (usually) activities which yield immediate feedback through discussion with peers and/or instructors.
For links to over 50 U.S. PER groups, over 200 PER papers published in the American Journal of Physics since 1972, and tests of cognitive and affective conditions see, respectively, Meltzer 2005a, Meltzer 2005b, and NCSU 2005. The very active PER discussion list PhysLrnR <http://listserv.boisestate.edu/archives/physlrnr.html >
logged over 750 posts in 2005. As far as I know, no other discipline is so actively researching undergraduate student learning. For reviews see McDermott & Redish 1999, Redish 1999, Thacker 2003, Heron & Meltzer 2005, and Wieman & Perkins 2005.
The March of Synapses
The fact that IE methods are far more effective in promoting conceptual understanding than traditional passive-student methods is probably related to the "enhanced synapse addition and modification" induced by those methods.
Bransford, et al. 2000 wrote: ". . . synapse addition and modification are lifelong processes, driven by experience. In essence, the quality of information to which one is exposed and the amount of information one acquires is reflected throughout life in the structure of the brain.
This process is probably not the only way that information is stored in the brain, but it is a very important way that provides insight into how people learn." Leamnson 1999, 2000 has also stressed the relationship of biological brain change to student learning. In his Chapter 5 "Teaching and Pedagogy," Leamnson 1999 wrote, "Teaching must involve telling, but learning will only start when something persuades students to engage their minds and do what it takes to learn." Another reminder that the affective and the cognitive are inextricably linked, as recently emphasized by Ed Nuhfer 2005 in this Forum.
I see no reason that student learning gains far larger than those in traditional courses could not eventually be achieved and documented in other disciplines from arts through philosophy to zoology if their practitioners would (a) reach a consensus on the crucial concepts that all beginning students should be brought to understand, (b) undertake the lengthy qualitative and quantitative research required to develop multiple-choice tests (MCT's) of higher-level learning of those concepts, so as to (c) gauge the need for and effects of non-traditional pedagogy, and (b) develop Interactive Engagement methods suitable to their disciplines.
Why MCT's? So that the tests can be given to thousands of students in hundreds of courses under varying conditions in such a manner that meta-analyses can be performed, thus establishing general causal relationships in a convincing manner.
But can multiple-choice tests measure
higher-order learning? Wilson & Bertenthal 2005 think so, writing: "Performance assessment is an approach that offers great potential for assessing complex thinking and learning abilities, but multiple choice items also have their strengths. For example, although many people recognize that multiple-choice items are an efficient and effective way of determining how well students have acquired basic content knowledge, many do not recognize that they can also be used to measure complex cognitive processes. For example, the Force Concept Inventory [Hestenes et al. 1992] . . . is an assessment that uses multiple-choice items to tap into higher-level cognitive processes."
Can nearly all university disciplines develop synapse-stimulating interactive engagement methods, and also valid and reliable multiple-choice tests of affective and cognitive conditions to measure their effectiveness? I would bet "Yes," provided they care enough about student learning to mount the necessary research and development effort.
Aside from the advantages of pre/post testing, perhaps physics education researchers' most important lessons (Hake 2002b) for higher education are Lessons #1, 3, and 4:
L1: The use of Interactive Engagement strategies can increase the effectiveness of conceptually difficult courses well beyond that obtained with traditional methods.
L3: High-quality standardized tests of the cognitive and affective impact of courses are essential for gauging the relative effectiveness of non-traditional and traditional educational methods. For examples of such physics tests see the listing at http://www.ncsu.edu/per/TestInfo.html NCSU 2005.
L4: Education Research and Development by disciplinary experts (DEs), and of the same quality and nature as traditional science/engineering R&D, is needed to develop potentially effective educational methods within each discipline. But the DEs should take advantage of the insights of DEs engaged in education R&D in other disciplines, cognitive scientists, faculty and graduates of education schools, and classroom teachers.
Calls for the accountability of higher education in promoting student learning are becoming more forceful, both from inside the university, e.g., Duderstadt 2000, Hersh 2005, Hersh & Merrow 2005, Bok 2005a,b; and outside the university, e.g., by the U.S. Dept. of Education's new "Commission on the Future of Higher Education" (USDE 2005b). For reports on the Commission's first two meetings and commissioner's comments on the possibility of NCLB-like testing in higher education, and on the declining literacy of college graduates (NAAL 2005), see Lederman 2005a,b.
As Hersh 2005 observes: ". . . in an era when the importance of a college diploma is increasing while public support for universities is diminishing, [assessment of student learning] is desperately needed. The real question is who will control it. Legislators are prepared to force the
issue: Congress raised the question of quality during its recent hearings on the reauthorization of the Higher Education Act; all regional accrediting agencies and more than forty states now require evidence of student learning from their colleges and universities; and pressure is rising to extend a No Child Left Behind-style testing regime to higher education" (see USDE 2005a,b).
Thus it would appear to be high time for faculty members to turn more of their attention to shifting the higher education paradigm from teaching to learning, both because it's the right thing to do, and because not doing so may invite stifling oversight by state and national bureaucrats.
(Professor Hake's extensive list of references may be found posted in the ancillary materials section for this issue of the FORUM on
Emeritus Professor of Physics
24245 Hatteras Street
Woodland Hills, CA 91367
THE STANFORD UNIVERSITY CENTER FOR TEACHING AND LEARNING