Skip to content

Interrogating the Advanced Placement Exams

Back in 2014, I was pleased to see my daughter earn a ‘5’ on her AP European History test (this will not be a complaint about victimization by standardized testing).  Gratified, but curious about what a ‘5’ really means, I went searching for a translation: how good is that top score of ‘5’, restated in more conventional terms?  About 90th percentile, it appears.

The trouble began as I scanned the percentile breakdowns for other AP exams:*

Humanities

% receiving

top score

# taking the test (000)

English literature     8% 386
English language 10 476
European history 10 110
US History 11 443
 

Math and Science

 

Physics B 17   89
Chemistry 19 140
Calculus AB 24 283
Physics C mechanics 28   43

*media.collegeboard.com/digitalServices/pdf/research/2013/STUDENT-SCORE-DISTRIBUTIONS-2013.pdf

Turns out that a ‘5’ on calculus does not correspond to a 90th percentile score, but something rather lower; and the same holds for physics and chemistry.  Only in the humanities does a ‘5’ correspond to a performance better than that of 90% of the test takers.  Put another way, there are many fewer top-scoring students in literature and history, relative to the count of top scorers in several STEM subjects.

How can this divergence be explained?

  • Perhaps literature and history are intrinsically more difficult subjects than math and science, which is why fewer students do really well on humanities AP tests
  • American high school students’ math and science skills may be stronger than their literary skills or historical knowledge

Both of these explanations can be put aside as risible.  Here are two more attempts:

  • It may be easier to “teach to the test” in the case of math and science than literature or history. There the introductory curriculum has been fixed almost since Newton and Boyle: no one questions whether a physics student should be taught the ideal gas law and how to calculate with it. Compare the more vexed issue of whether Hawthorne’s Scarlet Letter depicts Hestor Prynne as fairly treated or poorly done by.
  • Alternatively, literature and history teachers by and large have lesser qualifications and weaker skill sets; the good ones have so many other job opportunities in industry (!) that only the weaker ones go into teaching. Being more poorly taught, students do less well on humanities AP tests.

The fourth explanation can be readily dismissed, while the third one is incomplete as it stands.  If developed further, it might be stated thus:

  • By the 1980s the literary canon of my youth had been dismissed as the work of “dead white males.” Decades on, there is no longer any consensus among teachers of literature and history as to what constitutes excellent performance; plus, there may be an under-current of resistance to such judgments of quality, as likely infested with patriarchal and ethnocentric bias.

Perhaps unable to agree on standards, the guidelines for AP exams in language and literature list a huge number of eligible works for students to read; and since no student will actually cover any more than a fraction, odds are good that most students will come to the exam unprepared for some portion of the multiple choice questions they will confront. Top scores thus become less common than in math or science, where the curriculum is more tightly focused, and test takers know exactly what to expect.

Furthermore, when it comes to the written portion of these exams, taste judgments seem likely to reign supreme (the idea of that one can take an objective stance toward literary texts is now heavily contested in the humanities).  Students taught by instructors from the same taste communities as the individuals the College Boards taps to grade AP exams will be more likely to get a top score on the essays, and thus the test as a whole; students of equal native ability, but not part of that taste community, will pass the exam but fail to excel because their essays are never judged quite so apt. The distribution tabled above then results.

This last assertion produces a testable hypothesis.  The prediction is that for math and science AP exams, top scores will be dispersed across the cultural regions of the US, after adjusting for such well known factors as the income and educational levels of the zip codes where students reside.  Specifically, students attending good suburban schools in North Dakota and Mississippi will get their share of ‘5’ scores in calculus or physics.  But top scores in literature and history will not be well-dispersed.  These will be concentrated in the Northeast and on the West Coast, i.e., the same taste communities from whence the test-makers employed by the College Board in Princeton, NJ likely come.

Regardless of whether the hypothesis checks out, I think the disproportionate distribution of top scores on AP tests, for humanities versus the sciences, should give the reader pause.  Might it be that literary skill and historical acumen are per se more difficult to test reliably than calculation ability?  If so, what are the implications for No Child Left Behind, Race to the Top, and sundry other efforts to test students with an eye to improving American education?

If we can’t reliably test literary skill or historical acumen, should the AP exams in the humanities continue?  The assessment movement beloved of accreditation agencies has increasingly moved away from single test administrations to holistic assessment of a student’s portfolio of work product. Perhaps teachers of literature should put their foot down on the question of testing, and while teaching the same curriculum as today, let student portfolios henceforth be the measure of performance.  There’s no harm in letting a private university selectively admit students who display desired tastes.  The danger lies in letting the College Board paint the lipstick of merit on that pig.  In a democratic society, taste compatibility masquerading as merit should be abhorrent.

Leaving the humanities alone for a moment, and putting math and science teachers under the lens, perhaps, contra the above, the humanities disciplines have got it exactly right: on a properly constructed AP exam, a ‘5’, a top score, a designation that the student is extremely well qualified, should only be achieved by the top 10% of students. Accordingly, the tabled math and science scores may reflect a dumbing down of math and science curricula to match the increasing innumeracy of Americans.  Perhaps it really isn’t college-level math and science that’s being tested there anymore.  This too can be studied empirically; simply administer the math and science AP tests to a control sample of foreign students. If even larger percentages earn a ‘5’, then perhaps only humanities teachers in America have succeeded in maintaining standards, so that my aspersions about taste are misbegotten.

It gets worse.  In the 2011 AP Biology exam 18.5% of test takers got a 5, and in 2012, 19.4%, consistent with the physics and chemistry scores above.  Then, after 2013, scores plunged, to 5.4%, and then in 2014, to 6.5% who got a 5 (source: totalregistration.com).  As a passerby remarked, hearing two men in the park each proclaim himself to be Jesus, “One of them has got to be wrong.”

The explanation for the divergence across time, of course, is that the biology exam was extensively revised for 2013.  Should biology scores continue to increase over the next few years, back to the math and science norm, maybe we need to acknowledge the unpalatable prospect that some portion of a science AP score represents test-taking competence and teaching to the test, rather than scientific knowledge gained.  Hats off to the humanities, perhaps, for finding a way to avoid the insidious effects of insider knowledge and test preparation in the inflation of math and science AP scores.

One way or the other, I think it is time that the Advanced Placement examination system be held up to the same kind of searching scrutiny long turned on the SAT exam.  It’s a mistake to approach AP exams as an unproblematic exemplar of what good testing looks like.

And if the AP testing system can’t consistently sort students into five groups by ability across subject matter, what are we to make of the SAT’s attempt to distinguish 61 levels of aptitude? Are the aptitude questions on the SAT so much more powerful and fine-grained than the knowledge questions on the AP tests?  Funny, they both look like multiple choice questions to me. And what are we to make of the pretensions of an IQ test, with its standard deviation of 16 points, to group humans into almost 100 different ranks?

And that is the real specter I would wish to see exorcised: the Charles Murrays of the world stand ready to tell us that humans can indeed be ranked, reliably and precisely, according to 100 different levels of intellectual ability.  That sorry stance reminds me of an old Jesuit jest, updated for the modern world: “Give me a No. 2 pencil and three hours with the child, and I will foretell his unchanging fate.”

It certainly would be convenient, in a mass society that hopes to be meritocratic, to set a child’s entire life course, given just three hours and a No. 2 pencil. The importance of challenging the AP testing system is precisely to call into question the fitness of those three hours and that No. 2 pencil.

There once was a society that made social advancement absolutely dependent on the results of testing: mandarin China. But mandarin is not a term of praise in the West, nor would many of us say that total reliance on a testing regime worked out so well for China. And yet, forces within contemporary American society appear to be forcing us in a mandarin direction.  Not good.

Can we here in 21st century America let go, even a little bit, the hope for that great Test in the Sky, which will rank each according to his absolute merit, once and for all?  Maybe it’s never going to be that easy.

Published incollege admissions

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.