The current admissions process for elite colleges is a mess. I’m not here to defend it.
My goal: to expose the problems with Steven Pinker’s proposed solution.
I take Professor Pinker to be an eminent member of the scientific elite: he’s published dozens of papers and books, and he’s been on the faculty at MIT and Harvard for decades. His many books also lend him credibility as a man of letters and a public intellectual—one of the few whose repute is founded on their scientific contributions.
As we will see, Pinker’s errors are exemplary as well. They anchor a type: the misunderstandings that occur when a quantitatively trained scientist ventures into the human social sphere. I first encountered such errors among colleagues trained in economics and operations research. The root misconception reduces to a mistake about what can be measured, and how well.
Pinker’s proposal
Pinker’s essay is too well-written to contain a bald statement of recommended reforms. But after decrying the emphasis on extra-curricular activities (“it’s common knowledge that Harvard selects at most 10 percent (some say 5 percent) of its students on the basis of academic merit”), and lauding standardized testing, he asks: what if the ivy League “admitted the highest scoring kids?”
That’s a simple recommendation: find a good quantitative metric for academic merit. Then, if you have a thousand places available in the entering freshman class, admit the top 1000 scorers on that measure. Done.
Candidates for this metric would include the SAT and the ACT; possibly the applicant’s high school GPA; possibly Advanced Placement tests; and maybe SAT subject tests.
For the first part of my argument, I’ll assume that all these measures have a near perfect correlation, so that any can substitute for all the rest; and I’ll name the SAT aptitude tests as the criterion for academic merit, and as a perfect measure of it.
With these assumptions, plus a few others, Pinker’s recommendation can be made to work; but you may find some of the actions needed to put it in place both unexpected, and disturbing.
Last year in Spring of 2015, the eight Ivy League schools plus Stanford and MIT accepted about 26,000 students, for perhaps 17,000 places. The year before, about 1.67 million students took the SAT. (See Toptieradmissions for statistics, and the College Board site for SAT data)
These numbers make the application of Pinker’s principle straightforward: accept the 99th percentile of SAT test-takers into the best schools, and leave the rest to populate other lesser institutions. In raw numbers, students averaging 740 and above across each of the [old] SAT’s verbal, math, and writing components would be admitted to the Ivy League, and students with SAT component scores averaging 730 or below could attend Amherst, Northwestern, Carnegie Mellon, or some other pretender to top tier status.* These second tier schools would in turn take the 95th percentile through the 98th, the next 67,000 students or so; and then the state schools could sweep up the remainder.
* Yes, an undercurrent of mockery runs through this essay. It was inspired by the late Fred Hoar, who once told me this joke. Q: How can you tell a Harvard man? A: He’ll tell you!
Pinker’s rule would drastically simplify the admissions process. No expensive staff to read tens of thousands of essays. No acrimonious arguments over borderline cases. The student takes a four hour multiple choice test one morning, and in six weeks, learns whether he or she is one of the elect. No more agonizing over the Common Application and its proliferation of supplemental questions!
It worked for the mandarins in China.
Tracing the implications
Remember, here in the beginning we stipulate that the SAT is a perfect and perfectly fine-grained measure of academic merit. The student averaging a 730, 2190 in total, at the 98.86th percentile, is measurably less capable than the student at the 99.00th percentile, averaging 740; and the student averaging 770, at the 99.56th percentile, leaves them both in the dust. To average 770, he or she had to get correct half a dozen, maybe eight or nine, more multiple choice questions than the student averaging 730. That shows greater merit. Enough said. Or as Pinker states the belief: “test scores, as far up the upper tail as you can go, predict a vast range of intellectual, practical, and artistic accomplishments.”
The first unexpected consequence of Pinker’s rule: all eight Ivy League schools would have to use the same admissions criteria, and accept or deny the same individuals. They would all have to follow the same rule. There’s no longer any subjective element to the determination of merit, which, under Pinker’s rule, is the only thing that should matter to these elite schools.
It would be inefficient to have every applicant either accepted by all eight Ivys, or none. So, students would submit their test scores to one Ivy League clearing house; then, score by score, applicants would be randomly allocated to each school, in proportion to the size of their entering class.
* I trust no one thinks that instead, Harvard would fill its class with the first thousand students, averaging 790 or more; that Yale would fill its class with the second thousand, averaging 780 or so…no, let’s not go there. Students can be ranked precisely, but not schools. Right?
Piece of cake; a smartphone app could handle the whole thing.
Objections easily handled
A couple of objections occur, but these are easily dispatched without fundamental alteration to Pinker’s rule.
- What about children of alumni and donors? Or athletes, violinists, sculptors? Or affirmative action for the marginalized?
No problem: simply have a separate admissions track for these special cases. They can’t possibly account for more than 10,000 of the available 17,000 slots. It’s the remaining 7000 slots that would be awarded strictly on merit. We simply shift the threshold for merit-based admission to the Ivy League to an average SAT score of 770, or the 99.56th percentile.
- What about the deluded high-scoring applicant who chooses to apply to Carnegie Mellon instead of MIT? Or Williams, instead of Yale?
No problem: accept everyone with scores of 770 or better into the Ivy League, and if the number of those applicants comes in low, because of mis-directed applications to Carnegie, Williams, and their ilk, accept students scoring at 760, or even as low as 750 (2250 total score). That preserves the principle of merit-based admissions. The beauty of Pinker’s proposal: it scales. Start at the top and work down from there.
Taking apart the proposal
I hope to have established that Pinker’s rule is feasible, and would at least simplify and streamline the current onerous admissions process, while having a strong claim to be scrupulously fair, and totally based on academic merit, in so far as possible (can’t have an orchestra without violinists, or field a football team without a quarterback; and no reason to suppose that the SAT measures merit in these non-academic spheres).
The question now is whether the assumptions underlying the initial application of Pinker’s rule can pass muster. Is it true that:
- Any of the widely recognized metrics for academic merit is a perfect substitute for each of the others, allowing admissions to focus on the SAT alone;
- The SAT perfectly measures academic merit
- The SAT perfectly discriminates degrees of academic merit, down to its level of resolution (i.e., a score of 740 reflects greater merit than a score of 730).
Of course not; no one with any expertise in psychometrics can accept these assertions. No extant test of mental ability meets any one of these standards.
The problems: tests may correlate highly, but do not coincide exactly; academic merit may have several dimensions, so that no single linear scale can capture it; and all tests have errors of measurement, which reduce their true resolution to less than their stated resolution (a score of 740 is not to be understood as an exact scalar value, but as an estimate, as a prediction that the true score is 740, +/- 30, to capture the error of estimate).
Nonetheless, tests may be highly correlated, a single scale may capture the primary factor in a multidimensional concept like merit, and good tests are accurate within the bounds of their error. If the SAT is a pretty good test, then 740 really is a higher score than 680.
The fair question is whether Pinker’s rule still works, if we substitute “pretty good” for “perfect” in each of the three initial assumptions. Let’s play it out.
The challenge from psychometrics
Tests which are only pretty good cannot be perfect substitutes for one another. That produces multiple routes to achieving a position in the 99th percentile of all seniors applying to college. For instance, about two million students took the ACT test in 2014-2015, and about 10,000 received a score of either 36 (n = 1600) or 35 (n = 8500), placing them in the 99.92nd and 99.6th percentiles respectively—on this non-SAT test.
Unfortunately, the entire entering class of the Ivy League, after removing alumni, athletes, and violinists, only has about 7000 places, and these have already been filled by students scoring 770 (total 2320) and above on the SAT. There’s really no room for students scoring a mere 35 on the ACT. For that matter, once those 1600 students getting a perfect score on the ACT are deemed eligible for the Ivy League, we had better hike the SAT threshold, to make room. Both have to be deemed “pretty good” tests.
You see the problem: once we accept multiple routes to being judged among the top 1% of students, we also inflate the raw count of students with a claim to that status.
It gets worse. The SAT score distribution chart contains this caveat: “composite scores are not the best way to view SAT scores because important differences between the performances on each section are obscured.” In plain English: we might be more interested in the student who scores at the very top in math, as opposed to the all-around student who scores only near the top, but does so on each of several sections. That argues for looking at SAT Verbal and the SAT Math scores separately, and applying cut-offs to these scores rather than the composite.*
*Since the SAT writing subtest is going away, I’ll exclude it from what follows.
But now we are really in trouble. It is not possible to get a 99.5th percentile on the SAT math section. Because almost 17,000 students got an 800 on the SAT math section last year, 99.00th percentile is as high as that test measures. There were also 10,000 students who got an 800 on their SAT Verbal section. In turn, breaking up the ACT composite gives 32,000 students who got perfect scores on either the reading, math, or science sections.
If we translated Pinker’s rule to “you must have at least one perfect test score to be admitted to the Ivy League,” every available slot would still be filled eight to ten times over. And even this estimate is a low ball. We have not yet accounted for: 1) the SAT subject tests, which also allow for perfect scores; 2) students who take the SAT or ACT more than once, which implies that the number of applicants with perfect scores will exceed the count of test takers with perfect scores in any one period, as students make multiple attempts across periods to get a perfect score; or 3) AP or IB tests, where some exams indicate a 95th percentile score, and enough such scores might indicate a 99+ percentile student.
If we further translated Pinker’s rule as “Two perfect test scores, or no Ivy League for you,” that might reduce the number of applicants to match the 7000 slots available; but even that might not be enough (no one publishes cross-tabs across all tests, so we can only guess). We might have to further refine the criterion as “at least two perfect scores, and counting only your first try on each test.)
Given multiple eligible tests, converting the Ivy League admissions process to run strictly on test-based academic merit produces absurd results like “two perfect scores or out.” There simply aren’t enough slots available in the Ivy League for the volume of erstwhile applicants* (there were about 3.3 million high school seniors last year). We’d be forced to make judgments like, “a 790 Verbal won’t cut it here, maybe you should look at Northwestern?” or “too bad about that 99.6th percentile ACT score of 35—have you considered a state school?”
*You must read Frank Bruni’s essay in the New York Times. It’s both a tonic for the soul, and a role model for some of the mockery in this post.
Now what?
To this point, we have accepted the SAT and other standardized tests as at least “pretty good.” And they are pretty good, overall, per the research cited by Pinker, and especially if we narrow our scope to “how well do they distinguish academic ability among affluent suburban kids?” That way we don’t have to deal with issues of cultural bias etc.
Unfortunately, pretty good isn’t good enough to implement Pinker’s proposal. Remember, in Pinker’s Harvard world, we have Carnegie-Mellon, Williams and the rest to take the merely 99th percentile kids, the ones that manage just a single not quite perfect test score. And we have the better state schools to take the 90th percentile students. A pretty good test probably can discriminate the 90th percentile student from the 99th percentile student.
But to manage Ivy League admissions we need something more. We need a test that’s pretty good at distinguishing the 790 SAT student from the 800 student, or the 36 ACT student from the 35 ACT student. Turns out, no test is that good. The math is straightforward; see the sub-page.
Probing the underlying error
Existing tests cannot discriminate accurately enough to identify the top 7000 students deserving of admission to the Ivy League strictly on academic merit. Pinker’s solution can’t be implemented with today’s testing technology. In another sub-page, I contemplate what kind of future test might be constructed that could do the job; and also, why such a custom-designed super test might nonetheless fail of its aim.
Enough has been said to uncover the exemplary error behind Pinker’s proposal: that humans can be measured to any desired degree of accuracy, on a single linear scale. It’s the illusion that we can tell apart the better from the good, the way better from the somewhat better, and the absolute best from the merely outstanding, with perfect fidelity. It’s a fantasy to believe we can make this discrimination in any life sphere, in a few hours, using a paper and pencil test.
Only a university professor could take this fantasy at face value. Only a quantitatively-trained scientist could ever have conceived such a fantasy.
By contrast, we can measure matter or motion to any desired degree of accuracy: an ounce, a microgram, a molecule, an atom, an electron, or a quark can all be measured equally well. That’s why scholars trained in the physical sciences are so likely to over-estimate the accuracy of measurements taken on humans. I see it in engineers who go back to school to get an MBA.
Pinker makes the paradigmatic post-Enlightenment error: he extrapolates from the great successes in the physical sciences to the far more fraught issues in judging human potential, and measuring the worth of individual human beings. We might alternatively describe the exemplary error as a Marxism of the spirit. As money commoditizes labor, multiple choice questions commoditize human potential. All young people are placed on a single scale; every human capacity is priced out. Every Ivy League college likewise becomes interchangeable.
Pinker makes a second error, the characteristic error among contemporary psychologists. I call it the tyranny of the aggregate. [Michael Billig writes about this propensity in his How to Succeed in the Social Sciences.]
For example: a laboratory psychologist might put two groups of 20 students through an experimental treatment. One group performs an arithmetic task while listening to Mozart; the other group attempts the same task while exposed to a cacophony of city street noise: horns honking, engines revving, brakes screeching. The mean performance on the task, when listening to Mozart, is found to be significantly better, p < .05. By the time Malcolm Gladwell or the New York Times glosses the result of the experiment, the finding has been transmogrified to “music boosts mathematical ability.”
Two things have gone awry in the sequence of inferences that got us from experiment to popularized finding.
- The experimental finding might have been glossed as “loud noises interfere with concentration.” That’s an equally apt interpretation of the mean comparison, but is too boring to publish. It’s a common problem in social and consumer psychology. First you create a broken or aversive condition—loud noises. Then you compare it to an intact condition—any of the thousand ordinary, unbroken conditions found in everyday life, such as playing music in the background while working on a task. Next, forget that the mean comparison includes a broken half. Instead, reinterpret the loud noises condition as a “no music” condition. Last, soar upward from the specific experimental task—calculating sales tax, say—to the most general category of which it is an instance—here, mathematical ability. Et voila: listening to music improves mathematical ability.
- Next, let’s drill down to the experimental data, the set of 40 mean scores on the mathematical task assigned. An even more basic error of interpretation comes into view. Psychologists have taught themselves to stay focused at the level of aggregate performance. The mean task performance for the 20 subjects in the music condition was significantly higher; end of story. No one ever thinks to place the two sets of 20 raw scores side by side. You certainly don’t want to tell journal reviewers that the effect was driven by the four subjects in the music condition who got perfect scores, and the three subjects in the annoying noise condition whose performance was devastated. You want the reader to assume a uniform shift upward in the music condition; which readers all too readily do.
Given these two bad habits, once you can cite a study of SAT scores that shows a significant positive correlation with some specific outcome—even if it’s only “some other multiple choice tests, taken later, in college”– you are good to go. The possibility of a ceiling effect out at the extremes will never cross your mind. The significant correlation in the aggregate gets treated like gravity, and taken as uniform throughout its range. Play music, and you will do better on your math homework—any and every person who acts on this experimental finding. Select students with the highest SAT scores, and you’ll have the strongest possible entering class, with the greatest potential to contribute to society.
* See the sub-page titled “how to dismiss a correlation study” for more on ceiling effects and the blindness of correlation tests with respect to detecting a ceiling.
Part of what I did to construct a rebuttal of Pinker was to insist on looking at actual numbers: I called out the fact there are only 7000, at best, 17,000 slots in the Ivy League class to be allocated. With 3.3 million high school seniors, there’s only room for the 99.98th percentile. The absurdity of expecting a test, any test, to distinguish its 99.98th from its 99.97th percentile then makes my case. Factors other than test-measured academic merit have to be used to select students for admission into Harvard or any other elite institution.
Conclusion
Let’s return to this quite nonsensical statement of Pinker’s: “(“it’s common knowledge that Harvard selects at most 10 percent (some say 5 percent) of its students on the basis of academic merit.” This statement would have some meaning if a linear regression had been constructed to predict Harvard acceptance. It is conventional to translate R2 to “percent of variance explained.” The claim would be that knowing SAT scores would only improve prediction of Harvard acceptance by 5% or 10%. But acceptance is a binary outcome, indicating a logistic regression; and there is no equivalent, for that technique, of percent of variance explained.
Pinker is suing a rhetorical technique—the appearance of quantification—to buttress his opinion statement: that Harvard isn’t paying enough attention to academic merit in its admissions decisions.
But as we’ve seen, Harvard can’t just take the highest scores on the SAT—too many bodies. On the other hand, statistics indicate that to be at the 75th percentile among admitted students at Harvard requires very strong test scores indeed, well up in the 99th percentile (to see this data for any school of interest, search on the school name plus the phrase “student profile”). Harvard is already admitting lots of kids way out on the tail of the test score distribution. How can we reconcile Pinker’s perceptions, that academic merit plays too little a role, and my calculations in this essay, which show that academic merit can’t be the sole criterion for Ivy League admission?
I suspect that what a Harvard does is divide the tens of thousands of applications received into two streams: 1) special cases (alumni, athletes, affirmative action, whatever); and 2) the rest, which will be the bulk of applications, 20,000 or more. Triage is performed on this second stream. The top 3000-5000 or so based on test scores and similar metrics (e.g., top of his class at prominent private school X) are set aside to be read by senior admissions staff. The remaining 15,000 to 20,000 or so are handed off to junior staff, who make a quick perusal to see if there is anything extraordinary hidden in the essays or the record, for these applicants who have pretty strong but not superlative scores. Shouldn’t take more than 30 seconds per file to fish out the few extraordinary cases.
Back to the 5000 applications being read by senior staff, all with superlative test scores: that’s still way too many for the 700+ slots to be awarded on merit. Here is where essays, extra-curricular activities, and intangible fit are assessed. There are too many top-scorers to admit; extra-academic factors have to be brought into play. Those extra-academic factors lead to up to 90% of the top pool being turned away or waitlisted; and that’s what causes Pinker to lament that academic merit drives only 10% of Harvard admissions decisions. But 90% of the top pool had to be discarded, somehow, some way, to make the numbers work.
The root problem: it’s been decades since Pinker faced a class full of 85-90th percentile students. He misinterprets the disinterest in class and diffident anti-intellectualism of the students he does see as artifacts of a cock-eyed Harvard admission process; where in fact, these traits have always been characteristics of American undergraduates, now, 50 and 100 years ago, whether at Harvard or at State U. Few college students will grow up to be professors. Tsk, tsk. With a nod to Mr. Deriew, even fewer will grow up to be English professors. Very disappointing.
Much later in his essay, Pinker discovers the real reason students attend Harvard or any other Ivy League school: prestige. Students want a hall pass to join the elite. Families already part of the elite need to reproduce that status.
* The late sociologist Pierre Bourdieu wrote eloquently on these matters. Before tossing off a casual endorsement of the entrance exams used in Europe to admit students to elite schools, as does Pinker, please look over Bourdieu’s book translated as The State Nobility. It will not increase your confidence in such exams.
Pinker is at Harvard for all the right reasons; however much he enjoys the prestige, he earned his position fair and square, and appreciates Harvard for its real virtues—the great libraries, the fruits of a rigorous meritocracy in faculty selection, yada yada. But those virtues do not explain why there’s a crush at the door among 18 years olds hoping to be admitted to an Ivy League school.
* Regarding Pinker the scholar: It goes without saying, but I’ll say it anyway, that I would not spend an entire essay critiquing the views of someone who lacked stature, or whose acumen I did not respect. If you haven’t read Steven Pinker, you should. He has a lot to offer.
If we look at Harvard and the Ivy League through the lens of social class reproduction, everything clicks into place. Of course most Harvard students have scintillating test scores: there’s tens of thousands of such students produced every year in an America of 300 million souls. And of course, Harvard also turns away thousands and thousands of students with equally scintillating test scores, producing a superficial appearance of academic merit playing only a small role in admissions.
Test scores are only a ticket to the real contest, the winnowing that must be performed among the meritorious to determine who shall be named a Harvard man (or woman). If Harvard wants to reproduce its own class position at the top of the heap, it behooves the admissions committee to seek out the following:
- the student with a dominant personality and extraordinary energy, who is going to be attractive to Goldman Sachs et al., and who is going to succeed out there in the material world, bringing glory to Harvard, and money.
- The student from a good family, defined as a family already part of the power structure, a family that will be grateful to Harvard for helping them to maintain their own position, and will show that gratitude in concrete ways.
- The star, the truly outstanding polymath, someone whose success and fame is almost certain, someone with whom Harvard would like to be associated.
Don’t get me wrong; the admission committee doesn’t cross-check applicants against the Social Register. There are far more subtle, largely unconscious ways to reach the same end. And mistake me not: every year, Harvard admits one or two hundred nobodies, students who don’t fit any of the above criteria, but who showed such unusual pluck as to make them a good bet to produce children who might one day fit those criteria.
Last, if we put on our hard-headed economist hat, and approached Harvard University as an institution, now centuries old, dominant in America for almost as long, with every hope and desire of continuing that dominance for centuries to come; and if we asked how a self-interested institution could best sustain its top position for the indefinite future; then we would conclude that its admission decisions should hew to the three criteria above, and not be based solely on test scores or any other scale of pure academic merit. Only a life-tenured professor could think otherwise.
That’s what Pinker got wrong about college admissions. It’s a very American error; for no one wants to talk about class here.
[…] principle that most differences, that far out on the tail of a statistical distribution, are noise; and that even any real differences are irrelevant to success outside the testing room, even in […]