Political Polls and the (Mis)Use of Statistics [Reader Post]

Loading

The political campaign season is upon us (Did Obama ever leave it?), and we are currently being inundated with polls about all kinds of subjects, such as debt ceiling compromise, or preidental approval polls, or who won the Republican debate, so being an informed citizen and knowing how to validly interpret poll results is imperative. With the MSM bias when reporting poll results, this guide is even more important. Being knowledgeable will have you screaming at your TV and/or newspaper.

First, this is NOT meant to be an introduction to statistics; far from it. This is nothing more than some famous and humorous quotes and an explanation of how politicians and the MSM (can) misuse statistics.

Second, let me establish my bona fides, definition 4. I have a Ph.D. in statistics from the Florida State University. I have also done quite a bit of consulting on (among other things) marketing projects, so I have taken samples and formulated questions to ask. The companies with which I consulted are doing well, so I must have done something correct!

Third, take a moment to look at these two very short, excellent articles about polls by Rosslyn Smith: Poll Games: when the goal is not to inform but to persuade, and Poll Games: why one should always follow the link to the poll itself, concentrating on the crosstabs while ignoring the media spin. These articles explain why this post is important.

Fourth, before y’all all go glassy eyed, the subject of “statistics” has NOTHING to do with mathematics. It is quite unfortunate that statistics has been lumped in with mathematics. The reason for the “lumping” (IMHO) is that (1) almost all early statisticians were mathematicians, and (2) before the advent of widely available computers, the mathematical formulas used were nothing more than short-cuts to make statistical calculations easier.

We have all seen the quote from Benjamin Disraeli, Prime Minister of Great Britain under Queen Victoria: There are three kinds of lies: lies, damned lies, and statistics. Benjamin Disraeli, British politician (1804 – 1881), and and some other attributions.

Here are some more humorous (?) quotes about statistics and statisticians:

  • Statistics: The only science that enables different experts using the same figures to draw different conclusions. Evan Esar (1899 – 1995)
  • Statistician: A man who believes figures don’t lie, but admits that under analysis some of them won’t stand up either. Evan Esar
  • Statistics can be made to prove anything – even the truth. Author Unknown
  • There are two kinds of statistics: the kind you look up and the kind you make up. Rex Stout

Just Remember, This Is A Quick (and dirty) Guide

First, NOTHING is ever proven with statistics. All statistics can ever do is provide additional information to backup a decision and/or judgment.

“Statistics are no substitute for judgment”: Henry Clay.

Statistics may be defined as “a body of methods for making wise decisions in the face of uncertainty.”: W.A. Wallis, famous statistician

When you hear on a TV commercial that something is “clinically proven” to work, that statement is referring to a sample that exhibits characteristics that, when viewed by a reasonable person (whatever that is), that person would interpret the results as “proof.” (see “significance level” and hypothesis testing below)

Second, there is a difference between a parameter (the unknown in which we are interested that is from an entire population) and a statistic (calculated from a sample or subset of a population). Populations are usually too large to observe all members, so a sample (a subset) of the population is taken, a statistic is calculated from the sample that estimates the unknown population parameter, and, based on the calculated statistic, a decision is made.    So Ernie Banks had it wrong when he said, “Awards mean a lot, but they don’t say it all. The people in baseball mean more to me than statistics”.    What are called “statistics” in baseball are actually “parameters.” Think about it: in baseball EVERY at-bat (a parameter, not a statistic) is included.   This link, slides 2 through 4, explain the difference between parameters and statistics.

If the sample (subset) is randomly drawn (by random, statisticians mean that every member of the population being studied [referred to as the population of interest] has an equal chance of being included in the sample), then a statistic calculated from the sample can be useful for making a decision. Notice that a statistic is never correct, hence the “margin of error” (see below).

When a sample is taken (drawn) from a population, an error occurs – never is the sample an exact subset of the population from which it is taken. So any statistic calculated, by definition, is also in error. Again, hence the “margin of error.”

The larger the sample, the less likely that it will vary from the population from which it is taken. So pollsters like to take “large” samples. Large samples reduce the “margin of error.” But (and there is always a “but”) the larger the sample, the more it costs. So pollsters try to strike a balance between accuracy and cost. That balance is usually expressed in the “margin of error.” Reducing the “margin of error” means increasing costs (and vice versa).

Of course, the “margin of error” cannot be meaningfully reduced, regardless of sample size, if the sample is taken from a non-representative population. That is why you should read carefully about the population being studied. For example, there is a difference between a population of “voters” and a population of “likely voters.”

Third, there are four levels of measurement. Measurement Scales and Permissible (valid) statistics are explained. ANY statistic calculated from inappropriately measured data cannot validly be interpreted and are therefore meaningless, especially when trying to make a decision.

Fourth, statistics NEVER shows causality. Causality is a management interpretation.    “The invalid assumption that correlation implies cause is probably among the two or three most serious and common errors of human reasoning,” says Stephen Jay Gould, in The Mismeasure of Man.

Margin of Error or Confidence Interval

Technically, there IS a difference between a margin of error and a confidence interval (so, please, all you statisticians out there, don’t write to me – we are basing our decision on only one sample), but for our purposes here we can consider them to be equivalent.

You often hear politicians, their campaign managers, and media types say, “Polls show the race within the ‘margin of error’.” You can interpret it that way if you want, but that is an incorrect interpretation of the margin of error, what is properly known as a “confidence interval.” This link explains what a confidence interval is, and how it is properly interpreted. So the next time you hear that the race is “within the margin of error,” disregard it as wishful thinking. Sample results are what they are. The “margin of error” simply gives you a “feel” for what error could have been committed.

For example, say (for illustrative purposes only) that a poll result is reported as “the citizens are in favor of this initiative, 60% to 40%.” The margin of error is ± 3%, with a confidence level of 95%. What this means is that the pollster is (at least) 95% confident that he/she is correct when he/she states that the true population parameter (unknown) of citizens favoring the initiative is between 57% (60% – 3%) and 63% (60% + 3%). Notice that the sample size does not have to be reported. Notice, also, that the population being studied is assumed to be representative of citizens who will vote on the initiative in question.

We have not gotten into how the question was phrased, nor how it was asked. That is another subject in and of itself.

Confidence and Significance Level

You often hear someone specify the “confidence level” or “significance level” at 95% or 99%. They are NOT the same thing! Hopefully, this link will explain the difference.

The confidence level refers only to a confidence interval, and refers only to the probability (usually expressed as a percentage) that the calculated confidence interval embraces the (unknown) population parameter.

The significance level refers only to hypothesis testing. A hypothesis is nothing more than a statement or belief held by a manager. That statement is either correct or incorrect. The “significance level” states, unambiguously (usually as a percentage), how “confident” you are when saying that the hypothesis is incorrect. There is a lot more to this, but we can ignore it for now.

From this explanation I hope you can see how confusion has arisen. BTW, there is nothing sacred about a 90% or 95% or 99% confidence level or significance level. Those “levels” were chosen for convenience.

Why bother with all of this?

If you are a politician (or any kind of manager, political or otherwise, or just a political junkie) you (are paid to) make decisions. Just guessing won’t do. You had better have some (valid) analysis to back yourself up. You never know when you may encounter someone like me.

Ultimately, the onus is upon you, the information consumer, to make a decision. So being forewarned is to be forearmed.

”Statistics is the grammar of science.” Karl Pearson, famous statistician.

FWIW, here is a great statistics glossary.

But that’s just my opinion.

0 0 votes
Article Rating
Subscribe
Notify of
9 Comments
Inline Feedbacks
View all comments

WB, you said you are a person who could properly formulated questions to ask for a proper poll.
I just saw this from Gallup:
“From what you know or have read about it, would you want your member of Congress to vote for or against a jobs bill similar to the one President Obama has proposed?

The results based on a random sample of 1,010 adults, aged 18 and older, living in all 50 U.S. states and the District of Columbia was that (Sept 12-13) 45% would want it voted for; 32% would not want it voted for; and 23% don’t know.

Would you have written the question differently?

More here.
http://www.gallup.com/poll/149447/Americans-Favor-Obama-Jobs-Plan.aspx

Methodology, full question results, and trend data:
http://www.gallup.com/file/poll/149450/Obama_Job_Plan_110914.pdf

Nate Silver was interviewed yesterday on Keith Olbermann’s show, concerning the implications of the two special House races yesterday.

Silver is unquestionably the rising star in polling analysis. His projections/predictions in both 2008 and 2010 were dead on. I think it’s evident that his personal political leanings are in the Leftward direction, but, unlike Rasmussen (in the other direction), Silver’s personal leanings don’t contaminate his work.

I got the idea that Silver pretty much thinks that Obama is dead meat, 2012 general-election-wise.

– Larry Weisenthal/Huntington Beach, CA

I am also a mathematician.
I would like to strongly emphasize the concept of the random choice of a sample.
Look before you leap: in any poll, look to see how the percentages of different parts of the voting population are represented in the sample. Ideally these percentages should be about the same as the population as a whole. This is not the case in many polls.
Look before you leap: in any poll, the answers can be slanted by the way in which the questions are asked. Pollsters learned long ago how to bias the result by the order in which questions are asked and the way in which each question is asked. Are the actual poll questions published?
In most polls, the sample is biased: polls made by telephone only reach those persons who answer the phone when called by an unknown caller, and are home at the time of the call. This leaves out a relatively large segment of the population. For obvious reasons, pollsters are reluctant to reveal the size of their sample population who do not respond at all to the telephone call. Polls also suffer from a non-response phenomenon: if a person agrees to take the poll, then refuses to answer a question, that non-response is often not counted.
Some have gone so far as to argue that our entire voting system is not representative. Not all eligible persons are registered. Not all of those who are registered actually vote (in primaries it can be as few as 20%). India does far better, with voting spread out over three days.
Methods to improve this system invariably run into roadblocks, which roadblocks are set up by the incumbents who were elected under the old rules and want to be re-elected. Hence the vociferous objections to “voter ID” or requiring a photo ID at the polling place.
The objections are preposterous, in view of the necessity for photo ID to cash a check, board an interstate bus, board a train, or fly on a plane. And in view of the minuscule amount of money necessary to obtain such an ID.
The most important polls are those taken by the same organization over a period of time, where trends can be seen, anyway.

It is a well known fact that 5 out of 4 people have trouble with fractions.

As for statistics, half of the people don’t believe them.

Half of the people don’t use them.

And half of the people rely on them.

Oops, three halves.

*shrug*

Well I told you 5 out of 4 people have trouble with fractions…
.
.

Excellent article. The lone statistics course I took when studying engineering was almost useless, and when I went to work in R&D I had to learn ‘real’ statistics on the job. Fortunately, we had some good in-house teachers. I wish someone had taught something like the above as a freshman.

I have tried to explain to laymen what statistics are really good for, and it is very difficult. So, I like the definition of ‘a body of methods for making wise decisions in the face of uncertainty.’ Most people’s eyes will still glaze over, but it does explain it pretty succinctly.

@Nan G: Nan G, re: comment # 1
First, far be it for me to tell Gallup how to phrase a question.

Second, rather than rephrase the question, look at WHAT is asking: “From what you know or have read about it, …” How many people do you think have read all 155 pages and/or know more than superficially about it? IMHO the results are a “popularity contest” orchastrated by the MSM. It precisely illustrates the point I am trying to make: think about what the question is REALLY assessing before deciding. As mathman succinctly says in comment # 3: look before you leap. Very good question and point, Nan G.

For years I have said that when results of a survey or poll are given, the actual questions should be given and how many times the survey or poll was administered before they got these results. I also want to know how much was paid to the outfit that took the survey, and how many other surveys did they do for this company and other companies, and what percentage were favorable?

There is a lot of money to be made in the survey business. One negative result and a company might go to another survey outfit. How do we know there was even a survey done?  I don’t go by surveys at all.

Garbage in garbage out. Remember that it is also the pollsters who rate the accuracy of their own polls, so take the results of all polls with a variable amount of salt dependent on the reputation of the pollster’s impartiality. Statistics are only accurate as the statistical criteria of, data gathering process and interpretation of the data allow. Statistics are not fact, they are a collection of data with suggestions of what that data might mean, The problem with statistics is that the data may not necessarily support the interpretive result given. This especially true if the data gathering introduces it’s own bias.

Philosopher: “Deep Thought, is there is an answer to “the Ultimate Question of Life, the Universe, and Everything?”
Deep Thought“Yes.”
Philosopher:“What is the answer?”
Deep Thought“I’ll have to think about it”
(7½ million years later)
Descendants of Philosopher:“Deep Thought, what is the answer to “the Ultimate Question of Life, the Universe, and Everything?”
Deep Thought“You’re not going to like it.”
Descendants of Philosopher:“Please, what is the answer?”
Deep Thought“42.”

It’s not just about the question.

anticsrocks: you have plagarized one of my favorite t-shirts.
It always get a laugh.
Ditto: you left out the actual “ultimate question”:
What is 6 times 9?
Which makes the gag even better!