On 7/29/09 3:09 PM, "Alexander Hoffman" <ahoffman

aledev.com> wrote:
> On Jul 28, 2009, at 10:51 AM, John C. Welch wrote:
>> As much as I love Consumer Reports, surveys that rely on human
>> memory have
>> all the scientific validity of "I like cake",
>
>
> That -- or at least the implication -- is quite untrue, as anyone who
> has truely studied survey design and analysis methodology could tell
> you. (If you are interested, some great texts are out there by F.
> Fowler, A. Gelman, R. Goves, S. Lohr, R. Parker, L. Rea. Those are
> just the main book authors I've read on the topic. There are journals
> devoted to this stuff, and articles in other journals, too. There are
> classes at every major university. If you want to learn about this the
> survey methodology, there are a lot of ways to do it.)
>
> The key is to make sure that that any potential biases affect all
> groups equally. To the extent that they do not, account for that in
> your analysis and/or your write up.
Nonsense. In fact, unlike most other surveys, CR is upfront about the fact
that reader score measures overall satisfaction, and that is nothing but a
big batch of bias:
"Reader score reflects respondents¹ overall satisfaction with their
cell-phone service and is not limited to factors listed under connectivity
results."
translation: how happy are you with your results.
Unlike most "satisfaction surveys", CR also admits the limitations of their
survey:
" Respondents might not reflect the general U.S. population."
Finally, from "How we survey":
" We conduct most of our surveys by polling a portion of the several million
readers who subscribe to Consumer Reports or to ConsumerReports.org. Our
biggest effort, the Annual Questionnaire, is sent to all subscribers each
spring. Our surveys of our subscribers afford us very large sample sizes and
permit extensive and detailed analysis, which allows us to rate a large
number of brands for quality of service and for product reliability. We also
survey consumers outside our readership to get the most accurate
representation of U.S. households."
So there's a bias in the sample source, as most of them are CR subscribers,
who, by nature, are not 'average' customers. More bias.
None of this is *bad*, but it is there, and it is unavoidable. The same
thing shows up in the bias that creeps into phone polls. If you run the
polls during 'normal' working hours, you're going to skew your sample to the
young, the retired, and those who work from home, the majority of which are
going to be stay at home parents. You will however, miss the vast majority
of americans with jobs outside the home. So you'll have a valid sample size,
but it's going to be terribly skewed, especially when you take into account
the demographics of who will even do phone polls in the first place.
>
> The specific problem that Mr. Welch specifically cites (i.e. that
> greater impact on memory of poor results than of more average results)
> impacts all cellular service providers equally. Therefore, it cancels
> out.
The survey measured 23 cities. Of those cities, all are fairly large. I
think St. Louis or Charlotte would be the 'smallest' cities in the group. So
this survey is biased already, in that all respondents live in largish
metropolitan areas. It's of little to no use to people outside of large
metropolitan areas. For example, the closest city to me in the survey is
Jacksonville FL, followed by Atlanta. Jacksonville is several hours away
from, and quite a bit larger than Tallhassee, Atlanta even more so.
We haven't even looked at the specific stats measured, and we already see
evidence of noticeable bias in the source for the survey: CR subscribers in
major metropolitan areas.
>
> The more general problem that Mr. Welch cites (i.e. manipulation of
> memory) is certainly an issue with interviews and more qualitative
> data gathering. However, the surveys at issue here are quantitative
> and everyone faces the same questions. Again, whatever issues there
> might be cancel out. Of course, survey decision (e.g. wording of
> questions) is very important. However, if the same question is
> mechanically asked of every brand mentioned even those problems tend
> to cancel out -- at least for the purposes of comparing brands.
Actually, they didn't even use the same brands consistently. For example,
our of the 23 cities measured, Alltel only shows up in 4. It happens to do
well, but what is left out in the survey itself is that alltell is part of
verizon. This is mentioned in the separate "Guide to cell phone carriers",
but that's not part of the survey.
>
> In fact, Mr. Welch makes a horrible comparison. These surveys at
> issues have thousands of respondents, and therefore have fare greater
> reliability than a single person saying "I like cake." Moreover, "I
> like cake" is presumably a voluntary qualitative answer, quite
> different than the forced-choice quantitative answers in these surveys.
Actually, "overall satisfaction" is VERY much a "I like cake" question,
because satisfaction is not objectively measureable. The concept is
subjective, no matter how hard you try to make it otherwise, and CR, to
their credit, admits that in the survey. Again:
"Reader score reflects respondents¹ overall satisfaction with their
cell-phone service and is not limited to factors listed under connectivity
results."
Another problem with the survey: The connectivity portion only covers a
week.
" Connectivity reflects how many times respondents said they experienced
these problems making calls on their phones in the previous seven days: no
service, circuits full, dropped calls, and static, or difficulty hearing. "
That is a TERRIBLY small amount of time to judge a 24x7 service on, and when
combined with the limitation to major metropolitan areas, which, due to the
nature of RF, and general topology, (artificial and natural), are going to
have the most issues with cell coverage in the first place.
Another point from the survey:
" Differences of fewer than seven points are not meaningful."
Well, the average difference, out of the four majors, from best to worst,
was 11.39 points. If you discard the bottom performers, you only have a
meaningful difference in performance between the top 3, (top 2 in NYC) in 15
of 23 markets. So in 35% of the markets surveyed, there's not even a
meaningful difference between the top three performers. In other words, it's
a wash.
As to the bias that 'opinion' creates, that you can supposedly get away
from, in the case of 8 cities, the top and bottom rated carriers had either
identical connectivity ratings, or differed by only one part of a four part
scale in a single column. *Objectively*, or as objectively as human memory
can be, in 8 cities, #1 and #4 were the same or almost the same. So much for
personal opinion not being a deciding factor. In fact, in 9 cases, the
bottom-rated company was 'objectively' *better* than carriers with a higher
opinion rating.
I'm not seeing where personal bias was parsed out.
>
> And last, when it comes to scientifically determining whether or not
> Mr. Welch likes cake, can someone propose a superior methodology than
> asking him, in a open-ended fashion what he thinks about cake? From a
> scientific perspective -- points out this Harvard and Columbia trained
> researcher -- that's a perfectly valid statement. (It's resulting
> inferences about generalizability that one might make that might be at
> issue, but that's more an issue of sample size than of anything Mr.
> Welch is alluding to.)
Testing signal quality can be done via any number of methods that completely
remove human memory and opinion, and can be done across a far longer period
of time than a week. Measuring RF strength and signal quality is a
well-solved problem. By testing across a longer time period, you remove
abberations. For example, anyone trying to use AT&T in parts of San
Francisco the weeks of Macworld or the WWDC is going to have a rather
difference than they would at other times of the year. But if that's the
week they're asked to remember, then the results are artificially depressed.
>
> Frankly, the positivist perspective that Mr. Welch seems to be
> advocating is a product of 19th and early 20th century thinking (i.e.
> "modernist" thinking). Those of us who actually know something about
> research methodology understand the it has come quite far in the last
> 100 years.
When you show me a better survey that isn't so blatantly biased towards
opinion and personal bias, I'll take it more seriously.
--
John C. Welch