"Tom Knapp's books and articles are at the top of my list
for statistical literacy." Milo Schield (2012). For more
information, see his home page:
Tom is Professor Emeritus of Education and Nursing at the University of
Rochester and The Ohio State University.
Latest Unpublished Book:
Quantitative Research Methods
2016. [Highly recommended! An excellent collection of
important ideas. Ed.]
Significance Test, Confidence Interval, Both or Neither?
n: Answers to "How
Many?" questions 2012
Correlations to Teach or Learn Statistics. 2012
Ordinal Scales as Ordinal Scales. 1993 Nursing Research
Treating Ordinal Scales as
Interval Scales: An Attempt to Resolve the Controversy. 1990
Instances of Simpson's Paradox.
MAA College Math. Journal, 16:3, 209–211 1985.
Knapp, T. R. (1977) The unit-of-analysis
problem in applications of simple correlation analysis to educational
research. Source : Journal of Educational [and Behavioral] Statistics, 2,
171-186, 1977 Abstract: http://jeb.sagepub.com/content/2/3/171.abstract
Knapp, T.R. (1982) 'The unit and context of the
analysis for research in educational administration', Educational
Administration Quarterly, 18(1), 1-13. Abstract: http://eaq.sagepub.com/content/18/1/1.abstract
Visual Analog Scales
iN versus (N-1)
To pool or not to
What is the meaning of the word "random"? What
is the difference between random sampling and random assignment? If and when
a researcher finds that some data are missing in a particular study, under
what circumstances can such data be regarded as "missing at random"? These
are just a few of the many questions that are addressed in this monograph. I
have divided it into 20 sections--one section for each of 20 questions--a
feeble attempt at humor by appealing to an analogy with that once-popular
game. But it is my sincere hope that when you get to the end of the
monograph you will have a better understanding of this crucial term than you
had at the beginning.
To give you some
idea of the importance of the term, the widely-used search engine Google
returns a list of approximately 1.4 billion web pages when given the prompt
"random". Many of them are duplicates or near-duplicates, and some of them
have nothing to do with the meaning of the term as treated in this monograph
(for example, the web pages that are concerned with the rock group Random),
but many of those pages do contain some very helpful information about the
use of "random" in the scientific sense with which I am concerned.
I suggest that
you pay particular attention to the connection between randomness and
probability (see Section 2), especially the matter of which of those is
defined in terms of the other. The literature is quite confusing in that
There are very
few symbols and no formulas, but there are LOTS of important concepts. A
basic knowledge of statistics, measurement, and research design should be
sufficient to follow the narrative (and even to catch me when I say
something stupid). Thanks for stopping by, and enjoy!
Table of Contents
Section 1: What is
the meaning of the word "random"
Section 2: Which
comes first, randomness or probability?
Section 3: Are
randomness and chance the same thing?
Section 4: Is
randomness a characteristic of a process or a product?
Section 5: What is
a random-number generator?
Section 6: Where
can you find tables of random numbers?
Section 7: What are
tests of randomness?
Section 8: What is
"random sampling" and why is it important?
Section 9: What is
the difference between random sampling and random assignment?
Section 10: What is
the randomized response method in survey research?
Section 11: Under
what circumstances can data be regarded as missing at random?
Section 12: What is
a random variable?
Section 13: What is
the difference between random effects and fixed effects in experimental
Section 14: Can
random sampling be either with replacement or without replacement?
Section 15: What is
stratified random sampling and how does it differ from stratified random
Section 16: What is
the difference between stratified random sampling and quota sampling?
Section 17: What is
Section 18: Does
classical reliability theory necessarily assume random error?
Percentages: The Most useful Statistics Ever Invented
Table of Contents
Chapter 1: The
Percentages and probability
Chapter 4: Sample
percentages vs. population percentages
Statistical inferences for differences between percentages and ratios of
Percentage overlap of two frequency distributions
Dichotomizing continuous variables: Good idea or bad idea?
Percentages and reliability
You know what a percentage is. 2 out of 4 is 50%. 3 is 25%
of 12. Etc. But do you know enough about percentages? Is a percentage the
same thing as a fraction or a proportion? Should we take the difference
between two percentages or their ratio? If their ratio, which percentage
goes in the numerator and which goes in the denominator? Does it matter?
What do we mean by something being statistically significant at the 5%
level? What is a 95% confidence interval? Those questions, and much more,
are what this book is all about.
In his fine article regarding nominal and ordinal
bivariate statistics, Buchanan (1974) provided several criteria for a good
statistic, and concluded: “The percentage is the most useful statistic ever
invented…” (p. 629). I agree, and thus my choice for the title of this book.
In the ten chapters that follow, I hope to convince you of the defensibility
of that claim.
The first chapter is on basic concepts (what a percentage
is, how it differs from a fraction and a proportion, what sorts of
percentage calculations are useful in statistics, etc.) If you’re pretty
sure you already understand such things, you might want to skip that chapter
(but be prepared to return to it if you get stuck later on!).
In the second chapter I talk about the interpretation of
percentages, differences between percentages, and ratios of percentages,
including some common mis-interpretations and pitfalls in the use of
Chapter 3 is devoted to probability and its explanation in
terms of percentages. I also include in that chapter a discussion of the
concept of “odds” (both in favor of, and against, something). Probability
and odds, though related, are not the same thing (but you wouldn’t know that
from reading much of the scientific and lay literature).
Chapter 4 is concerned with a percentage in a sample
vis-à-vis the percentage in the population from which the sample has been
drawn. In my opinion, that is the most elementary notion in inferential
statistics, as well as the most important. Point estimation, interval
estimation (confidence intervals), and hypothesis testing (significance
testing) are all considered.
The following chapter goes one step further by discussing
inferential statistical procedures for examining the difference between two
percentages and the ratio of two percentages, with special attention to
applications in epidemiology.
The next four chapters are devoted to special topics
involving percentages. Chapter 6 treats graphical procedures for displaying
and interpreting percentages. It is followed by a chapter that deals with
the use of percentages to determine the extent to which two frequency
distributions overlap. Chapter 8 discusses the pros and cons of
dichotomizing a continuous variable and using percentages with the resulting
dichotomy. Applications to the reliability of measuring instruments (my
second most favorite statistical concept--see Knapp, 2009) are explored in
Chapter 9. The final chapter attempts to summarize things and tie up loose
There is an extensive list of references, all of which are
cited in the text proper. You may regard some of them as “old” (they
actually range from 1919 to 2009). I like old references, especially those
that are classics and/or are particularly apt for clarifying certain points.
[And I’m old too.]
through Playing Cards (1996 Sage; 2003, 2012)
A one-of-a-kind volume, Learning Statistics Through
Playing Cards uniquely utilizes a simple deck of playing cards to explain
the important concepts in statistics. Covering many of the topics included
in introductory college statistics courses, author Thomas R. Knapp escorts
the student through populations and variables, parameters, percentages,
probability and sampling, sampling distribution, estimation, hypothesis
testing, and two-by-two tables. Each chapter ends with a series of exercises
designed to help the student actually manipulate the concept under
discussion (the answers are provided at the back of the text). Also included
is an annotated bibliography that directs the student toward further
readings. This simple approach to teaching the elementary principles of
statistics and probabilities makes this an exceptional supplementary text
for undergraduates and first-year graduates in the social, behavioral, and
of Measuring Instruments (2009)
[Tom is continually revising his reliability
book. You can read the latest version, download it, or whatever, by
visiting his www.tomswebpage.net website.]
Can you say "reliability" without saying "validity"?
(Can you say "Rosencrantz" without saying "Guildenstern"?) I hope so,
because this book is all about reliability, except for five appendices
in which I discuss validity and for occasional comments in the text
proper regarding the difference between reliability and validity. But
isn't validity more important than reliability? Of course; a reliable
instrument that doesn't measure what you want it to measure is
essentially worthless. The problem is that the validity of a measurement
device ultimately relies on the subjective judgment of experts in the
field (all of the current emphasis on construct validity to the contrary
notwithstanding), and my primary purpose in writing this book is to
pursue those statistical features of measuring instruments that tell you
whether or not, or to what extent, such instruments are consistent.
There are 14 chapters in the book. Chapter 1 is an
introductory treatment of the concept of reliability, with special
attention given to its many synonyms and nuances. The following chapter
addresses the associated concept of measurement error, with an extended
discussion of "randomness". Chapter 3 is devoted to classical
reliability theory and is the most technical section of the book, but if
you think back to your high school mathematics you will recognize the
similarity to plane geometry, with its counterpart definitions, axioms,
and theorems. (It is assumed that you are also familiar with descriptive
statistics such as means, variances, and correlation coefficients, and
with the basic principles of inferential statistics.)
Chapters 4 and 5 treat, respectively, the concept of
attenuation and the interpretation of individual measurements. In
Chapter 6 I try to summarize the literature regarding the reliability of
difference scores of various types and the controversies concerning some
of those types.
The matter of the reliability of individual test items
is explored in Chapter 7. Discussion of the internal consistency
reliability of the total score on a test that consists of more than one
item (the usual case) follows naturally in Chapter 8, where the primary
emphasis is on coefficient alpha (Cronbach's alpha). That chapter
(Chapter 8) also includes a brief section in which I point out the
methodological equivalence of internal consistency reliability and both
inter-rater and intra-rater reliability.
Chapter 9 on intraclass correlations is my favorite
chapter. Although their principal application has been to the
reliability of ratings, they come up in all sorts of interesting
contexts, including those concerned with the unit of analysis and the
independence of observations.
Relative agreement vs. absolute agreement and ordinal
vs. interval measurement provide the focus of Chapter 10. Most
discussions of instrument reliability are concerned with the relative
agreement between two equal-status operationalizations of a particular
construct, but some are devoted exclusively to absolute agreement.
Likert-type scales and other instruments that do not have equal units
require special considerations. (Some of this material was originally
included in various other chapters in previous editions of this book.)
Chapter 11 is concerned mostly with statistical
inferences from samples of "measurees" to populations of "measurees",
but some attention is also given to statistical inferences from samples
of “measurers” to populations of “measurers”.
In Chapter 12 I try to bring everything together by
applying classical reliability theory to a set of data that were
generated in a study of alternative ways of measuring height. (The data,
which have been graciously provided to me by Dr. Jean K. Brown, Dean,
School of Nursing, University at Buffalo, State University of New York,
are in Appendix A.)
The following chapter (Chapter 13) deals with a
variety of special topics regarding instrument reliability. And a final
chapter (Chapter 14) attempts to extend the concept of reliability of
measuring instruments to the reliability of claims.
There is an appendix (Appendix B) on the validity of
measuring instruments in general, an appendix (Appendix C) on the
reliability and validity of birth certificates and death certificates,
an appendix (Appendix D) on the reliability and validity of height and
weight measurements, an appendix (Appendix E) on the reliability and
validity of the four gospels, and an appendix (Appendix F) on the
reliability and validity of claims regarding the effects of secondhand
smoke. A list of references completes the work.
The book is replete with examples of various
measurement situations (real and hypothetical), drawn from both the
physical sciences and the social sciences. Measurement is at the heart
of all sciences. Without reliable (and valid) instruments science would
You may find my writing style to be a bit breezy. I
can't help that; I write just like I talk (and nobody talks like some
academics write!). I hope that my informal style has not led me to be
any less rigorous in my arguments regarding the reliability of measuring
measurements. If it has, I apologize to you and ask you to read no
further if or when that happens. You may also feel that many of the
references are old. Since I am a proponent of the "classical" approach
to reliability, their inclusion is intentional.
I would like to thank Dr. Brown and Dr. Shlomo S.
Sawilowsky (Wayne State University) for their very helpful comments
regarding earlier manuscript versions of the various chapters in this
But don't hold them accountable for any mistakes that
might remain. They're all mine.
TABLE OF CONTENTS:
Chapter 1 What do we mean by the reliability of a measuring
instrument? Terminology Illustrative examples Necessity vs. sufficiency
Chapter 2 Measurement error. Attribute vs. variable When is something random? Obtained score, true
score, and error score Dunn's example Continuous vs. discrete variables
The controversial true score Some more thoughts about randomness
Chapter 3 Reliability theory (abridged, with examples). The basic concepts The first few axioms, definitions, and theorems A
hypothetical example A different approach Some other concepts and
terminology The key theorem A caution concerning parallelism and
reliability Truman Kelley on parallelism and reliability Examples (one
hypothetical, one real) Hypothetical data Real data Additional reading
Chapter 4 Attenuation. What happens, and why The "correction" What can go wrong? How many
ways are there to get a particular correlation between two variables?
The effect of attenuation on other statistics Additional reading Chapter
5 The interpretation of individual measurements. Back to our hypothetical example, and a little more theory How to
interpret an individual measurement Point estimation Interval estimation
Hypothesis testing Compounded measurement error Additional reading
Chapter 6 The reliability of difference scores. Types of difference scores The general case Measure-remeasure
differences Between-object differences Change scores Simple change
Controversy regarding the measurement of simple change Modified change
Percent change Weighted change Residual change Other difference scores
that are not change scores Inter-instrument differences Inter- and
intra-rater differences Our flow meter example (revisited) Additional
Chapter 7 The reliability of a single item. Single-item examples X, T, and E for single dichotomous items Some
approaches to the estimation of the reliability of single items The
Knapp method (and comparison to the phi coefficient) The Guttman method
Percent agreement and Cohen's kappa Spearman-Brown in reverse Visual
analog(ue) scales Additional reading
Chapter 8 The internal consistency of multi-item tests. A little history Kuder and Richardson Cronbach How many items? Factor
analysis and internal consistency reliability Inter-item and
item-to-total correlations Other approaches to internal consistency
Inter-rater reliability and intra-rater reliability Additional reading
Chapter 9 Intraclass correlations. The most useful one The one that equals Cronbach’s alpha Additional
Chapter 10 Two vexing problems. Absolute vs. relative agreement Mean and median absolute differences
Ordinal vs. interval measurement Kendall’s tau-b Goodman & Kruskal’s
gamma Williams’ method Back to John and Mary Additional reading
Chapter 11 Statistical inferences regarding instrument reliability. Parallel forms reliability coefficients Test-retest reliability
coefficients Intraclass correlations Coefficient alpha Cohen's kappa
Reliability and power Sample size for reliability studies The effect of
reliability on confidence intervals in general Our flow meter example
(re-revisited) Random samples vs. "convenience" samples Additional
Chapter 12 A very nice real-data example. Background and the study itself Over-all parallelism Over-all
reliability The 82 measurers Tidbits
Chapter 13 Special topics. Some other conceptualizations of reliability Generalizability theory
Item response theory Structural equation modeling Norm-referenced vs.
criterion-referenced reliability Unit-of-analysis problems Weighting
Missing-data problems Some miscellaneous educational testing examples
Some more esoteric contributions
Chapter 14 The reliability of claims
Appendix A The very nice data set
Appendix B The validity of measuring instruments
Appendix C The reliability and validity of birth and death
Appendix D The reliability and validity of height and weight
Appendix E The reliability and validity of the four gospels
Appendix F The reliability and validity of claims regarding the
effects of secondhand smoke