|
RRFSS Sample Design and Analysis
RRFSS is a series of ongoing
monthly telephone surveys designed to monitor community trends in risk
factors within the service area of participating health units. The sample has been designed to represent
the adult population 18 years and over, who speak either English or French
and who reside in private households. Note that occupied households
without telephones are not included in the sample population, but according
to Statistics Canada, these households are only about 3% of all Ontario households.
Choosing individuals in RRFSS
basically follows a two-stage probability selection process. The first
stage involves the selection of households by randomly selecting residential
telephone numbers. A Random Digit Dialing (RDD) approach is used to
select the phone numbers by randomly selecting from commercially available
list of telephone numbers as well as using telephone numbers of either side
of the listed numbers (to cater for numbers that might be unlisted or new).
The second stage, which is the disproportionate selection of an adult from a
cluster of adult respondents in the household, is made by choosing the person
with the most recent birthday. Overall RRFSS can be considered as a
disproportionate sample design within each public health unit.
|
Main Features of the Rapid Risk Factor
Surveillance System
|
|
Objective
|
Cross-sectional
estimates of risk factors for participating Ontario PHUs
|
|
Target
|
Persons 18+ living
in private dwellings
|
|
Sample size
|
Approximately 100
per PHU per month
|
|
Sampling plan-
two-stage cluster design (1st stage- households within PHU; 2nd
stage-persons within household)
|
Frames for
selecting households
Random digit dialing (RDD) from telephone lists plus telephone numbers on
either side of numbers on lists
Selection of
respondents
Select 1 adult using next birthday
|
Survey Analysis in RRFSS
Unweighted data in RRFSS are the actual responses
of each participant. Unweighted data represent
results before any adjustment is made either for variation in respondents'
probability of selection, for disproportionate selection of population groups
or subgroups relative to the overall population distribution, or for nonresponse. Weighted RRFSS data represent results that
have been adjusted to compensate for such differences.
As long
as the sample weight has been computed, generating a point estimate (a single
number that is the best estimate of the indicator) becomes a simple process
of applying the appropriate sample design formula to the data. Statistics can also be computed to assess
the precision levels of the estimates, including the standard error (also
referred to by ‘se’ and defined as the square root of sample variance),
confidence intervals (range of values that describes the uncertainty around a
point estimate) and the coefficient of variation (which measures the relative
variability around a point estimate and is defined as the standard error
divided by the point estimate).
Although
standard data analysis software can be used to compute RRFSS point estimates,
unless the survey design is a simple random sample (SRS), accurate generation
of precision statistics requires appropriate survey analysis software that can
accommodate complex survey designs. In
the absence of the survey analysis tools analysts can assume a simpler survey
design or compute proxy measures for estimating the precision statistics.
When
RRFSS commenced in 2001, the majority of data analysts did not have access to
the survey analysis tools to compute accurate precision measures associated
with the point estimates. By assuming
that the sample of respondents was representative of a random sample of 18+
individuals in the PHU, analysts applied standard formula for estimating the
variance and other precision statistics.
These estimates are presented in the RQ System under Applying Simple
Random Sampling.
More
recently most of the popular statistical analysis programs, including SPSS
which is the standard analysis tool for RRFSS analysts, have incorporated
functionality for handling complex survey designs. As a result some analysts have begun to
apply this methodology for generating point estimates. The RQ allows RRFSS analysts to access
these estimates under Appling Complex Survey Sampling.
Methodology for Applying Simple
Random Sampling
Household
weights
This
method of computing household level estimates is based on unweighted
data so no household weights were generated.
Person
weights
Formulation
of person weights with this method is based on the principle that the number
of observations in the unweighted data set must
equal the observations in the weighted dataset. The weights are calculated as follows:
·
Inclusion Probability
If we let:
nh º # sample phone
numbers selected from the h-th PHU
Ai º number of
adults at each residence for the ith respondent, then
Prob(include individual in sample) = ( nh Ai / S
nh Ai
)
·
Person Weight wri = 1 / Prob(include individual in
sample)
wri = (S
nh Ai
/ nh Ai)
Point
Estimates
The
table below outlines the formula for computing point estimates for the
weighted percentage and weighted total.
|
Formula for computing point estimates - Applying Simple Random Sampling
|
|
Statistic
|
Formula
|
Description
|
|
Proportion
(characteristic is either present (1) or absent (0))
|
S yi wi
____________
S wi
|
The sum of the
product of each weight by the value of y divided by the sum of all the
weights
|
Sample
Variance
For a
simple random sample, computation of the sample variance is given by (1-nh/Nh)
s2/n where nh is the size of
the sample, Nh is the size of the target
population and s2 =p(1-p)/nh , if p is
the proportion as computed from the above formula. The (1-nh/Nh) component
of the sample variance formula is referred to as the finite population
correction factor which can be omitted if nh
is small relative to Nh. There the sample variance formula for
Applying Simple Random Sampling is estimated as s2/n .
Methodology for Applying Complex
Survey Sampling
The steps
involved in formulating of the sample weights for the Complex Survey Sampling method are outlined below for both households
and persons. In addition to formulating
the basic sample weight, adjustments have been made for differential seasonal
effects and for representation of the overall number of households or persons
in the population (post-stratification).
The adjustments for seasonal effects have been integrated to ensure
that each month (or survey wave) is represented by one twelfth (or the
reciprocal of all available months) of the total sample.
·
Inclusion Probability
If we let:
nh º # sample phone
numbers selected from the h-th PHU
Nh º # phone numbers on
sample frame in the h-th PHU
[approximated by number of households in last census]
Prob(include individual’s residence) = ( nh / Nh
)
If Ai º number of
adults at each residence for the ith respondent, then
Prob(include individual
within the individual’s residence) = 1 / Ai , and
Prob(include individual within h-th PHU) = ( nh / Nh
) * ( 1 / Ai )
·
Initial Household Weight whi
= 1 / Prob(include residence
within h-th PHU)
whi = ( Nh
/ nh )
Initial Person Weight whi = 1 / Prob(include individual within
h-th PHU)
wpi = ( Nh
/ nh ) * ( Ai )
Note that for the Ottawa
PHU households were divided into a French and
non-French strata.
·
Weight adjustment for seasonal effect –
household: whi* , person: wpi*
If s
represents one of twelve seasonal periods (survey wave), then
|
whi*=
|
12 x Ss whi
|
whi
|
|
S whi
|
|
Wpi*=
|
12 x Ss wpi
|
wpi
|
|
S wpi
|
·
Post
stratified weights
Household weight whi**
If Nw
º
Sum of seasonally adjusted household weights (S whi*) for the PHU, then
Note: Total occupied
private dwellings from 2001 census was used to estimate total households
within PHU for 2001-2005 and total occupied private dwellings from 2006
census was used to estimate total households within PHU for 2006-2007.
Post stratified person weight wpi** for
each sex, age group: 18-44,
45-64, 65+, and for Ottawa French and non-French speaking strata.
If Mp º
Population estimate for the PHU, sex and age group of i-th
respondent, and
Mp º
Sum of seasonally adjusted weights (S wpi*) for the PHU, sex and age group of i-th
respondent, then
Estimate of variance
Variance estimates of RRFSS indicators have been computed
using Taylor’s
Series Linearization. The approximate
formula for the variance of the mean (ignoring the finite population
correction factor) is given below:
|
|
|
|
|
vL(ý)=
|
H
|
Sh=1
|
nh
|
nh
|
Si=1
|
(y hi – ý h.)2
|
|
nh - 1
|
where for the h-th PHU, n-th household, i-th respondent,
total of H PHU’s, total sample of nh households in the h-th
PHU, and ..
|
y hi =
|
whi (yhi
- ý.. ) / w..
|
|
ý h.=
|
nh
|
Si=1
|
whi y hi / nh.
|
|
ý.. =
|
( H
|
Sh=1
|
nh
|
Si=1
|
whi yhi
) / w..
|
|
w..=
|
H
|
Sh=1
|
nh
|
Si=1
|
whi
|
|
|
|
|
|
|
|
|
|
RRFSS General
Guidelines for Analysis
These guidelines are based on the analysis of the 1999
RRFSS pilot project, the 2001 RRFSS data and the general knowledge,
experience and technical expertise of the RRFSS Analysis Group.
1.
Unweighted denominator data – cell sizes less than 30 should be
suppressed.
2.
Unweighted numerator data – cell sizes less than 5 should be
suppressed.
3.
The
following categories for Coefficient of variation (CV) determine the reliability
of the estimates:
a.
CV
between 0 and 16.5% are deemed to be acceptable for reporting
b.
CV
between 16.6% and 33.3% are to be ‘Interpreted with caution’
c.
CV
greater than 33.3% should be suppressed.
4.
95%
confidence intervals (CI) should accompany all point estimates. If we define the percentage of a particular
characteristic as p, then the 95% CI can be computed as
p ± 1.96 * se .
5.
If
weighted cell sizes of the ‘Don’t know’ or ‘Refusal’ responses are 5% or
greater then these responses should be included in the analysis and
separately reported.
|