RRFSS Query System - Simple Analysis Simple Analysis | Indicators | Methodology | Help | Login | Exit

RRFSS Sample Design and Analysis

RRFSS is a series of ongoing monthly telephone surveys designed to monitor community trends in risk factors within the service area of participating health units.  The sample has been designed to represent the adult population 18 years and over, who speak either English or French and who reside in private households.  Note that occupied households without telephones are not included in the sample population, but according to Statistics Canada, these households are only about 3% of all Ontario households.

Choosing individuals in RRFSS basically follows a two-stage probability selection process.  The first stage involves the selection of households by randomly selecting residential telephone numbers.  A Random Digit Dialing (RDD) approach is used to select the phone numbers by randomly selecting from commercially available list of telephone numbers as well as using telephone numbers of either side of the listed numbers (to cater for numbers that might be unlisted or new).  The second stage, which is the disproportionate selection of an adult from a cluster of adult respondents in the household, is made by choosing the person with the most recent birthday.  Overall RRFSS can be considered as a disproportionate sample design within each public health unit.

Main Features of the Rapid Risk Factor Surveillance System

Objective

Cross-sectional estimates of risk factors for participating Ontario PHUs

Target

Persons 18+ living in private dwellings

Sample size

Approximately 100 per PHU per month

Sampling plan- two-stage cluster design (1st stage- households within PHU; 2nd stage-persons within household)

Frames for selecting households
Random digit dialing (RDD) from telephone lists plus telephone numbers on either side of numbers on lists

Selection of respondents
Select 1 adult using next birthday

 

Survey Analysis in RRFSS
Unweighted data in RRFSS are the actual responses of each participant. Unweighted data represent results before any adjustment is made either for variation in respondents' probability of selection, for disproportionate selection of population groups or subgroups relative to the overall population distribution, or for nonresponse. Weighted RRFSS data represent results that have been adjusted to compensate for such differences. 

 

As long as the sample weight has been computed, generating a point estimate (a single number that is the best estimate of the indicator) becomes a simple process of applying the appropriate sample design formula to the data.  Statistics can also be computed to assess the precision levels of the estimates, including the standard error (also referred to by ‘se’ and defined as the square root of sample variance), confidence intervals (range of values that describes the uncertainty around a point estimate) and the coefficient of variation (which measures the relative variability around a point estimate and is defined as the standard error divided by the point estimate).

 

Although standard data analysis software can be used to compute RRFSS point estimates, unless the survey design is a simple random sample (SRS), accurate generation of precision statistics requires appropriate survey analysis software that can accommodate complex survey designs.  In the absence of the survey analysis tools analysts can assume a simpler survey design or compute proxy measures for estimating the precision statistics.

When RRFSS commenced in 2001, the majority of data analysts did not have access to the survey analysis tools to compute accurate precision measures associated with the point estimates.  By assuming that the sample of respondents was representative of a random sample of 18+ individuals in the PHU, analysts applied standard formula for estimating the variance and other precision statistics.  These estimates are presented in the RQ System under Applying Simple Random Sampling.

 

More recently most of the popular statistical analysis programs, including SPSS which is the standard analysis tool for RRFSS analysts, have incorporated functionality for handling complex survey designs.  As a result some analysts have begun to apply this methodology for generating point estimates.  The RQ allows RRFSS analysts to access these estimates under Appling Complex Survey Sampling.

 

 

Methodology for Applying Simple Random Sampling

 

Household weights

This method of computing household level estimates is based on unweighted data so no household weights were generated.

 

Person weights

Formulation of person weights with this method is based on the principle that the number of observations in the unweighted data set must equal the observations in the weighted dataset.  The weights are calculated as follows:

 

·          Inclusion Probability
If we let:
nh
º # sample phone numbers selected from the h-th PHU
Ai º number of adults at each residence for the ith respondent, then

Prob(include individual in sample) = ( nh Ai /
S nh Ai )

·          Person Weight  wri = 1 / Prob(include individual in sample) 
wri = (
S nh Ai / nh Ai)

 

 

Point Estimates

 

The table below outlines the formula for computing point estimates for the weighted percentage and weighted total.

Formula for computing point estimates  - Applying Simple Random Sampling

Statistic

Formula

Description

Proportion (characteristic is either present (1) or absent (0))

S yi wi
____________
S wi

The sum of the product of each weight by the value of y divided by the sum of all the weights

 

 

Sample Variance

For a simple random sample, computation of the sample variance is given by (1-nh/Nh) s2/n where nh is the size of the sample, Nh is the size of the target population and s2 =p(1-p)/nh , if p is  the proportion as computed from the above formula.  The (1-nh/Nh) component of the sample variance formula is referred to as the finite population correction factor which can be omitted if nh is small relative to Nh.  There the sample variance formula for Applying Simple Random Sampling is estimated as s2/n .

 

 

Methodology for Applying Complex Survey Sampling

 

The steps involved in formulating of the sample weights for the Complex Survey Sampling method are outlined below for both households and persons.  In addition to formulating the basic sample weight, adjustments have been made for differential seasonal effects and for representation of the overall number of households or persons in the population (post-stratification).  The adjustments for seasonal effects have been integrated to ensure that each month (or survey wave) is represented by one twelfth (or the reciprocal of all available months) of the total sample.

 

·          Inclusion Probability
If we let:
nh
º # sample phone numbers selected from the h-th PHU
Nh
º # phone numbers on sample frame in the h-th PHU              
[approximated by number of households in last census]

Prob(include individual’s residence) = ( nh / Nh )

If
Ai º number of adults at each residence for the ith respondent, then
Prob(include individual within the individual’s residence) = 1 / Ai , and
Prob(include individual within h-th PHU)  = ( nh / Nh ) * ( 1 / Ai )


·          Initial Household Weight  whi = 1 / Prob(include residence within h-th PHU)  
whi = ( Nh / nh )

Initial Person Weight  whi = 1 / Prob(include individual within h-th PHU) 
wpi = ( Nh / nh ) * ( Ai )


Note that for the Ottawa PHU households were divided into a French and non-French strata.

·         Weight adjustment for seasonal effect – household: whi* , person: wpi*
If  s  represents one of twelve seasonal periods (survey wave), then



whi*=

12 x Ss whi

whi

S whi




Wpi*=

12 x Ss wpi

wpi

S wpi

 

 

·         Post stratified weights

Household weight whi**
If Nw
º Sum of seasonally adjusted household weights (S whi*) for the PHU,  then


whi**=

Nh

whi*

Nw







Note: Total occupied private dwellings from 2001 census was used to estimate total households within PHU for 2001-2005 and total occupied private dwellings from 2006 census was used to estimate total households within PHU for 2006-2007.

Post stratified person weight wpi** for each sex, age group: 18-44, 45-64, 65+, and for Ottawa French and non-French speaking strata.

If
Mp
º Population estimate for the PHU, sex and age group of i-th respondent,  and
Mp º Sum of seasonally adjusted weights (S wpi*) for the PHU, sex and age group of i-th respondent,  then


wpi**=

Mp

wpi*

Mw







 

 

Estimate of variance

Variance estimates of RRFSS indicators have been computed using Taylor’s Series Linearization.  The approximate formula for the variance of the mean (ignoring the finite population correction factor) is given below:

 

 

 

 

vL(ý)=

H

Sh=1

nh

nh

Si=1

(y hiý h.)2

nh - 1

 

where for the h-th PHU, n-th household, i-th respondent, total of H PHU’s, total sample of nh households in the h-th PHU, and ..

 

y hi =

whi (yhi - ý.. ) / w..

ý h.=

nh

Si=1

whi y hi   / nh. 

ý.. =

( H

Sh=1

nh

Si=1

whi yhi ) / w..

w..=

H

Sh=1

nh

Si=1

whi

 

 

 

 

 

 

 

 

 

 

RRFSS General Guidelines for Analysis

 

These guidelines are based on the analysis of the 1999 RRFSS pilot project, the 2001 RRFSS data and the general knowledge, experience and technical expertise of the RRFSS Analysis Group. 

 

1.        Unweighted denominator data – cell sizes less than 30 should be suppressed.

2.        Unweighted numerator data – cell sizes less than 5 should be suppressed.

3.        The following categories for Coefficient of variation (CV) determine the reliability of the estimates:

a.        CV between 0 and 16.5% are deemed to be acceptable for reporting

b.        CV between 16.6% and 33.3% are to be ‘Interpreted with caution’

c.        CV greater than 33.3% should be suppressed.

4.        95% confidence intervals (CI) should accompany all point estimates.  If we define the percentage of a particular characteristic as p, then the 95% CI can be computed as
p ± 1.96 * se .

5.        If weighted cell sizes of the ‘Don’t know’ or ‘Refusal’ responses are 5% or greater then these responses should be included in the analysis and separately reported.