Statistical Services Offered

Back to List of Services

Design and Analysis of Probability Surveys

A survey is the process of collecting data from a sample of a defined population for the purpose of estimating attributes about that population. A sampling population is developed based on a target population, and a sampling frame identifies all units in the sampling population.

Simple Random Sampling (SRS) is an equal-probability design where all sampling population units have equal probability (n/N) of selection (where n is the sample size and N the sampling population size). SRS continues to be an extremely useful and commonly used design. When auxiliary information is available about sampling unit frames (e.g., a stratifying variable), more efficient sampling designs may be more appropriate (e.g., stratified random sampling). However, SRS is the lingua franca in situations where data analysts will be using standard inferential methods (that assume SRS) and where the collected survey data will be analyzed by statisticians from different organizations, professions, or interests.

Stratified Random Sampling (STR) uses an auxiliary (or stratifying) variable, e.g., income or education levels, to divide the units in the sampling population into non-overlapping groups and then selects a probability sample from each of the groups. Stratification can lower the required total sample size. STR is appropriate when positive samples sizes and/or estimates for identifiable subpopulations are needed, when it is necessary to control data collection costs or workloads for identifiable subpopulations, and when increased precision from using homogeneous strata is desired.

Cluster Sampling is designed to reduce data collection costs by selecting units that are close to each other. This is accomplished by first selecting clusters of units and then selecting individual units within the selected clusters. For household surveys, it is common to sample clusters of households by first selecting city blocks or census blocks and then sampling households within the selected blocks. This greatly reduces the travel costs per interview, thereby reducing the total cost of the survey.

In the first stage of two-stage sampling, clusters or primary sampling units (PSUs) are selected. In the second stage, elements are randomly selected, but only from the PSUs selected in the first stage. Elements within PSUs are called secondary sampling units (SSUs). Examples of PSU/SSU combinations are: City Block/Household, Household/Family Member, and Clinic/Patient.