Statistical Services Offered

Back to List of Services

Longitudinal Data Analysis

The essential feature of longitudinal data is that repeated measurements are taken on an experimental unit (or subject) over time. Special methods of statistical analysis are needed for longitudinal data because the set of measurements on one subject tends to be correlated, because measurements on the same subject close in time tend to be more highly correlated than measurements far apart in time, and because the variances of longitudinal data often are not constant over time.

Longitudinal data sets differ from time series data sets because longitudinal data usually consist of a large number of a short series of time points. In contrast, time series data sets usually consist of a single, long series of time points. For example, the monthly average of the Dow Jones Industrials Index for several years is a time series data set, while measurements over time of the effectiveness of a drug treatment for several patients are a longitudinal data set.

Longitudinal data analysis examines and compares responses over time. The longitudinal data model has the capability to study changes over time within subjects and changes between groups. For example, longitudinal models can estimate individual-level (subject-specific) regression parameters and population-level regression parameters.

The linear mixed model provides a flexible approach to modeling longitudinal data. It handles unbalanced data with unequally spaced time points and subjects observed at different time points, it uses all the available data in the analysis, it directly models the covariance structure, and it provides valid standard errors and efficient statistical tests.

If the model is correctly specified including correct selection of the covariance structure, violation of the normality assumption of the random effects has little effect on the estimation of the fixed effect parameter estimates and their standard errors. However, it has a substantial effect on the estimation of the random effect parameter estimates and their standard errors.

Longitudinal models fit in the Mixed procedure are based on the assumption that the responses are normally distributed. However, the assumption of normality may not always be valid, especially when the response variable is discrete. A method which is widely used to handle correlated discrete response data is Generalized Estimating Equations (GEEs). This relatively new method is suitable for modeling longitudinal data with response variables which are, for example, binary or are discrete counts. GEEs are fit in SAS with the GenMod procedure.