Thumbnail Image

Modeling early alcohol initiation: A comparison of linear regression, logistic regression, and discrete time hazard models

Danelia, Ketevan
Scope and Method of Study: In social science research there is often a need to study the occurrence of a rare event whose distribution is not normal and whose data structure is nested. Common statistical methods for these questions require either the violation of important statistical assumptions or the mishandling of missing data. For data that involve whether an event occurs and when it occurs, the most appropriate statistical model are discrete-time hazard models. However, until recently a method that uses discrete-time hazard models and appropriately adjusts the standard errors to account for the nested structure of the data did not exist. The present study develops three models that combine discrete-time hazard models and hierarchical linear modeling, to model Age of First Use of alcohol, and compares and contrasts these models with more commonly used multiple regression and logistic regression models. To illustrate the advantages of this method, the study evaluates the effects of several common covariates of alcohol use, such as Age of First Opportunity (AFO) of using alcohol, Family Attention (FA), Externalizing Behavior (EXT), Socioeconomic Status (SES), and Gender in a sample of 1785 youth from Caracas, Venezuela.
Findings and Conclusions: Age of first opportunity of using alcohol appears to be the most influential variable in the models. The highest hazard rate of alcohol initiation was found at the first year of opportunity to use alcohol. The results obtained in this study varied across models depending on whether or not AFO was included in models as a covariate. When models did not control for AFO all other independent variables of this study become significant predictors of alcohol initiation in all models except for the logistic regression model where controlling for AFO did not make statistically significant differences in predicting alcohol use. Even though all models considered in the present study have their own advantages, hazard models are seen as the most appropriate in modeling age of first alcohol use. The main advantages of hazard models is in their ability to handle a particular kind of missing data called right censoring, such as youth who report delaying their initiation of alcohol use for all years covered in a given study. In investigating alcohol initiation, only about 18% reported no use of alcohol in this study, but when investigating illicit drugs, many more participants will be in a no-user group. For modeling early ages of drug initiation or any other event occurrence, when a vast majority of participants have not yet experienced it, hazard models should be used.