Faculty in the Department of Biostatistics & Informatics carry out research to develop, evaluate, and improve statistical methods for designing and analyzing health care studies. This research is often motivated by work with co-investigators doing clinical or public health research who encounter design or analysis questions that have not been adequately addressed in existing research literature. The following are some of the areas in which we work.
New technologies in the 'omics field have changed the landscape of biomedical research in the last twenty years. Data generated from a variety of these high-throughput assays (microarrays, sequencing, mass-spectometry, etc.) pose interesting analytical challenges due to the high-dimensionality, large volume, and complex structure of the data. Many of our faculty are involved with developing methods from all stages of the omics data life-cycle, including study design, data generation, quality control, pre-processing, modeling, analysis, interpretation, archiving, data dissemination and software development. Depending on the question, statistical and computational approaches include machine learning, high-dimensional methods, latent variable models, statistical genetics, genetic risk prediction, sub-typing, causal modeling, data integration and non-parametric methods. Faculty work closely with core facilities generating the data and a variety of biomedical investigators on campus and beyond who provide interesting applications for specific systems and diseases.
Observations in health sciences studies are often made sequentially on subjects, resulting in temporal patterns and/or serial correlation. Studies also commonly involve subjects treated at the same hospital or by the same provider, which gives rise to similarities of those subjects. These situations require special statistical methods to maintain validity in situations of dependent observations and to learn about these sources of variation. In many cases multiple outcomes are also of interest. Faculty members are involved in developing new statistical methods for many such situations, such as where patterns of missing data depend on outcomes; complex temporal patterns in longitudinal data subjects; use of surrogate measures when true covariates are not observable; and jointly modeled longitudinal measures and time to event outcomes.
Estimating the number of subjects required to conduct a sensitive, efficient and ethical study is an extremely common and important question in health care research. Faculty are developing methods to carry out such estimation in situations where the study design is complex, for example involving longitudinal measurement of subjects over time, or subjects clustered within sites or clinicians. Some current faculty research can be found at SampleSizeShop.org, which provides all researchers, including behavioral and social scientists, with free, open-source, peer-reviewed power and sample size software, tutorials, and educational materials.
Outcomes in biomedical and public health studies are often more complex than simple numerical measures, and normality assumptions are not appropriate. When time to an event such as death, admission to hospital, or recurrence of disease is of interest, models are available to account for censoring (when only ranges of event times are known). These models have been extended to events that can occur more than once, a situation that is common in clinical medicine. Faculty members are studying situations when only counts of events in intervals are available, and implications of this limitation on required sample sizes. Other examples of statistical methods for non-normal outcomes being studied by faculty include numbers of bacteria of different species in samples of lung microbiome, and cost of health care in multiple cost categories or when cost may be zero.
Exploring multiple types of imaging data, data processing algorithms, and new data analysis approaches for imaging data, multiple types of imaging data, data processing algorithms, and new data analysis approaches for imaging data.
Various aspects of study design and analysis necessary for moving beyond correlation and into causation.
Distributions of variables and relationships among variables in health sciences studies are often more complex than can be captured with traditional parametric approaches. Non-parametric and semi-parametric methods relax some of the assumptions of parametric approaches. For non-parametric approaches, a distribution is not implicitly assumed in order to perform inference. This can be useful when a distribution is not known or is difficult to work with, situations that are especially common with biological data such as microbiome or genomic data. Common non-parametric approaches include permutation, bootstrapping and other resampling based methods. In other situations trends or relationships may be complex and nonlinear, in which case semi-parametric methods such as smoothing or splines are useful. Faculty are developing methods for such situations as drop-out patterns that depend in complex ways on patients’ health status in HIV/AIDS studies.
All faculty in the department collaborate with groups on and off campus to inspire and better equip researchers to implement appropriate application of statistical methodology and interpretation of results. Here are some specific areas and groups we collaborate with.