2024-03-28T09:34:15Z
http://biostats.bepress.com/do/oai/
oai:biostats.bepress.com:harvardbiostat-1000
2003-09-10T18:21:12Z
publication:harvardbiostat
Nonparametric Comparison of Two Survival-Time Distributions in the Presence of Dependent Censoring
DiRienzo, Greg
2003-09-10T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper1
https://biostats.bepress.com/context/harvardbiostat/article/1000/viewcontent/biom__497_504_.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:harvardbiostat-1003
2003-10-16T18:38:30Z
publication:harvardbiostat
The Effects of Misspecifying Cox's Regression Model on Randomized Treatment Group Comparisons
DiRienzo, Greg
2003-10-16T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper3
https://biostats.bepress.com/context/harvardbiostat/article/1003/viewcontent/HS23022.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:harvardbiostat-1011
2003-12-01T17:10:11Z
publication:harvardbiostat
Empirical and Kernel Estimation of Covariate Distribution Conditional on Survival Time
Li, Xiaochun
Xu, Ronghui
2003-12-01T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper11
https://biostats.bepress.com/context/harvardbiostat/article/1011/viewcontent/epress.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:harvardbiostat-1016
2004-10-15T14:05:56Z
publication:harvardbiostat
A Robust Regression Model for a First-Order Autoregressive Time Series with Unequal Spacing: Technical Report
Houseman, E. Andres
2004-10-15T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper16
https://biostats.bepress.com/context/harvardbiostat/article/1016/viewcontent/robustARTechReport1.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:harvardbiostat-1002
2003-10-16T16:50:54Z
publication:harvardbiostat
Nonparametric Methods to predict HIV drug susceptibility phenotype from genotype
DiRienzo, Greg
2003-10-16T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper2
https://biostats.bepress.com/context/harvardbiostat/article/1002/viewcontent/StMed_article.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:harvardbiostat-1018
2004-10-18T20:17:58Z
publication:harvardbiostat
A Functional-Based Distribution Diagnostic for a Linear Model with Correlated Outcomes: Technical Report
Houseman, E. Andres
Coull, Brent
Ryan, Louise
Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed modesl and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated marginal residual vector by the Cholesky decomposition of the inverse of the estimated marginal variance matrix. Linear functions or the resulting "rotated" residuals are used to construct an empirical cumulative distribution function (ECDF), whose stochastic limit is characterized. We describe a resampling technique that serves as a computationally efficient parametric bootstrap for generating representatives of the stochastic limit of the ECDF. Through functionals, such representatives are used to construct global tests for the hypothesis of normal margional errors. In addition, we demonstrate that the ECDF of the predicted random effects, as described by Lange and Ryan (1989), can be formulated as a special case of our approach. Thus, our method supports both omnibus and directed tests. Our method works well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series).
2004-10-18T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper18
https://biostats.bepress.com/context/harvardbiostat/article/1018/viewcontent/hcr2004.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1037
2004-04-23T19:21:10Z
publication:jhubiostat
Point Process Methodology for On-line Spatio-temporal Disease Surveillance
Diggle, Peter J.
Rowlingson, Barry
Su, Ting-li
The AEGISS (Ascertainment and Enhancement of Gastrointestinal Infection Surveillance and Statistics) project aims to use spatio-temporal statistical methods to identify anomalies in the space-time distribution of non-specific, gastrointestinal infections in the UK, using the Southampton area in southern England as a test-case. In this paper, we use the AEGISS project to illustrate how spatio-temporal point process methodology can be used in the development of a rapid-response, spatial surveillance system.
Current surveillance of gastroenteric disease in the UK relies on general practitioners reporting cases of suspected food-poisoning through a statutory notification scheme, voluntary laboratory reports of the isolation of gastrointestinal pathogens and standard reports of general outbreaks of infectious intestinal disease by public health and environmental health authorities. However, most statutory notifications are made only after a laboratory reports the isolation of a gastrointestinal pathogen. As a result, detection is delayed and the ability to react to an emerging outbreak is reduced. For more detailed discussion, see Diggle et al. (2003).
A new and potentially valuable source of data on the incidence of non-specific gastro-enteric infections in the UK is NHS Direct, a 24-hour phone-in clinical advice service. NHS Direct data are less likely than reports by general practitioners to suffer from spatially and temporally localized inconsistencies in reporting rates. Also, reporting delays by patients are likely to be reduced, as no appointments are needed. Against this, NHS Direct data sacrifice specificity. Each call to NHS Direct is classified only according to the general pattern of reported symptoms (Cooper et al, 2003). The current paper focuses on the use of spatio-temporal statistical analysis for early detection of unexplained variation in the spatio-temporal incidence of non-specific gastroenteric symptoms, as reported to NHS Direct. Section 2 describes our statistical formulation of this problem, the nature of the available data and our approach to predictive inference. Section 3 describes the stochastic model. Section 4 gives the results of fitting the model to NHS Direct data. Section 5 shows how the model is used for spatio-temporal prediction. The paper concludes with a short discussion.
2004-02-17T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper37
https://biostats.bepress.com/context/jhubiostat/article/1037/viewcontent/diggle_1.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1110
2008-02-20T17:17:30Z
publication:jhubiostat
MULTIPLE DISEASES IN CARRIER PROBABILITY ESTIMATION: ACCOUNTING FOR SURVIVING ALL CANCERS OTHER THAN BREAST AND OVARY IN BRCAPRO
Katki, Hormuzd A
Blackford, Amanda
Chen, Sining
Parmigiani, Giovanni
Mendelian models can predict who carries an inherited deleterious mutation of known disease genes based on family history. For example, the BRCAPRO model is commonly used to identify families who carry mutations of BRCA1 and BRCA2, based on familial breast and ovarian cancers. These models incorporate the age of diagnosis of diseases in relatives and current age or age of death. We develop a rigorous foundation for handling multiple diseases with censoring. We prove that any disease unrelated to mutations can be excluded from the model, unless it is sufficiently common and dependent on a mutation-related disease time. Furthermore, if a family member has a disease with higher probability density among mutation carriers, but the model does not account for it, then the carrier probability is deflated. However, even if a family only has diseases the model accounts for, if the model excludes a mutation-related disease, then the carrier probability will be inflated. In light of these results, we extend BRCAPRO to account for surviving all non-breast/ovary cancers as a single outcome. The extension also enables BRCAPRO to extract more useful information from male relatives. Using 1500 familes from the Cancer Genetics Network, accounting for surviving other cancers improves BRCAPRO’s concordance index from 0.758 to 0.762 (p = 0.046), improves its positive predictive value from 35% to 39% (p < 10−6) without impacting its negative predictive value, and improves its overall calibration, although calibration slightly worsens for those with carrier probability < 10%. Copyright c 2000 John Wiley & Sons, Ltd.
2007-02-20T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper110
https://biostats.bepress.com/context/jhubiostat/article/1110/viewcontent/StatMedCompRisk.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
Mendelian models; Competing risks; Risk assessment; Mendelian mutation prediction models; BRCA1
BRCA2; MMRpro
Genetics
Genetics
oai:biostats.bepress.com:jhubiostat-1071
2004-12-27T19:39:43Z
publication:jhubiostat
Multiple Lab Comparison of Microarray Platforms
Irizarry et al., Rafael A
Microarray technology is a powerful tool able to measure RNA expression for thousands of genes at once. Various studies have been published comparing competing platforms with mixed results: some find agreement, others do not. As the number of researchers starting to use microarrays and the number of crossplatform meta-analysis studies rapidly increase, appropriate platform assessments become more important.
Here we present results from a comparison study that offers important improvements over those previously described in the literature. In particular, we notice that none of the previously published papers consider differences between labs. For this paper, a consortium of ten labs from the Washington DC/Baltimore (USA) area was formed to compare three heavily used platforms using identical RNA samples: Appropriate statistical analysis demonstrates that relatively large differences exist between labs using the same platform, but that the results from the best performing labs agree rather well. Supplemental material is available from http://www.biostat.jhsph.edu/~ririzarr/techcomp/
2004-11-30T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper71
https://biostats.bepress.com/context/jhubiostat/article/1071/viewcontent/natmeth_12_27.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1105
2006-03-27T16:41:17Z
publication:jhubiostat
POOR PERFORMANCE OF BOOTSTRAP CONFIDENCE INTERVALS FOR THE LOCATION OF A QUANTITATIVE TRAIT LOUCS
Manichaikul, Ani
Dupuis, Josee
Sen, Saunak
Broman, Karl W
The aim of many genetic studies is to locate the genomic regions (called quantitative trait loci, QTLs) that contribute to variation in a quantitative trait (such as body weight). Confidence intervals for the locations of QTLs are particularly important for the design of further experiments to identify the gene or genes responsible for the effect. Likelihood support intervals are the most widely used method to obtain confidence intervals for QTL location, but the non-parametric bootstrap has also been recommended. Through extensive computer simulation, we show that bootstrap confidence intervals are poorly behaved and so should not be used in this context. The profile likelihood (or LOD curve) for QTL location has a tendency to peak at genetic markers, and so the distribution of the maximum likelihood estimate (MLE) of QTL location has the unusual feature of point masses at genetic markers; this contributes to the poor behavior of the bootstrap. Likelihood support intervals and approximate Bayes credible intervals, on the other hand, are shown to behave appropriately.
2006-03-24T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper105
https://biostats.bepress.com/context/jhubiostat/article/1105/viewcontent/qtlboot_v9.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
QTL; LOD support intervals; Confidence intervals; Bootstrap
Bayes credible intervals
Genetics
Genetics
oai:biostats.bepress.com:jhubiostat-1032
2004-03-02T17:53:03Z
publication:jhubiostat
A Cox Model for Biostatistics of the Future
Zeger, Scott L.
Diggle, Peter J.
Liang, Kung-Yee
Professor Sir David R. Cox (DRC) is widely acknowledged as among the most important scientists of the second half of the twentieth century. He inherited the mantle of statistical science from Pearson and Fisher, advanced their ideas, and translated statistical theory into practice so as to forever change the application of statistics in many fields, but especially biology and medicine. The logistic and proportional hazards models he substantially developed, are arguably among the most influential biostatistical methods in current practice.
This paper looks forward over the period from DRC's 80th to 90th birthdays, to speculate about the future of biostatistics, drawing lessons from DRC's contributions along the way. We consider "Cox's model" of biostatistics, an approach to statistical science that: formulates scientific questions or quantities in terms of parameters gamma in probability models f(y; gamma) that represent in a parsimonious fashion, the underlying scientific mechanisms (Cox, 1997); partition the parameters gamma = theta, eta into a subset of interest theta and other "nuisance parameters" eta necessary to complete the probability distribution (Cox and Hinkley, 1974); develops methods of inference about the scientific quantities that depend as little as possible upon the nuisance parameters (Barndorff-Nielsen and Cox, 1989); and thinks critically about the appropriate conditional distribution on which to base infrences.
We briefly review exciting biomedical and public health challenges that are capable of driving statistical developments in the next decade. We discuss the statistical models and model-based inferences central to the CM approach, contrasting them with computationally-intensive strategies for prediction and inference advocated by Breiman and others (e.g. Breiman, 2001) and to more traditional design-based methods of inference (Fisher, 1935). We discuss the hierarchical (multi-level) model as an example of the future challanges and opportunities for model-based inference. We then consider the role of conditional inference, a second key element of the CM. Recent examples from genetics are used to illustrate these ideas. Finally, the paper examines causal inference and statistical computing, two other topics we believe will be central to biostatistics research and practice in the coming decade. Throughout the paper, we attempt to indicate how DRC's work and the "Cox Model" have set a standard of excellence to which all can aspire in the future.
2004-03-02T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper32
https://biostats.bepress.com/context/jhubiostat/article/1032/viewcontent/new.paper.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1063
2004-11-22T17:03:31Z
publication:jhubiostat
Squared Extrapolation Methods (SQUAREM): A New Class of Simple and Efficient Numerical Schemes for Accelerating the Convergence of the EM Algorithm
Varadhan, Ravi
Roland, Ch.
We derive a new class of iterative schemes for accelerating the convergence of the EM algorithm, by exploiting the connection between fixed point iterations and extrapolation methods. First, we present a general formulation of one-step iterative schemes, which are obtained by cycling with the extrapolation methods. We, then square the one-step schemes to obtain the new class of methods, which we call SQUAREM. Squaring a one-step iterative scheme is simply applying it twice within each cycle of the extrapolation method. Here we focus on the first order or rank-one extrapolation methods for two reasons, (1) simplicity, and (2) computational efficiency. In particular, we study two first order extrapolation methods, the reduced rank extrapolation (RRE1) and minimal polynomial extrapolation (MPE1). The convergence of the new schemes, both one-step and squared, is non-monotonic with respect to the residual norm. The first order one-step and SQUAREM schemes are linearly convergent, like the EM algorithm but they have a faster rate of convergence. We demonstrate, through five different examples, the effectiveness of the first order SQUAREM schemes, SqRRE1 and SqMPE1, in accelerating the EM algorithm. The SQUAREM schemes are also shown to be vastly superior to their one-step counterparts, RRE1 and MPE1, in terms of computational efficiency. The proposed extrapolation schemes can fail due to the numerical problems of stagnation and near breakdown. We have developed a new hybrid iterative scheme that combines the RRE1 and MPE1 schemes in such a manner that it overcomes both stagnation and near breakdown. The squared first order hybrid scheme, SqHyb1, emerges as the iterative scheme of choice based on our numerical experiments. It combines the fast convergence of the SqMPE1, while avoiding near breakdowns, with the stability of SqRRE1, while avoiding stagnations. The SQUAREM methods can be incorporated very easily into an existing EM algorithm. They only require the basic EM step for their implementation and do not require any other auxiliary quantities such as the complete data log likelihood, and its gradient or hessian. They are an attractive option in problems with a very large number of parameters, and in problems where the statistical model is complex, the EM algorithm is slow and each EM step is computationally demanding.
2004-11-19T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper63
https://biostats.bepress.com/context/jhubiostat/article/1063/viewcontent/Ravi_ms.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1095
2006-03-29T20:00:02Z
publication:jhubiostat
THE ROLE OF AN EXPLICIT CAUSAL FRAMEWORK IN AFFECTED SIB PAIR DESIGNS WITH COVARIATES
Frangakis, Constantine E.
Li, Fan
Doan, Betty Q.
The affected sib/relative pair (ASP/ARP) design is often used with covariates to find genes that can cause a disease in pathways other than through those covariates. However, such "covariates" can themselves have genetic determinants, and the validity of existing methods has so far only been argued under implicit assumptions. We propose an explicit causal formulation of the problem using potential outcomes and principal stratification. The general role of this formulation is to identify and separate the meaning of the different assumptions that can provide valid causal inference in linkage analysis. This separation helps to (a) develop better methods under explicit assumptions, and (b) show the different ways in which these assumptions can fail, which is necessary for developing further specific designs to test these assumptions and confirm or improve the inference. Using this formulation in the specific problem above, we show that, when the "covariate" (e.g., addiction to smoking) also has genetic determinants, then existing methods, including those previously thought as valid, can declare linkage between the disease and marker loci even when no such linkage exists. We also introduce design strategies to address the problem.
2005-12-01T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper95
https://biostats.bepress.com/context/jhubiostat/article/1095/viewcontent/paper_dec2_2005.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
Affected sib pairs; Causal effects; Genetic linkage; Partially controlled studies; Potential outcomes; Principal stratification
Genetics
Genetics
oai:biostats.bepress.com:jhubiostat-1125
2006-12-05T17:03:19Z
publication:jhubiostat
USE OF HIDDEN MARKOV MODELS FOR QTL MAPPING
Broman, Karl W
An important aspect of the QTL mapping problem is the treatment of missing genotype data. If complete genotype data were available, QTL mapping would reduce to the problem of model selection in linear regression. However, in the consideration of loci in the intervals between the available genetic markers, genotype data is inherently missing. Even at the typed genetic markers, genotype data is seldom complete, as a result of failures in the genotyping assays or for the sake of economy (for example, in the case of selective genotyping, where only individuals with extreme phenotypes are genotyped). We discuss the use of algorithms developed for hidden Markov models (HMMs) to deal with the missing genotype data problem.
2006-12-05T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper125
https://biostats.bepress.com/context/jhubiostat/article/1125/viewcontent/hmm.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
Genetics
Genetics
oai:biostats.bepress.com:jhubiostat-1028
2004-01-05T20:20:12Z
publication:jhubiostat
Power and Robustness of Linkage Tests for Quantitative Traits in General Pedigrees
Chen, Weimin
Broman, Karl
Liang, Kung-Yee
There are numerous statistical methods for quantitative trait linkage analysis in human studies. An ideal such method would have high power to detect genetic loci contributing to the trait, would be robust to non-normality in the phenotype distribution, would be appropriate for general pedigrees, would allow the incorporation of environmental covariates, and would be appropriate in the presence of selective sampling. We recently described a general framework for quantitative trait linkage analysis, based on generalized estimating equations, for which many current methods are special cases. This procedure is appropriate for general pedigrees and easily accommodates environmental covariates. In this paper, we use computer simulations to investigate the power robustness of a variety of linkage test statistics built upon our general framework. We also propose two novel test statistics that take account of higher moments of the phenotype distribution, in order to accommodate non-normality. These new linkage tests are shown to have high power and to be robust to non-normality. While we have not yet examined the performance of our procedures in the context of selective sampling via computer simulations, the proposed tests satisfy all of the other qualities of an ideal quantitative trait linkage analysis method.
2004-01-05T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper28
https://biostats.bepress.com/context/jhubiostat/article/1028/viewcontent/ms_2004.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1029
2004-01-28T17:07:56Z
publication:jhubiostat
Inequity Measures for Evaluations of Environmental Justice: A Case Study of Close Proximity to Highways in NYC
Jacobson, Jerry O.
Hengartner, Nicolas W.
Louis, Thomas A.
Assessments of environmental and territorial justice are similar in that both assess whether empirical relations between the spatial arrangement of undesirable hazards (or desirable public goods and services) and socio-demographic groups are consistent with notions of social justice, evaluating the spatial distribution of benefits and burdens (outcome equity) and the process that produces observed differences (process equity. Using proximity to major highways in NYC as a case study, we review methodological issues pertinent to both fields and discuss choice and computation of exposure measures, but focus primarily on measures of inequity. We present inequity measures computed from the empirically estimated joint distribution of exposure and demographics and compare them to traditional measures such as linear regression, logistic regression and Theil’s entropy index. We find that measures computed from the full joint distribution provide more unified, transparent and intuitive operational definitions of inequity and show how the approach can be used to structure siting and decommissioning decisions.
2004-01-21T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper29
https://biostats.bepress.com/context/jhubiostat/article/1029/viewcontent/nyc_paper.complete.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1030
2004-02-03T15:44:56Z
publication:jhubiostat
Choosing Smoothness Parameters for Smoothing Splines by Minimizing and Estimate of Risk
Irizarry, Rafael A
Smoothing splines are a popular approach for non-parametric regression problems. We use periodic smoothing splines to fit a periodic signal plus noise model to data for which we assume there are underlying circadian patterns. In the smoothing spline methodology, choosing an appropriate smoothness parameter is an important step in practice. In this paper, we draw a connection between smoothing splines and REACT estimators that provides motivation for the creation of criteria for choosing the smoothness parameter. The new criteria are compared to three existing methods, namely cross-validation, generalized cross-validation, and generalization of maximum likelihood criteria, by a Monte Carlo simulation and by an application to the study of circadian patterns. For most of the situations presented in the simulations, including the practical example, the new criteria out-perform the three existing criteria.
2004-02-03T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper30
https://biostats.bepress.com/context/jhubiostat/article/1030/viewcontent/react_splines.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1047
2004-08-02T18:44:50Z
publication:jhubiostat
The Genomes of Recombinant Inbred Lines: The Gory Details
Broman, Karl W
Recombinant inbred lines (RILs) can serve as powerful tools for genetic mapping. Recently, members of the Complex Trait Consortium have proposed the development of a large panel of eight-way RILs in the mouse, derived from eight genetically diverse parental strains. Such a panel would be a valuable community resource. The use of such eight-way RILs will require a detailed understanding of the relationship between alleles at linked loci on an RI chromosome. We extend the work of Haldane and Waddington (1931) on twoway RILs and describe the map expansion, clustering of breakpoints, and other features of the genomes of multiple-strain RILs as a function of the level of crossover interference in meiosis.
In this technical report, we present all of our results, in their gory detail. We don’t intend to include such details in the final publication, but want to present them here for those who might be interested.
2004-08-02T07:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper47
https://biostats.bepress.com/context/jhubiostat/article/1047/viewcontent/rigenome_goryK.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:jhubiostat-1181
2009-01-29T18:03:06Z
publication:jhubiostat
ASSOCIATON TESTS THAT ACCOMMODATE GENOTYPING ERRORS
Ruczinski, Ingo
Li, Qing
Carvalho, Benilton
Fallin, M. Daniele
Irizarry, Rafael A.
Louis, Thomas A.
High-throughput SNP arrays provide estimates of genotypes for up to one million loci, often used in genome-wide association studies. While these estimates are typically very accurate, genotyping errors do occur, which can influence in particular the most extreme test statistics and p-values. Estimates for the genotype uncertainties are also available, although typically ignored. In this manuscript, we develop a framework to incorporate these genotype uncertainties in case-control studies for any genetic model. We verify that using the assumption of a “local alternative” in the score test is very reasonable for effect sizes typically seen in SNP association studies, and show that the power of the score test is simply a function of the correlation of the genotype probabilities with the true genotypes. We demonstrate that the power to detect a true association can be substantially increased for difficult to call genotypes, resulting in improved inference in association studies.
2009-01-29T08:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper181
https://biostats.bepress.com/context/jhubiostat/article/1181/viewcontent/ruczinski.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
Association studies Genotypes; Genotype uncertainty; Score tests; Single nucleotide polymorphisms
Genetics
Genetics
oai:biostats.bepress.com:jhubiostat-1195
2009-07-01T18:25:41Z
publication:jhubiostat
TRIO LOGIC REGRESSION - DETECTION OF SNP - SNP INTERACTIONS IN CASE-PARENT TRIOS
Li, Qing
Louis, Thomas A.
Fallin, M. Daniele
Ruczinski, Ingo
Statistical approaches to evaluate higher order SNP-SNP and SNP-environment interactions are critical in genetic association studies, as susceptibility to complex disease is likely to be related to the interaction of multiple SNPs and environmental factors. Logic regression (Kooperberg et al., 2001; Ruczinski et al., 2003) is one such approach, where interactions between SNPs and environmental variables are assessed in a regression framework, and interactions become part of the model search space. In this manuscript we extend the logic regression methodology, originally developed for cohort and case-control studies, for studies of trios with affected probands. Trio logic regression accounts for the linkage disequilibrium (LD) structure in the genotype data, and accommodates missing genotypes via haplotype-based imputation. We also derive an efficient algorithm to simulate case-parent trios where genetic risk is determined via epistatic interactions.
2009-07-01T07:00:00Z
text
application/pdf
https://biostats.bepress.com/jhubiostat/paper194
https://biostats.bepress.com/context/jhubiostat/article/1195/viewcontent/manuscript_methods.pdf
Johns Hopkins University, Dept. of Biostatistics Working Papers
Collection of Biostatistics Research Archive
Case-parent trios; Interaction; Logic regression; Single nucleotide polymorphisms
Genetics
Genetics
oai:biostats.bepress.com:mskccbiostat-1002
2005-04-18T18:14:11Z
publication:mskccbiostat
Power Calculations for Preclinical Studies Using a K-Sample Rank Test and the Lehmann Alternative Hypothesis
Heller, Glenn
Power calculations in a small sample comparative study, with a continuous outcome measure, are typically undertaken using the asymptotic distribution of the test statistic. When the sample size is small, this asymptotic result can be a poor approximation. An alternative approach, using a rank based test statistic, is an exact power calculation. When the number of groups is greater than two, the number of calculations required to perform an exact power calculation is prohibitive. To reduce the computational burden, a Monte Carlo resampling procedure is used to approximate the exact power function of a k-sample rank test statistic under the family of Lehmann alternative hypotheses. The motivating example for this approach is the design of animal studies, where the number of animals per group is typically small.
2005-04-15T07:00:00Z
text
application/pdf
https://biostats.bepress.com/mskccbiostat/paper3
https://biostats.bepress.com/context/mskccbiostat/article/1002/viewcontent/mcpower_sim_r.pdf
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
animal study design
exact test
permutation distribution
sample size calculation
oai:biostats.bepress.com:mskccbiostat-1006
2007-03-05T15:37:57Z
publication:mskccbiostat
Sequential Quantitative Trait Locus Mapping in Experimental Crosses
Satagopan, Jaya M
Sen, Saunak
Churchill, Gary A
The etiology of complex diseases is heterogeneous. The presence of risk alleles in one or more genetic loci affects the function of a variety of intermediate biological pathways, resulting in the overt expression of disease. Hence, there is an increasing focus on identifying the genetic basis of disease by sytematically studying phenotypic traits pertaining to the underlying biological functions. In this paper we focus on identifying genetic loci linked to quantitative phenotypic traits in experimental crosses. Such genetic mapping methods often use a one stage design by genotyping all the markers of interest on the available subjects. A genome scan based on single locus or multi-locus models is used to identify the putative loci. Since the number of quantitative trait loci (QTLs) is very likely to be small relative to the number of markers genotyped, a one-stage selective genotyping approach is commonly used to reduce the genotyping burden, whereby markers are genotyped solely on individuals with extreme trait values. This approach is powerful in the presence of a single quantitative trait locus (QTL) but may result in substantial loss of information in the presence of multiple QTLs. Here we investigate the efficiency of sequential two stage designs to identify QTLs in experimental populations. Our investigations for backcross and F2 crosses suggest that genotyping all the markers on 60% of the subjects in Stage 1 and genotyping the chromosomes significant at 20% level using additional subjects in Stage 2 and testing using all the subjects provides an efficient approach to identify the QTLs and utilizes only 70% of the genotyping burden relative to a one stage design, regardless of the heritability and genotyping density. Complex traits are a consequence of multiple QTLs conferring main effects as well as epistatic interactions. We propose a two-stage analytic approach where a single-locus genome scan is conducted in Stage 1 to identify promising chromosomes, and interactions are examined using the loci on these chromosomes in Stage 2. We examine settings under which the two-stage analytic approach provides sufficient power to detect the putative QTLs.
2007-03-01T08:00:00Z
text
application/pdf
https://biostats.bepress.com/mskccbiostat/paper7
https://biostats.bepress.com/context/mskccbiostat/article/1006/viewcontent/sequential.pdf
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Genetics
Genetics
oai:biostats.bepress.com:mskccbiostat-1009
2006-11-20T16:43:18Z
publication:mskccbiostat
A Faster Circular Binary Segmentation Algorithm for the Analysis of Array CGH Data
Venkatraman, E S
Olshen, Adam
Motivation: Array CGH technologies enable the simultaneous measurement of DNA copy number for thousands of sites on a genome. We developed the circular binary segmentation (CBS) algorithm to divide the genome into regions of equal copy number (Olshen {\it et~al}, 2004). The algorithm tests for change-points using a maximal $t$-statistic with a permutation reference distribution to obtain the corresponding $p$-value. The number of computations required for the maximal test statistic is $O(N^2),$ where $N$ is the number of markers. This makes the full permutation approach computationally prohibitive for the newer arrays that contain tens of thousands markers and highlights the need for a faster. algorithm.
Results: We present a hybrid approach to obtain the $p$-value of the test statistic in linear time. We also introduce a rule for stopping early when there is strong evidence for the presence of a change. We show through simulations that the hybrid approach provides a substantial gain in speed with only a negligible loss in accuracy and that the stopping rule further increases speed. We also present the analysis of array CGH data from a breast cancer cell line to show the impact of the new approaches on the analysis of real data.
Availability: An R (R Development Core Team, 2006) version of the CBS algorithm has been implemented in the ``DNAcopy'' package of the Bioconductor project (Gentleman {\it et~al}, 2004). The proposed hybrid method for the $p$-value is available in version 1.2.1 or higher and the stopping rule for declaring a change early is available in version 1.5.1 or higher.
2006-06-07T07:00:00Z
text
application/pdf
https://biostats.bepress.com/mskccbiostat/paper9
https://biostats.bepress.com/context/mskccbiostat/article/1009/viewcontent/cbs_corba1.pdf
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Genetics
Genetics
oai:biostats.bepress.com:mskccbiostat-1001
2005-04-07T16:56:09Z
publication:mskccbiostat
Concordance Probability and Discriminatory Power in Proportional Hazards Regression
Gonen, Mithat
Heller, Glenn
The concordance probability is used to evaluate the discriminatory power and the predictive accuracy of nonlinear statistical models. We derive an analytic expression for the concordance probability in the Cox proportional hazards model. The proposed estimator is a function of the regression parameters and the covariate distribution only and does not use the observed event and censoring times. For this reason it is asymptotically unbiased, unlike Harrell's c-index based on informative pairs. The asymptotic distribution of the concordance probability estimate is derived using U-statistic theory and the methodology is applied to a predictive model in lung cancer.
2005-04-07T07:00:00Z
text
application/pdf
https://biostats.bepress.com/mskccbiostat/paper2
https://biostats.bepress.com/context/mskccbiostat/article/1001/viewcontent/pa.gonen.heller.pdf
Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
c-index
censored data
Cox model
predictive accuracy
oai:biostats.bepress.com:umichbiostat-1018
2003-10-03T15:30:06Z
publication:umichbiostat
A Fully Bayesian Approach for Combining Multilevel Failure Information in Fault Tree Quantification and Corresponding Optimal Resource Allocation
Hamada, M
Martz, H. F.
Reese, C S
Graves, T.
Johnson, Valen
Wilson, A. G.
This paper presents a fully Bayesian approach that simultaneously combines basic event and statistically independent higher event-level failure data in fault tree quantification. Such higher-level data could correspond to train, sub-system or system failure events. The full Bayesian approach also allows the highest-level data that are usually available for existing facilities to be automatically propagated to lower levels. A simple example illustrates the proposed approach. The optimal allocation of resources for collecting additional data from a choice of different level events is also presented. The optimization is achieved using a genetic algorithm.
2003-09-11T07:00:00Z
text
application/pdf
https://biostats.bepress.com/umichbiostat/paper19
https://biostats.bepress.com/context/umichbiostat/article/1018/viewcontent/auto_convert.pdf
The University of Michigan Department of Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:umichbiostat-1022
2004-02-10T20:36:07Z
publication:umichbiostat
Monotone Constrained Tensor-product B-spline with application to screening studies
Wang, Yue
Taylor, Jeremy
When different markers are responsive to different aspects of a disease, combination of multiple markers could provide a better screening test for early detection. It is also resonable to assume that the risk of disease changes smoothly as the biomarker values change and the change in risk is monotone with respect to each biomarker. In this paper, we propose a boundary constrained tensor-product B-spline method to estimate the risk of disease by maximizing a penalized likelihood. To choose the optimal amount of smoothing, two scores are proposed which are extensions of the GCV score (O'Sullivan et al. (1986)) and the GACV score (Ziang and Wahba (1996)) to incorporate linear constraints. Simulation studies are carried out to investigate the performance of the proposed estimator and the selection scores. In addidtion, sensitivities and specificities based ona pproximate leave-one-out estimates are proposed to generate more realisitc ROC curves. Data from a pancreatic cancer study is used for illustration.
2004-02-10T08:00:00Z
text
application/pdf
https://biostats.bepress.com/umichbiostat/paper23
https://biostats.bepress.com/context/umichbiostat/article/1022/viewcontent/101503.pdf
The University of Michigan Department of Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:umichbiostat-1041
2004-06-08T16:49:29Z
publication:umichbiostat
Classification and selection of biomarkers in genomic data using LASSO
Ghosh, Debashis
Chinnaiyan, Arul
High-throughput gene expression technologies such as microarrays have been utilized in a variety of scientific applications. Most of the work has been on assessing univariate associations between gene expression with clinical outcome (variable selection) or on developing classification procedures with gene expression data (supervised learning). We consider a hybrid variable selection/classification approach that is based on linear combinations of the gene expression profiles that maximize an accuracy measure summarized using the receiver operating characteristic curve. Under a specific probability model, this leads to consideration of linear discriminant functions. We incorporate an automated variable selection approach using LASSO. An equivalence between LASSO estimation with support vector machines allows for model fitting using standard software. We apply the proposed method to simulated data as well as data from a recently published prostate cancer study.
2004-06-08T07:00:00Z
text
application/pdf
https://biostats.bepress.com/umichbiostat/paper42
https://biostats.bepress.com/context/umichbiostat/article/1041/viewcontent/svmpath6.pdf
The University of Michigan Department of Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:umichbiostat-1029
2004-05-10T15:51:17Z
publication:umichbiostat
Bayes Factors Based on Test Statistics
Johnson, Valen
Traditionally, the use of Bayes factors has required the specification of proper prior distributions on model parameters implicit to both null and alternative hypotheses. In this paper, I describe an approach to defining Bayes factors based on modeling test statistics. Because the distributions of test statistics do not depend on unknown model parameters, this approach eliminates the subjectivity normally associated with the definition of Bayes factors. For standard test statistics, including the _2, F, t and z statistics, the values of Bayes factors that result from this approach can be simply expressed in closed form.
2004-04-19T07:00:00Z
text
application/pdf
https://biostats.bepress.com/umichbiostat/paper30
https://biostats.bepress.com/context/umichbiostat/article/1029/viewcontent/bf.pdf
The University of Michigan Department of Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:uwbiostat-1003
2003-01-24T21:15:12Z
publication:uwbiostat
Performance of the Halex in Logitudinal Studies of Older Adults
Diehr, Paula
Patrick, Donald L.
Burke, Gregory L.
Williamson, Jeff D.
Goal: The Halex is an indicator of health status that combines self-rated health and activity limitations, which has been used by NCHS to predict future years of healthy life. The scores for each health state were developed based on strong assumptions, notably that a person in excellent health with ADL disabilities is as healthy as a person in poor health with no disabilities. Our goal was to examine the performance of the Halex as a longitudinal measure of health for older adults, and to improve the scoring if necessary.
Methods: We used data from the Cardiovascular Health Study (CHS) to compare the relationship of baseline health to health 2 years later. Subject ages ranged from 65 to 103 (mean age 75). A total of 40,827 transitions were available for analysis. We examined whether Halex scores at time 0 were related monotonically to scores two years later, and iterated the original scores to improve the fit over time.
Findings: The original Halex scores were not consistent over time. Persons in excellent health with ADL limitations were much healthier 2 years later than people in poor health with no limitations, even though they had been assumed to have identical health. People with ADL limitations had higher scores than predicted. The assumptions made in creating the Halex were not upheld in the data.
Conclusions: The new iterated scores are specific to older adults, are appropriate for longitudinal data, and are relatively assumption-free. We recommend the use of these new scores for longitudinal studies of older adults that use the Halex health states.
2002-01-25T08:00:00Z
text
application/pdf
https://biostats.bepress.com/uwbiostat/paper176
https://biostats.bepress.com/context/uwbiostat/article/1003/viewcontent/Diehr1_cropped.pdf
UW Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:uwbiostat-1032
2003-06-13T16:54:37Z
publication:uwbiostat
Tests for Comparing Mark-Specific Hazards and Cumulative Incidence Functions
Gilbert, Peter B.
McKeague, Ian W.
Sun, Yanqing
It is of interest in some applications to determine whether there is a relationship between a hazard rate function (or a cumulative incidence function) and a mark variable which is only observed at uncensored failure times. We develop nonparametric tests for this problem when the mark variable is continuous. Tests are developed for the null hypothesis that the mark-specific hazard rate is independent of the mark versus ordered and two-sided alternatives expressed in terms of mark-specific hazard functions and mark-specific cumulative incidence functions. The test statistics are based on functionals of a bivariate test process equal to a weighted average of differences between a Nelson--Aalen-type estimator of the mark-specific cumulative hazard function and a nonparametric estimator of this function under the null hypothesis. The weight function in the test process can be chosen so that the test statistics are asymptotically distribution-free.Asymptotically correct critical values are obtained through a simple simulation procedure. The testing procedures are shown to perform well in numerical studies, and are illustrated with an AIDS clinical trial example. Specifically, the tests are used to assess if the instantaneous or absolute risk of treatment failure depends on the amount of accumulation of drug resistance mutations in a subject's HIV virus. This assessment helps guide development of anti-HIV therapies that surmount the problem of drug resistance.
2003-06-13T07:00:00Z
text
application/pdf
https://biostats.bepress.com/uwbiostat/paper209
https://biostats.bepress.com/context/uwbiostat/article/1032/viewcontent/Gilbert209_cropped.pdf
UW Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:uwbiostat-1015
2003-02-24T16:20:03Z
publication:uwbiostat
Whither PQL?
Breslow, Norm
Generalized linear mixed models (GLMM) are generalized linear models with normally distributed random effects in the linear predictor. Penalized quasi-likelihood (PQL), an approximate method of inference in GLMMs, involves repeated fitting of linear mixed models with “working” dependent variables and iterative weights that depend on parameter estimates from the previous cycle of iteration. The generality of PQL, and its implementation in commercially available software, has encouraged the application of GLMMs in many scientific fields. Caution is needed, however, since PQL may sometimes yield badly biased estimates of variance components, especially with binary outcomes.
Recent developments in numerical integration, including adaptive Gaussian quadrature, higher order Laplace expansions, stochastic integration and Markov chain Monte Carlo (MCMC) algorithms, provide attractive alternatives to PQL for approximate likelihood inference in GLMMs. Analyses of some well known datasets, and simulations based on these analyses, suggest that PQL still performs remarkably well in comparison with more elaborate procedures in many practical situations. Adaptive Gaussian quadrature is a viable alternative for nested designs where the numerical integration is limited to a small number of dimensions. Higher order Laplace approximations hold the promise of accurate inference more generally. MCMC is likely the method of choice for the most complex problems that involve high dimensional integrals.
2003-01-24T08:00:00Z
text
application/pdf
https://biostats.bepress.com/uwbiostat/paper192
https://biostats.bepress.com/context/uwbiostat/article/1015/viewcontent/Breslow191.pdf
UW Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:uwbiostat-1040
2003-06-25T20:34:38Z
publication:uwbiostat
Selection of Matching Variables in Community Health Intervention Trials
Dunning, Andrew J.
In a matched experimental design, the effectiveness of matching in reducing bias and increasing power depends on the strength of the association between the matching variable and the outcome of interest. In particular, in the design of a community health intervention trial, the effectiveness of a matched design, where communities are matched according to some community characteristic, depends on the strength of the correlation between the matching characteristic and the change in the health behavior being measured.
We attempt to estimate the correlation between community characteristics and changes in health behaviors in four datasets from community intervention trials and observational studies. Community characteristics that are highly correlated with changes in health behaviors would potentially be effective matching variables in studies of health intervention programs designed to change those behaviors.
Among the community characteristics considered, the urban-rural character of the community was the most highly correlated with changes in health behaviors. The correlations between Per Capita Income, Percent Low Income & Percent aged over 65 and changes in health behaviors were marginally statistically significant (p < 0.08).
1998-12-15T08:00:00Z
text
https://biostats.bepress.com/uwbiostat/paper161
UW Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:uwbiostat-1000
2003-01-22T18:25:09Z
publication:uwbiostat
Large Sample Theory for Semiparametric Regression Models with Two-Phase, Outcome Dependent Sampling
Breslow, Norm
McNeney, Brad
Wellner, Jon A.
Outcome-dependent, two-phase sampling designs can dramatically reduce the costs of observational studies by judicious selection of the most informative subjects for purposes of detailed covariate measurement. Here we derive asymptotic information bounds and the form of the efficient score and influence functions for the semiparametric regression models studied by Lawless, Kalbfleisch, and Wild (1999) under two-phase sampling designs. We show that the maximum likelihood estimators for both the parametric and nonparametric parts of the model are asymptotically normal and efficient. The efficient influence function for the parametric part aggress with the more general information bound calculations of Robins, Hsieh, and Newey (1995). By verifying the conditions of Murphy and Van der Vaart (2000) for a least favorable parametric submodel, we provide asymptotic justification for statistical inference based on profile likelihood.
2002-02-22T08:00:00Z
text
application/pdf
https://biostats.bepress.com/uwbiostat/paper183
https://biostats.bepress.com/context/uwbiostat/article/1000/viewcontent/Breslow1_cropped.pdf
UW Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:uwbiostat-1030
2003-06-11T17:52:31Z
publication:uwbiostat
Asymptotics for Marginal Generalized Linear Models With Sparse Correlations
Lumley, Thomas
Mayer Hamblett, Nicole
Marginal generalized linear models can be used for clustered and longitudinal data by fitting a model as if the data were independent and using an empirical estimator of parameter standard errors. We extend this approach to data where the number of observations correlated with a given one grows with sample size and show that parameter estimates are consistent and asymptotically Normal with a slower convergence rate than for independent data, and that an information sandwich variance estimator is consistent. We present two problems that motivated this work, the modelling of patterns of HIV genetic variation and the behavior of clustered data estimators when clusters are large.
2003-06-11T07:00:00Z
text
application/pdf
https://biostats.bepress.com/uwbiostat/paper207
https://biostats.bepress.com/context/uwbiostat/article/1030/viewcontent/Lumley207_cropped.pdf
UW Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:uwbiostat-1031
2003-06-13T16:33:21Z
publication:uwbiostat
Sensitivity Analysis for the Assessment of Causal Vaccine Effects on Viral Load in HIV Vaccine Trials
Gilbert, Peter B.
Bosch, Ronald J.
Hudgens, Michael G.
Vaccines with limited ability to prevent HIV infection may positively impact the HIV/AIDS pandemic by preventing secondary transmission and disease in vaccine recipients who become infected. To evaluate the impact of vaccination on secondary transmission and disease, efficacy trials assess vaccine effects on HIV viral load and other surrogate endpoints measured after infection. A standard test that compares the distribution of viral load between the infected subgroups of vaccine and placebo recipients does not assess a causal effect of vaccine, because the comparison groups are selected after randomization. To address this problem, we formulate clinically relevant causal estimands using the principal stratification framework developed by Frangakis and Rubin (2002), and propose a class of logistic selection bias models whose members identify the estimands. Given a selection model in the class, procedures are developed for testing and estimation of the causal effect of vaccination on viral load in the principal stratum of subjects who would be infected regardless of randomization assignment. We show how the procedures can be used for a sensitivity analysis that quantifies how the causal effect of vaccination varies with the presumed magnitude of selection bias.
2003-06-13T07:00:00Z
text
application/pdf
https://biostats.bepress.com/uwbiostat/paper208
https://biostats.bepress.com/context/uwbiostat/article/1031/viewcontent/Gilbert208_cropped.pdf
UW Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:ucbbiostat-1009
2002-08-06T15:34:53Z
publication:ucbbiostat
Detection of Progressive Deterioration in Early Onset Schizophrenia with a New Statistical Method
Chen, Ying Qing
Wang, Mei-Cheng
Eaton, William W.
Much controversy exists over whether the course of schizophrenia, as defined by the lengths of repeated community tenures, is progressively ameliorating or deteriorating. This article employs a new statistical method proposed by Wang and Chen (2000) to analyze the Denmark registry data in Eaton, et al (1992). The new statistical method correctly handles the bias caused by induced informative censoring, which is an interaction of the heterogeneity of schizophrenia patients and long-term follow-up. The analysis shows a progressive deterioration pattern in terms of community tenures for the full registry cohort, rather than a progressive amelioration pattern as reported for a selected sub-cohort in Eaton, et al (1992). When adjusted for the long-term chronicity of calendar time, no significant progressive pattern was found for the full cohort.
2001-12-01T08:00:00Z
text
application/pdf
https://biostats.bepress.com/ucbbiostat/paper102
https://biostats.bepress.com/context/ucbbiostat/article/1009/viewcontent/Chen_102.pdf
U.C. Berkeley Division of Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
oai:biostats.bepress.com:bioconductor-1000
2008-01-16T18:42:31Z
publication:bioconductor
Bioconductor: Open software development for computational biology and bioinformatics
Gentleman, Robert C.
Carey, Vincent J.
Bates, Douglas J.
Bolstad, Benjamin M.
Dettling, Marcel
Dudoit, Sandrine
Ellis, Byron
Gautier, Laurent
Ge, Yongchao
Gentry, Jeff
Hornik, Kurt
Hothorn, Torsten
Huber, Wolfgang
Iacus, Stefano
Irizarry, Rafael
Leisch, Friedrich
Li, Cheng
Maechler, Martin
Rossini, Anthony J.
Sawitzki, Guenther
Smith, Colin
Smyth, Gordon K.
Tierney, Luke
Yang, Yee Hwa
Zhang, Jianhua
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. We detail some of the design decisions, software paradigms and operational strategies that have allowed a small number of researchers to provide a wide variety of innovative, extensible, software solutions in a relatively short time. The use of an object oriented programming paradigm, the adoption and development of a software package system, designing by contract, distributed development and collaboration with other projects are elements of this project's success. Individually, each of these concepts are useful and important but when combined they have provided a strong basis for rapid development and deployment of innovative and flexible research software for scientific computation. A primary objective of this initiative is achievement of total remote reproducibility of novel algorithmic research results.
2004-01-01T08:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper1
https://biostats.bepress.com/context/bioconductor/article/1000/viewcontent/viewcontent.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
bioinformatics
computational biology
reproducible research
scientific computation
Bioinformatics
Computational Biology
Numerical Analysis and Computation
oai:biostats.bepress.com:bioconductor-1001
2004-05-30T04:04:57Z
publication:bioconductor
Statistical Analyses and Reproducible Research
Gentleman, Robert
Temple Lang, Duncan
For various reasons, it is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, etc. with the documents that describe and rely on them. This integration allows readers to both verify and adapt the statements in the documents. Authors can easily reproduce them in the future, and they can present the document's contents in a different medium, e.g. with interactive controls. This paper describes a software framework for authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are dynamic in that the contents, including figures, tables, etc., can be recalculated each time a view of the document is generated. Our model treats a dynamic document as a master or ``source'' document from which one can generate different views in the form of traditional, derived documents for different audiences.
We introduce the concept of a compendium as both a container for the different elements that make up the document and its computations (i.e. text, code, data, ...), and as a means for distributing, managing and updating the collection.
The step from disseminating analyses via a compendium to reproducible research is a small one. By reproducible research, we mean research papers with accompanying software tools that allow the reader to directly reproduce the results and employ the methods that are presented in the research paper. Some of the issues involved in paradigms for the production, distribution and use of such reproducible research are discussed.
2004-05-29T07:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper2
https://biostats.bepress.com/context/bioconductor/article/1001/viewcontent/RR2.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Compendium
Dynamic documents
Literate programming
Markup language
Perl
Python
R
Bioinformatics
Computational Biology
Numerical Analysis and Computation
oai:biostats.bepress.com:bioconductor-1002
2004-05-30T03:56:26Z
publication:bioconductor
Reproducible Research: A Bioinformatics Case Study
Gentleman, Robert
While scientific research and the methodologies involved have gone through substantial technological evolution the technology involved in the publication of the results of these endeavors has remained relatively stagnant. Publication is largely done in the same manner today as it was fifty years ago. Many journals have adopted electronic formats, however, their orientation and style is little different from a printed document. The documents tend to be static and take little advantage of computational resources that might be available. Recent work, Gentleman and Temple Lang (2004), suggests a methodology and basic infrastructure that can be used to publish documents in a substantially different way. Their approach is suitable for the publication of papers whose message relies on computation. Stated quite simply, Gentleman and Temple Lang propose a paradigm where documents are mixtures of code and text. Such documents may be self-contained or they may be a component of a compendium which provides the infrastructure needed to provide access to data and supporting software. These documents, or compendiums, can be processed in a number of different ways. One transformation will be to replace the code with its output -- thereby providing the familiar, but limited, static document.
In this paper we apply these concepts to a seminal paper in bioinformatics, namely The Molecular Classification of Cancer, Golub et al. (1999). The authors of that paper have generously provided data and other information that have allowed us to largely reproduce their results. Rather than reproduce this paper exactly we demonstrate that such a reproduction is possible and instead concentrate on demonstrating the usefulness of the compendium concept itself.
2004-05-20T07:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper3
https://biostats.bepress.com/context/bioconductor/article/1002/viewcontent/Golub.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Compendium
Expression Analysis
Reproducible Research
Bioinformatics
Computational Biology
oai:biostats.bepress.com:bioconductor-1003
2004-06-02T17:59:16Z
publication:bioconductor
A graph theoretic approach to testing associations between disparate sources of functional genomic data
Balasubramanian, Raji
LaFramboise, Thomas
Scholtens, Denise
Gentleman, Robert
The last few years have seen the advent of high-throughput technologies to analyze various properties of the transcriptome and proteome of several organisms. The congruency of these different data sources, or lack thereof, can shed light on the mechanisms that govern cellular function. A central challenge for bioinformatics research is to develop a unified framework for combining the multiple sources of functional genomics information and testing associations between them, thus obtaining a robust and integrated view of the underlying biology.
We present a graph theoretic approach to test the significance of the association between multiple disparate sources of functional genomics data by proposing two statistical tests, namely edge permutation and node label permutation tests. We demonstrate the use of the proposed tests by finding significant association between a Gene Ontology-derived "predictome" and data obtained from mRNA expression and phenotypic experiments for Saccharomyces cerevisiae. Moreover, we employ the graph theoretic framework to recast a surprising discrepancy presented in Giaever et al. (2002) between gene expression and knockout phenotype, using expression data from a different set of experiments.
2004-06-02T07:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper4
https://biostats.bepress.com/context/bioconductor/article/1003/viewcontent/graphAT.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Graph theory
permutation testing
tests of association
yeast genomics
Genetics
Numerical Analysis and Computation
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:bioconductor-1013
2008-01-16T01:17:20Z
publication:bioconductor
Data Quality Assessment of Ungated Flow Cytometry Data in High
Le Meur, Nolwenn
Rossini, Anthony
Gasparetto, Maura
Smith, Clay
Brinkman, Ryan R
Gentleman, Robert
Background: The recent development of semi-automated techniques for staining and analyzing flow cytometry samples has presented new challenges. Quality control and quality assessment are critical when developing new high throughput technologies and their associated information services. Our experience suggests that significant bottlenecks remain in the development of high throughput flow cytometry methods for data analysis and display. Especially, data quality control and quality assessment are crucial steps in processing and analyzing high throughput flow cytometry data.
Methods: We propose a variety of graphical exploratory data analytic tools for exploring ungated flow cytometry data. We have implemented a number of specialized functions and methods in the Bioconductor package rflowcyt. We demonstrate the use of these approaches by investigating two independent sets of high throughput flow cytometry data.
Results: We found that graphical representations can reveal substantial non-biological differences in samples. Empirical Cumulative Distribution Function and summary scatterplots were especially useful in the rapid identification of problems not identified by manual review.
Conclusions: Graphical exploratory data analytic tools are quick and useful means of assessing data quality. We propose that the described visualizations should be used as quality assessment tools and where possible, be used for quality control.
2007-02-12T08:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper11
https://biostats.bepress.com/context/bioconductor/article/1013/viewcontent/Article_1013.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
flow cytometry; quality assessment; visualization; exploratory data
Bioinformatics
Computational Biology
oai:biostats.bepress.com:bioconductor-1011
2006-08-21T21:37:40Z
publication:bioconductor
Extensions to Gene Set Enrichment
Jiang, Zhen
Gentleman, Robert
Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper.
Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting gene sets using different statistics. We find that gender may have effects on the gene expression in addition to the phenotype effects. Investigating overlap among interesting gene sets indicate that overlapping could alter the interpretation of the significant results.
2006-08-02T07:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper12
https://biostats.bepress.com/context/bioconductor/article/1011/viewcontent/ExtGSEA4wp.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Gene Set Enrichment; microarray; ALL;
Gene Set Enrichment; microarray; ALL;
Bioinformatics
Computational Biology
oai:biostats.bepress.com:bioconductor-1007
2005-10-01T03:59:32Z
publication:bioconductor
On the Synthesis of Microarray Experiments
Gentleman, Robert
Ruschhaupt, Markus
Huber, Wolfgang
With many different investigators studying the same disease and with a strong commitment to publish supporting data in the scientific community, there are often many different datasets available for any given disease. Hence there is substantial interest in finding methods for combining these datasets to provide better and more detailed understanding of the underlying biology. We consider the synthesis of different microarray data sets using a random effects paradigm and demonstrate how relatively standard statistical approaches yield good results. We identify a number of important and substantive areas which require further investigation.
2005-09-30T07:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper8
https://biostats.bepress.com/context/bioconductor/article/1007/viewcontent/Synthesis.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
meta-analysis; microarray; random effects model; synthesis of experiments
Microarrays
oai:biostats.bepress.com:bioconductor-1006
2004-06-19T00:58:49Z
publication:bioconductor
Differential Expression with the Bioconductor Project
von Heydebreck, Anja
Huber, Wolfgang
Gentleman, Robert
A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. We discuss different approaches to this task and illustrate how they can be applied using software from the Bioconductor Project. A central problem is the high dimensionality of gene expression space, which prohibits a comprehensive statistical analysis without focusing on particular aspects of the joint distribution of the genes expression levels. Possible strategies are to do univariate gene-by-gene analysis, and to perform data-driven nonspecific filtering of genes before the actual statistical analysis. However, more focused strategies that make use of biologically relevant knowledge are more likely to increase our understanding of the data.
2004-06-18T07:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper7
https://biostats.bepress.com/context/bioconductor/article/1006/viewcontent/vhhg.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Biological metadata
differential gene expression
microarrays
multiple testing
statistical software.
Bioinformatics
Computational Biology
Genetics
Microarrays
Numerical Analysis and Computation
oai:biostats.bepress.com:bioconductor-1009
2008-01-16T19:10:53Z
publication:bioconductor
Visualizing Genomic Data
Gentleman, Robert
Hahne, Florian
Huber, Wolfgang
The advent of experimental techniques capable of probing biomolecules and cells at high levels of resolution has led to a rapid change in the methods used for the analysis of experimental molecular biology data. In this article we give an overview over visualization techniques and methods that can be used to assess various aspects of genomic data.
2006-02-01T08:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper10
https://biostats.bepress.com/context/bioconductor/article/1009/viewcontent/viewcontent.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Visualization
Bioinformatics
Computational Biology
oai:biostats.bepress.com:bioconductor-1005
2004-06-18T17:34:45Z
publication:bioconductor
Error models for microarray intensities
Huber, Wolfgang
von Heydebreck, Anja
Vingron, Martin
We derive the additive-multiplicative error model for microarray intensities, and describe two applications. For the detection of differentially expressed genes, we obtain a statistic whose variance is approximately independent of the mean intensity. For the post hoc calibration (normalization) of data with respect to experimental factors, we describe a method for parameter estimation.
2004-03-09T08:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper6
https://biostats.bepress.com/context/bioconductor/article/1005/viewcontent/errmod_2.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Calibration
differential expression
error model
microarrays
normalization
parameter estimation
variance stabilization.
Microarrays
Statistical Models
oai:biostats.bepress.com:bioconductor-1004
2004-06-02T18:09:18Z
publication:bioconductor
Classification Using Generalized Partial Least Squares
Ding, Beiying
Gentleman, Robert
The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in the context of generalized linear regression based on a previous approach, Iteratively ReWeighted Partial Least Squares, i.e. IRWPLS (Marx, 1996). We compare our results with two-stage PLS (Nguyen and Rocke, 2002A; Nguyen and Rocke, 2002B) and other classifiers. We show that by phrasing the problem in a generalized linear model setting and by applying bias correction to the likelihood to avoid (quasi)separation, we often get lower classification error rates.
2004-05-01T07:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper5
https://biostats.bepress.com/context/bioconductor/article/1004/viewcontent/pls_jcgs_blinded_resubmission.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Cross-validation
Firth's procedure
gene expression
Iteratively Reweighted Partial Least Squares
(Quasi)separation
Two-stage PLS
Bioinformatics
Computational Biology
Genetics
Microarrays
Multivariate Analysis
Statistical Models
oai:biostats.bepress.com:bioconductor-1008
2008-01-16T19:03:38Z
publication:bioconductor
An introduction to low-level analysis methods of DNA microarray data
Huber, Wolfgang
von Heydebreck, Anja
Vingron, Martin
This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect.
2005-11-15T08:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper9
https://biostats.bepress.com/context/bioconductor/article/1008/viewcontent/viewcontent.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
microarray
preprocessing
normalization
quality control
microarray
preprocessing
normalization
quality control
microarray
preprocessing
normalization
quality control
microarray
preprocessing
normalization
quality control
microarray
preprocessing
normalization
quality control
microarray
preprocessing
normalization
quality control
microarray
preprocessing
normalization
quality control
microarray
preprocessing
normalization
quality control
Bioinformatics
Computational Biology
oai:biostats.bepress.com:bioconductor-1014
2008-01-16T18:14:35Z
publication:bioconductor
Assessing The Role Of Multi-protein Complexes In Determining Phenotype
Le Meur, Nolwenn
Gentleman, Robert
Understanding regulatory mechanisms in complex biological systems is an important challenge, in particular to understand disease mechanisms, and to discover new therapies and drugs. In this paper, we consider the important question of cellular regulation of phenotype. Using single gene deletion data, we address the problem of linking a phenotype to underlying functional roles in the organism and provide a sound computational and statistical paradigm that can be extended to address more complex experimental settings such as multiple deletions. We apply the proposed approaches to publicly available data sets to demonstrate strong evidence for the involvement of multi-protein complexes in the phenotypes studied.
2008-01-16T08:00:00Z
text
application/pdf
https://biostats.bepress.com/bioconductor/paper13
https://biostats.bepress.com/context/bioconductor/article/1014/viewcontent/PCpheno_article.pdf
https://biostats.bepress.com/context/bioconductor/article/1014/filename/0/type/additional/viewcontent/PCpheno_supplement.pdf
Bioconductor Project Working Papers
Collection of Biostatistics Research Archive
Systems biology; Graph theory; Proteomic; Phenotype
Bioinformatics
Computational Biology
oai:biostats.bepress.com:harvardbiostat-1130
2010-09-27T15:09:22Z
publication:harvardbiostat
Landmark Prediction of Survival
Parast, Layla
Cai, Tianxi
2010-09-27T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper123
https://biostats.bepress.com/context/harvardbiostat/article/1130/viewcontent/LandmarkJRSSB_1.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Biomarkers; Disease prognosis; Predictive accuracy; Risk prediction; Survival analysis
Biostatistics
Clinical Epidemiology
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1015
2004-10-12T15:56:15Z
publication:harvardbiostat
Semiparametric Methods for Semi-competing Risks Problem with Censoring and Truncation
Jiang, Hongyu
Fine, Jason
Chappell, Richard J
Studies of chronic life-threatening diseases often involve both mortality and morbidity. In observational studies, the data may also be subject to administrative left truncation and right censoring. Since mortality and morbidity may be correlated and mortality may censor morbidity, the Lynden-Bell estimator for left truncated and right censored data may be biased for estimating the marginal survival function of the non-terminal event. We propose a semiparametric estimator for this survival function based on a joint model for the two time-to-event variables, which utilizes the gamma frailty specification in the region of the observable data. Firstly, we develop a novel estimator for the gamma frailty parameter under left truncation. Using this estimator, we then derive a closed form estimator for the marginal distribution of the non-terminal event. The large sample properties of the estimators are established via asymptotic theory. The methodology performs well with moderate sample sizes, both in simulations and in an analysis of data from a diabetes registry.
2004-10-12T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper15
https://biostats.bepress.com/context/harvardbiostat/article/1015/viewcontent/truncTR.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Bivariate survival function
Concordance probability
Copula
Semi-competing risks
truncation
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1019
2004-10-18T20:26:18Z
publication:harvardbiostat
Cholesky Residuals for Assessing Normal Errors in a Linear Model with Correlated Outcomes: Technical Report
Houseman, E. Andres
Ryan, Louise
Coull, Brent
Despite the widespread popularity of linear models for correlated outcomes (e.g. linear mixed models and time series models), distribution diagnostic methodology remains relatively underdeveloped in this context. In this paper we present an easy-to-implement approach that lends itself to graphical displays of model fit. Our approach involves multiplying the estimated margional residual vector by the Cholesky decomposition of the inverse of the estimated margional variance matrix. The resulting "rotated" residuals are used to construct an empirical cumulative distribution function and pointwise standard errors. The theoretical framework, including conditions and asymptotic properties, involves technical details that are motivated by Lange and Ryan (1989), Pierce (1982), and Randles (1982). Our method appears to work well in a variety of circumstances, including models having independent units of sampling (clustered data) and models for which all observations are correlated (e.g., a single time series). Our methods can produce satisfactory results even for models that do not satisfy all of the technical conditions stated in our theory.
2004-10-18T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper19
https://biostats.bepress.com/context/harvardbiostat/article/1019/viewcontent/hrc2003.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Cumulative distribution function
Goodness-of-fit
Linear mixed model
Random effects
Residual diagnostics
Longitudinal Data Analysis and Time Series
Statistical Methodology
Statistical Models
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1068
2006-11-28T18:54:25Z
publication:harvardbiostat
Semiparametric Regression of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines and Linear Mixed Models
Liu, Dawei
Lin, Xihong
Ghosh, Debashis
2006-11-28T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper62
https://biostats.bepress.com/context/harvardbiostat/article/1068/viewcontent/lskm.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
BLUPs; Kernel function; Model/variable selection; Nonparametric regression; Penalized likelihood; REML; Score test; Smoothing parameter; Support vector machines
Genetics
Bioinformatics
Biostatistics
Computational Biology
Epidemiology
Genetics
Laboratory and Basic Science Research
Microarrays
Multivariate Analysis
Statistical Methodology
Statistical Models
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1087
2008-06-20T14:35:00Z
publication:harvardbiostat
Estimation and Testing for the Effect of a Genetic Pathway on a Disease Outcome Using Logistic Kernel Machine Regression via Logistic Mixed Models
Liu, Dawei
Ghosh, Debashis
Lin, Xihong
2008-06-20T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper81
https://biostats.bepress.com/context/harvardbiostat/article/1087/viewcontent/logistic_km.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Genetics
Bioinformatics
Biostatistics
Computational Biology
Genetics
Microarrays
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1093
2008-09-09T13:50:01Z
publication:harvardbiostat
Measurement Error Caused by Spatial Misalignment in Environmental Epidemiology
Gryparis, Alexandros
Paciorek, Christopher J.
Zeka, Ariana
Schwartz, Joel
Coull, Brent A
2008-09-09T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper87
https://biostats.bepress.com/context/harvardbiostat/article/1093/viewcontent/MEpaper.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Spatial misalignment; measurment error; predictor; air pollution
Categorical Data Analysis
Statistical Methodology
Statistical Models
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1097
2008-10-15T12:57:31Z
publication:harvardbiostat
Evaluating Subject-level Incremental Values of New Markers for Risk Classification Rule
Cai, Tianxi
Tian, Lu
Lloyd-Jones, Donald M.
Wei, L. J.
2008-10-14T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper91
https://biostats.bepress.com/context/harvardbiostat/article/1097/viewcontent/AddValue_RS_V1.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Coronary heart disease; nonparametric functional estimation; risk factors/markers; pointwise and simultaneous confidence interval; subgroup analysis
Biostatistics
Clinical Epidemiology
Disease Modeling
Statistical Methodology
Statistical Models
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1100
2008-11-12T16:42:51Z
publication:harvardbiostat
The Highest Confidence Density Region and Its Usage for Inferences about the Survival Function with Censored Data
Tian, Lu
wang, Rui
Cai, Tianxi
Wei, L. J.
2008-11-12T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper94
https://biostats.bepress.com/context/harvardbiostat/article/1100/viewcontent/HCDR.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Confidence distribution; highest posterior density region; Markov chain Monte Carlo; simultaneous confidence intervals; survival analysis
Biostatistics
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1109
2010-03-02T20:02:46Z
publication:harvardbiostat
A New Class of Dantzig Selectors for Censored Linear Regression Models
Li, Yi
Dicker, Lee
Zhao, Sihai Dave
2010-03-02T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper102
https://biostats.bepress.com/context/harvardbiostat/article/1109/viewcontent/jrssb_301.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Adaptive Dantzig variable selector; Censored linear regression; Buckley-James imputation; Model selection consistency; Asymptotic normality
Microarrays
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1110
2009-06-24T14:31:02Z
publication:harvardbiostat
Spatial Cluster Detection for Repeatedly Measured Outcomes while Accounting for Residential History
Cook, Andrea J.
Gold, Diane
Li, Yi
2009-06-24T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper103
https://biostats.bepress.com/context/harvardbiostat/article/1110/viewcontent/SpatClustRepeatedManuscript_revision090619.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Asthma; Cumuluative Residuals; Repeated Measured; Spatial Cluster Detection; Wheeze
Epidemiology
Longitudinal Data Analysis and Time Series
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1123
2010-04-21T13:44:36Z
publication:harvardbiostat
Nonparametric Regression with Missing Outcomes Using Weighted Kernel Estimating Equations
Wang, Lu
Rotnitzky, Andrea
Lin, Xihong
2010-04-21T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper116
https://biostats.bepress.com/context/harvardbiostat/article/1123/viewcontent/JASA_T08_463R2_merged.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Asymptotics; Augmented kernel estimating equations; Double robustness; Efficiency; Inverse probability weighted kernel estimating equations; Kernel smoothing
Biostatistics
Categorical Data Analysis
Clinical Epidemiology
Clinical Trials
Epidemiology
Statistical Methodology
Statistical Models
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1125
2010-05-11T15:50:38Z
publication:harvardbiostat
Powerful SNP Set Analysis for Case-Control Genome Wide Association Studies
Wu, Michael C.
Kraft, Peter
Epstein, Michael P
Taylor, Deanne M.
Chanock, Stephen J.
Hunter, David J.
Lin, Xihong
2010-05-11T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper118
https://biostats.bepress.com/context/harvardbiostat/article/1125/viewcontent/gwa_kmtest_fin.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Genetics
Bioinformatics
Biostatistics
Categorical Data Analysis
Computational Biology
Genetics
oai:biostats.bepress.com:harvardbiostat-1129
2010-09-07T14:01:50Z
publication:harvardbiostat
Stratifying Subjects for Treatment Selection with Censored Event Time Data from a Comparative Study
Zhao, Lihui
Cai, Tianxi
Tian, Lu
Uno, Hajime
Solomon, Scott D
Wei, L. J.
2010-09-07T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper122
https://biostats.bepress.com/context/harvardbiostat/article/1129/viewcontent/appliedpaper2.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Cox's model; Nonparametric function estimation; Personalized medicine; Perturbation-resampling method; Stratified medicine; Subgroup analysis; Survival analysis
Biostatistics
Clinical Trials
Statistical Methodology
Statistical Models
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1134
2011-04-05T19:59:09Z
publication:harvardbiostat
Bayesian Effect Estimation Accounting for Adjustment Uncertainty
Wang, Chi
Parmigiani, Giovanni
Dominici, Francesca
2011-04-05T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper127
https://biostats.bepress.com/context/harvardbiostat/article/1134/viewcontent/BAC11.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Epidemiology
oai:biostats.bepress.com:harvardbiostat-1139
2011-07-18T15:03:02Z
publication:harvardbiostat
On the Covariate-adjusted Estimation for an Overall Treatment Difference with Data from a Randomized Comparative Clinical Trial
Tian, Lu
Cai, Tianxi
Zhao, Lihui
Wei, L. J.
2011-07-18T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper132
https://biostats.bepress.com/context/harvardbiostat/article/1139/viewcontent/eff_bepress.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
ANCOVA; cross validation; efficiency augmentation; Mayo PBC data; semi-parametric efficiency
Biostatistics
Clinical Trials
Multivariate Analysis
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1017
2005-09-13T14:43:31Z
publication:harvardbiostat
A Nonstationary Negative Binomial Time Series with Time-Dependent Covariates: Enterococcus Counts in Boston Harbor
Houseman, E. Andres
Coull, Brent
Shine, James P.
Boston Harbor has had a history of poor water quality, including contamination by enteric pathogens. We conduct a statistical analysis of data collected by the Massachusetts Water Resources Authority (MWRA) between 1996 and 2002 to evaluate the effects of court-mandated improvements in sewage treatment. Motivated by the ineffectiveness of standard Poisson mixture models and their zero-inflated counterparts, we propose a new negative binomial model for time series of Enterococcus counts in Boston Harbor, where nonstationarity and autocorrelation are modeled using a nonparametric smooth function of time in the predictor. Without further restrictions, this function is not identifiable in the presence of time-dependent covariates; consequently we use a basis orthogonal to the space spanned by the covariates and use penalized quasi-likelihood (PQL) for estimation. We conclude that Enterococcus counts were greatly reduced near the Nut Island Treatment Plant (NITP) outfalls following the transfer of wastewaters from NITP to the Deer Island Treatment Plant (DITP) and that the transfer of wastewaters from Boston Harbor to the offshore diffusers in Massachusetts Bay reduced the Enterococcus counts near the DITP outfalls.
2005-09-13T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper17
https://biostats.bepress.com/context/harvardbiostat/article/1017/viewcontent/msA04_079_R1Houseman.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
B-splines
Enteroccoccus
Fourier series
Penalized spline
Poisson-gamma
Orthogonal Basis
Categorical Data Analysis
Epidemiology
Longitudinal Data Analysis and Time Series
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1020
2005-01-03T18:17:14Z
publication:harvardbiostat
Robust Inferences For Covariate Effects On Survival Time With Censored Linear Regression Models
Leon, Larry
Cai, Tianxi
Wei, L. J.
Various inference procedures for linear regression models with censored failure times have been studied extensively. Recent developments on efficient algorithms to implement these procedures enhance the practical usage of such models in survival analysis. In this article, we present robust inferences for certain covariate effects on the failure time in the presence of "nuisance" confounders under a semiparametric, partial linear regression setting. Specifically, the estimation procedures for the regression coefficients of interest are derived from a working linear model and are valid even when the function of the confounders in the model is not correctly specified. The new proposals are illustrated with two examples and their validity for cases with practical sample sizes is demonstrated via a simulation study.
2005-01-03T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper20
https://biostats.bepress.com/context/harvardbiostat/article/1020/viewcontent/Robust.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Censored linear regression; Partial linear model; Resampling method; Rank estimation
Numerical Analysis and Computation
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1024
2006-10-13T19:31:47Z
publication:harvardbiostat
Bayesian Hidden Markov Modeling of Array CGH Data
Guha, Subharup
Li, Yi
Neuberg, Donna
Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for algorithms that can identify gains and losses in the number of copies based on statistical considerations, rather than merely detect trends in the data.
We adopt a Bayesian approach, relying on the hidden Markov model to account for the inherent dependence in the intensity ratios. Posterior inferences are made about gains and losses in copy number. Localized amplifications (associated with oncogene mutations) and deletions (associated with mutations of tumor suppressors) are identified using posterior probabilities. Global trends such as extended regions of altered copy number are detected. Since the posterior distribution is analytically intractable, we implement a Metropolis-within-Gibbs algorithm for efficient simulation-based inference. Publicly available data on pancreatic adenocarcinoma, glioblastoma multiforme and breast cancer are analyzed, and comparisons are made with some widely-used algorithms to illustrate the reliability and success of the technique.
2006-10-13T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper24
https://biostats.bepress.com/context/harvardbiostat/article/1024/viewcontent/guha.cgh.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Amplifications
Cancer
Deletions
DNA
Copy number
Genomic alterations
Intensity ratios
MCMC
Tumor
Longitudinal Data Analysis and Time Series
Multivariate Analysis
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1045
2006-05-25T19:25:53Z
publication:harvardbiostat
Posterior Simulation in the Generalized Linear Model with Semiparmetric Random Effects
Guha, Subharup
Generalized linear mixed models with semiparametric random effects are useful in a wide variety of Bayesian applications. When the random effects arise from a mixture of Dirichlet process (MDP) model, normal base measures and Gibbs sampling procedures based on the Pólya urn scheme are often used to simulate posterior draws. These algorithms are applicable in the conjugate case when (for a normal base measure) the likelihood is normal. In the non-conjugate case, the algorithms proposed by MacEachern and Müller (1998) and Neal (2000) are often applied to generate posterior samples. Some common problems associated with simulation algorithms for non-conjugate MDP models include convergence and mixing difficulties.
This paper proposes an algorithm based on the Pólya urn scheme that extends the Gibbs sampling algorithms to non-conjugate models with normal base measures and exponential family likelihoods. The algorithm proceeds by making Laplace approximations to the likelihood function, thereby reducing the procedure to that of conjugate normal MDP models. To ensure the validity of the stationary distribution in the non-conjugate case, the proposals are accepted or rejected by a Metropolis-Hastings step. In the special case where the data are normally distributed, the algorithm is identical to the Gibbs sampler.
2006-05-25T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper42
https://biostats.bepress.com/context/harvardbiostat/article/1045/viewcontent/technical.report.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Biostatistics
Numerical Analysis and Computation
oai:biostats.bepress.com:harvardbiostat-1047
2006-06-01T15:40:10Z
publication:harvardbiostat
PLASQ: A Generalized Linear Model-Based Procedure to Determine Allelic Dosage ini Cancer Cells from SNP Array Data
LaFramboise, Thomas
Harrington, David P.
Weir, Barbara A.
2006-06-01T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper44
https://biostats.bepress.com/context/harvardbiostat/article/1047/viewcontent/LaFramboiseRevision.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Bioinformatics
Computational Biology
oai:biostats.bepress.com:harvardbiostat-1050
2006-07-24T18:50:10Z
publication:harvardbiostat
Mixed Multiplicative Factor Analysis Model for Air Pollution Exposure Assessment
Nikolov, Margaret C.
Coull, Brent A
Catalano, Paul J.
Godleski, John J.
2006-07-24T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper47
https://biostats.bepress.com/context/harvardbiostat/article/1050/viewcontent/NikolovP2.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Multivariate Analysis
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1054
2006-08-10T13:11:44Z
publication:harvardbiostat
Bayesian Smoothing of Irregularly-spaced Data Using Fourier Basis Functions
Paciorek, Christopher J.
2006-08-10T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper49
https://biostats.bepress.com/context/harvardbiostat/article/1054/viewcontent/paci.2006.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Bayesian statistics; Fourier basis; FFT; geostatistics; generalized linear mixed model; generalized additive model; Markov chain Monte Carlo; spatial statistics; spectral representation
Numerical Analysis and Computation
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1083
2008-02-25T14:57:50Z
publication:harvardbiostat
Empirical Null and False Discovery Rate Inference for Exponential Families
Schwartzman, Armin
2008-02-25T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper77
https://biostats.bepress.com/context/harvardbiostat/article/1083/viewcontent/empNull2008.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
multiple testing; multiple comparisons; mixture models; Poisson regression; genome-wide association
Bioinformatics
Computational Biology
Multivariate Analysis
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1098
2008-10-29T13:15:22Z
publication:harvardbiostat
Calibrating Parametric Subject-specific Risk Estimation
Cai, Tianxi
Tian, Lu
Uno, Hajime
Solomon, Scott D.
Wei, L. J.
2008-10-29T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper92
https://biostats.bepress.com/context/harvardbiostat/article/1098/viewcontent/CalibRisk_V1.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Cardiovascular diseases; Cox's model; nonparametric functional estimation; risk index; ROC analysis; survival analysis
Biostatistics
Clinical Epidemiology
Disease Modeling
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1104
2009-03-10T13:39:00Z
publication:harvardbiostat
Analysis of Randomized Comparative Clinical Trial Data for Personalized Treatment Selections
Cai, Tianxi
Tian, Lu
Wong, Peggy H
Wei, L. J.
2009-03-10T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper97
https://biostats.bepress.com/context/harvardbiostat/article/1104/viewcontent/SubgroupTrt_V1.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Cross-validation; HIV-infection; Nonparametric function estimation; Personalized medicine; Subgroup analysis
Biostatistics
Disease Modeling
Epidemiology
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1105
2009-03-24T13:22:53Z
publication:harvardbiostat
The Importance of Scale for Spatial-confounding Bias and Precision of Spatial Regression Estimators
Paciorek, Christopher J.
Increasingly, regression models are used when residuals are spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When the spatial residual is induced by an unmeasured confounder, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias depends on the spatial scales of the covariate and the residual; bias is reduced only when there is variation in the covariate at a scale smaller than the scale of the unmeasured confounding. I also discuss how the scales of the residual and the covariate affect efficiency and uncertainty estimation when the residuals can be considered independent of the covariate. In an application on the association between black carbon particulate matter air pollution and birth weight, controlling for large-scale spatial variation appears to reduce bias from unmeasured confounders, while increasing uncertainty in the estimated pollution effect.
2009-03-24T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper98
https://biostats.bepress.com/context/harvardbiostat/article/1105/viewcontent/paci.2009.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Epidemiology
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1107
2009-05-07T14:47:33Z
publication:harvardbiostat
Estimating Subject-Specific Dependent Competing Risk Profile with Censored Event Time Observations
Li, Yi
Tian, Lu
Wei, L. J.
2009-05-07T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper100
https://biostats.bepress.com/context/harvardbiostat/article/1107/viewcontent/jasa_ltw_1.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Local likelihood function; nonparametric function estimation; perturbation-resampling method; Risk index score
Biostatistics
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1116
2009-11-17T15:35:22Z
publication:harvardbiostat
A New Class of Minimum Power Divergence Estimators with Applications to Cancer Surveillance
Martin, Nirian
Li, Yi
2009-11-17T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper109
https://biostats.bepress.com/context/harvardbiostat/article/1116/viewcontent/AnnalsAppliedStatisticsMartinLi.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Minimum power divergence estimators
Age-adjusted cancer rates
Annual percent change (APC)
Trends
Poisson sampling
Categorical Data Analysis
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1117
2009-11-20T21:13:28Z
publication:harvardbiostat
Survival Analysis with Error-prone Time-varying Covariates: A Risk Set Calibration Approach
Liao, Xiaomei
Zucker, David M.
Li, Yi
spiegelman, donna
2009-11-20T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper110
https://biostats.bepress.com/context/harvardbiostat/article/1117/viewcontent/rrcpaper_harvard_posting.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Epidemiology
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1053
2006-08-09T18:57:13Z
publication:harvardbiostat
Predicting Future Responses Based on Possibly Misspecified Working Models
Cai, Tianxi
Tian, Lu
Solomon, Scott D
Wei, L.J.
2006-08-09T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper48
https://biostats.bepress.com/context/harvardbiostat/article/1053/viewcontent/cai_tian_solomon.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Heterogeneous regression; K-fold cross validation; misspecified regression model; optimal prediction region; prediction error rate
Biostatistics
Statistical Methodology
Statistical Models
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1118
2011-04-08T20:10:48Z
publication:harvardbiostat
Principled Sure Independence Screening for Cox Models with Ultra-high-dimensional Covariates
Zhao, Sihai Dave
Li, Yi
2010-07-19T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper111
https://biostats.bepress.com/context/harvardbiostat/article/1118/viewcontent/Zhao_and_Li_4_2011.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Principled sure independence screening; Multiple myeloma; Variable selection; Sure independence screening; Cox model; Ultra-high-dimensional covariates
Biostatistics
Microarrays
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1121
2010-02-11T15:26:27Z
publication:harvardbiostat
Modeling Dependent Gene Expression
Telesca, Donatello
Muller, Peter
Parmigiani, Giovanni
Freedman, Ralph S.
2010-02-11T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper114
https://biostats.bepress.com/context/harvardbiostat/article/1121/viewcontent/DepPOE3.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Conditional independence; Microarray data; Probability of expression; Probit models; Reciprocal graphs; Reversible jumps MCMC
Bioinformatics
Computational Biology
oai:biostats.bepress.com:harvardbiostat-1131
2010-11-17T21:01:53Z
publication:harvardbiostat
Improving the Power of Chronic Disease Surveillance by Incorporating Residential History
Manjourides, Justin
Pagano, Marcello
2010-11-17T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper124
https://biostats.bepress.com/context/harvardbiostat/article/1131/viewcontent/ResidentialHistory_WorkingPaper.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1132
2011-03-07T20:38:18Z
publication:harvardbiostat
Estimating Subject-Specific Treatment Differences for Risk-Benefit Assessment with Competing Risk Event-Time Data
Claggett, Brian
Zhao, Lihui
Tian, Lu
Castagno, Davide
Wei, L. J.
2011-03-07T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper125
https://biostats.bepress.com/context/harvardbiostat/article/1132/viewcontent/bepress_bc_lz_ljw_lt_dc_mar7.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
clinical trail; Cox model; nonparametric estimation; presonalized medicine; perturbation-resampling method; stratified medicine; subgroup analysis; survival analysis
Clinical Trials
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1154
2012-03-12T15:33:47Z
publication:harvardbiostat
Robustness of Measures of Interaction to Unmeasured Confounding
Tchetgen Tchetgen, Eric J
VanderWeele, Tyler J
2012-03-12T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper146
https://biostats.bepress.com/context/harvardbiostat/article/1154/viewcontent/tech_report___3_12_12.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Biostatistics
oai:biostats.bepress.com:harvardbiostat-1006
2003-10-30T16:39:54Z
publication:harvardbiostat
Semi-parametric Box-Cox Power Transformation Models for Censored Survival Observations
Cai, Tianxi
Tian, Lu
Wei, L. J.
2003-10-30T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper6
https://biostats.bepress.com/context/harvardbiostat/article/1006/viewcontent/cai_tian.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Numerical Analysis and Computation
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1012
2004-05-06T17:32:37Z
publication:harvardbiostat
One- and Two-Sample Nonparametric Inference Procedures in the Presence of Dependent Censoring
Park, Yuhyun
Tian, Lu
Wei, L. J.
2004-04-13T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper12
https://biostats.bepress.com/context/harvardbiostat/article/1012/viewcontent/park_tian1.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Competing risks; Martingale; Simultaneous confidence interval; Sensitivity analysis; Survival analysis
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1025
2005-09-22T19:57:21Z
publication:harvardbiostat
Semiparametric Estimation in General Repeated Measures Problems
Lin, Xihong
Carroll, Raymond J
This paper considers a wide class of semiparametric problems with a parametric part for some covariate effects and repeated evaluations of a nonparametric function. Special cases in our approach include marginal models for longitudinal/clustered data, conditional logistic regression for matched case-control studies, multivariate measurement error models, generalized linear mixed models with a semiparametric component, and many others. We propose profile-kernel and backfitting estimation methods for these problems, derive their asymptotic distributions, and show that in likelihood problems the methods are semiparametric efficient. While generally not true, with our methods profiling and backfitting are asymptotically equivalent. We also consider pseudolikelihood methods where some nuisance parameters are estimated from a different algorithm. The proposed methods are evaluated using simulation studies and applied to the Kenya hemoglobin data.
2005-09-06T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper25
https://biostats.bepress.com/context/harvardbiostat/article/1025/viewcontent/kernel_like.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Clustered/longitudinal data; Generalized estimating equations; Generalized linear mixed models; Kernel method
Longitudinal Data Analysis and Time Series
Multivariate Analysis
Statistical Methodology
Statistical Theory
oai:biostats.bepress.com:harvardbiostat-1028
2005-09-06T19:52:52Z
publication:harvardbiostat
Inference on Survival Data with Covariate Measurement Error - An Imputation-based Approach
Li, Yi
Ryan, Louise
We propose a new method for fitting proportional hazards models with error-prone covariates. Regression coefficients are estimated by solving an estimating equation that is the average of the partial likelihood scores based on imputed true covariates. For the purpose of imputation, a linear spline model is assumed on the baseline hazard. We discuss consistency and asymptotic normality of the resulting estimators, and propose a stochastic approximation scheme to obtain the estimates. The algorithm is easy to implement, and reduces to the ordinary Cox partial likelihood approach when the measurement error has a degenerative distribution. Simulations indicate high efficiency and robustness. We consider the special case where error-prone replicates are available on the unobserved true covariates. As expected, increasing the number of replicate for the unobserved covariates increases efficiency and reduces bias. We illustrate the practical utility of the proposed method with an Eastern Cooperative Oncology Group clinical trial where a genetic marker, c-myc expression level, is subject to measurement error.
2005-09-01T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper28
https://biostats.bepress.com/context/harvardbiostat/article/1028/viewcontent/Inference_on_Survival_Data_with_Covariate_Measurement_Error___An_Imputation_based_Approach.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1029
2005-09-13T16:37:29Z
publication:harvardbiostat
Simultaneous and Exact Interval Estimates for the Contrast of Two Groups Based on an Extremely High Dimensional Response Variable: Application to Mass Spec Data Analysis
Park, Yuhyun
Downing, Sean R.
Li, Cheng, Dr.
Hahn, William C.
Kantoff, Philip W.
Wei, L. J.
2005-09-09T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper29
https://biostats.bepress.com/context/harvardbiostat/article/1029/viewcontent/cb_park_downing.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Bioinformatics
Computational Biology
oai:biostats.bepress.com:harvardbiostat-1032
2005-10-07T17:04:42Z
publication:harvardbiostat
Computational Techniques for Spatial Logistic Regression with Large Datasets
Paciorek, Christopher J.
Ryan, Louise
In epidemiological work, outcomes are frequently non-normal, sample sizes may be large, and effects are often small. To relate health outcomes to geographic risk factors, fast and powerful methods for fitting spatial models, particularly for non-normal data, are required. We focus on binary outcomes, with the risk surface a smooth function of space. We compare penalized likelihood models, including the penalized quasi-likelihood (PQL) approach, and Bayesian models based on fit, speed, and ease of implementation.
A Bayesian model using a spectral basis representation of the spatial surface provides the best tradeoff of sensitivity and specificity in simulations, detecting real spatial features while limiting overfitting and being more efficient computationally than other Bayesian approaches. One of the contributions of this work is further development of this underused representation. The spectral basis model outperforms the penalized likelihood methods, which are prone to overfitting, but is slower to fit and not as easily implemented. Conclusions based on a real dataset of cancer cases in Taiwan are similar albeit less conclusive with respect to comparing the approaches.
The success of the spectral basis with binary data and similar results with count data suggest that it may be generally useful in spatial models and more complicated hierarchical models.
2005-10-07T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper32
https://biostats.bepress.com/context/harvardbiostat/article/1032/viewcontent/paci.ryan.2005.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Bayesian statistics; Fourier basis; FFT; generalized linear mixed model; geostatistics; spatial statistics
Epidemiology
Numerical Analysis and Computation
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1035
2005-12-06T15:18:38Z
publication:harvardbiostat
Model Checking for ROC Regression Analysis
Cai, Tianxi
Zheng, Yingye
The Receiver Operating Characteristic (ROC) curve is a prominent tool for characterizing the accuracy of continuous diagnostic test. To account for factors that might invluence the test accuracy, various ROC regression methods have been proposed. However, as in any regression analysis, when the assumed models do not fit the data well, these methods may render invalid and misleading results. To date practical model checking techniques suitable for validating existing ROC regression models are not yet available. In this paper, we develop cumulative residual based procedures to graphically and numerically assess the goodness-of-fit for some commonly used ROC regression models, and show how specific components of these models can be examined within this framework. We derive asymptotic null distributions for the residual process and discuss resampling procedures to approximate these distributions in practice. We illustrate our methods with a dataset from the Cystic Fibrosis registry.
2005-12-06T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper34
https://biostats.bepress.com/context/harvardbiostat/article/1035/viewcontent/mcheck_V2.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Cumulative Residual
Diagnostic Accuracy
Generalized Linear Model
Model Checking
ROC Regression
Biostatistics
Disease Modeling
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1034
2005-11-07T20:23:43Z
publication:harvardbiostat
Model Evaluation Based on the Distribution of Estimated Absolute Prediction Error
Tian, Lu
Cai, Tianxi
Goetghebeur, Els
Wei, L. J.
The construction of a reliable, practically useful prediction rule for future response is heavily dependent on the "adequacy" of the fitted regression model. In this article, we consider the absolute prediction error, the expected value of the absolute difference between the future and predicted responses, as the model evaluation criterion. This prediction error is easier to interpret than the average squared error and is equivalent to the mis-classification error for the binary outcome. We show that the distributions of the apparent error and its cross-validation counterparts are approximately normal even under a misspecified fitted model. When the prediction rule is "unsmooth", the variance of the above normal distribution can be estimated well via a perturbation-resampling method. We also show how to approximate the distribution of the difference of the estimated prediction errors from two competing models. With two real examples, we demonstrate that the resulting interval estimates for prediction errors provide much more information about model adequacy than the point estimates alone.
2005-11-07T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper35
https://biostats.bepress.com/context/harvardbiostat/article/1034/viewcontent/asa1_tianxi.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
0.632 resampling; Bootstrap; K-fold cross-validation; Model and variable selections; Perturbation-resampling; Prediction
Biostatistics
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1041
2006-03-10T14:55:05Z
publication:harvardbiostat
Evaluating Prediction Rules for t-Year Survivors With Censored Regression Models
Uno, Hajime
Cai, Tianxi
Tian, Lu
Wei, L.J.
Suppose that we are interested in establishing simple, but reliable rules for predicting future t-year survivors via censored regression models. In this article, we present inference procedures for evaluating such binary classification rules based on various prediction precision measures quantified by the overall misclassification rate, sensitivity and specificity, and positive and negative predictive values. Specifically, under various working models we derive consistent estimators for the above measures via substitution and cross validation estimation procedures. Furthermore, we provide large sample approximations to the distributions of these nonsmooth estimators without assuming that the working model is correctly specified. Confidence intervals, for example, for the difference of the precision measures between two competing rules can then be constructed. All the proposals are illustrated with two real examples and their finite sample properties are evaluated via a simulation study.
2006-03-10T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper38
https://biostats.bepress.com/context/harvardbiostat/article/1041/viewcontent/asa_uno.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
cross validation; gene expression; model selection; positive and negative predictive values; prediction error; ROC curve; survival analysis
Statistical Methodology
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1042
2006-04-06T17:56:27Z
publication:harvardbiostat
Selecting 'Significant' Differentially Expressed Genes from the Combined Perspective of the Null and the Alternative
Moerkerke, Beatrijs
Goetghebeur, Els
2006-04-01T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper39
https://biostats.bepress.com/context/harvardbiostat/article/1042/viewcontent/Moerkerke2.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
alternative p-values
balanced testing
gene expression
Bioinformatics
Computational Biology
oai:biostats.bepress.com:harvardbiostat-1061
2006-09-14T18:04:26Z
publication:harvardbiostat
Spatial Cluster Detection for Censored Outcome Data
Cook, Andrea J
Gold, Diane
Li, Yi
2006-09-13T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper56
https://biostats.bepress.com/context/harvardbiostat/article/1061/viewcontent/censoredbiometricsFINAL.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Asthma; Cluster Detection; Cumulative Residuals; Martingales; Spatial Scan Statistic
Epidemiology
Statistical Methodology
Statistical Models
Statistical Theory
Survival Analysis
oai:biostats.bepress.com:harvardbiostat-1075
2007-08-13T15:10:27Z
publication:harvardbiostat
Effectively Combining Independent 2 x 2 Tables for Valid Inferences in Meta Analysis with all Available Data but no Artificial Continuity Corrections for Studies with Zero Events and its Application to the Analysis of Rosiglitazone's Cardiovascular Disease Related Event Data
Tian, Lu
Cai, Tianxi
Piankov, Nikita
Cremieux, Pierre-Yves
Wei, L. J.
2007-08-13T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper69
https://biostats.bepress.com/context/harvardbiostat/article/1075/viewcontent/MetaBpress.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Meta analysis; cardiovascular toxicity; combining 2 x 2 tables; continuity correction for zero events
Type 2 Diabetes
Biostatistics
Clinical Trials
oai:biostats.bepress.com:harvardbiostat-1095
2008-11-04T14:45:03Z
publication:harvardbiostat
Limitations of Remotely-sensed Aerosol as a Spatial Proxy for Fine Particulate Matter
Paciorek, Christopher J.
Liu, Yang
Recent research highlights the promise of remotely-sensed aerosol optical depth (AOD) as a proxy for ground-level PM2.5. Particular interest lies in the information on spatial heterogeneity potentially provided by AOD, with important application to estimating and monitoring pollution exposure for public health purposes. Given the temporal and spatio-temporal correlations reported between AOD and PM2.5 , it is tempting to interpret the spatial patterns in AOD as reflecting patterns in PM2.5 . Here we find only limited spatial associations of AOD from three satellite retrievals with PM2.5 over the eastern U.S. at the daily and yearly levels in 2004. We then use statistical modeling to show that the patterns in monthly average AOD poorly reflect patterns in PM2.5 because of systematic, spatially-correlated error in AOD as a proxy for PM2.5 . Furthermore, when we include AOD as a predictor of monthly PM2.5 in a statistical prediction model, AOD provides little additional information to improve predictions of PM2.5 when included in a model that already accounts for land use, emission sources, meteorology and regional variability. These results suggest caution in using spatial variation in AOD to stand in for spatial variation in ground-level PM2.5 in epidemiological analyses and indicate that when PM2.5 monitoring is available, careful statistical modeling outperforms the use of AOD.
2008-09-22T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper89
https://biostats.bepress.com/context/harvardbiostat/article/1095/viewcontent/paci.liu.2008.tr.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Multivariate Analysis
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1096
2008-10-08T15:36:35Z
publication:harvardbiostat
A Functional Random Effects Model for Flexible Assessment of Susceptibility in Longitudinal Designs
Coull, Brent A
2008-10-08T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper90
https://biostats.bepress.com/context/harvardbiostat/article/1096/viewcontent/fre_submitted.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
latent variable; particulate matter; Laird-Ware model; random intercept - random slope model; semiparametric regression; penalized spline
Longitudinal Data Analysis and Time Series
oai:biostats.bepress.com:harvardbiostat-1094
2008-09-15T17:40:04Z
publication:harvardbiostat
Expanded Technical Report: Mapping Ancient Forests: Bayesian Inference for Spatio-temporal Trends in Forest Composition Using the Fossil Pollen Proxy Record
Paciorek, Christopher J.
McLachlan, Jason S
2008-09-11T07:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper88
https://biostats.bepress.com/context/harvardbiostat/article/1094/viewcontent/paciorek_trees.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
Dirichlet-multinomial; Gaussian process; paleoecology; radial basis functions; smoothing; spatial statistics
Statistical Models
oai:biostats.bepress.com:harvardbiostat-1099
2008-11-12T16:37:04Z
publication:harvardbiostat
A New Class of Rank Tests for Interval-censored Data
Gomez, Guadalupe
Oller Pique, Ramon
2008-11-12T08:00:00Z
text
application/pdf
https://biostats.bepress.com/harvardbiostat/paper93
https://biostats.bepress.com/context/harvardbiostat/article/1099/viewcontent/GomezOller08.pdf
Harvard University Biostatistics Working Paper Series
Collection of Biostatistics Research Archive
interval-censored data; treatment comparison; weighted log-rank test; permutation test
Biostatistics
Survival Analysis
56534/oai_dc/100//