Statistics Colloquium on Jun 04, 2013 (in the afternoon)

 

Tuesday, Jun 4, 2013 03:15 pm

Multivariate count data with censoring

Dimitris Karlis (Athens University of Economics and Busines)

Abstract:
Censoring  is  widely  used  for  survival  data.  With  count  data  it  is  often  the  case that  the
counts are not fully observed but we know that they may exceed a certain number leading
to right censored data. In the univariate case there are papers treating such data. The aim
of  the  present  work  is  to  exploit  models  for  multivariate  counts  with  censoring.  The
motivation  for  this  work  lies  on  modelling  the  number  of  renewals  of  subscription  on  a
large  number  of  distinct  magazines  of  the  same  publisher,  leading  to  multivariate  count
data,  with  right  censoring.  Note  that  only  non-informative  censoring  is  treated  in  this
work.  We  propose  a  model  based  on  copulas.  The  basic  idea  is  fully  explored  for  the
bivariate case. Interestingly application of copulas is easier when censoring occurs. Then
we  extend  to  the  multivariate  case.  For  this,  instead  of  writing  down  the  complicated
likelihood,  we  switch  to  a  composite  likelihood  approach.  Simulations  results  show  the
good behaviour of the approach in both the bivariate and the multivariate case. Real data
application is also provided.
 

Time: 15:15 - 16:00 Uhr

Location: Hörsaal HKW 4 (sogen. „Toaster“), Raum 503, Wüllnerstr. 1, 52062 Aachen

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Ioannis Ntzoufras (Athens University of Economics and Busines) (Joint work with D.Fouskakis and D.Draper)

Abstract:
In the context of the expected‐posterior prior (EPP) approach to Bayesian variable selection in linear
models,  we  combine  ideas  from  power‐prior  and  unit‐information‐prior  methodologies  to
simultaneously (a) produce a minimally‐informative prior and (b) diminish the effect of training
samples.  The  result  is  that  in  practice  our  power‐expected‐posterior  (PEP)  methodology  is
sufficiently   insensitive  to  the  size  n*  of  the  training  sample,  due  to  PEP's  unit‐information
construction, that one may take n* equal to the full ‐ data sample size $n$ and dispense with training
samples altogether. This promotes stability of the resulting Bayes factors, removes the arbitrariness
arising  from  individual  training‐sample  selections,  and  greatly  increases  computational  speed,
allowing many more models to be compared within a fixed CPU budget. In this we focus on Gaussian
linear  models  and  develop  our  PEP  method  under  two  different  baseline  prior  choices:  the
independence Jeffreys (or reference) prior, yielding the J‐PEP posterior, and the Zellner $g$‐prior,
leading to  Z‐PEP. The first is the usual choice in the literature related to our work, since it results in
an objective model‐selection technique, while the second simplifies and accelerates computations
due to its conjugate structure (this also provides significant computational acceleration with the
Jeffreys prior, because the J‐PEP posterior is a special case of the Z‐PEP posterior). We find that,
under the reference baseline prior, the asymptotics of PEP Bayes factors are equivalent to those of
Schwartz's BIC criterion, ensuring consistency of the PEP approach to model selection. We compare
the performance of our method, in simulation studies and a real example involving prediction of air‐
pollutant concentrations from meteorological covariates, with that of a variety of previously‐defined
variants on Bayes factors for objective variable selection. Our PEP prior, due to its unit‐information
structure, leads to a variable‐selection procedure that (1) is systematically more parsimonious than
the basic EPP with minimal training sample, while sacrificing no desirable performance characteristics
to  achieve  this  parsimony;  (2)  is  robust  to  the  size  of  the  training  sample,  thus  enjoying  the
advantages described above arising from the avoidance of training samples altogether; and (3)
identifies maximum‐a‐posteriori models that achieve good out‐of‐sample predictive performance.
Moreover, PEP priors are diffuse even when n is not much larger than the number of covariates p, a
setting in which EPPs can be far more informative than intended.
 

Time: 16:15 - 17:00 Uhr

Location: Hörsaal HKW 4 (sogen. „Toaster“), Raum 503, Wüllnerstr. 1, 52062 Aachen