This is the code accompanying the paper "Modelling uncertainty in Correspondence Analysis: a Bayesian framework for imputation and credibility ellipses" by Nils Müller-Scheeßel, Martin Hinz and Andrea Göhring.
Correspondence Analysis (CA) is widely used in archaeology to explore associations in contingency tables and to visualize underlying structural gradients. While bootstrapped confidence regions have been proposed to express sampling uncertainty in CA ordinations, missing data – ubiquitous in archaeological datasets – are usually handled separately by imputation, without accounting for the additional uncertainty introduced. As a consequence, combined workflows of imputation followed by bootstrapped CA tend to underestimate total uncertainty. In this paper, we present a fully Bayesian approach that integrates contingency table imputation and CA within a single probabilistic framework. Using a Poisson log-linear model estimated via Markov Chain Monte Carlo, missing cell counts are treated as parameters and sampled jointly with observed data. Repeated CAs on posterior predictive samples allow the construction of credibility regions that simultaneously reflect sampling variation and imputation uncertainty. Instabilities in axis order and sign are addressed systematically using assignment algorithms. The method is demonstrated on two archaeological case studies: Romano-British small-find assemblages and European Iron Age sites with single human bones. Results show that, in the absence of missing data, Bayesian credibility ellipses and bootstrapped confidence ellipses are broadly comparable. When missing data are present, however, bootstrapped ellipses remain unrealistically narrow, whereas Bayesian credibility regions expand appropriately and reflect both data scarcity and imputation uncertainty. We conclude that Bayesian CA offers a coherent and conservative framework for analyzing incomplete archaeological contingency tables. Its main advantage lies in enabling the joint visualization of uncertainty arising from multiple analytical steps.
If you want to reproduce the analysis, create a local RStudio-project and then simply follow the numbered individual R-files, so start with 1_start.R …