Article,
CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction
Affiliations
- [1] University College London [NORA names: United Kingdom; Europe, Non-EU; OECD];
- [2] The Francis Crick Institute [NORA names: United Kingdom; Europe, Non-EU; OECD];
- [3] Aarhus University [NORA names: AU Aarhus University; University; Denmark; Europe, EU; Nordic; OECD];
- [4] Aarhus University Hospital [NORA names: Central Denmark Region; Hospital; Denmark; Europe, EU; Nordic; OECD];
- [5] University College London Hospitals [NORA names: United Kingdom; Europe, Non-EU; OECD]
Abstract
Intratumor heterogeneity provides the fuel for the evolution and selection of subclonal tumor cell populations. However, accurate inference of tumor subclonal architecture and reconstruction of tumor evolutionary histories from bulk DNA sequencing data remains challenging. Frequently, sequencing and alignment artifacts are not fully filtered out from cancer somatic mutations, and errors in the identification of copy number alterations or complex evolutionary events (e.g., mutation losses) affect the estimated cellular prevalence of mutations. Together, such errors propagate into the analysis of mutation clustering and phylogenetic reconstruction. In this Protocol, we present a new computational framework, CONIPHER (COrrecting Noise In PHylogenetic Evaluation and Reconstruction), that accurately infers subclonal structure and phylogenetic relationships from multisample tumor sequencing, accounting for both copy number alterations and mutation errors. CONIPHER has been used to reconstruct subclonal architecture and tumor phylogeny from multisample tumors with high-depth whole-exome sequencing from the TRACERx421 dataset, as well as matched primary-metastatic cases. CONIPHER outperforms similar methods on simulated datasets, and in particular scales to a large number of tumor samples and clones, while completing in under 1.5 h on average. CONIPHER enables automated phylogenetic analysis that can be effectively applied to large sequencing datasets generated with different technologies. CONIPHER can be run with a basic knowledge of bioinformatics and R and bash scripting languages.