open access publication

Article, 2024

CONIPHER: a computational framework for scalable phylogenetic reconstruction with error correction

Nature Protocols, ISSN 1754-2189, Volume 19, 1, Pages 159-183, 10.1038/s41596-023-00913-9

Contributors

Grigoriadis K. 0000-0002-8435-0176 [1] [2] Huebner A. 0000-0001-8009-4999 [1] [2] Bunkum A. 0000-0002-7753-9421 [1] Colliver E. 0000-0002-1497-0740 [2] Frankell A.M. 0000-0002-0341-7878 [1] [2] Hill M.S. 0000-0003-0718-8934 [2] Thol K. 0000-0002-4624-118X [1] Birkbak N.J. 0000-0003-1613-9587 [1] [2] [3] [4] Swanton C. 0000-0002-4299-3018 (Corresponding author) [1] [2] [5] Zaccaria S. 0000-0002-5265-7392 (Corresponding author) [1] McGranahan N. 0000-0001-9537-4045 (Corresponding author) [1]

Affiliations

  1. [1] University College London
  2. [NORA names: United Kingdom; Europe, Non-EU; OECD];
  3. [2] The Francis Crick Institute
  4. [NORA names: United Kingdom; Europe, Non-EU; OECD];
  5. [3] Aarhus University
  6. [NORA names: AU Aarhus University; University; Denmark; Europe, EU; Nordic; OECD];
  7. [4] Aarhus University Hospital
  8. [NORA names: Central Denmark Region; Hospital; Denmark; Europe, EU; Nordic; OECD];
  9. [5] University College London Hospitals
  10. [NORA names: United Kingdom; Europe, Non-EU; OECD]

Abstract

Intratumor heterogeneity provides the fuel for the evolution and selection of subclonal tumor cell populations. However, accurate inference of tumor subclonal architecture and reconstruction of tumor evolutionary histories from bulk DNA sequencing data remains challenging. Frequently, sequencing and alignment artifacts are not fully filtered out from cancer somatic mutations, and errors in the identification of copy number alterations or complex evolutionary events (e.g., mutation losses) affect the estimated cellular prevalence of mutations. Together, such errors propagate into the analysis of mutation clustering and phylogenetic reconstruction. In this Protocol, we present a new computational framework, CONIPHER (COrrecting Noise In PHylogenetic Evaluation and Reconstruction), that accurately infers subclonal structure and phylogenetic relationships from multisample tumor sequencing, accounting for both copy number alterations and mutation errors. CONIPHER has been used to reconstruct subclonal architecture and tumor phylogeny from multisample tumors with high-depth whole-exome sequencing from the TRACERx421 dataset, as well as matched primary-metastatic cases. CONIPHER outperforms similar methods on simulated datasets, and in particular scales to a large number of tumor samples and clones, while completing in under 1.5 h on average. CONIPHER enables automated phylogenetic analysis that can be effectively applied to large sequencing datasets generated with different technologies. CONIPHER can be run with a basic knowledge of bioinformatics and R and bash scripting languages.

Funders

  • National Institute for Health Research
  • UCLH Biomedical Research Centre
  • Stand Up To Cancer‐LUNGevity-American Lung Association Lung Cancer Interception
  • American Association for Cancer Research
  • Novo Nordisk Fonden
  • University College London
  • European Research Council
  • Royal Society
  • Cancer Research UK
  • Medical Research Council
  • Experimental Cancer Medicine Centre
  • Entertainment Industry Foundation
  • Rosetrees Trust
  • Cancer Research UK–University College London Centre
  • Horizon 2020 Framework Programme
  • CRUK Lung Cancer Centre of Excellence
  • Stoneygate Trusts
  • UCL Experimental Cancer Research Centre
  • Wellcome Trust
  • Francis Crick Institute
  • Mark Foundation For Cancer Research
  • Butterfield
  • Breast Cancer Research Foundation

Data Provider: Elsevier