Metrics details Current variant callers are not suitable for single-cell DNA sequencing, as they do not account for allelic dropout, false-positive errors and coverage nonuniformity. We developed Monovar (https://bitbucket.org/hamimzafar/monovar) a statistical method for detecting and genotyping single-nucleotide variants in single-cell data Monovar exhibited superior performance over standard algorithms on benchmarks and in identifying driver mutations and delineating clonal substructure in three different human tumor data sets Prices may be subject to local taxes which are calculated during checkout Download references and MD Anderson Knowledge Gap and Center for Genetics & Genomics is a Damon Runyon-Rachleff Innovator (DRR-25-13) is a Sabin Fellow and was supported by an NCI grant (RO1CA172652) Chapman and Dell Foundations and NCI (CA016672) Hamim Zafar and Yong Wang: These authors contributed equally to this work Department of Bioinformatics and Computational Biology analyzed the data and wrote the manuscript The authors declare no competing financial interests GATK HaplotypeCaller and Samtools were compared using single cell exome sequencing data generated from a normal isogenic fibroblast cell line in terms of SNV detection (a) Precision versus Detection Efficiency (Recall) and (b) SNV transition and transversion spectrum for FP errors Monovar and GATK HaplotypeCaller were compared in terms of (a) Precision and (b) Detection Efficiency (Recall) acquired via down-sampling the SKN2 SCS data The SNV detection (a) Precision and (b) Detection Efficiency of Monovar were measured by comparing SNVs detected from a set of datasets created by in silico intermixing of variable numbers of SKN2 and 12 TNBC cells with SNVs detected from SKN2 bulk sequencing data Supplementary Tables 1–6 and Supplementary Note 1 (PDF 2042 kb) Reprints and permissions Download citation Anyone you share the following link with will be able to read this content: a shareable link is not currently available for this article Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology A newly developed computer program is so sophisticated, it can spot DNA mutations in a single cancer cell That huge increase in accuracy could have a major impact on the way that doctors diagnose the disease and brings us closer to treatments that forego brute force methods like chemo and radiotherapy for a more personalised approach Whereas existing 'next-generation sequencing' (NGS) techniques measure genomes derived from millions of cells is capable of pinpointing important variations within tissue samples that would normally get lost in all the noise Monovar is built on technology known as single cell sequencing (SCS), which is used not just in cancer research, but also in neurobiology, microbiology, and immunology. The newer SCS method pulls genome data from individual cells and spot anomalies with a high degree of accuracy Monovar is capable of spotting very slight DNA variations known as single nucleotide variants (SNVs) which could help in the diagnosis of certain types of cancer That's a lot of acronyms to keep on top of but the upshot is that we should see improvements in cancer diagnosis and treatment thanks to the more accurate detection of these SNVs The Monovar algorithm essentially gives doctors more accurate data to work with and helps them to spot subtle differences they might not otherwise be aware of And it's all built on statistical analysis - the system is able to extract data from multiple single cells to discover SNVs and provide highly detailed genetic data on each, explains one of the team, Nicholas Navin from the University of Texas MD Anderson Cancer Centre The accurate detection of SNVs is critical for patient care because they affect how an individual develops a disease and responds to various drugs and vaccines These molecular variations are crucial in applying personalised medicines and treatments that are specifically tailored to the patient's body "Monovar is capable of analysing large-scale datasets and handling different whole-genome protocols, therefore it is well-suited for many types of studies," says one of the team, Ken Chen The Monovar program – which you can actually check out for yourself online – has been shown to be more accurate than standard algorithms at identifying mutations and variations according to the benchmark tests run by the team This isn't the first time that 'big data' and statistical analysis have been used in the fight against cancer. Labs across the world are collecting vast amounts of data on how cancers work and how they react to various treatments - data that can be used to refine our approach to them Let's hope cancer patients can be given more personalised treatments in the future based on much smarter data analysis The program has been described in Nature Methods Metrics details Single-cell omics technologies enable molecular characterization of diverse cell types and states but how the resulting transcriptional and epigenetic profiles depend on the cell’s genetic background remains understudied a computational tool to detect single-nucleotide variants (SNVs) from single-cell sequencing data Monopogen leverages linkage disequilibrium from external reference panels to identify germline SNVs and detects putative somatic SNVs using allele cosegregating patterns at the cell population level It can identify 100 K to 3 M germline SNVs achieving a genotyping accuracy of 95% together with hundreds of putative somatic SNVs Monopogen-derived genotypes enable global and local ancestry inference and identification of admixed samples It identifies variants associated with cardiomyocyte metabolic levels and epigenomic programs It also improves putative somatic SNV detection that enables clonal lineage tracing in primary human clonal hematopoiesis Monopogen brings together population genetics cell lineage tracing and single-cell omics to uncover genetic determinants of cellular processes the genetic ancestry of the samples and its contribution to cellular molecular traits are largely unexplored it is often necessary to resequence the study samples using bulk whole-genome sequencing (WGS)/whole-exome sequencing which requires additional sequencing efforts and costs Possible reasons for low variant detection are as follows: (1) the single-cell RNA sequencing (scRNA-seq) reads are usually enriched in specific genomic regions such as 5′ or 3′ end of genes; (2) genes are usually expressed in cell-type/state-specific patterns and thus are highly variable across genome regions leading to uneven sequencing depth distribution; (3) coverage is likely affected by allelic imbalance inherent in RNA profiles and (4) sequencing reads tend to have many errors due to technological infidelity Monopogen includes germline and putative somatic SNV calling modules Monopogen starts from individual bam files produced by single-cell sequencing technologies Sequencing reads with multiple alignment mismatches (default four) are removed Putative SNVs are identified sensitively from pooled pileup containing at least one nonreference read For SNVs present in the external reference panel (such as 1KG3) genotype likelihoods are further refined based on LD in the reference panel The loci showing persistent discordance are used to estimate a sequencing error model we identify putative somatic SNVs by focusing on ones if there is sufficient sequencing depth and alternative allele frequency (calibrated by a sequencing error model) The SVM module is designed to remove low-quality SNVs The variant calling metrics including the QS for calling The germline SNVs are considered as the positive training sets while the continuous de novo SNV chunks (>2 SNVs) that do not include any germline SNV are set as the negative sets The remaining de novo SNVs are considered as the test set The alleles observed at a de novo SNV site are statistically phased together with adjacent germline alleles to calculate an LD refinement score that estimates the percentage of cells in which the alleles do not cosegregate with neighboring germline alleles De novo SNVs with high LD refinement scores are classified as the putative somatic SNVs and their genotypes at the single cell/cluster level are inferred using Monovar Projection of study samples onto the HGDP enables genetic ancestry inference Genome-wide association study of cellular quantitative traits can be performed when there is sufficient sample size Lineage tracing at single cell or clonal level Monopogen is implemented in Python, automatically splitting the genome into small chunks (defined by the users), performing variant scan and LD refinement in massive parallelization for individual chunks and merging the results (Supplementary Note) Overall accuracy and SNV detection sensitivity (recall) in representative snRNA-seq (n = 4) sci-ATAC-seq data (n = 2) and scDNA-seq data (n = 1) using matched WGS data as the gold standard The x axis denotes the overall accuracy and y axis denotes the detection sensitivity (recall) The closer a dot is to the top-right corner the better the corresponding method has performed only the SNVs present in the 1KG3 were considered Median sequencing depth of SNVs found from snRNA-seq data (b) and sci-ATAC-seq data (c) over gene annotations The pie charts show the percentage of SNVs in each category Number of SNVs versus the number of cells in the retina data via downsampling Pearson’s correlations were applied to calculate the R and the P values Number of SNVs detected from seven single-cell sequencing datasets The sequencing coverage was calculated as the $L\times n/(3.2\times {10}^{9})$ where L is the read length and n is the total number of reads in one sample while each big dot is the mean value of a dataset The top ellipse covers samples from scATAC-seq data and the bottom ellipse samples from scRNA-seq data sequencing depth was higher in genes than in intergenic regions Off-target reads appear sufficiently leveraged to derive accurate genotypes through LD-based refinement In the colon sci-ATAC-seq data, Monopogen detected 752 K to 1.1 M germline SNVs, achieving a recall of 25%. In contrast, the recall for Samtools, GATK and FreeBayes was less than 12%. Strelka2 detected ~30% SNVs with an accuracy lower than 40%. Most (57.4%) of the SNVs from Monopogen were found in intergenic regions and 38.6% in gene regions (Fig. 2c) We also included two SNV callers cellSNP and scAllele that were designed for single-cell sequencing data cellSNP had the lowest SNV detection (<5%) and scAllele had the lowest accuracy (<10%) across three benchmarking datasets demonstrating the efficiency of LD-based genotyping refinement on challenging scenarios further demonstrating the robustness of Monopogen SNVs calling on various sequencing platforms demonstrating the possibility of distinguishing individuals from the same ancestry This indicates that the LD-based genotyping refinement from the commonly used 1KG3 panel did not over-correct genotypes on subpopulation or individual levels To demonstrate the utilization of Monopogen in establishing the link between genetic variants and cellular quantitative traits in a cell-type or cell-state-specific manner we characterize the genetic contribution to metabolic processes (such as ATP production) and epigenetic programs in healthy cardiomyocytes These relationships are usually disguised by previous bulk-based data analysis Variant calls were further merged for samples of paired modalities Ancestry admixture analysis using inferred genotypes shows that this cohort contains samples with diverse ancestry, which are as follows: European (71.1%), Asian (10.2%) and African (8.5%). Six samples appeared admixed (Supplementary Fig. 6a) Manhattan plot showing the association of SNVs with the GATA4 motif-based transcription factor activity level in cardiomyocytes The gray line denotes the P value threshold 10−6 Boxplot shows the difference in GATA4 activity level across the three genotypes of rs17745507 (one of the leading variants in ADAM12) the height of the box is given by the interquartile range (IQR) and the whiskers are given by 1.5× IQR we were able to reveal potential genetic determinants of cardiac health via metabolic and epigenomic trait mapping of cardiomyocytes Associations identified in this fashion may lead to a better understanding of the pathogenicity of noncoding variants in a cell-type-aware manner a,b, LD refinement scores on germline SNVs from the TNBC single-cell DNA data. It is shown with two-locus model in a and three-locus model in b. c, Evaluation of de novo SNVs from Monopogen by comparison with categories defined in matched bulk DNA sample (Methods) Distribution of LD refinement scores for de novo SNVs that are classified as germline and somatic SNVs from the bulk sample Boxplot displaying the relationship between LD refinement score and BAF the height of the box is given by the interquartile range (IQR) the whiskers are given by 1.5× IQR and outliers are given as points beyond the minimum or maximum whisker LD refinement scores on germline SNVs from the bone-marrow sample measured in single-cell RNA data It is shown with two-locus model in g and three-locus model in h the length of haplotypes is grouped into 13 bins (Methods) The y axis shows the mean value of LD refinement score within each bin together with the 95% confidence interval The total number of haplotypes used for evaluation is labeled at the right-bottom of each panel Number of SNVs detected in each step from Monopogen Heatmap displaying the detected percentage of putative somatic SNVs in each mtDNA clone (the sum of each row is 1) UMAPs displaying the cell types annotated in myeloid and erythroid lineages UMAPs displaying the mutated cell distribution for mtDNA variant 2593G:A (l) and three selected putative somatic SNVs from scRNA-seq (m) Heatmap displaying the detected percentage of putative somatic SNVs in each TRB clone UMAPs displaying the cell types annotated in T/NK cell lineages (o) the mutated cell distribution for TRB region CASAPNFGQELTYEQYF (p) and the putative somatic SNV chr20:2904623A:G (q) the mutated cell distribution for TRB region CASSQAGAANTEAFF (r) and the somatic SNV chr1:91689518A:G (s) there were 11 known oncogenes and 12 tumor suppressors The unknown SNVs from Monopogen may contain low-abundance somatic SNVs that were missed by matched bulk sequencing indicating these putative somatic SNVs may represent multiple T-cell clonotypes that have occurred from multipotent hematopoietic stem cells global and local ancestry inference can be reliably performed in studies that have only single-cell sequencing but not bulk sequencing or array-based genotyping data which greatly increases the chance of discovering genetic factors underlying diverse cellular quantitative traits and disease leveraging the power of having phased haplotypes from germline SNVs the LD refinement models applied at cell population level enabled us to substantially increase the accuracy of somatic SNV detection in sparse Although Monopogen can potentially detect putative somatic SNVs it is challenging to separate germline from truncal somatic SNVs whose BAFs are close to 0.5 those SNVs can be easily detected via bulk sequencing In the human heart left ventricle analysis we demonstrated the utilization of Monopogen-called genotypes to identify associations of ATP metabolism and GATA4 activity levels in one cell type such analysis can be extended to other cell types and cellular quantitative traits of interest that could be objectively measured such association analysis should be guided by strong prior knowledge to reduce the burden of multiple hypothesis testing with the increasing generation of sparse single-cell sequencing data and expansion of data modalities our work will become increasingly relevant for assessing the effects of genetic ancestry and discovering genetic mechanisms underlying complex traits in human populations and diseases Monopogen starts from individual bam files of single-cell sequencing data Reads with high alignment mismatches (default four mismatches) and lower mapping quality (default 20) are removed We first scan the putative SNVs in a sensitivity way Any loci are detected from pooled (across cells) read alignment from one sample wherever an alternative allele is found in at least one read For each candidate SNV locus m with observed sequencing data information d we record its genotype likelihoods (GL) that incorporate errors from base calling and alignment as For each locus m, we calculate the observed genotype as the one with the highest posterior probability from Eqs. (1) and (2) The final genotype of locus m is set as ${G}_{{m|H},d}$ if ${G}_{{m|H},d}={G}_{{m|d}}$ The heterozygous loci that are imputed to homozygotes are considered as sequencing errors (that is ${G}_{{m|H},d}=0$ and ${G}_{{m|d}}=\mathrm{1,2}$) We classify this discordance into 12 categories: The median BAF across all inconsistent loci in each category c is denoted as BAFc This is considered the threshold to separate the sequencing error from the true heterozygous SNVs with ${G}_{{m|H},d}={G}_{{m|d}}$ are retained as the germline SNVs (that is Others are only used to build the sequencing error model and are not included in the final genotyping call set we implement the following two filters: (1) the total sequencing depth filtering (default 100); and (2) BAF less than the threshold from the above sequencing error model one putative SNV genotyped as A/T with its BAF lower than ${{\max }}\,\{\mathrm{BA{F}_{AT\to AA},BA{F}_{AT\to TT}}\}$ is removed due to difficulties in separating true heterozygotes from sequencing errors The somatic SNVs calling includes the following two major modules: (1) removing low-quality SNVs using an SVM and (2) distinguishing somatic from germline SNVs using LD refinement models at the cell population level all detected germline SNVs overlapped with 1KG3 are considered as the positive set We define de novo SNVs found consecutively (default >2 SNVs) in genomic chunks that do not contain any germline SNV as the negative set This is because the chance of only detecting multiple somatic SNVs in one region without any germline SNVs is typically low due to the low average somatic mutation rate in most datasets SNVs calling quality metrics including quality score for calling variant distance bias for filtering splice-site artifacts Mann–Whitney U test of ratio of mapping quality and strand bias segregation-based metric and BAF are selected as features The model is trained using the svm function implemented in R package e1071 The de novo SNVs with a predicted probability of positive labels less than 0.5 are set as sequencing errors and excluded from downstream analysis The de novo SNVs passing the SVM filtering are further interrogated using the LD refinement models The LD refinement models assume that only two alleles are present in the cell population We first estimate the LD refinement scores on germline SNVs that quantify the degree of their LD taking into consideration widespread sparseness and allelic dropout in single-cell sequencing data We then implement germline LD patterns to statistically phase the observed alleles of de novo SNVs in the cell population We assume that the germline SNV block includes nm SNVs with genotype vector being $\left\{{G}_{1},{G}_{2},\cdots ,{G}_{{n}_{m}}\right\}$ Denote ${G}_{i}={A}_{i}^{1}|{A}_{i}^{2}$ The cell level genotype matrix G on these germline SNVs can be represented as not all adjacent germline SNVs are informative for LD refinement Here we first define a two-locus neighborhood index in cell j to identify informative germline SNV pairs as Illustration of two-locus neighborhood index can be seen in Supplementary Fig. 1b Denote ${{\mathscr{H}}}_{2}$ as the set including all two-locus neighborhoods We next group elements in ${{\mathscr{H}}}_{2}$ based on the distance of SNVs as The two-locus haplotype in ${{\mathcal{H}}}_{2}$ with allele cosegregated can be represented as the two-locus LD refinement score with physical distance being d is calculated as we first define the three-locus neighborhood index in cell j as The three-locus neighborhood means that the upper and lower SNVs detect the same allele. Illustration of three-locus neighborhood index can be seen in Supplementary Fig. 1b Denote ${{\mathscr{H}}}_{3}$ as the set including all three-locus neighborhoods We next group ${{\mathscr{H}}}_{3}$ based on the length of haplotype as The three-locus haplotype in ${{\mathscr{H}}}_{3}$ with allele cosegregated can be represented as the three-locus LD refinement score with physical distance being d is defined as The two-locus and three-locus LD refinement scores $p({\mathscr{H}}_{2}^{d}),\,p({\mathscr{H}}_{3}^{d})$ can largely represent the colocalization for neighboring SNVs on a DNA haplotype or RNA transcript at the cell population level the physical distance d is grouped into 13 bins with <100 bp We next phase the de novo SNVs based on germline SNVs Assume the genotype of de novo SNV s is ${A}_{s}^{1}/{A}_{s}^{2}$ and its adjacent germline SNV profile for cell j as follows: where ${{\mathrm{Neighb}}}_{2}\left(k,s,j\right)=1$ and ${{\mathrm{Neighb}}}_{2}\left(s,l,j\right)=1$ ${c}_{{sj}}^{1}$ and ${c}_{{sj}}^{2}$ are the number of reads supporting allele ${A}_{s}^{1}$ and ${A}_{s}^{2}$ it is difficult to detect allele ${A}_{s}^{1}$ and ${A}_{s}^{2}$ simultaneously in each cell we set $\left|{d}_{k}-{d}_{s}\right| < \left|{d}_{s}-{d}_{l}\right|$.The probability of phased genotype ${A}_{s}^{1}|{A}_{s}^{2}$ under two-locus model is To derive the probability of haplotype ${A}_{s}^{1}|{A}_{s}^{2}$ under three-locus model we need to search germline SNV k and l satisfying ${{\mathrm{Neighb}}}_{3}\left(k,s,l,j\right)=1$ The probability of phased genotype ${A}_{s}^{1}|{A}_{s}^{2}$ by combining two models is the probability of phased genotype ${A}_{s}^{1}|{A}_{s}^{2}$ for de novo SNV s across the cell population is the probability of phased genotype ${A}_{s}^{2}|{A}_{s}^{1}$ for de novo SNV s across the cell population is we have $p\left({A}_{s}^{1}|{A}_{s}^{2}\right)+p\left({A}_{s}^{2}|{A}_{s}^{1}\right)=1$ The genotype of s is set ${A}_{s}^{1}|{A}_{s}^{2}$ if $p\left({A}_{s}^{1}|{A}_{s}^{2}\right) > p\left({A}_{s}^{2}|{A}_{s}^{1}\right)$ and ${A}_{s}^{2}|{A}_{s}^{1}$ otherwise The LD refinement score ps is defined as ${p}_{s}={{\min }}\left\{p\left({A}_{s}^{1}|{A}_{s}^{2}\right),p\left({A}_{s}^{2}|{A}_{s}^{1}\right)\right\}$ The LD refinement score ps ranges from 0 to 0.5 It is closer to 0 for a germline SNV as it has strong LD with the adjacent germline SNVs sharing the same two haplotypes in all the cells The score is greater than 0 for a somatic SNV as the recently gained somatic allele cosegregates with germline alleles in only a subpopulation of cells SNVs with a larger LD refinement score are classified as putative somatic SNVs (default value 0.25) only reads covering these candidate loci are extracted and then split into different bam files based on their cluster identities Monovar can be run on these bam files (each is one cluster or cell type) with default parameter settings Seven single-cell samples in our study have matched WGS data that were treated as the gold standard only bi-allelic loci having at least one alternative allele (that is genotype is 0/1 or 1/1) were extracted from the two call sets denoting as N (Monopogen-called) and W (WGS-called) The sensitivity (recall) was defined as ${|N}\cap {W|}/{|W|}$ and specificity (precision) as $\frac{{|N}\cap {W|}}{{|N|}}$ The genotyping accuracy was defined as the fraction of identical genotypes in the $\left|N\cap W\right|$ overlapping SNVs The overall accuracy was defined as the specificity multiplied by the genotype accuracy The genotype concordance of the Monopogen-called genotype data versus the AIDA Illumina GSAv3 genotype data was computed by first counting the number of matching alleles between the Monopogen and the Illumina GSAv3 results for loci found in both sets The minimum possible concordance score per Monopogen calls (accounting for some match always being possible in the case of heterozygous genotypes) was subtracted and the resulting scores were then normalized against the number of loci evaluated two PCA coordinates were calculated as ${{\boldsymbol{Y}}}_{n\times K}$ and $\left[\begin{array}{c}{\widetilde{{\boldsymbol{Y}}}}_{n\times {K}^{{\prime} }}\\ {\widetilde{{\boldsymbol{y}}}}_{1\times {K}^{{\prime} }}\end{array}\right]\,({K}^{{\prime}}\ge K)$ by applying eigenvalue decomposition on the genetic relationship matrix (GRM) ${\boldsymbol{R}}{{\boldsymbol{R}}}^{T}$ and $\widetilde{{\boldsymbol{R}}}{\widetilde{{\boldsymbol{R}}}}^{T}$ Projection procrustes analysis was used to find an orthonormal projection matrix ${{\boldsymbol{A}}}_{{K}^{{\prime} }\times K}$ and an isotropic calling factor ρ such that ${{\Big|\Big|}\rho \widetilde{{\boldsymbol{Y}}}{\boldsymbol{A}}-{\boldsymbol{Y}}{\Big|\Big|}}_{F}^{2}$ is minimized where ${{||}.{||}}_{F}^{2}$ represents the square of Frobenius norm Once ${{\boldsymbol{A}}}_{{K}^{{\prime} }\times K}$ and ρ were solved the sample-specific PCA-projection coordinates on HGDP panel can be calculated as ${\boldsymbol{y}}=\rho \widetilde{{\boldsymbol{Y}}}{\boldsymbol{A}}$ The PC coordinates of $\left[\begin{array}{c}{{\boldsymbol{Y}}}_{n\times K}\\ {{\boldsymbol{y}}}_{1\times K}\end{array}\right]$ were used for PCA-projection visualization Monopogen-called genotypes were input to the PopPhased module with the following flags: -w 0.2 The RFMix output was collapsed into haploid bed files and ‘UNK’ or unknown ancestry was assigned where the posterior probability of a given ancestry was <0.90 These collapsed haploid tracts were used for local ancestry component visualization (segment size was set as 1 cM) The RFMix tool was also run on WGS genotypes from matched samples the ancestry component percentage for each source population was recorded The local ancestry consistency index was calculated as the correlation of the ancestry component vector between the two call sets There are 54 donors sequenced with snRNA-seq and 65 with snATAC-seq, among which 54 are paired. For the downstream association study, SNV calling of 54 snRNA-seq and 65 snATAC-seq samples were performed separately using Monopogen, followed by removing MAF < 10%. Variant calls were further merged for samples of paired modalities (Supplementary Table 4) Cell type annotation was performed by uploading all the cells of each sample to the online Azimuth heart database in Seurat V4 (ref. 19) Cells with predicted cell type probability scores lower than 0.9 were removed Only cells annotated as cardiomyocytes were extracted for the downstream association study The gene-level chromatin accessibility was derived using GeneActivity module by aggregating peaks in gene promoters plus upstream 2 kb The cell type annotation was also performed using the online Azimuth heart database under the same quality control criteria as in the snRNA-seq analysis GCTA22 was used to calculate a GRM among single-cell sequencing samples The association studies on ATP metabolism level and GATA4 activity level were performed using its fastGWA-mlm option with the input of GRM and covariates as the top five ancestry PCs Only variants with MAF > 10% were considered for association studies The inflation factor of Quantile–Quantile plots was calculated using the R package qqman to examine whether there is population stratification in our genome-wide scan Manhattan plot was used to show the P value across the whole genome with P = 10−5 as potential significant associations with cellular traits The significant loci were further grouped into bins based on their closest genes The nearest genes to significant loci were annotated the mpileup option was used to transform base calling and alignment information into the GL followed by variant calling using Bcftools The GATK was run using the HaplotypeCaller mode with default settings The P value lower than 0.01 was reported as enriched in the specific mtDNA clone The putative somatic SNVs were grouped based on whether they were enriched in the same mtDNA clone We then calculated the cellular concordance of each mtDNA clone as the number of cells detected in both the mtDNA clone and its matched somatic SNV group divided by the total number of cells in the mtDNA clone The overall concordance was the mean across all the mtDNA clones The same scheme was used to compare somatic SNVs against TRB/A regions Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article The scDNA-seq from the TNBC sample was downloaded from breast cancer study33 The single-cell profiles of 20 HBCA samples, 20 AIDA samples, and four retina samples were generated as part of the cell atlas and genetic ancestry networks organized by the Chan Zuckerberg Initiative. The 20 AIDA single-cell samples could be downloaded from https://data.humancellatlas.org/explore/projects/f0f89c14-7460-4bab-9d42-22228a91f185 The four retina single-cell samples could be downloaded from https://data.humancellatlas.org/explore/projects/f0f89c14-7460-4bab-9d42-22228a91f185 The 20 HBCA single-cell samples could be accessed through GSE195665 (https://navinlabcode.github.io/HumanBreastCellAtlas.github.io/dataAccess.html) Monopogen is available in open source at https://github.com/KChen-lab/Monopogen. Scripts for reproducing key analysis results are also available at https://github.com/KChen-lab/Monopogen GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues Large-scale cis-and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression Identification of context-dependent expression quantitative trait loci in whole blood Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs Single-cell RNA-seq reveals new types of human blood dendritic cells Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants An integrative approach for building personalized gene regulatory networks for precision medicine Population genetics meets single-cell sequencing The human cell atlas: from vision to reality The human tumor atlas network: charting tumor transitions across space and time at single-cell resolution Low-coverage sequencing: implications for design of complex trait association studies Using off-target data from whole-exome sequencing to improve genotyping accuracy association analysis and polygenic risk prediction Reliable identification of genomic variants from RNA-seq data The sequence alignment/map format and SAMtools The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data Monovar: single-nucleotide variant detection in single cells Integrated analysis of multimodal single-cell data RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference GCTA: a tool for genome-wide complex trait analysis Wnt signaling exerts an antiproliferative effect on adult cardiac progenitor cells through IGFBP3 F-box and leucine-rich repeat protein 22 is a cardiac-enriched F-box protein that regulates sarcomeric protein turnover and is essential for maintenance of contractile function in vivo Conserved N-terminal cysteine dioxygenases transduce responses to hypoxia in animals and plants Cardiac metabolism and its interactions with contraction Cardiac metabolism in heart failure: implications beyond ATP production Mutation in myosin heavy chain 6 causes atrial septal defect Interaction of Gata4 and Gata6 with Tbx5 is critical for normal cardiac development Combinatorial transcriptional control in blood stem/progenitor cells: genome-wide analysis of ten major transcriptional regulators Complex interdependence regulates heterotypic transcription factor distribution and coordinates cardiogenesis Cardiac hypertrophy is inhibited by antagonism of ADAM12 processing of HB-EGF: metalloproteinase inhibitors as a new therapy Breast tumours maintain a reservoir of subclonal diversity during expansion Mitochondrial variant enrichment from high-throughput single-cell RNA sequencing resolves clonal populations Target-enrichment strategies for next-generation sequencing Ancestry estimation and control of population stratification for sequence-based association studies Full-length RNA-seq from single cells using Smart-seq2 Clinical use of current polygenic risk scores may exacerbate health disparities Large-scale whole-genome sequencing of three diverse Asian populations in Singapore Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease Pan-cancer single-cell landscape of tumor-infiltrating T cells Advances and applications of single-cell sequencing technologies Lineage tracing meets single-cell omics: opportunities and challenges A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals Fast two-stage phasing of large-scale sequence data Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation Single-cell chromatin state analysis with Signac chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data A framework for variation discovery and genotyping using next-generation DNA sequencing data Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at arXiv https://doi.org/10.48550/arXiv.1207.3907 (2012) Strelka2: fast and accurate calling of germline and somatic variants Cellsnp-lite: an efficient tool for genotyping single cells scAllele: a versatile tool for the detection and analysis of variants in scRNA-seq Integrated informatics analysis of cancer-related variants CScape: a tool for predicting oncogenic single-point mutations in the cancer genome Rozowsky, J. et al. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Preprint at bioRxiv https://doi.org/10.1101/2021.04.26.441442 (2021) Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function An integrated map of structural variation in 2,504 human genomes Worldwide human relationships inferred from genome-wide patterns of variation Download references This project has been made possible in part by Human Cell Atlas Seed Network (grants CZF2019-02425 and CZF2019-002432) Genetic Ancestry Network (grant CZF2021-239847) from the Chan Zuckerberg Initiative DAF an advised fund of Silicon Valley Community Foundation (grant U01CA247760 to K.Chen) the Cancer Center Support (grant P30 CA016672 to P the Chan Zuckerberg Foundation (grant CZF2019-002446 to S Technology and Research (A*STAR) in Singapore (grant IAF-PP-H18/01/a0/020 to S This work is also supported by the CPRIT Single-Cell Genomics Center (grant RP180684 to N CPRIT Training Program (grant RP210028 to H.Jin) and National Cancer Institute (grant U24CA264010 to L Xu from Baylor College of Medicine for suggestions on left ventricle single-cell studies Nakhleh for Monovar implementation/maintenance The University of Texas MD Anderson Cancer Center Department of Molecular and Human Genetics Laboratory for Genome Information Analysis RIKEN center for Integrative Medical Sciences Graduate School of Integrated Sciences for Life RIKEN Center for Integrative Medical Sciences The University of Texas Health Science Center McWilliams School of Biomedical Informatics conceived the project and designed the experiments participated in the discussion of manuscript writing All authors read and approved the final version of the manuscript The authors declare no competing interests Nature Biotechnology thanks Alejo Fraticelli and the other reviewer(s) for their contribution to the peer review of this work Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Supplementary Tables 1–8 and Supplementary Note Download citation DOI: https://doi.org/10.1038/s41587-023-01873-x Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research The Phoblographer may receive affiliate compensation for products purchased using links in this article. For more information, please visit our Disclaimers page. All images by Miguel A. S. Used with permission. Please follow Miguel’s Flickr and Instagram @redcosmonautgirl. Some translation corrections were implemented Miguel started photography around the summer of 2014 and originally took inspiration from Alberto Verdú “He is an amazing wedding and street photographer in Monovar (Alicante),” Miguel tells us Upon hearing that Alberto was putting on a local workshop That’s how he started to see people and his environment in a brand new way Part of Miguel’s creative vision is both color are composition. He genuinely believes that this is how you make street photography that gets people to keep looking at your work he uses a Fujifilm X Pro 2 along with the 23mm and 18mm lenses “…I really love to use the flash during the day or night,” he tells us “Fujifilm cameras are pretty small and easy to manage Their portability makes it easy for me to get lost in the moment while taking pictures.” Miguel heads out to shoot whenever he can and doesn’t like AI imagery He’s been fooled several times via social media The Phoblographer works with human photographers to verify that they’ve actually created their work through shoots. These are done by providing us assets such as BTS captures, screenshots of post-production We do this to help our readers realize that this is authentically human work Here’s what this photographer provided for us Of all the mentors Begonya Klumb has had in her career one stands out as being particularly influential: her grandmother Josefa Bonastre founded a shoe manufacturing and export company in the 1940s and by the 1970s it was one of the biggest employers in Monóvar It was Bonastre — “la Jefa” to the people of Monóvar — who encouraged Klumb to study business and economics and who taught her that treating employees with respect is the key to building a successful organization Women to Watch 2016: Begonya Klumb Women to Watch 2015: Begonya KlumbNow the head of health care services at UMB Financial in Kansas City established UMB as one of the nation’s foremost administrators of health savings accounts and related products When Klumb took over as CEO of the division in 2015 UMB was the nation’s eighth-largest HSA administrator with more than 1 million accounts and $2 billion of assets and deposits under management The HSA division is the fastest-growing business unit within the $20.1 billion-asset UMB and Klumb aims to maintain that momentum by pursuing strategic acquisitions — it recently bought the health savings portfolio of a bank that has exited the business — and investing in technology that will help customers manage their accounts See the most recent rankings:• Most Powerful Women in Banking• Women to Watch• Most Powerful Women in Finance Before taking the helm of the HSA division Klumb launched and built the first mergers-and-acquisitions unit at UMB and she remains frustrated that female executives still endure what she refers to as “a prevalence of latent biases.” These include off-handed comments such as one she heard recently from a man praising a female business leader’s accomplishments while raising three kids but in my mind he was contributing to a bias that we expect women to raise children when we should expect parents to raise children,” she said the Federal Reserve is defending its current practices in court That argument raises thorny legal questions about whether stress tests are more like rules or adjudications Innovation of the Year 2025 Innovation of the Year 2025: Meet the honorees The 10 winning innovations span categories from AI and payments to risk and compliance An overall winner will be announced at American Banker's Digital Banking event on June 2 Student loans CFPB wins rare judgment over student loan debt relief firm A federal judge has ordered FDATR a now-defunct student loan debt relief provider to pay $43 million in restitution and fees bucking the trend of cases brought by the Biden administration-era Consumer Financial Protection Bureau being dropped Industry News How Cathinka Wahlstrom is modernizing America's oldest bank BNY's chief commercial officer talks about AI tariffs and her efforts to help create a leaner FORECLOSURE WARS She stopped paying her mortgage more than 15 years ago OrganisationCondolences for the death of José Luis MonrealMálaga CF deeply regrets the death of former player and coach of CD Málaga, José Luis Monreal, and sends deepest sympathies to his family and friends. Rest in peace. Copy linkHighlights | José Luis Monreal Sotillo (Madrid remembers Monreal: “He was my teammate for 8 or 9 years at Málaga. He came from Madrid to play with Malagueño as a midfielder he was moved on to the left. He was very competitive a winner who never wanted to leave the pitch (…) He got married in Málaga and lived well here. He was a true sportsman and his death came as quite a shock. Everything said about him is good we’re deeply saddened by his loss.” president of the Association of Ex Málaga Football Players: “José Luis was a regular at the monthly meals with the Veterans. During the last one in January we talked a lot about football. You learnt from him as a coach and as a person he was a good friend and adviser. His death came as such a shock and it still hasn’t sunk in. He was fine he was an athlete. I was a pupil of his and I knew him as a coach and personally for many years.” Monovar Hussain-Butt was sentenced to thirteen months for battery dangerous driving and driving while disqualified A man has been jailed for taking his family on a terrifying tour of Bristol and assaulting his wife in the back of his Jaguar in front of their four-year-old child before trying to run from police tried to escape from police in his Jaguar XF after a member of the public called 999 and reported seeing his wife sprawled on the back seat said she had been surprised to see him as he was already disqualified from driving following a previous offence on April 2 Mr Gordon said it soon became obvious Hussain-Butt, of Cranmore Crescent in Southmead The court heard the defendant was furious because his wife was applying to be a taxi driver to supplement the family income – the same job he had done before he was disqualified He shouted at her: “You’re not f***ing doing it.” They pulled up outside Luckwell Primary School where Mr Gordon said the defendant leaned round into the back seat while his wife was screaming to be let out A caretaker at the school reported hearing a “blood-curdling scream” and a man running an after-school club said he saw the defendant’s elbow “pumping up and down” inside the car with “significant force” Meanwhile the prosecution described how Hussain-Butts four-year-old daughter put up her hands as if to try and stop her father The woman tried to get out of the car but the defendant drove off The next time they were seen was on Clifton Road by a member of the public who saw the woman sprawled in the back seat of the Jaguar Hussain-Butt was sentenced at Bristol Crown Court ‌ The police caught up with the car on St Luke’s Road but the court heard the defendant refused to stop. A police car with lights and its siren on blocked his path and a policeman grabbed the rear door handle to try and rescue the wife but Hussain-Butt reversed away from the police car, dragging the officer with him and crashing into a post box before driving away. The ordeal came to an end when the defendant finally let his wife and child out of the car at Temple Meads railway station, where she called the police and was taken to hospital with bruising and abrasion to her face. After he was arrested and interviewed Hussain-Butt denied hitting his wife and stopping her leaving the car – saying it was only a verbal argument. Shortly after that his wife retracted her statement saying she was not trapped in the car or punched. “It was clear she didn’t wish to pursue the allegation,” said Mr Gordon. The prosecution said they were able to make a case on the basis of what she had already told police and what was seen by independent witnesses. In passing sentence Judge Patrick called it “an ugly incident.” He said: “No matter what was going on in your life with your wife, you had no business to deal with her in the way that you did. “The fact that it was distressing to a child makes this all the more serious.” He also said the defendant had tried to make good his escape, and could have injured pedestrians while trying to get away from police. Hussain-Butt was found guilty of battery, driving while disqualified and dangerous driving and given 13 months in prison, half to be served on licence. He was also disqualified from driving for two and a half years and made to pay a surcharge. Metrics details methods that account for the dynamics of mutational signatures in cellular evolution will improve the diagnosis and prognosis of diseases for which somatic alterations are a key factor obtaining accurate profiles of the genetic variation affecting single cells is essential there is no statistical model that allows for both local variation of bias and errors due to amplification and for statistically sound false discovery rate control when calling and genotyping SNVs in single cells ProSolo’s statistical rigor allows for accurate control of the false discovery rate when calling alternative alleles or identifying other relevant effects It achieves a higher variant calling accuracy compared to state-of-the-art tools we name the central innovations of our model and demonstrate its advantages in comparison to existing approaches A more detailed description of the innovations is available in the Methods section our model addresses the two major issues of MDA: (i) the differential amplification of the two alleles present in a diploid cell (“amplification bias” in the following); (ii) MDA induced errors (“amplification errors” in the following) which are copy errors introduced by the Φ29 polymerase used in MDA empirically derived model of differential amplification of alleles we evaluate single-cell samples together with a bulk sample from which the single cell is supposed to stem we argue that a bulk sample should be added to single-cell sequencing experiments wherever possible: it samples from the same cell population without requiring amplification and is therefore unaffected by amplification bias and errors and thus makes a particularly useful background sample to address the statistical uncertainties and biases induced by MDA one of the major features of the core model and its implementation is that it can easily be adapted to flexibly deal with other sampling setups so it could be extended to further scenarios The most precise single-cell variant callers to date the absence of an alternative allele (i.e. the heterozygous and the homozygous alternative genotypes called jointly) We thus focused on this for the main benchmarking All panels are strong zoom-ins, focusing on (different) areas of interest. Global views of these panels are provided in Supplementary Fig. 7 a Precision and recall of an average of two whole genome sequenced single cells IL-11 and IL-12 against their kindred clone IL-1C as ground truth genotypes b Precision and recall average of the five whole-exome sequenced single granulocytes against their pedigree-based germline genotype ground truth c Precision and recall average of 16 tumor and 16 normal single cells sequenced at the whole exome level −b The germline ground truth induces an artificial increase of recall for SCIPhI’s sensitive and ProSolo’s imputation mode; these modes should thus be disregarded for a fair comparison on the granulocyte dataset in panel b Threshold parameters (not comparable across tools): MonoVar --t; ProSolo --fdr; SCAN-SNV --fdr; SCcaller -a cutoff; SCIPhI prosolo --fdr Software modes: MonoVar with consensus filtering (default) or without (no consensus); ProSolo with minimum coverage 1 in single-cell (default) or imputing zero coverage sites based on bulk sample (imputation); SCcaller with recommended settings (default) or with a more sensitive calling; SCIPhI with default parameters (default) or all heuristics off (sensitive) MonoVar achieved a maximum precision of only 0.962 this was at a much higher recall (0.141) than for example SCcaller (0.095 at a precision of 0.972) SCcaller’s decreased recall on this dataset might be due to its estimation of local allelic bias by also taking biases at neighboring sites into account—in whole-exome data the number of neighboring sites available for this estimation will be limited and might lead to less reliable estimates SCAN-SNV’s recall increased to 0.0016 at a decreased maximum precision of 0.897 this decreased precision is an artifact of using the germline genotype as ground truth At the sites with somatic mutations in single cells this ground truth will instead contain the homozygous reference germline genotype and will incorrectly classify (existing) alternative alleles as false positives we also expect the calculated precision of all the other tools to be underestimated as the other tools also provide alternative allele calls for all sites where the single cells retained this germline genotype the relative effect on their precision will be smaller But this still leaves ProSolo as the only tool that provides the user with the choice of either aiming for more discoveries at the cost of a higher rate of false discoveries or at aiming for a more limited number of discoveries with higher confidence in each of them we suggest to not impute zero-coverage single cell sites whenever possible and instead recommend using and developing downstream software that can deal with both these missing values and the event probabilities that ProSolo provides uncertainties in the probabilities and information about missing data are passed on and will allow for more accurate statistical modeling in those downstream analyses which might explain the higher overall coverage and points to a possible source for the discrepancy between the naively calculated and the estimated allele dropout rates If samples PAG1 and PAG10 were doublets as well this would indicate that our use of empirical distributions in ProSolo provides for more robust event probabilities in the presence of doublets while heuristic thresholding (as in our naive allele dropout estimate) is very sensitive to such perturbations the parameters of the mechanistically motivated combination of beta-binomial distributions for modeling heterozygous genotypes could be learned per single cell sample at germline heterozygous sites—similar to the approaches of SCcaller and SCAN-SNV but globally per cell with their local variation modeled by their dependence on a site’s coverage When bulk coverage is low and an alternative allele is not sampled by any read the prior will result in nonzero alternative allele frequencies in the bulk being assigned nonzero likelihoods which is the desired behavior for this constellation the increased amount of data will progressively overrule the prior Studying and implementing the corresponding changes in the future has the potential to further improve the accurate site-specific event probabilities that ProSolo already provides through the joint modeling with a (sufficiently deep) bulk background sample ProSolo provides an accurate and easy-to-use variant caller for single-cell MDA sequencing data which will easily scale to calling variation on more cells and broader genomic coverage It will thus empower more research using single-cell DNA sequencing data More details and a detailed derivation of all model elements can be found in the Supplement To account for MDA amplification bias up to the complete dropout of individual alleles we distinguish between two alternative allele frequencies: The true (but usually unknown) underlying allele frequency at a site in a single cell: θs This can be assigned one of three possible values where 0 and 1 represent the homozygous reference and alternative genotype and 0.5 a heterozygous genotype the ratio of reads harboring the different alleles from a single cell sequencing experiment does not reflect the true allele frequency because of the biases induced by amplification the allele frequency after its distortion through amplification bias of which k reads bear the alternative allele the formal definition of this measurable frequency is ${\rho }_{s}=\frac{k}{l}$ The goal is to estimate the likelihood density across the three possible underlying allele frequencies in the single-cell (we denote ${\tilde{\theta }}_{s}$ as the density estimate across all θs ∈ {0.0 To accurately quantify the uncertainty introduced by amplification bias the probability distributions that reflect the shift from the true allele frequency θs to the distorted allele frequency ρs, as induced by MDA (Fig. 3c) We thus formally describe the statistics of read counts skewed by MDA at all sites encompassing sites that are homozygous for the reference allele to calculate likelihoods for each of these possible true allele frequencies the absence of a single cell candidate mutation in a bulk sample (with sufficient sequencing depth) increases the probability of an amplification error bulk samples can be employed to improve both the sensitivity and specificity of variant calls in the single cell where the increasing depth of coverage of the bulk sample increases the accuracy of the calls we derive likelihood density estimates for all possible alternative allele frequencies in the background bulk sample Given a set of n reads from the bulk (b) read data ${{{{{{{{\boldsymbol{Z}}}}}}}}}^{b}=\{{{{{{{{{\boldsymbol{Z}}}}}}}}}_{1}^{b},{{{{{{{{\boldsymbol{Z}}}}}}}}}_{2}^{b},\ldots ,{{{{{{{{\boldsymbol{Z}}}}}}}}}_{n}^{b}\}$ and discrete possible allele frequencies $\frac{m}{n}$ (m ∈ 0 we compute the probability of the data given a particular allele frequency as the product of the probabilities of all the reads: when referring to the likelihood density estimates across all possible allele frequencies (as opposed to a particular allele frequency) we denote this with ${\tilde{\theta }}_{b}$ our model fully defines the two-dimensional space of possible underlying alternative allele frequencies in the two samples as: please note that we here resolve a contradiction between ${\tilde{\theta }}_{s}=0$ which indicates a homozygous reference single cell which indicates that the bulk does not contain any homozygous reference cells We decide to trust the bulk sample over the single-cell sample and with our above assumption that the bulk is a mixture of a maximum of two subpopulations the bulk can only contain heterozygous and homozygous alternative cells This renders a homozygous reference single-cell impossible and we classify this event as evidence for an allele dropout of the alternative allele We thus obtain a set of mutually exclusive single-cell events (Fig. 3d): Accounting for the sample likelihoods based on Supplementary Equation S 23 (assuming ρb = θb for the bulk that has no amplification step, Supplementary Equation S 3) and evaluating only point estimates of these likelihoods at possible alternative allele frequencies whose posterior probability we can obtain from this sum: to calculate the posterior probability of an allele dropout at a particular site we sum up the posterior probabilities of the two ADO events a large enough sampling of the bulk cell population that the single-cell comes from should contain the single cell’s genotype at a particular site unless this cell is genuinely the first cell to harbor a mutation at that site This bulk background sample can thereby render credibility to single-cell variants with low coverage while at the same time eliminating amplification errors in the single-cell sample as these will not exist in the bulk sample the bulk sample also provides a mechanism of biologically meaningful imputation at sites that have no read coverage in the single cell If imputation is desired for sites with no read coverage in the single cell we set ${{{{{{{\bf{P}}}}}}}}({{{{{{{{\boldsymbol{Z}}}}}}}}}^{s}| {\theta }_{s}=0)={{{{{{{\bf{P}}}}}}}}({{{{{{{{\boldsymbol{Z}}}}}}}}}^{s}| {\theta }_{s}=\frac{1}{2})={{{{{{{\bf{P}}}}}}}}({{{{{{{{\boldsymbol{Z}}}}}}}}}^{s}| {\theta }_{s}=1)=1$ rendering all (unknown underlying) single-cell genotypes equally likely the posterior probabilities of events at sites with no read coverage become solely dependent on the read data from the bulk sample providing the most common genotype in the bulk while this is a biologically meaningful way of imputation at the vast majority of genomic sites it should be noted that this imputation will favor germline genotypes over any existing (lower frequency) somatic genotypes at a site unless such a somatic genotype is present in a majority of cells such an imputation carries the potential to introduce erroneous calls (especially when looking at subclonal somatic mutations) and we recommend to instead use downstream tools that can accommodate for missing data and data uncertainty wherever they are available We benchmarked ProSolo on three experimental datasets (Fig. 4) each with a different kind of ground truth: a single cell was selected as the founder for the secondary IL expansion into 20–30 cells two cells were extracted (IL-11 and IL-12) and sequenced following MDA The remaining kindred cells from that clone were used as a bulk sequencing sample without amplification (IL-1C) as these cells are only very few cell divisions away from IL-11 and IL-12 and thus have almost no difference in the somatic mutations acquired The ground truth genotype was generated using GATK HaplotypeCaller to call variant sites and bcftools mpileup to identify homozygous reference sites (with read coverage above 25 but no alternative allele present) IL-1C was only used as ground truth and not provided as input to any of the software compared here generated from cells after the first mini-expansion were merged into a further bulk sample for SCcaller and ProSolo (see Software and Parameters below) Unlike other callers (all of which finished in less than 5 days) SCIPhI took 5 weeks to finish on this dataset in sensitive mode and 7.5 weeks in default mode we selected granulocytes where at least 15 of these loci were properly amplified we also extracted bulk DNA and submitted it to whole-exome capture and paired-end Illumina sequencing without MDA to generate a bulk background sample for ProSolo and SCcaller We analyzed the tumor and normal cells separately ensuring that normal cells were called with the normal bulk background sample and tumor cells with the tumor bulk background sample we then used the normal bulk sample augmented with the clonal tumor mutations confirmed by targeted duplex sequencing removing the confirmed clonal and subclonal tumor mutations This experimental setup aims for fairness across all competitors Variant calling for the ground truths was performed as for cell line data above For the allele dropout rate, we will focus on the set of sites where the respective ground truth call is heterozygous, as these are the sites where the dropout of one of the alleles can be identified in a nonambiguous manner. More details for the three ways in which we calculate allele dropout rates are given in Supplement (Supplementary Section 2.7) Further information on research design is available in the Nature Research Reporting Summary linked to this article Mosaicism in health and disease — clones picking up speed Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation Li, R. et al. Somatic point mutations occurring early in development: a monozygotic twin study. J. Med. Genet. http://jmg.bmj.com/content/early/2013/10/11/jmedgenet-2013-101712 (2013) Differences between germline and somatic mutation rates in humans and mice Single-cell transcriptomics meets lineage tracing OncoNEM: inferring tumor evolution from single-cell sequencing data SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data Advances in understanding tumour evolution through single-cell sequencing Inference of clonal selection in cancer populations using single-cell sequencing data Eleven grand challenges in single-cell data science Comprehensive human genome amplification using multiple displacement amplification A quantitative comparison of single-cell whole genome amplification methods Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing Single-cell whole-genome amplification and sequencing: methodology and applications Estévez-Gómez, N. et al. Comparison of single-cell whole-genome amplification strategies. Preprint at bioRxiv https://doi.org/10.1101/443754 (2018) Genome coverage and sequence fidelity of ϕ29 polymerase-based multiple strand displacement whole genome amplification Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm Clonal evolution in breast cancer revealed by single nucleus genome sequencing Somatic mutation in single human neurons tracks developmental and transcriptional history High-resolution mapping of DNA polymerase fidelity using nucleotide imbalances and next-generation sequencing Exploring DNA quality of single cells for genome analysis with simultaneous whole-genome amplification TruePrime is a novel method for whole-genome amplification from single cells based on TthPrimPol Optimization and evaluation of single-cell whole-genome multiple displacement amplification Accurate identification of single-nucleotide variants in whole-genome-amplified single cells Single-cell mutation identification via phylogenetic inference Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery Single-cell exome sequencing and monoclonal evolution of a JAK2\mbox-negative myeloproliferative neoplasm Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor Evaluation of genome coverage and fidelity of multiple displacement amplification from single cells by SNP array Whole-genome multiple displacement amplification from single cells Proof of principle and first cases using preimplantation genetic haplotyping – a paradigm shift for embryo diagnosis Zafar, H., Navin, N., Chen, K. & Nakhleh, L. SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data. Genome Res. 29, 1847–1859 https://doi.org/10.1101/gr.243121.118 (2019) Koptagel, H., Jun, S.-H. & Lagergren, J. SCuPhr: a probabilistic framework for cell lineage tree reconstruction. Preprint at bioRxiv https://doi.org/10.1101/357442 (2018) Bohrson, C. L. et al. Linked-read analysis identifies mutations in single-cell DNA-sequencing data. Nat. Genet. https://doi.org/10.1038/s41588-019-0366-2 (2019) Conbase: a software for unsupervised discovery of clonal somatic mutations in single cells through read phasing Genome-wide copy number analysis of single cells SCARLET: single-cell tumor phylogeny inference with copy-number constrained mutation losses Eggenberger, F. & Pólya, G. Über die Statistik verketteter Vorgänge. J. Appl. Math. Mech./ Zeitschrift für Angewandte Mathematik und Mechanik https://doi.org/10.1002/zamm.19230030407 (1923) Optimal sample size for multiple testing: the case of gene expression microarrays Ten simple rules for making research software more robust Bioconda: sustainable and comprehensive software distribution for the life sciences Snakemake—a scalable bioinformatics workflow engine Constitutional mismatch repair-deficiency and whole-exome sequencing as the means of the rapid detection of the causative MSH6 defect Improving the accuracy and efficiency of identity-by-descent detection in population data A likelihood-based framework for variant calling and de novo mutation detection in families Rare variant detection using family-based sequencing analysis FamSeq: a variant calling program for family-based sequencing data using graphics processing units Download references This work has been supported by the Helmholtz Association in particular through a Helmholtz Incubator grant (Sparse2Big ZT-I-0007) by the compute cluster at the Helmholtz Institute for Infection Research Alexander Schönhuth was supported by the Netherlands Organisation for Scientific Research (NWO: Vidi grant 639.072.309) Arndt Borkhardt and Ute Fischer were further supported by the German Federal Office for Radiation Protection (BfS) grant nos Open Access funding enabled and organized by Projekt DEAL These authors jointly supervised this work: Alice C Department for Computational Biology of Infection Research Braunschweig Integrated Centre of Systems Biology (BRICS) Faculty of Mathematics and Natural Sciences Algorithms for Reproducible Bioinformatics conceived the original project to study single immune cells formulated the statistical model and wrote the manuscript Peer review information Nature Communications thanks Yong Wang reviewer for their contribution to the peer review of this work Download citation DOI: https://doi.org/10.1038/s41467-021-26938-w Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research Metrics details Reconstructing the evolution of tumors is a key aspect towards the identification of appropriate cancer therapies The task is challenging because tumors evolve as heterogeneous cell populations Single-cell sequencing holds the promise of resolving the heterogeneity of tumors; however it has its own challenges including elevated error rates we develop a new approach to mutation detection in individual tumor cells by leveraging the evolutionary relationship among cells jointly calls mutations in individual cells and estimates the tumor phylogeny among these cells Employing a Markov Chain Monte Carlo scheme enables us to reliably call mutations in each single cell even in experiments with high drop-out rates and missing data We show that SCIΦ outperforms existing methods on simulated data and applied it to different real-world datasets namely a whole exome breast cancer as well as a panel acute lymphoblastic leukemia dataset Due to recent technological advances it is now possible to sequence the genome of individual cells1 to directly study genetic cell-to-cell variability and gives unprecedented insights into somatic cell evolution in development and disease The two main issues are that mutational signals of small subclones cannot be distinguished from noise and that the deconvolution of the aggregate measurements into clones is While this process is very efficient at amplifying the overall DNA material the random non-amplification of one allele of a heterozygous genotype site all evidence of a heterozygous genotype mutation is lost when the mutated allele drops out false positive artifacts can arise in the MDA amplification when random errors introduced early in the process end up with high frequencies due to allelic amplification biases Further challenges arise from uneven amplification across the genome which results in non-uniform coverage that will leave some sites with insufficient coverage depth for reliable base calling Both methods take raw sequencing data (BAM files) and output the inferred genotypes of the cells Monovar specifically addresses the problem of low and uneven coverage in mutation calling by pooling sequencing information across cells while assuming that no dependencies exist across sites SCcaller detects variants independently for each cell and accounts for local allelic amplification biases the identification of such biases is based on germline single-nucleotide polymorphisms (SNPs) it cannot recover mutations from drop-out events or loss of heterozygosity a new single-cell-specific variant caller that combines single-cell genotyping with reconstruction of the cell lineage tree SCIΦ leverages the fact that the somatic cells of an organism are related via a phylogenetic tree where mutations are propagated along tree branches SCIΦ can reliably identify single-nucleotide variants (SNVs) in single cells with very low or even no variant allele support and is robust to copy number changes the only other tool able to transfers information between cells These loci are then used to infer the underlying phylogenetic tree and the parameters of the model In a last step the mutation to cell assignment is sampled from the posterior distribution the only previously published single-cell mutation caller sharing information across cells We start by analyzing the results of the simulated data Performance of SCIΦ and Monovar on simulated data with different number of cells Summary statistics of the F1 performance of SCIΦ and Monovar on simulated data F1 performance depending on different levels of drop-out events (a) We found SCIΦ to be more robust to increasing drop-out rates in comparison to Monovar (Fig. 3a) In addition to using the phylogenetic tree structure SCIΦ also learns the drop-out rate of the experiment during the MCMC scheme and uses 10% only as a starting condition An additional experiment was conducted to investigate the effects of loss of heterozygosity. Monovar as well as SCIΦ perform better with increasing levels of homozygous mutations present in the experiment (Fig. 3b) Monovar particularly benefits from homozygous mutations as these are very unlikely to be classified as wild type SCIΦ experiences a more modest benefit from homozygous mutations since it already starts with high performance due to the usage of the phylogenetic tree structure to accurately call mutations Because copy number events play a prominent role in tumor evolution, we investigated the performance of Monovar and SCIΦ in the presence of additional wild type alleles (Fig. 3c) Similar to the dependence on the homozygosity rate SCIΦ shows a fairly stable performance for copy number events affecting up to 50% of the mutated loci and outperforms Monovar for all settings the performance of Monovar drops more quickly with increasing rate of copy number events This dataset is particularly challenging because cells are aneuploid a Cell lineage tree with average number of mutations per inner node as identified by SCIΦ The area of a node is proportional to its number of assigned mutations b Posterior probability of SCIΦ mutation calls clustered according to the tree in a c Probability of Monovar mutation calls for loci identified as mutated by SCIΦ and clustered according to the tree in a d Probability of Monovar mutation calls for loci identified as mutated by SCIΦ and clustered hierarchically its placement earlier in the tree above those cells is much more evolutionarily plausible a Monovar mutation calls for loci identified as mutated by SCIΦ clustered hierarchically b Monovar mutation calls clustered according to the tree inferred by SCIΦ c SCIΦ mutation calls clustered according to its inferred tree This is of particular interest in cancer genomics because tumors show heterogeneous cell compositions often resulting in the failure of targeted cancer therapies the first single-cell mutation caller that simultaneously infers the mutational landscape and the phylogenetic history of a tumor sample SCIΦ accounts for the elevated noise levels of single-cell data by appropriately approximating the genomic amplification process and the high fraction of drop-out events In combination with a Markov Chain Monte Carlo phylogenetic tree inference scheme mutations are reliably assigned to individual cells We have compared SCIΦ to Monovar11 on both simulated and real datasets both SCIΦ and Monovar show a precision of almost one SCIΦ shows a substantially higher recall and F1 score simulating different MDA amplifications we showed that SCIΦ is not sensitive to the amplification process we showed that SCIΦ achieves a much cleaner assignment of mutations to cells within subclones SCIΦ recovered mutations from drop-out events using the inferred phylogenetic tree structure of the sample to share information across cells the phylogenetic tree inferred by SCIΦ reflects the evolutionary history more accurately than a hierarchical clustering from Monovar results Further improvements could be the inclusion of copy number information into the tree reconstruction this comes at the cost of losing the independence between mutation assumption which is computationally expensive to overcome as groups of mutations would have to be identified Mutation calling and lineage tree building are two interdependent tasks and addressing them in a single statistical model provides both improved mutation calls as well as a better estimate of the underlying cell lineage tree and hence a better understanding of tumor heterogeneity with parameters α and β and where B is the beta function For better interpretability in our implementation we will employ an alternative parametrization of the beta-binomial distribution with $f = {\textstyle{\alpha \over {\alpha + \beta }}}$ being the frequency of a nucleotide and ω = α + β an overdispersion term determining the shape of the distribution which decreases with increasing variance the probability of the observed count (support) sij for a specific nucleotide in the absence of a mutation is cij) and fwt is the expected frequency of the observed nucleotide Large values of ωwt lead to a binomial distribution representing independent sequencing errors In the presence of a heterozygous mutation (a mutation affecting one of the two homologous chromosomes) The underlying allele frequency of ${\textstyle{1 \over 2}}$ is corrected by sequencing errors producing any of the other two bases Low values of the overdispersion term ωa reflect a small number of initial genomic fragments and any additional feedback in the amplification SCIΦ generally assumes copy number neutrality but learning ωa allows for additional shifts in the mean variant allele frequency away from ${\textstyle{1 \over 2}}$ due to copy number changes Likely mutated loci are identified using the posterior probability of observing at least one mutated cell at a specific locus The probability of observing no mutation at locus i across all cells is where K is a random variable indicating the number of mutated cells and λ is the prior probability of a mutation occurring at the locus The probability of observing the mutation in k cells is We do not need to compute P(Di) as it cancels out when computing the likelihood ratio or posterior odds The likelihood of the data given that exactly k of the m cells possess the mutation The prior probability of a mutation in a phylogeny affecting k descendant cells is determined by placing mutations uniformly among the edges of the tree (Supplementary Section A) leading to Along with the uncertainty in the supporting read counts due to the amplifications in each cell when a mutation is present an additional artifact is drop-out whereby one allele is not amplified at all To account for allelic drop-out occurring with probability μ we introduce the following mixture for the likelihood of the observations for each cell: where the first term describes the loss of the mutant allele, the second the loss of the wild-type allele and the third term describes a heterozygous mutation. The case μ = 0 reduces to Eq. (3) while each of the n mutations can be attached to the (2m − 1) edges leading to (2m − 3)!!(2m − 1)n possible configurations for the discrete component (T it is infeasible to enumerate all solutions Instead we employ a Markov Chain Monte Carlo approach to search and sample from the tree space we employ the likelihood of a specific tree realization with the mutation attachment parameter σ and the parameters θ to be where P(Dij | T) = Pa(Dij) if the cell j is below mutation i (on the path from leaf j to the root) and P(Dij | T) = Pwt(Dij) otherwise The first set of products describes the loci identified to be likely mutated (section Identification of candidate mutated loci) which are placed on the tree and used together to infer its phylogenetic structure The second half represents all loci where no mutation is present which inform the inference of the sequencing error parameters We marginalize out the attachment points of the mutations, analogously to ref. 20 Assuming each mutation is equally likely to attach to any edge in the tree and the attachment probability to be independent between mutations we have P(σ | T θ) = ${\textstyle{1 \over {(2m - 1)^n}}}$ so that the sum over σi can be written explicitly as where I is the indicator function and $\left( {\sigma _i \prec j} \right)$ indicates that cell j sits below the attachment point σi of mutation i in the tree T The sum can be computed in O(m) time using the binary tree structure we propagate the probability of attaching a mutation to a specific node from the leaves toward the root This can be implemented using the depth-first search (DFS) algorithm combining in each node the probabilities from two previously computed subtrees Computing Eq. (10) is therefore in O(mn) while the marginalization has the benefit of reducing the search space by a factor of (2m − 1)n In addition we employ the marginalization to focus on the tree structure of the cell lineage rather than the attachment points of mutations Since that number is typically much smaller than mn Because tumor cells show chromosomal abnormalities mutations can be observed as homozygous variants even without drop-out events In order to also account for loss of heterozygosity we adapt the scheme introduced in section Tree likelihood Instead of computing the likelihood of the data when attaching a mutation to a node in the lineage tree in the heterozygous state only we additionally compute the likelihood when attaching each mutation in the homozygous state involving the nucleotide model when only alternative alleles are present Note that homozygous mutations are only attached to inner nodes as the probability of observing a drop-out event in a single cell is assumed to be higher than a single homozygous mutation Utilizing the tree structure, the sum can again be computed in O(m) time for each mutation on the tree. The overall likelihood (Eq. (10)) for each mutation becomes a weighted sum of the two possibilities leading to we employ an MCMC scheme to sample from the posterior distribution of mutation assignments as well as tree structures given the data (for simplicity with uniform priors) We change one parameter at a time with transition probability q(T′ θ) and accept the new configuration with probability and can be verified by computing the correlation between two runs in practice The overall runtime complexity is O(x × max(mn c)) with c being the number of unique coverage values of the experiment From the sample of trees and parameters we could also conditionally sample the placement of the mutations for the full joint posterior sample utilizing the full weights of attaching each mutation to different edges we record the probability of each cell possessing each mutation Averaging over the MCMC chain provides the posterior genotype matrix and hence our single-cell variant calls In order to benchmark the performance of SCIΦ we simulated tumor evolution by introducing a cell lineage tree and simulated read counts by mimicking the noisy MDA process we created a random binary genealogical cell linage tree with 100 mutations attached to the edges The placement of the mutations defines which cells possess each mutation We chose the placement such that each mutation is shared by at least two cells because mutations in only one cell may be false positives from sequencing errors and are filtered out in practice as well as in our benchmark among all the mutations present in cells a specified fraction μ was randomly selected as drop-outs ${\textstyle{\mu \over 2}}$ of the mutations became wild type and ${\textstyle{\mu \over 2}}$ became homozygous alternative genotype Then we generated an artificial reference chromosome of 1 million base pairs (bp) and divided it into segments of ~1000 bp for each cell individually we generated a coverage distribution following a negative binomial distribution with a mean of 25 nucleotides and a variance of 50 10% of the segments were assigned 0 coverage to include missing information The coverage c of specific positions was additionally randomized following a discretized Gaussian distribution with the segment coverage as mean and a standard deviation of 10% of that mean in order to simulate the uneven coverage profiles of real single-cell sequencing experiments This process is repeated c times and the copies are retained we change the number of initial copies of the wild type allele for a specific locus We set the probability of x extra copies to be ${\textstyle{1 \over {2^x}}}$ This strategy assumes all copy number changes happened prior to mutation events the strategy provides lower bounds on the performance measures because the variant allele frequency decreases with increasing copy number a nucleotide is mutated to account for sequencing errors and the resulting simulated data was embedded into a multi-pileup file Both experiments were in line with the previously reported results The first five years of single-cell cancer genomics and beyond Tumour heterogeneity and the evolution of polyclonal drug resistance A population genetics perspective on the determinants of intra-tumor heterogeneity Computational approaches for inferring tumor evolution from single-cell genomic data Genomic DNA amplification by the multiple displacement amplification (MDA) method Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics Reliable detection of subclonal single-nucleotide variants in tumour cell populations A mechanistic beta-binomial probability model for mRNA sequencing data SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples Enumerative Combinatorics (Cambridge University Press Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors Calibrating genomic and allelic coverage bias in single-cell sequencing Rapid amplification of plasmid and phage DNA using phi29 DNA polymerase and multiply-primed rolling circle amplification Genome-wide detection of single-nucleotide and copy-number variations of a single human cell Snakemake-a scalable bioinformatics workflow engine ggplot2: Elegant Graphics for Data Analysis Download references We thank David Seifert for constructive discussions and C++ support as well as Franziska Singer for critical feedback. J.S. and J.K. were supported by ERC Synergy Grant 609883 (http://erc.europa.eu/). K.J. was supported by SystemsX.ch RTD Grant 2013/150 (http://www.systemsx.ch/) These authors contributed equally: Jochen Singer Department of Biosystems Science and Engineering All authors drafted the manuscript and approved the final version Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Reprints and permissions Download citation DOI: https://doi.org/10.1038/s41467-018-07627-7 01 Oct 2017UK crime sentencing guidelinesFrom a murderer to a drunk on a plane that forced a plane to make a landing here are some of those jailed in September 2017 From a dealer found with drugs outside a school to paedophiles and a lying businessman, these are just some of the people either from the Bristol region or who have been locked up after committing crimes in the area was jailed for 28 months for dealing cocaine and cannabis‌ A drug dealer was found with more than £3,500 of cannabis and cocaine outside a Bristol school Nathan Larcombe was discovered by officers on at Somerset Square near St Mary Redcliffe School smoking cannabis finally catching up with him before a violent struggle where they had to pepper spray him On searching Larcombe they discovered a JD Sports bag full of drugs Larcombe admitted to possession both drugs with intent to supply on the basis that he was not selling to schoolchildren A woman had her face smashed after she hit her violent partner with an ironing board Christina Page “got in first” with the board when David Cox turned verbally aggressive Bristol Crown Court heard Cox then pushed her and she struck her face on a marble fire surround pleaded guilty to sending a malicious communication and inflicting grievous bodily harm in July The court heard how the pair had met on an internet dating site with the relationship quickly deteriorating as Cox acted “controlling and jealous” - fixated that she was unfaithful to him 'American gangster movie-style car chase' ends in hammer and baseball bat attack Jailed: Kaluba for 15 months and Lucas for 18 months An attempt to reclaim cash resulted in an American "gangster movie-style" car chase through Weston-super-Mare Mutilla Kaluba and Oliver Lucas targeted doormen Kurt Duddridge and Robert Carlucci to get back some £2,000 In the bizarre events that followed Kaluba and Lucas stopped their victims’ Citroen and caused it to reverse in the road Kaluba and Lucas then did a u-turn in their Ford and the two cars went bonnet-to-bonnet before the Citroen reversed into a post 'American gangster movie-style car chase' ends in hammer and baseball bat attack‌ The wrecked Citroen was then further smashed up by Kaluba and Lucas having an offensive weapon and criminal damage and was jailed for 15 months criminal damage and dangerous driving and was jailed for 18 months A man from Bristol admitted a string of sexual abuse of a 10-year-old girl carried out the abuse around 25 years ago was living in Chippenham in Wiltshire and was in his late 20s Police said it had been both traumatic and upsetting for her to relive the awful abuse she suffered at the hands of Loosemore When Mohammed Khalid went on a burglary spree he made a fatal error He forgot Bristol is a city that never sleeps As he broke into one property in the early hours he was disturbed by a chef returning home from work And when he went on to try his luck at a property nearby he was disturbed by a postman getting ready for work Frustrated by people cropping up everywhere he went he turned his attention to an empty Peugeot car smashed a window and stole around £5 in change Khalid, 41, of Barbour Road in Hartcliffe Kieron Wood has been cleared of a samurai sword attack but sent to two years’ youth custody for drugs offences.(Image: Avon and Somerset Police) Self-confessed drug dealer Kieron Wood was cleared of a samurai sword attack on a customer after stating that it was Christopher Hill who stormed into his flat put a knife to his throat and took his drugs said he knew nothing about a later attack where a group of men went to Mr Hill's flat and attacked him But Wood did admit possessing Class A drugs heroin and crack cocaine with intent to supply as well as possessing cannabis and a blade and so was jailed anyway Shelvi Varkey(Image: Avon and Somerset police) A Southmead Hospital nurse who failed to declare he was being investigated for not dispensing medicine has been jailed Whilst under investigation by the Nursing and Midwifery Council he attended a jobs fair at Southmead Hospital and applied for a nursing position Not only did he fail to disclose he was being investigated he provided two fake references and was taken on a married father-of-three of Parade Court in Speedwell denied wrongdoing but was found guilty of fraud by Bristol magistrates Derek Root(Image: Avon and Somerset police)Jailed: Derek Root was jailed for 12 months A pair of strangers got so drunk and abusive on a holiday jet the pilot was forced to land at Bristol Airport Derek Root necked 10 shots of Jack Daniel's whiskey in the departure lounge when his flight from Glasgow to Alicante was delayed by three hours on July 8 this year Police on the runway(Image: Jason Wassall) Root finally boarded the plane and found himself sitting near passenger Alexander Gray who handed him a bottle of Jagermeister which he also started drinking – the two had never met before The pair soon became abusive to passengers and crew with Root asking stewards if they 'wanted his c*ck' before Gray was sick on the floor They were subdued by staff and returned to their seats - where they promptly fell fast asleep as the plane diverted to Bristol Airport A sexual deviant who used two young girls as ‘sex toys’ was finally brought to justice after a victim disclosed his crimes during a psychic reading Paedophile Roger Britton systematic abused the two girls during the 1990s “targeting occasions to abuse them” at his home and workplace Britton admitted 14 charges of indecent assaults and gross indecency on the children who finally spoke out about what he had done last year Bristol Crown Court heard how he abused one of the girls from the age of seven through to 11 and another from the age of nine to 14 One of his victim’s had disclosed what had happened to her during her childhood during a psychic reading Britton was confronted about what had happened and admitted everything before sending out a text message apologising to one of his victims Shaun Hudd has been jailed for eight years for a series of online sex offences against children Shaun Hudd was a chef at soft play business Jump in Cribbs Causeway and a scoutmaster with 36th Bristol Scout Group from behind the safety of his computer screen sexually exploiting children as young as nine who had also worked at award-winning Bristol restaurant Casamia posed as a youngster himself before contacting children through Facebook and Skype Bristol Crown Court heard that he became hooked on exploiting boys and girls persuading them to expose themselves to him and recording indecent photos and videos When youngsters tried to rid themselves of Hudd he threatened to distribute the humiliating footage - and carried out the threat on two occasions Businessman William Irving lied to a court in a bid to avoid points on his driving licence near Thornbury told North Avon Magistrates’ Court in August last year that he had no points at all It came as the 62-year-old was midway through a trial for driving without insurance Prosecutor Julian Howells said Irving already had three points on his licence from May 2014 for driving while using a phone and another three points from October 2015 for failing to give a driver’s details The court heard a conviction for driving his Mercedes car without insurance on December 7 2015 would have meant he was banned under the ‘totting up’ rules And during a trial contesting the charge that he was driving without insurance Irving told magistrates he had a clean licence Jailed: Two years with another two years extended licence A violent steroid addict left a man in danger of going blind in one eye after he punched him Restaurant owner Detjon Prenci suffered three fractures to his face and was told he could lose his sight after Peter Clark delivered a single blow in an “unprovoked” attack in the garden of Wees Lounge Bar on Park Street Bristol Crown Court heard how 27-year-old Clark had a history of violence when he was locked up for smashing someone over the head with a piece of wood has been jailed after punching a man in the garden of Wees bar on Park Street The latest attack was his sixth violent assault to be brought before the courts including previous attacks using glasses and bottles in the city centre Prosecutor David Maunder told the court how Mr Prenci had been out with friends on April 22 this year and ended up in the bar garden having a cigarette with a friend The next thing he knew he was in the Bristol Royal Infirmary and unable to see from his left eye Daniel Povey has been jailed for three years and three months after he was caught by undercover officers A Kingswood man was caught trying to get naked pictures of underage girls on social media Daniel Povey used online messaging app Kik to approach someone he thought was a 12-year-old girl The security officer repeatedly asked for nude pictures and videos from the girl and sent images of himself carrying out a sexual act But the 'girl' was in fact a fake profile set up by undercover police to catch online paedophile predators Alexander David Densley stole bicycles from Fishponds Police Station (pictured) and drove a stolen Porsche during his seven-month crime spree Jailed: 12 months in a young offender institution A prolific thief who counted stealing goods from a Bristol police station and driving a stolen Porsche among the highlights of a seven-month crime spree has finally been caught Teenager Alexander David Densley was jailed last week after admitting to a string of offences including driving a stolen Porsche and breaking in to a retirement home The 18-year-old, from Bath , also admitted to stealing eight bicycles worth an eye-watering £3,390 from the bike shed at Fishponds Police Station in February Densley’s crime spree came to a sudden halt this month after he was caught entering The Moorings retirement home Jamie Mitchell has been jailed for six years for robbery ‌Jailed: Six years and four months in prison with an extended licence period of five years A Bristol man grabbed cash from a shop till and then pulled a knife on staff who tried to restrain him Jamie Mitchell just weeks before the incident at the Co-op in Station Road The court heard he walked into the store and was approached by staff because he is banned then while being removed from the store he grabbed £50 from the till of Brigstocke Road in St Paul's then he pulled out and brandished a large bread knife and told staff he had enough before running out of the shop dangerous driving and driving while disqualified ‌ A man took his family on a terrifying tour of Bristol and assaulted his wife in the back of his Jaguar in front of their four-year-old Monovar Hussain-Butt Mr Gordon said it soon became obvious Hussain-Butt, of Cranmore Crescent in Southmead Martyn Ford given a life sentence for the murder of his stepfather Ian Baker(Image: Avon and Somerset Police) Jailed: Mandatory life sentence and will spend at least 20 years and four months behind bars before being considered for parole A troublesome stepson battered his “kind and gentle” stepfather to death with a hammer just days after being released from prison Martyn Ford launched a “vicious and fatal” attack on Ian Baker at his Hungerford Road home in Brislington before ensuring he was dead and then searching his home for cash Bristol Crown Court heard how Ford stashed the murder weapon and his blood stained clothes in a blue suitcase before throwing it into the Feeder Canal – never to be discovered then went out drinking with Mr Baker’s cash visited KFC and even returned to the murder scene hours later to discover it cordoned off by police before heading back to a guest house jailed after abusing a 14-year-old girl and concocting a story with his girlfriend to cover up his crimes(Image: Avon and Somerset Police)‌Jailed: Lukas Deacon seven-and-a-half years and Lisa Watson 18 months A couple concocted a story to tell police and the courts in a bid to cover up his abuse of a 14-year-old girl Lukas Deacon and his partner of 10 years were both jailed after DNA proved their story could not be true Bristol Crown Court heard how Deacon of Jacob Street in Old Market had abused the girl while Watson But after the girl broke down and told her mum what had happened the pair came up with a lie to tell police stating her claims could not be true as they were both awake and asleep at the same time that evening The court was told how even when the teenager’s DNA was discovered on his boxer shorts the couple continued in their story and denials of what had happened was jailed for seven-and-a-half years for the abuse perverting the course of justice and breaking into a neighbours flat to steal a bottle of whiskey Watson of Ridgeway Court in Ridgeway Lane in Whitchurch who was described by the judge as the most intellectual of the pair and the one to come up with the lie A conman who used 'sleight of hand' to steal jewellery and coins from a shop in Bedminster has been locked up following a similar con across the country Staff at East Street Jewellers in Bristol didn’t even know they had been the victim of the crime when a necklace and a bracelet valued at £750 were stolen But it has since emerged that they were the same thieves hitting jewellers across the UK – taking almost £24,000 worth of goods from unsuspecting shop owners One of the two tricksters - Romanian national Mircea Rostas - was subsequently identified from CCTV footage and a fingerprint and was jailed for a total of 26 month for his role in the thefts who had only been in the UK since October last year admitted stealing the jewellery from the East Street shop in Bristol city centre on February 17 this year when he appeared at Shrewsbury Crown Court Rostas also admitted thefts from seven other shops in Shrewsbury Ely near Cambridge and Oxford between January and April this year A burglar came up with an outlandish reason his blood was at a crime scene after he was arrested by police Wayne Morris’ blood was discovered in a Clevedon home after he had smashed a window and made his way in But when arrested the 48-year-old came up with a story as to why his DNA was left at the crime scene Bristol Crown Court heard he had tried to tell police he had cut his elbow after tripping while down the street before using a sock to soak up his blood and then chucked it in the bin He said that the real thief must of fished his blood-soaked sock out of the rubbish and put it on their hands and used them as gloves during the burglary on August 4 must have realised his story wasn’t going to wash and pleaded guilty to the burglary at court Stephen Priddis has been jailed for six months for threatening lampposts and bollards with a knife A man has been jailed for carrying a knife after he was seen threatening lamp-posts and bollards in the centre of Bristol Stephen Priddis was seen in Upper Maudlin Street pointing a blade at street furniture and making slashing motions in the air where he helped himself to pasties from the Pumpkin Café and wandered off Police found a “very drunk” Priddis with a half-eaten pasty at St James Barton roundabout Priddis, 38, of no fixed address, pleaded guilty to possessing a blade, theft and a public order offence at Bristol Crown Court. Judge William Hart jailed him for six months.