Metrics details
Current variant callers are not suitable for single-cell DNA sequencing, as they do not account for allelic dropout, false-positive errors and coverage nonuniformity. We developed Monovar (https://bitbucket.org/hamimzafar/monovar)
a statistical method for detecting and genotyping single-nucleotide variants in single-cell data
Monovar exhibited superior performance over standard algorithms on benchmarks and in identifying driver mutations and delineating clonal substructure in three different human tumor data sets
Prices may be subject to local taxes which are calculated during checkout
Download references
and MD Anderson Knowledge Gap and Center for Genetics & Genomics
is a Damon Runyon-Rachleff Innovator (DRR-25-13)
is a Sabin Fellow and was supported by an NCI grant (RO1CA172652)
Chapman and Dell Foundations and NCI (CA016672)
Hamim Zafar and Yong Wang: These authors contributed equally to this work
Department of Bioinformatics and Computational Biology
analyzed the data and wrote the manuscript
The authors declare no competing financial interests
GATK HaplotypeCaller and Samtools were compared using single cell exome sequencing data generated from a normal isogenic fibroblast cell line in terms of SNV detection (a) Precision versus Detection Efficiency (Recall) and (b) SNV transition and transversion spectrum for FP errors
Monovar and GATK HaplotypeCaller were compared in terms of (a) Precision and (b) Detection Efficiency (Recall)
acquired via down-sampling the SKN2 SCS data
The SNV detection (a) Precision and (b) Detection Efficiency of Monovar were measured by comparing SNVs detected from a set of datasets
created by in silico intermixing of variable numbers of SKN2 and 12 TNBC cells
with SNVs detected from SKN2 bulk sequencing data
Supplementary Tables 1–6 and Supplementary Note 1 (PDF 2042 kb)
Reprints and permissions
Download citation
Anyone you share the following link with will be able to read this content:
a shareable link is not currently available for this article
Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology
A newly developed computer program is so sophisticated, it can spot DNA mutations in a single cancer cell
That huge increase in accuracy could have a major impact on the way that doctors diagnose the disease
and brings us closer to treatments that forego brute force methods like chemo and radiotherapy for a more personalised approach
Whereas existing 'next-generation sequencing' (NGS) techniques measure genomes derived from millions of cells
is capable of pinpointing important variations within tissue samples that would normally get lost in all the noise
Monovar is built on technology known as single cell sequencing (SCS), which is used not just in cancer research, but also in neurobiology, microbiology, and immunology. The newer SCS method pulls genome data from individual cells
and spot anomalies with a high degree of accuracy
Monovar is capable of spotting very slight DNA variations
known as single nucleotide variants (SNVs)
which could help in the diagnosis of certain types of cancer
That's a lot of acronyms to keep on top of
but the upshot is that we should see improvements in cancer diagnosis and treatment
thanks to the more accurate detection of these SNVs
The Monovar algorithm essentially gives doctors more accurate data to work with
and helps them to spot subtle differences they might not otherwise be aware of
And it's all built on statistical analysis - the system is able to extract data from multiple single cells to discover SNVs and provide highly detailed genetic data on each, explains one of the team, Nicholas Navin
from the University of Texas MD Anderson Cancer Centre
The accurate detection of SNVs is critical for patient care
because they affect how an individual develops a disease and responds to various drugs and vaccines
These molecular variations are crucial in applying personalised medicines and treatments that are specifically tailored to the patient's body
"Monovar is capable of analysing large-scale datasets and handling different whole-genome protocols, therefore it is well-suited for many types of studies," says one of the team, Ken Chen
The Monovar program – which you can actually check out for yourself online – has been shown to be more accurate than standard algorithms at identifying mutations and variations
according to the benchmark tests run by the team
This isn't the first time that 'big data' and statistical analysis have been used in the fight against cancer. Labs across the world are collecting vast amounts of data on how cancers work and how they react to various treatments - data that can be used to refine our approach to them
Let's hope cancer patients can be given more personalised treatments in the future based on much smarter data analysis
The program has been described in Nature Methods
Metrics details
Single-cell omics technologies enable molecular characterization of diverse cell types and states
but how the resulting transcriptional and epigenetic profiles depend on the cell’s genetic background remains understudied
a computational tool to detect single-nucleotide variants (SNVs) from single-cell sequencing data
Monopogen leverages linkage disequilibrium from external reference panels to identify germline SNVs and detects putative somatic SNVs using allele cosegregating patterns at the cell population level
It can identify 100 K to 3 M germline SNVs achieving a genotyping accuracy of 95%
together with hundreds of putative somatic SNVs
Monopogen-derived genotypes enable global and local ancestry inference and identification of admixed samples
It identifies variants associated with cardiomyocyte metabolic levels and epigenomic programs
It also improves putative somatic SNV detection that enables clonal lineage tracing in primary human clonal hematopoiesis
Monopogen brings together population genetics
cell lineage tracing and single-cell omics to uncover genetic determinants of cellular processes
the genetic ancestry of the samples and its contribution to cellular molecular traits are largely unexplored
it is often necessary to resequence the study samples using bulk whole-genome sequencing (WGS)/whole-exome sequencing
which requires additional sequencing efforts and costs
Possible reasons for low variant detection are as follows: (1) the single-cell RNA sequencing (scRNA-seq) reads are usually enriched in specific genomic regions
such as 5′ or 3′ end of genes; (2) genes are usually expressed in cell-type/state-specific patterns and thus are highly variable across genome regions
leading to uneven sequencing depth distribution; (3) coverage is likely affected by allelic imbalance inherent in RNA profiles and (4) sequencing reads tend to have many errors due to technological infidelity
Monopogen includes germline and putative somatic SNV calling modules
Monopogen starts from individual bam files produced by single-cell sequencing technologies
Sequencing reads with multiple alignment mismatches (default four) are removed
Putative SNVs are identified sensitively from pooled pileup containing at least one nonreference read
For SNVs present in the external reference panel (such as 1KG3)
genotype likelihoods are further refined based on LD in the reference panel
The loci showing persistent discordance are used to estimate a sequencing error model
we identify putative somatic SNVs by focusing on ones if there is sufficient sequencing depth and alternative allele frequency (calibrated by a sequencing error model)
The SVM module is designed to remove low-quality SNVs
The variant calling metrics including the QS for calling
The germline SNVs are considered as the positive training sets
while the continuous de novo SNV chunks (>2 SNVs) that do not include any germline SNV are set as the negative sets
The remaining de novo SNVs are considered as the test set
The alleles observed at a de novo SNV site are statistically phased together with adjacent germline alleles to calculate an LD refinement score that estimates the percentage of cells in which the alleles do not cosegregate with neighboring germline alleles
De novo SNVs with high LD refinement scores are classified as the putative somatic SNVs
and their genotypes at the single cell/cluster level are inferred using Monovar
Projection of study samples onto the HGDP enables genetic ancestry inference
Genome-wide association study of cellular quantitative traits can be performed when there is sufficient sample size
Lineage tracing at single cell or clonal level
Monopogen is implemented in Python, automatically splitting the genome into small chunks (defined by the users), performing variant scan and LD refinement in massive parallelization for individual chunks and merging the results (Supplementary Note)
Overall accuracy and SNV detection sensitivity (recall) in representative snRNA-seq (n = 4)
sci-ATAC-seq data (n = 2) and scDNA-seq data (n = 1) using matched WGS data as the gold standard
The x axis denotes the overall accuracy and y axis denotes the detection sensitivity (recall)
The closer a dot is to the top-right corner
the better the corresponding method has performed
only the SNVs present in the 1KG3 were considered
Median sequencing depth of SNVs found from snRNA-seq data (b) and sci-ATAC-seq data (c) over gene annotations
The pie charts show the percentage of SNVs in each category
Number of SNVs versus the number of cells in the retina data via downsampling
Pearson’s correlations were applied to calculate the R and the P values
Number of SNVs detected from seven single-cell sequencing datasets
The sequencing coverage was calculated as the \(L\times n/(3.2\times {10}^{9})\)
where L is the read length and n is the total number of reads in one sample
while each big dot is the mean value of a dataset
The top ellipse covers samples from scATAC-seq data and the bottom ellipse samples from scRNA-seq data
sequencing depth was higher in genes than in intergenic regions
Off-target reads appear sufficiently leveraged to derive accurate genotypes through LD-based refinement
In the colon sci-ATAC-seq data, Monopogen detected 752 K to 1.1 M germline SNVs, achieving a recall of 25%. In contrast, the recall for Samtools, GATK and FreeBayes was less than 12%. Strelka2 detected ~30% SNVs with an accuracy lower than 40%. Most (57.4%) of the SNVs from Monopogen were found in intergenic regions and 38.6% in gene regions (Fig. 2c)
We also included two SNV callers cellSNP and scAllele that were designed for single-cell sequencing data
cellSNP had the lowest SNV detection (<5%)
and scAllele had the lowest accuracy (<10%) across three benchmarking datasets
demonstrating the efficiency of LD-based genotyping refinement on challenging scenarios
further demonstrating the robustness of Monopogen SNVs calling on various sequencing platforms
demonstrating the possibility of distinguishing individuals from the same ancestry
This indicates that the LD-based genotyping refinement from the commonly used 1KG3 panel did not over-correct genotypes on subpopulation or individual levels
To demonstrate the utilization of Monopogen in establishing the link between genetic variants and cellular quantitative traits in a cell-type or cell-state-specific manner
we characterize the genetic contribution to metabolic processes (such as ATP production) and epigenetic programs in healthy cardiomyocytes
These relationships are usually disguised by previous bulk-based data analysis
Variant calls were further merged for samples of paired modalities
Ancestry admixture analysis using inferred genotypes shows that this cohort contains samples with diverse ancestry, which are as follows: European (71.1%), Asian (10.2%) and African (8.5%). Six samples appeared admixed (Supplementary Fig. 6a)
Manhattan plot showing the association of SNVs with the GATA4 motif-based transcription factor activity level in cardiomyocytes
The gray line denotes the P value threshold 10−6
Boxplot shows the difference in GATA4 activity level across the three genotypes of rs17745507 (one of the leading variants in ADAM12)
the height of the box is given by the interquartile range (IQR) and the whiskers are given by 1.5× IQR
we were able to reveal potential genetic determinants of cardiac health via metabolic and epigenomic trait mapping of cardiomyocytes
Associations identified in this fashion may lead to a better understanding of the pathogenicity of noncoding variants in a cell-type-aware manner
a,b, LD refinement scores on germline SNVs from the TNBC single-cell DNA data. It is shown with two-locus model in a and three-locus model in b. c, Evaluation of de novo SNVs from Monopogen by comparison with categories defined in matched bulk DNA sample (Methods)
Distribution of LD refinement scores for de novo SNVs that are classified as germline and somatic SNVs from the bulk sample
Boxplot displaying the relationship between LD refinement score and BAF
the height of the box is given by the interquartile range (IQR)
the whiskers are given by 1.5× IQR and outliers are given as points beyond the minimum or maximum whisker
LD refinement scores on germline SNVs from the bone-marrow sample measured in single-cell RNA data
It is shown with two-locus model in g and three-locus model in h
the length of haplotypes is grouped into 13 bins (Methods)
The y axis shows the mean value of LD refinement score within each bin together with the 95% confidence interval
The total number of haplotypes used for evaluation is labeled at the right-bottom of each panel
Number of SNVs detected in each step from Monopogen
Heatmap displaying the detected percentage of putative somatic SNVs in each mtDNA clone (the sum of each row is 1)
UMAPs displaying the cell types annotated in myeloid and erythroid lineages
UMAPs displaying the mutated cell distribution for mtDNA variant 2593G:A (l) and three selected putative somatic SNVs from scRNA-seq (m)
Heatmap displaying the detected percentage of putative somatic SNVs in each TRB clone
UMAPs displaying the cell types annotated in T/NK cell lineages (o)
the mutated cell distribution for TRB region CASAPNFGQELTYEQYF (p) and the putative somatic SNV chr20:2904623A:G (q)
the mutated cell distribution for TRB region CASSQAGAANTEAFF (r) and the somatic SNV chr1:91689518A:G (s)
there were 11 known oncogenes and 12 tumor suppressors
The unknown SNVs from Monopogen may contain low-abundance somatic SNVs that were missed by matched bulk sequencing
indicating these putative somatic SNVs may represent multiple T-cell clonotypes that have occurred from multipotent hematopoietic stem cells
global and local ancestry inference can be reliably performed in studies that have only single-cell sequencing but not bulk sequencing or array-based genotyping data
which greatly increases the chance of discovering genetic factors underlying diverse cellular quantitative traits and disease
leveraging the power of having phased haplotypes from germline SNVs
the LD refinement models applied at cell population level enabled us to substantially increase the accuracy of somatic SNV detection in sparse
Although Monopogen can potentially detect putative somatic SNVs
it is challenging to separate germline from truncal somatic SNVs whose BAFs are close to 0.5
those SNVs can be easily detected via bulk sequencing
In the human heart left ventricle analysis
we demonstrated the utilization of Monopogen-called genotypes to identify associations of ATP metabolism and GATA4 activity levels in one cell type
such analysis can be extended to other cell types and cellular quantitative traits of interest that could be objectively measured
such association analysis should be guided by strong prior knowledge to reduce the burden of multiple hypothesis testing
with the increasing generation of sparse single-cell sequencing data and expansion of data modalities
our work will become increasingly relevant for assessing the effects of genetic ancestry and discovering genetic mechanisms underlying complex traits in human populations and diseases
Monopogen starts from individual bam files of single-cell sequencing data
Reads with high alignment mismatches (default four mismatches) and lower mapping quality (default 20) are removed
We first scan the putative SNVs in a sensitivity way
Any loci are detected from pooled (across cells) read alignment from one sample wherever an alternative allele is found in at least one read
For each candidate SNV locus m with observed sequencing data information d
we record its genotype likelihoods (GL) that incorporate errors from base calling and alignment as
For each locus m, we calculate the observed genotype as the one with the highest posterior probability from Eqs. (1) and (2)
The final genotype of locus m is set as \({G}_{{m|H},d}\) if \({G}_{{m|H},d}={G}_{{m|d}}\)
The heterozygous loci that are imputed to homozygotes are considered as sequencing errors (that is
\({G}_{{m|H},d}=0\) and \({G}_{{m|d}}=\mathrm{1,2}\))
We classify this discordance into 12 categories:
The median BAF across all inconsistent loci in each category c is denoted as BAFc
This is considered the threshold to separate the sequencing error from the true heterozygous
SNVs with \({G}_{{m|H},d}={G}_{{m|d}}\) are retained as the germline SNVs (that is
Others are only used to build the sequencing error model and are not included in the final genotyping call set
we implement the following two filters: (1) the total sequencing depth filtering (default 100); and (2) BAF less than the threshold from the above sequencing error model
one putative SNV genotyped as A/T with its BAF lower than \({{\max }}\,\{\mathrm{BA{F}_{AT\to AA},BA{F}_{AT\to TT}}\}\) is removed due to difficulties in separating true heterozygotes from sequencing errors
The somatic SNVs calling includes the following two major modules: (1) removing low-quality SNVs using an SVM and (2) distinguishing somatic from germline SNVs using LD refinement models at the cell population level
all detected germline SNVs overlapped with 1KG3 are considered as the positive set
We define de novo SNVs found consecutively (default >2 SNVs) in genomic chunks that do not contain any germline SNV as the negative set
This is because the chance of only detecting multiple somatic SNVs in one region without any germline SNVs is typically low due to the low average somatic mutation rate in most datasets
SNVs calling quality metrics including quality score for calling
variant distance bias for filtering splice-site artifacts
Mann–Whitney U test of ratio of mapping quality and strand bias
segregation-based metric and BAF are selected as features
The model is trained using the svm function implemented in R package e1071
The de novo SNVs with a predicted probability of positive labels less than 0.5 are set as sequencing errors and excluded from downstream analysis
The de novo SNVs passing the SVM filtering are further interrogated using the LD refinement models
The LD refinement models assume that only two alleles are present in the cell population
We first estimate the LD refinement scores on germline SNVs that quantify the degree of their LD
taking into consideration widespread sparseness and allelic dropout in single-cell sequencing data
We then implement germline LD patterns to statistically phase the observed alleles of de novo SNVs in the cell population
We assume that the germline SNV block includes nm SNVs with genotype vector being \(\left\{{G}_{1},{G}_{2},\cdots ,{G}_{{n}_{m}}\right\}\)
Denote \({G}_{i}={A}_{i}^{1}|{A}_{i}^{2}\)
The cell level genotype matrix G on these germline SNVs can be represented as
not all adjacent germline SNVs are informative for LD refinement
Here we first define a two-locus neighborhood index in cell j to identify informative germline SNV pairs as
Illustration of two-locus neighborhood index can be seen in Supplementary Fig. 1b
Denote \({{\mathscr{H}}}_{2}\) as the set including all two-locus neighborhoods
We next group elements in \({{\mathscr{H}}}_{2}\) based on the distance of SNVs as
The two-locus haplotype in \({{\mathcal{H}}}_{2}\) with allele cosegregated can be represented as
the two-locus LD refinement score with physical distance being d is calculated as
we first define the three-locus neighborhood index in cell j as
The three-locus neighborhood means that the upper and lower SNVs detect the same allele. Illustration of three-locus neighborhood index can be seen in Supplementary Fig. 1b
Denote \({{\mathscr{H}}}_{3}\) as the set including all three-locus neighborhoods
We next group \({{\mathscr{H}}}_{3}\) based on the length of haplotype as
The three-locus haplotype in \({{\mathscr{H}}}_{3}\) with allele cosegregated can be represented as
the three-locus LD refinement score with physical distance being d is defined as
The two-locus and three-locus LD refinement scores \(p({\mathscr{H}}_{2}^{d}),\,p({\mathscr{H}}_{3}^{d})\) can largely represent the colocalization for neighboring SNVs on a DNA haplotype or RNA transcript at the cell population level
the physical distance d is grouped into 13 bins with <100 bp
We next phase the de novo SNVs based on germline SNVs
Assume the genotype of de novo SNV s is \({A}_{s}^{1}/{A}_{s}^{2}\) and its adjacent germline SNV profile for cell j as follows:
where \({{\mathrm{Neighb}}}_{2}\left(k,s,j\right)=1\) and \({{\mathrm{Neighb}}}_{2}\left(s,l,j\right)=1\)
\({c}_{{sj}}^{1}\) and \({c}_{{sj}}^{2}\) are the number of reads supporting allele \({A}_{s}^{1}\) and \({A}_{s}^{2}\)
it is difficult to detect allele \({A}_{s}^{1}\) and \({A}_{s}^{2}\) simultaneously in each cell
we set \(\left|{d}_{k}-{d}_{s}\right| < \left|{d}_{s}-{d}_{l}\right|\).The probability of phased genotype \({A}_{s}^{1}|{A}_{s}^{2}\) under two-locus model is
To derive the probability of haplotype \({A}_{s}^{1}|{A}_{s}^{2}\) under three-locus model
we need to search germline SNV k and l satisfying \({{\mathrm{Neighb}}}_{3}\left(k,s,l,j\right)=1\)
The probability of phased genotype \({A}_{s}^{1}|{A}_{s}^{2}\) by combining two models is
the probability of phased genotype \({A}_{s}^{1}|{A}_{s}^{2}\) for de novo SNV s across the cell population is
the probability of phased genotype \({A}_{s}^{2}|{A}_{s}^{1}\) for de novo SNV s across the cell population is
we have \(p\left({A}_{s}^{1}|{A}_{s}^{2}\right)+p\left({A}_{s}^{2}|{A}_{s}^{1}\right)=1\)
The genotype of s is set \({A}_{s}^{1}|{A}_{s}^{2}\) if \(p\left({A}_{s}^{1}|{A}_{s}^{2}\right) > p\left({A}_{s}^{2}|{A}_{s}^{1}\right)\) and \({A}_{s}^{2}|{A}_{s}^{1}\) otherwise
The LD refinement score ps is defined as \({p}_{s}={{\min }}\left\{p\left({A}_{s}^{1}|{A}_{s}^{2}\right),p\left({A}_{s}^{2}|{A}_{s}^{1}\right)\right\}\)
The LD refinement score ps ranges from 0 to 0.5
It is closer to 0 for a germline SNV as it has strong LD with the adjacent germline SNVs
sharing the same two haplotypes in all the cells
The score is greater than 0 for a somatic SNV as the recently gained somatic allele cosegregates with germline alleles in only a subpopulation of cells
SNVs with a larger LD refinement score are classified as putative somatic SNVs (default value 0.25)
only reads covering these candidate loci are extracted and then split into different bam files based on their cluster identities
Monovar can be run on these bam files (each is one cluster or cell type) with default parameter settings
Seven single-cell samples in our study have matched WGS data that were treated as the gold standard
only bi-allelic loci having at least one alternative allele (that is
genotype is 0/1 or 1/1) were extracted from the two call sets
denoting as N (Monopogen-called) and W (WGS-called)
The sensitivity (recall) was defined as \({|N}\cap {W|}/{|W|}\) and specificity (precision) as \(\frac{{|N}\cap {W|}}{{|N|}}\)
The genotyping accuracy was defined as the fraction of identical genotypes in the \(\left|N\cap W\right|\) overlapping SNVs
The overall accuracy was defined as the specificity multiplied by the genotype accuracy
The genotype concordance of the Monopogen-called genotype data versus the AIDA Illumina GSAv3 genotype data was computed by first counting the number of matching alleles between the Monopogen and the Illumina GSAv3 results for loci found in both sets
The minimum possible concordance score per Monopogen calls (accounting for some match always being possible in the case of heterozygous genotypes) was subtracted
and the resulting scores were then normalized against the number of loci evaluated
two PCA coordinates were calculated as \({{\boldsymbol{Y}}}_{n\times K}\) and \(\left[\begin{array}{c}{\widetilde{{\boldsymbol{Y}}}}_{n\times {K}^{{\prime} }}\\ {\widetilde{{\boldsymbol{y}}}}_{1\times {K}^{{\prime} }}\end{array}\right]\,({K}^{{\prime}}\ge K)\) by applying eigenvalue decomposition on the genetic relationship matrix (GRM) \({\boldsymbol{R}}{{\boldsymbol{R}}}^{T}\) and \(\widetilde{{\boldsymbol{R}}}{\widetilde{{\boldsymbol{R}}}}^{T}\)
Projection procrustes analysis was used to find an orthonormal projection matrix \({{\boldsymbol{A}}}_{{K}^{{\prime} }\times K}\) and an isotropic calling factor ρ such that \({{\Big|\Big|}\rho \widetilde{{\boldsymbol{Y}}}{\boldsymbol{A}}-{\boldsymbol{Y}}{\Big|\Big|}}_{F}^{2}\) is minimized
where \({{||}.{||}}_{F}^{2}\) represents the square of Frobenius norm
Once \({{\boldsymbol{A}}}_{{K}^{{\prime} }\times K}\) and ρ were solved
the sample-specific PCA-projection coordinates on HGDP panel can be calculated as \({\boldsymbol{y}}=\rho \widetilde{{\boldsymbol{Y}}}{\boldsymbol{A}}\)
The PC coordinates of \(\left[\begin{array}{c}{{\boldsymbol{Y}}}_{n\times K}\\ {{\boldsymbol{y}}}_{1\times K}\end{array}\right]\) were used for PCA-projection visualization
Monopogen-called genotypes were input to the PopPhased module with the following flags: -w 0.2
The RFMix output was collapsed into haploid bed files
and ‘UNK’ or unknown ancestry was assigned where the posterior probability of a given ancestry was <0.90
These collapsed haploid tracts were used for local ancestry component visualization (segment size was set as 1 cM)
The RFMix tool was also run on WGS genotypes from matched samples
the ancestry component percentage for each source population was recorded
The local ancestry consistency index was calculated as the correlation of the ancestry component vector between the two call sets
There are 54 donors sequenced with snRNA-seq and 65 with snATAC-seq, among which 54 are paired. For the downstream association study, SNV calling of 54 snRNA-seq and 65 snATAC-seq samples were performed separately using Monopogen, followed by removing MAF < 10%. Variant calls were further merged for samples of paired modalities (Supplementary Table 4)
Cell type annotation was performed by uploading all the cells of each sample to the online Azimuth heart database in Seurat V4 (ref. 19)
Cells with predicted cell type probability scores lower than 0.9 were removed
Only cells annotated as cardiomyocytes were extracted for the downstream association study
The gene-level chromatin accessibility was derived using GeneActivity module by aggregating peaks in gene promoters plus upstream 2 kb
The cell type annotation was also performed using the online Azimuth heart database under the same quality control criteria as in the snRNA-seq analysis
GCTA22 was used to calculate a GRM among single-cell sequencing samples
The association studies on ATP metabolism level and GATA4 activity level were performed using its fastGWA-mlm option with the input of GRM and covariates as the top five ancestry PCs
Only variants with MAF > 10% were considered for association studies
The inflation factor of Quantile–Quantile plots was calculated using the R package qqman to examine whether there is population stratification in our genome-wide scan
Manhattan plot was used to show the P value across the whole genome with P = 10−5 as potential significant associations with cellular traits
The significant loci were further grouped into bins based on their closest genes
The nearest genes to significant loci were annotated
the mpileup option was used to transform base calling and alignment information into the GL
followed by variant calling using Bcftools
The GATK was run using the HaplotypeCaller mode with default settings
The P value lower than 0.01 was reported as enriched in the specific mtDNA clone
The putative somatic SNVs were grouped based on whether they were enriched in the same mtDNA clone
We then calculated the cellular concordance of each mtDNA clone as the number of cells detected in both the mtDNA clone and its matched somatic SNV group
divided by the total number of cells in the mtDNA clone
The overall concordance was the mean across all the mtDNA clones
The same scheme was used to compare somatic SNVs against TRB/A regions
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
The scDNA-seq from the TNBC sample was downloaded from breast cancer study33
The single-cell profiles of 20 HBCA samples, 20 AIDA samples, and four retina samples were generated as part of the cell atlas and genetic ancestry networks organized by the Chan Zuckerberg Initiative. The 20 AIDA single-cell samples could be downloaded from https://data.humancellatlas.org/explore/projects/f0f89c14-7460-4bab-9d42-22228a91f185
The four retina single-cell samples could be downloaded from https://data.humancellatlas.org/explore/projects/f0f89c14-7460-4bab-9d42-22228a91f185
The 20 HBCA single-cell samples could be accessed through GSE195665 (https://navinlabcode.github.io/HumanBreastCellAtlas.github.io/dataAccess.html)
Monopogen is available in open source at https://github.com/KChen-lab/Monopogen. Scripts for reproducing key analysis results are also available at https://github.com/KChen-lab/Monopogen
GTEx Consortium The GTEx Consortium atlas of genetic regulatory effects across human tissues
Large-scale cis-and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression
Identification of context-dependent expression quantitative trait loci in whole blood
Single-cell RNA sequencing identifies celltype-specific cis-eQTLs and co-expression QTLs
Single-cell RNA-seq reveals new types of human blood dendritic cells
Single-cell RNA-sequencing of differentiating iPS cells reveals dynamic genetic effects on gene expression
Cellular deconvolution of GTEx tissues powers eQTL studies to discover thousands of novel disease and cell-type associated regulatory variants
An integrative approach for building personalized gene regulatory networks for precision medicine
Population genetics meets single-cell sequencing
The human cell atlas: from vision to reality
The human tumor atlas network: charting tumor transitions across space and time at single-cell resolution
Low-coverage sequencing: implications for design of complex trait association studies
Using off-target data from whole-exome sequencing to improve genotyping accuracy
association analysis and polygenic risk prediction
Reliable identification of genomic variants from RNA-seq data
The sequence alignment/map format and SAMtools
The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data
Systematic comparative analysis of single-nucleotide variant detection methods from single-cell RNA sequencing data
Monovar: single-nucleotide variant detection in single cells
Integrated analysis of multimodal single-cell data
RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference
GCTA: a tool for genome-wide complex trait analysis
Wnt signaling exerts an antiproliferative effect on adult cardiac progenitor cells through IGFBP3
F-box and leucine-rich repeat protein 22 is a cardiac-enriched F-box protein that regulates sarcomeric protein turnover and is essential for maintenance of contractile function in vivo
Conserved N-terminal cysteine dioxygenases transduce responses to hypoxia in animals and plants
Cardiac metabolism and its interactions with contraction
Cardiac metabolism in heart failure: implications beyond ATP production
Mutation in myosin heavy chain 6 causes atrial septal defect
Interaction of Gata4 and Gata6 with Tbx5 is critical for normal cardiac development
Combinatorial transcriptional control in blood stem/progenitor cells: genome-wide analysis of ten major transcriptional regulators
Complex interdependence regulates heterotypic transcription factor distribution and coordinates cardiogenesis
Cardiac hypertrophy is inhibited by antagonism of ADAM12 processing of HB-EGF: metalloproteinase inhibitors as a new therapy
Breast tumours maintain a reservoir of subclonal diversity during expansion
Mitochondrial variant enrichment from high-throughput single-cell RNA sequencing resolves clonal populations
Target-enrichment strategies for next-generation sequencing
Ancestry estimation and control of population stratification for sequence-based association studies
Full-length RNA-seq from single cells using Smart-seq2
Clinical use of current polygenic risk scores may exacerbate health disparities
Large-scale whole-genome sequencing of three diverse Asian populations in Singapore
Sequencing of 53,831 diverse genomes from the NHLBI TOPMed program
Single-cell RNA-seq reveals cell type-specific molecular and genetic associations to lupus
Single-cell eQTL mapping identifies cell type-specific genetic control of autoimmune disease
Pan-cancer single-cell landscape of tumor-infiltrating T cells
Advances and applications of single-cell sequencing technologies
Lineage tracing meets single-cell omics: opportunities and challenges
A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals
Fast two-stage phasing of large-scale sequence data
Improved ancestry estimation for both genotyping and sequencing data using projection procrustes analysis and genotype imputation
Single-cell chromatin state analysis with Signac
chromVAR: inferring transcription-factor-associated accessibility from single-cell epigenomic data
A framework for variation discovery and genotyping using next-generation DNA sequencing data
Garrison, E. & Marth, G. Haplotype-based variant detection from short-read sequencing. Preprint at arXiv https://doi.org/10.48550/arXiv.1207.3907 (2012)
Strelka2: fast and accurate calling of germline and somatic variants
Cellsnp-lite: an efficient tool for genotyping single cells
scAllele: a versatile tool for the detection and analysis of variants in scRNA-seq
Integrated informatics analysis of cancer-related variants
CScape: a tool for predicting oncogenic single-point mutations in the cancer genome
Rozowsky, J. et al. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Preprint at bioRxiv https://doi.org/10.1101/2021.04.26.441442 (2021)
Single-nucleus cross-tissue molecular reference maps toward understanding disease gene function
An integrated map of structural variation in 2,504 human genomes
Worldwide human relationships inferred from genome-wide patterns of variation
Download references
This project has been made possible in part by Human Cell Atlas Seed Network (grants CZF2019-02425 and CZF2019-002432)
Genetic Ancestry Network (grant CZF2021-239847) from the Chan Zuckerberg Initiative DAF
an advised fund of Silicon Valley Community Foundation (grant U01CA247760 to K.Chen)
the Cancer Center Support (grant P30 CA016672 to P
the Chan Zuckerberg Foundation (grant CZF2019-002446 to S
Technology and Research (A*STAR) in Singapore (grant IAF-PP-H18/01/a0/020 to S
This work is also supported by the CPRIT Single-Cell Genomics Center (grant RP180684 to N
CPRIT Training Program (grant RP210028 to H.Jin) and National Cancer Institute (grant U24CA264010 to L
Xu from Baylor College of Medicine for suggestions on left ventricle single-cell studies
Nakhleh for Monovar implementation/maintenance
The University of Texas MD Anderson Cancer Center
Department of Molecular and Human Genetics
Laboratory for Genome Information Analysis
RIKEN center for Integrative Medical Sciences
Graduate School of Integrated Sciences for Life
RIKEN Center for Integrative Medical Sciences
The University of Texas Health Science Center
McWilliams School of Biomedical Informatics
conceived the project and designed the experiments
participated in the discussion of manuscript writing
All authors read and approved the final version of the manuscript
The authors declare no competing interests
Nature Biotechnology thanks Alejo Fraticelli and the other
reviewer(s) for their contribution to the peer review of this work
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Supplementary Tables 1–8 and Supplementary Note
Download citation
DOI: https://doi.org/10.1038/s41587-023-01873-x
Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research
The Phoblographer may receive affiliate compensation for products purchased using links in this article. For more information, please visit our Disclaimers page.
All images by Miguel A. S. Used with permission. Please follow Miguel’s Flickr and Instagram @redcosmonautgirl. Some translation corrections were implemented
Miguel started photography around the summer of 2014 and originally took inspiration from Alberto Verdú
“He is an amazing wedding and street photographer in Monovar (Alicante),” Miguel tells us
Upon hearing that Alberto was putting on a local workshop
That’s how he started to see people and his environment in a brand new way
Part of Miguel’s creative vision is both color are composition. He genuinely believes that this is how you make street photography that gets people to keep looking at your work
he uses a Fujifilm X Pro 2 along with the 23mm and 18mm lenses
“…I really love to use the flash
during the day or night,” he tells us
“Fujifilm cameras are pretty small and easy to manage
Their portability makes it easy for me to get lost in the moment while taking pictures.”
Miguel heads out to shoot whenever he can and doesn’t like AI imagery
He’s been fooled several times via social media
The Phoblographer works with human photographers to verify that they’ve actually created their work through shoots. These are done by providing us assets such as BTS captures, screenshots of post-production
We do this to help our readers realize that this is authentically human work
Here’s what this photographer provided for us
Of all the mentors Begonya Klumb has had in her career
one stands out as being particularly influential: her grandmother
Josefa Bonastre founded a shoe manufacturing and export company in the 1940s and by the 1970s it was one of the biggest employers in Monóvar
It was Bonastre — “la Jefa” to the people of Monóvar — who encouraged Klumb to study business and economics and who taught her that treating employees with respect is the key to building a successful organization
Women to Watch 2016: Begonya KlumbWomen to Watch 2015: Begonya KlumbNow the head of health care services at UMB Financial in Kansas City
established UMB as one of the nation’s foremost administrators of health savings accounts and related products
When Klumb took over as CEO of the division in 2015
UMB was the nation’s eighth-largest HSA administrator
with more than 1 million accounts and $2 billion of assets and deposits under management
The HSA division is the fastest-growing business unit within the $20.1 billion-asset UMB and Klumb aims to maintain that momentum by pursuing strategic acquisitions — it recently bought the health savings portfolio of a bank that has exited the business — and investing in technology that will help customers manage their accounts
See the most recent rankings:• Most Powerful Women in Banking• Women to Watch• Most Powerful Women in Finance
Before taking the helm of the HSA division
Klumb launched and built the first mergers-and-acquisitions unit at UMB and
she remains frustrated that female executives still endure what she refers to as “a prevalence of latent biases.” These include off-handed comments such as one she heard recently from a man praising a female business leader’s accomplishments while raising three kids
but in my mind he was contributing to a bias that we expect women to raise children when we should expect parents to raise children,” she said
the Federal Reserve is defending its current practices in court
That argument raises thorny legal questions about whether stress tests are more like rules or adjudications
Innovation of the Year 2025 Innovation of the Year 2025: Meet the honorees The 10 winning innovations span categories from AI and payments to risk and compliance
An overall winner will be announced at American Banker's Digital Banking event on June 2
Student loans CFPB wins rare judgment over student loan debt relief firm A federal judge has ordered FDATR
a now-defunct student loan debt relief provider
to pay $43 million in restitution and fees
bucking the trend of cases brought by the Biden administration-era Consumer Financial Protection Bureau being dropped
Industry News How Cathinka Wahlstrom is modernizing America's oldest bank BNY's chief commercial officer talks about AI
tariffs and her efforts to help create a leaner
FORECLOSURE WARS She stopped paying her mortgage more than 15 years ago
OrganisationCondolences for the death of José Luis MonrealMálaga CF deeply regrets the death of former player and coach of CD Málaga, José Luis Monreal, and sends deepest sympathies to his family and friends. Rest in peace.
Copy linkHighlights | José Luis Monreal Sotillo (Madrid
remembers Monreal: “He was my teammate for 8 or 9 years at Málaga. He came from Madrid to play with Malagueño as a midfielder
he was moved on to the left. He was very competitive
a winner who never wanted to leave the pitch (…) He got married in Málaga and lived well here. He was a true sportsman and his death came as quite a shock. Everything said about him is good
we’re deeply saddened by his loss.”
president of the Association of Ex Málaga Football Players: “José Luis was a regular at the monthly meals with the Veterans. During the last one in January
we talked a lot about football. You learnt from him as a coach
and as a person he was a good friend and adviser. His death came as such a shock and it still hasn’t sunk in. He was fine
he was an athlete. I was a pupil of his
and I knew him as a coach and personally for many years.”
Monovar Hussain-Butt was sentenced to thirteen months for battery
dangerous driving and driving while disqualified A man has been jailed for taking his family on a terrifying tour of Bristol and assaulting his wife in the back of his Jaguar in front of their four-year-old child before trying to run from police
tried to escape from police in his Jaguar XF after a member of the public called 999 and reported seeing his wife sprawled on the back seat
said she had been surprised to see him as he was already disqualified from driving following a previous offence on April 2
Mr Gordon said it soon became obvious Hussain-Butt, of Cranmore Crescent in Southmead
The court heard the defendant was furious because his wife was applying to be a taxi driver to supplement the family income – the same job he had done before he was disqualified
He shouted at her: “You’re not f***ing doing it.”
They pulled up outside Luckwell Primary School where Mr Gordon said the defendant leaned round into the back seat while his wife was screaming to be let out
A caretaker at the school reported hearing a “blood-curdling scream” and a man running an after-school club said he saw the defendant’s elbow “pumping up and down” inside the car with “significant force”
Meanwhile the prosecution described how Hussain-Butts four-year-old daughter put up her hands as if to try and stop her father
The woman tried to get out of the car but the defendant drove off
The next time they were seen was on Clifton Road by a member of the public who saw the woman sprawled in the back seat of the Jaguar
Hussain-Butt was sentenced at Bristol Crown Court
The police caught up with the car on St Luke’s Road but the court heard the defendant refused to stop.
A police car with lights and its siren on blocked his path and a policeman grabbed the rear door handle to try and rescue the wife but Hussain-Butt reversed away from the police car, dragging the officer with him and crashing into a post box before driving away.
The ordeal came to an end when the defendant finally let his wife and child out of the car at Temple Meads railway station, where she called the police and was taken to hospital with bruising and abrasion to her face.
After he was arrested and interviewed Hussain-Butt denied hitting his wife and stopping her leaving the car – saying it was only a verbal argument.
Shortly after that his wife retracted her statement saying she was not trapped in the car or punched.
“It was clear she didn’t wish to pursue the allegation,” said Mr Gordon.
The prosecution said they were able to make a case on the basis of what she had already told police and what was seen by independent witnesses.
In passing sentence Judge Patrick called it “an ugly incident.”
He said: “No matter what was going on in your life with your wife, you had no business to deal with her in the way that you did.
“The fact that it was distressing to a child makes this all the more serious.”
He also said the defendant had tried to make good his escape, and could have injured pedestrians while trying to get away from police.
Hussain-Butt was found guilty of battery, driving while disqualified and dangerous driving and given 13 months in prison, half to be served on licence.
He was also disqualified from driving for two and a half years and made to pay a surcharge.
Metrics details
methods that account for the dynamics of mutational signatures in cellular evolution will improve the diagnosis
and prognosis of diseases for which somatic alterations are a key factor
obtaining accurate profiles of the genetic variation affecting single cells is essential
there is no statistical model that allows for both local variation of bias and errors due to amplification
and for statistically sound false discovery rate control when calling and genotyping SNVs in single cells
ProSolo’s statistical rigor allows for accurate control of the false discovery rate when calling alternative alleles or identifying other relevant effects
It achieves a higher variant calling accuracy compared to state-of-the-art tools
we name the central innovations of our model and demonstrate its advantages in comparison to existing approaches
A more detailed description of the innovations is available in the Methods section
our model addresses the two major issues of MDA: (i) the differential amplification of the two alleles present in a diploid cell (“amplification bias” in the following); (ii) MDA induced errors (“amplification errors” in the following) which are copy errors introduced by the Φ29 polymerase used in MDA
empirically derived model of differential amplification of alleles
we evaluate single-cell samples together with a bulk sample from which the single cell is supposed to stem
we argue that a bulk sample should be added to single-cell sequencing experiments wherever possible: it samples from the same cell population without requiring amplification
and is therefore unaffected by amplification bias and errors and thus makes a particularly useful background sample to address the statistical uncertainties and biases induced by MDA
one of the major features of the core model and its implementation is that it can easily be adapted to flexibly deal with other sampling setups
so it could be extended to further scenarios
The most precise single-cell variant callers to date
the absence of an alternative allele (i.e.
the heterozygous and the homozygous alternative genotypes called jointly)
We thus focused on this for the main benchmarking
All panels are strong zoom-ins, focusing on (different) areas of interest. Global views of these panels are provided in Supplementary Fig. 7
a Precision and recall of an average of two whole genome sequenced single cells IL-11 and IL-12 against their kindred clone IL-1C as ground truth genotypes
b Precision and recall average of the five whole-exome sequenced single granulocytes against their pedigree-based germline genotype ground truth
c Precision and recall average of 16 tumor and 16 normal single cells sequenced at the whole exome level
−b The germline ground truth induces an artificial increase of recall for SCIPhI’s sensitive and ProSolo’s imputation mode; these modes should thus be disregarded for a fair comparison on the granulocyte dataset in panel b
Threshold parameters (not comparable across tools): MonoVar --t; ProSolo --fdr; SCAN-SNV --fdr; SCcaller -a cutoff; SCIPhI prosolo --fdr
Software modes: MonoVar with consensus filtering (default) or without (no consensus); ProSolo with minimum coverage 1 in single-cell (default)
or imputing zero coverage sites based on bulk sample (imputation); SCcaller with recommended settings (default) or with a more sensitive calling; SCIPhI with default parameters (default) or all heuristics off (sensitive)
MonoVar achieved a maximum precision of only 0.962
this was at a much higher recall (0.141) than for example SCcaller (0.095 at a precision of 0.972)
SCcaller’s decreased recall on this dataset might be due to its estimation of local allelic bias by also taking biases at neighboring sites into account—in whole-exome data the number of neighboring sites available for this estimation will be limited and might lead to less reliable estimates
SCAN-SNV’s recall increased to 0.0016 at a decreased maximum precision of 0.897
this decreased precision is an artifact of using the germline genotype as ground truth
At the sites with somatic mutations in single cells
this ground truth will instead contain the homozygous reference germline genotype and will incorrectly classify (existing) alternative alleles as false positives
we also expect the calculated precision of all the other tools to be underestimated
as the other tools also provide alternative allele calls for all sites where the single cells retained this germline genotype
the relative effect on their precision will be smaller
But this still leaves ProSolo as the only tool that provides the user with the choice of either aiming for more discoveries at the cost of a higher rate of false discoveries
or at aiming for a more limited number of discoveries with higher confidence in each of them
we suggest to not impute zero-coverage single cell sites whenever possible and instead recommend using and developing downstream software that can deal with both these missing values and the event probabilities that ProSolo provides
uncertainties in the probabilities and information about missing data are passed on and will allow for more accurate statistical modeling in those downstream analyses
which might explain the higher overall coverage and points to a possible source for the discrepancy between the naively calculated and the estimated allele dropout rates
If samples PAG1 and PAG10 were doublets as well
this would indicate that our use of empirical distributions in ProSolo provides for more robust event probabilities in the presence of doublets
while heuristic thresholding (as in our naive allele dropout estimate) is very sensitive to such perturbations
the parameters of the mechanistically motivated combination of beta-binomial distributions for modeling heterozygous genotypes could be learned per single cell sample at germline heterozygous sites—similar to the approaches of SCcaller and SCAN-SNV
but globally per cell with their local variation modeled by their dependence on a site’s coverage
When bulk coverage is low and an alternative allele is not sampled by any read
the prior will result in nonzero alternative allele frequencies in the bulk being assigned nonzero likelihoods
which is the desired behavior for this constellation
the increased amount of data will progressively overrule the prior
Studying and implementing the corresponding changes in the future has the potential to further improve the accurate site-specific event probabilities that ProSolo already provides through the joint modeling with a (sufficiently deep) bulk background sample
ProSolo provides an accurate and easy-to-use variant caller for single-cell MDA sequencing data
which will easily scale to calling variation on more cells and broader genomic coverage
It will thus empower more research using single-cell DNA sequencing data
More details and a detailed derivation of all model elements can be found in the Supplement
To account for MDA amplification bias up to the complete dropout of individual alleles
we distinguish between two alternative allele frequencies:
The true (but usually unknown) underlying allele frequency at a site in a single cell: θs
This can be assigned one of three possible values
where 0 and 1 represent the homozygous reference and alternative genotype and 0.5 a heterozygous genotype
the ratio of reads harboring the different alleles from a single cell sequencing experiment does not reflect the true allele frequency
because of the biases induced by amplification
the allele frequency after its distortion through amplification bias
of which k reads bear the alternative allele
the formal definition of this measurable frequency is \({\rho }_{s}=\frac{k}{l}\)
The goal is to estimate the likelihood density across the three possible underlying allele frequencies in the single-cell (we denote \({\tilde{\theta }}_{s}\) as the density estimate across all θs ∈ {0.0
To accurately quantify the uncertainty introduced by amplification bias
the probability distributions that reflect the shift from the true allele frequency θs to the distorted allele frequency ρs, as induced by MDA (Fig. 3c)
We thus formally describe the statistics of read counts skewed by MDA at all sites
encompassing sites that are homozygous for the reference allele
to calculate likelihoods for each of these possible true allele frequencies
the absence of a single cell candidate mutation in a bulk sample (with sufficient sequencing depth) increases the probability of an amplification error
bulk samples can be employed to improve both the sensitivity and specificity of variant calls in the single cell
where the increasing depth of coverage of the bulk sample increases the accuracy of the calls
we derive likelihood density estimates for all possible alternative allele frequencies in the background bulk sample
Given a set of n reads from the bulk (b) read data \({{{{{{{{\boldsymbol{Z}}}}}}}}}^{b}=\{{{{{{{{{\boldsymbol{Z}}}}}}}}}_{1}^{b},{{{{{{{{\boldsymbol{Z}}}}}}}}}_{2}^{b},\ldots ,{{{{{{{{\boldsymbol{Z}}}}}}}}}_{n}^{b}\}\)
and discrete possible allele frequencies \(\frac{m}{n}\) (m ∈ 0
we compute the probability of the data given a particular allele frequency as the product of the probabilities of all the reads:
when referring to the likelihood density estimates across all possible allele frequencies (as opposed to a particular allele frequency)
we denote this with \({\tilde{\theta }}_{b}\)
our model fully defines the two-dimensional space of possible underlying alternative allele frequencies in the two samples as:
please note that we here resolve a contradiction between \({\tilde{\theta }}_{s}=0\)
which indicates a homozygous reference single cell
which indicates that the bulk does not contain any homozygous reference cells
We decide to trust the bulk sample over the single-cell sample
and with our above assumption that the bulk is a mixture of a maximum of two subpopulations
the bulk can only contain heterozygous and homozygous alternative cells
This renders a homozygous reference single-cell impossible
and we classify this event as evidence for an allele dropout of the alternative allele
We thus obtain a set of mutually exclusive single-cell events (Fig. 3d):
Accounting for the sample likelihoods based on Supplementary Equation S 23 (assuming ρb = θb for the bulk that has no amplification step, Supplementary Equation S 3)
and evaluating only point estimates of these likelihoods at possible alternative allele frequencies
whose posterior probability we can obtain from this sum:
to calculate the posterior probability of an allele dropout at a particular site
we sum up the posterior probabilities of the two ADO events
a large enough sampling of the bulk cell population that the single-cell comes from should contain the single cell’s genotype at a particular site
unless this cell is genuinely the first cell to harbor a mutation at that site
This bulk background sample can thereby render credibility to single-cell variants with low coverage
while at the same time eliminating amplification errors in the single-cell sample
as these will not exist in the bulk sample
the bulk sample also provides a mechanism of biologically meaningful imputation at sites that have no read coverage in the single cell
If imputation is desired for sites with no read coverage in the single cell
we set \({{{{{{{\bf{P}}}}}}}}({{{{{{{{\boldsymbol{Z}}}}}}}}}^{s}| {\theta }_{s}=0)={{{{{{{\bf{P}}}}}}}}({{{{{{{{\boldsymbol{Z}}}}}}}}}^{s}| {\theta }_{s}=\frac{1}{2})={{{{{{{\bf{P}}}}}}}}({{{{{{{{\boldsymbol{Z}}}}}}}}}^{s}| {\theta }_{s}=1)=1\)
rendering all (unknown underlying) single-cell genotypes equally likely
the posterior probabilities of events at sites with no read coverage become solely dependent on the read data from the bulk sample
providing the most common genotype in the bulk
while this is a biologically meaningful way of imputation at the vast majority of genomic sites
it should be noted that this imputation will favor germline genotypes over any existing (lower frequency) somatic genotypes at a site
unless such a somatic genotype is present in a majority of cells
such an imputation carries the potential to introduce erroneous calls (especially when looking at subclonal somatic mutations)
and we recommend to instead use downstream tools that can accommodate for missing data and data uncertainty wherever they are available
We benchmarked ProSolo on three experimental datasets (Fig. 4)
each with a different kind of ground truth:
a single cell was selected as the founder for the secondary IL expansion into 20–30 cells
two cells were extracted (IL-11 and IL-12) and sequenced following MDA
The remaining kindred cells from that clone were used as a bulk sequencing sample without amplification (IL-1C)
as these cells are only very few cell divisions away from IL-11 and IL-12
and thus have almost no difference in the somatic mutations acquired
The ground truth genotype was generated using GATK HaplotypeCaller to call variant sites and bcftools mpileup to identify homozygous reference sites (with read coverage above 25 but no alternative allele present)
IL-1C was only used as ground truth and not provided as input to any of the software compared here
generated from cells after the first mini-expansion
were merged into a further bulk sample for SCcaller and ProSolo (see Software and Parameters below)
Unlike other callers (all of which finished in less than 5 days)
SCIPhI took 5 weeks to finish on this dataset in sensitive mode and 7.5 weeks in default mode
we selected granulocytes where at least 15 of these loci were properly amplified
we also extracted bulk DNA and submitted it to whole-exome capture and paired-end Illumina sequencing without MDA
to generate a bulk background sample for ProSolo and SCcaller
We analyzed the tumor and normal cells separately
ensuring that normal cells were called with the normal bulk background sample and tumor cells with the tumor bulk background sample
we then used the normal bulk sample augmented with the clonal tumor mutations confirmed by targeted duplex sequencing
removing the confirmed clonal and subclonal tumor mutations
This experimental setup aims for fairness across all competitors
Variant calling for the ground truths was performed as for cell line data above
For the allele dropout rate, we will focus on the set of sites where the respective ground truth call is heterozygous, as these are the sites where the dropout of one of the alleles can be identified in a nonambiguous manner. More details for the three ways in which we calculate allele dropout rates are given in Supplement (Supplementary Section 2.7)
Further information on research design is available in the Nature Research Reporting Summary linked to this article
Mosaicism in health and disease — clones picking up speed
Half or more of the somatic mutations in cancers of self-renewing tissues originate prior to tumor initiation
Li, R. et al. Somatic point mutations occurring early in development: a monozygotic twin study. J. Med. Genet. http://jmg.bmj.com/content/early/2013/10/11/jmedgenet-2013-101712 (2013)
Differences between germline and somatic mutation rates in humans and mice
Single-cell transcriptomics meets lineage tracing
OncoNEM: inferring tumor evolution from single-cell sequencing data
SiFit: inferring tumor trees from single-cell sequencing data under finite-sites models
Integrative inference of subclonal tumour evolution from single-cell and bulk sequencing data
Advances in understanding tumour evolution through single-cell sequencing
Inference of clonal selection in cancer populations using single-cell sequencing data
Eleven grand challenges in single-cell data science
Comprehensive human genome amplification using multiple displacement amplification
A quantitative comparison of single-cell whole genome amplification methods
Comparison of variations detection between whole-genome amplification methods used in single-cell resequencing
Single-cell whole-genome amplification and sequencing: methodology and applications
Estévez-Gómez, N. et al. Comparison of single-cell whole-genome amplification strategies. Preprint at bioRxiv https://doi.org/10.1101/443754 (2018)
Genome coverage and sequence fidelity of ϕ29 polymerase-based multiple strand displacement whole genome amplification
Genome-wide single-cell analysis of recombination activity and de novo mutation rates in human sperm
Clonal evolution in breast cancer revealed by single nucleus genome sequencing
Somatic mutation in single human neurons tracks developmental and transcriptional history
High-resolution mapping of DNA polymerase fidelity using nucleotide imbalances and next-generation sequencing
Exploring DNA quality of single cells for genome analysis with simultaneous whole-genome amplification
TruePrime is a novel method for whole-genome amplification from single cells based on TthPrimPol
Optimization and evaluation of single-cell whole-genome multiple displacement amplification
Accurate identification of single-nucleotide variants in whole-genome-amplified single cells
Single-cell mutation identification via phylogenetic inference
Identification of somatic mutations in single cell DNA-seq using a spatial model of allelic imbalance
Varlociraptor: enhancing sensitivity and controlling false discovery rate in somatic indel discovery
Single-cell exome sequencing and monoclonal evolution of a JAK2\mbox-negative myeloproliferative neoplasm
Single-cell exome sequencing reveals single-nucleotide mutation characteristics of a kidney tumor
Evaluation of genome coverage and fidelity of multiple displacement amplification from single cells by SNP array
Whole-genome multiple displacement amplification from single cells
Proof of principle and first cases using preimplantation genetic haplotyping – a paradigm shift for embryo diagnosis
Zafar, H., Navin, N., Chen, K. & Nakhleh, L. SiCloneFit: Bayesian inference of population structure, genotype, and phylogeny of tumor clones from single-cell genome sequencing data. Genome Res. 29, 1847–1859 https://doi.org/10.1101/gr.243121.118 (2019)
Koptagel, H., Jun, S.-H. & Lagergren, J. SCuPhr: a probabilistic framework for cell lineage tree reconstruction. Preprint at bioRxiv https://doi.org/10.1101/357442 (2018)
Bohrson, C. L. et al. Linked-read analysis identifies mutations in single-cell DNA-sequencing data. Nat. Genet. https://doi.org/10.1038/s41588-019-0366-2 (2019)
Conbase: a software for unsupervised discovery of clonal somatic mutations in single cells through read phasing
Genome-wide copy number analysis of single cells
SCARLET: single-cell tumor phylogeny inference with copy-number constrained mutation losses
Eggenberger, F. & Pólya, G. Über die Statistik verketteter Vorgänge. J. Appl. Math. Mech./ Zeitschrift für Angewandte Mathematik und Mechanik https://doi.org/10.1002/zamm.19230030407 (1923)
Optimal sample size for multiple testing: the case of gene expression microarrays
Ten simple rules for making research software more robust
Bioconda: sustainable and comprehensive software distribution for the life sciences
Snakemake—a scalable bioinformatics workflow engine
Constitutional mismatch repair-deficiency and whole-exome sequencing as the means of the rapid detection of the causative MSH6 defect
Improving the accuracy and efficiency of identity-by-descent detection in population data
A likelihood-based framework for variant calling and de novo mutation detection in families
Rare variant detection using family-based sequencing analysis
FamSeq: a variant calling program for family-based sequencing data using graphics processing units
Download references
This work has been supported by the Helmholtz Association
in particular through a Helmholtz Incubator grant (Sparse2Big ZT-I-0007)
by the compute cluster at the Helmholtz Institute for Infection Research
Alexander Schönhuth was supported by the Netherlands Organisation for Scientific Research (NWO: Vidi grant 639.072.309)
Arndt Borkhardt and Ute Fischer were further supported by the German Federal Office for Radiation Protection (BfS) grant nos
Open Access funding enabled and organized by Projekt DEAL
These authors jointly supervised this work: Alice C
Department for Computational Biology of Infection Research
Braunschweig Integrated Centre of Systems Biology (BRICS)
Faculty of Mathematics and Natural Sciences
Algorithms for Reproducible Bioinformatics
conceived the original project to study single immune cells
formulated the statistical model and wrote the manuscript
Peer review information Nature Communications thanks Yong Wang
reviewer for their contribution to the peer review of this work
Download citation
DOI: https://doi.org/10.1038/s41467-021-26938-w
Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research
Metrics details
Reconstructing the evolution of tumors is a key aspect towards the identification of appropriate cancer therapies
The task is challenging because tumors evolve as heterogeneous cell populations
Single-cell sequencing holds the promise of resolving the heterogeneity of tumors; however
it has its own challenges including elevated error rates
we develop a new approach to mutation detection in individual tumor cells by leveraging the evolutionary relationship among cells
jointly calls mutations in individual cells and estimates the tumor phylogeny among these cells
Employing a Markov Chain Monte Carlo scheme enables us to reliably call mutations in each single cell even in experiments with high drop-out rates and missing data
We show that SCIΦ outperforms existing methods on simulated data and applied it to different real-world datasets
namely a whole exome breast cancer as well as a panel acute lymphoblastic leukemia dataset
Due to recent technological advances it is now possible to sequence the genome of individual cells1
to directly study genetic cell-to-cell variability and gives unprecedented insights into somatic cell evolution in development and disease
The two main issues are that mutational signals of small subclones cannot be distinguished from noise and that the deconvolution of the aggregate measurements into clones is
While this process is very efficient at amplifying the overall DNA material
the random non-amplification of one allele of a heterozygous genotype site
all evidence of a heterozygous genotype mutation is lost when the mutated allele drops out
false positive artifacts can arise in the MDA amplification when random errors introduced early in the process end up with high frequencies due to allelic amplification biases
Further challenges arise from uneven amplification across the genome which results in non-uniform coverage that will leave some sites with insufficient coverage depth for reliable base calling
Both methods take raw sequencing data (BAM files) and output the inferred genotypes of the cells
Monovar specifically addresses the problem of low and uneven coverage in mutation calling by pooling sequencing information across cells
while assuming that no dependencies exist across sites
SCcaller detects variants independently for each cell and accounts for local allelic amplification biases
the identification of such biases is based on germline single-nucleotide polymorphisms (SNPs)
it cannot recover mutations from drop-out events or loss of heterozygosity
a new single-cell-specific variant caller that combines single-cell genotyping with reconstruction of the cell lineage tree
SCIΦ leverages the fact that the somatic cells of an organism are related via a phylogenetic tree where mutations are propagated along tree branches
SCIΦ can reliably identify single-nucleotide variants (SNVs) in single cells with very low or even no variant allele support and is robust to copy number changes
the only other tool able to transfers information between cells
These loci are then used to infer the underlying phylogenetic tree and the parameters of the model
In a last step the mutation to cell assignment is sampled from the posterior distribution
the only previously published single-cell mutation caller sharing information across cells
We start by analyzing the results of the simulated data
Performance of SCIΦ and Monovar on simulated data with different number of cells
Summary statistics of the F1 performance of SCIΦ and Monovar on simulated data
F1 performance depending on different levels of drop-out events (a)
We found SCIΦ to be more robust to increasing drop-out rates in comparison to Monovar (Fig. 3a)
In addition to using the phylogenetic tree structure
SCIΦ also learns the drop-out rate of the experiment during the MCMC scheme and uses 10% only as a starting condition
An additional experiment was conducted to investigate the effects of loss of heterozygosity. Monovar as well as SCIΦ perform better with increasing levels of homozygous mutations present in the experiment (Fig. 3b)
Monovar particularly benefits from homozygous mutations as these are very unlikely to be classified as wild type
SCIΦ experiences a more modest benefit from homozygous mutations since it already starts with high performance due to the usage of the phylogenetic tree structure to accurately call mutations
Because copy number events play a prominent role in tumor evolution, we investigated the performance of Monovar and SCIΦ in the presence of additional wild type alleles (Fig. 3c)
Similar to the dependence on the homozygosity rate
SCIΦ shows a fairly stable performance for copy number events affecting up to 50% of the mutated loci and outperforms Monovar for all settings
the performance of Monovar drops more quickly with increasing rate of copy number events
This dataset is particularly challenging because cells are aneuploid
a Cell lineage tree with average number of mutations per inner node as identified by SCIΦ
The area of a node is proportional to its number of assigned mutations
b Posterior probability of SCIΦ mutation calls clustered according to the tree in a
c Probability of Monovar mutation calls for loci identified as mutated by SCIΦ and clustered according to the tree in a
d Probability of Monovar mutation calls for loci identified as mutated by SCIΦ and clustered hierarchically
its placement earlier in the tree above those cells is much more evolutionarily plausible
a Monovar mutation calls for loci identified as mutated by SCIΦ clustered hierarchically
b Monovar mutation calls clustered according to the tree inferred by SCIΦ
c SCIΦ mutation calls clustered according to its inferred tree
This is of particular interest in cancer genomics because tumors show heterogeneous cell compositions often resulting in the failure of targeted cancer therapies
the first single-cell mutation caller that simultaneously infers the mutational landscape and the phylogenetic history of a tumor sample
SCIΦ accounts for the elevated noise levels of single-cell data by appropriately approximating the genomic amplification process and the high fraction of drop-out events
In combination with a Markov Chain Monte Carlo phylogenetic tree inference scheme
mutations are reliably assigned to individual cells
We have compared SCIΦ to Monovar11 on both simulated and real datasets
both SCIΦ and Monovar show a precision of almost one
SCIΦ shows a substantially higher recall and F1 score
simulating different MDA amplifications we showed that SCIΦ is not sensitive to the amplification process
we showed that SCIΦ achieves a much cleaner assignment of mutations to cells within subclones
SCIΦ recovered mutations from drop-out events using the inferred phylogenetic tree structure of the sample to share information across cells
the phylogenetic tree inferred by SCIΦ reflects the evolutionary history more accurately than a hierarchical clustering from Monovar results
Further improvements could be the inclusion of copy number information into the tree reconstruction
this comes at the cost of losing the independence between mutation assumption
which is computationally expensive to overcome as groups of mutations would have to be identified
Mutation calling and lineage tree building are two interdependent tasks and addressing them in a single statistical model provides both improved mutation calls as well as a better estimate of the underlying cell lineage tree
and hence a better understanding of tumor heterogeneity
with parameters α and β and where B is the beta function
For better interpretability in our implementation we will employ an alternative parametrization of the beta-binomial distribution with \(f = {\textstyle{\alpha \over {\alpha + \beta }}}\) being the frequency of a nucleotide and ω = α + β an overdispersion term determining the shape of the distribution which decreases with increasing variance
the probability of the observed count (support) sij for a specific nucleotide in the absence of a mutation is
cij) and fwt is the expected frequency of the observed nucleotide
Large values of ωwt lead to a binomial distribution representing independent sequencing errors
In the presence of a heterozygous mutation (a mutation affecting one of the two homologous chromosomes)
The underlying allele frequency of \({\textstyle{1 \over 2}}\) is corrected by sequencing errors producing any of the other two bases
Low values of the overdispersion term ωa reflect a small number of initial genomic fragments and any additional feedback in the amplification
SCIΦ generally assumes copy number neutrality
but learning ωa allows for additional shifts in the mean variant allele frequency away from \({\textstyle{1 \over 2}}\) due to copy number changes
Likely mutated loci are identified using the posterior probability of observing at least one mutated cell at a specific locus
The probability of observing no mutation at locus i across all cells is
where K is a random variable indicating the number of mutated cells and λ is the prior probability of a mutation occurring at the locus
The probability of observing the mutation in k cells is
We do not need to compute P(Di) as it cancels out when computing the likelihood ratio or posterior odds
The likelihood of the data given that exactly k of the m cells possess the mutation
The prior probability of a mutation in a phylogeny affecting k descendant cells is determined by placing mutations uniformly among the edges of the tree (Supplementary Section A) leading to
Along with the uncertainty in the supporting read counts due to the amplifications in each cell when a mutation is present
an additional artifact is drop-out whereby one allele is not amplified at all
To account for allelic drop-out occurring with probability μ
we introduce the following mixture for the likelihood of the observations for each cell:
where the first term describes the loss of the mutant allele, the second the loss of the wild-type allele and the third term describes a heterozygous mutation. The case μ = 0 reduces to Eq. (3)
while each of the n mutations can be attached to the (2m − 1) edges leading to (2m − 3)!!(2m − 1)n possible configurations for the discrete component (T
it is infeasible to enumerate all solutions
Instead we employ a Markov Chain Monte Carlo approach to search and sample from the tree space
we employ the likelihood of a specific tree realization with the mutation attachment parameter σ and the parameters θ to be
where P(Dij | T) = Pa(Dij) if the cell j is below mutation i (on the path from leaf j to the root) and P(Dij | T) = Pwt(Dij) otherwise
The first set of products describes the loci identified to be likely mutated (section Identification of candidate mutated loci) which are placed on the tree and used together to infer its phylogenetic structure
The second half represents all loci where no mutation is present which inform the inference of the sequencing error parameters
We marginalize out the attachment points of the mutations, analogously to ref. 20
Assuming each mutation is equally likely to attach to any edge in the tree and the attachment probability to be independent between mutations we have P(σ | T
θ) = \({\textstyle{1 \over {(2m - 1)^n}}}\) so that
the sum over σi can be written explicitly as
where I is the indicator function and \(\left( {\sigma _i \prec j} \right)\) indicates that cell j sits below the attachment point σi of mutation i in the tree T
The sum can be computed in O(m) time using the binary tree structure
we propagate the probability of attaching a mutation to a specific node from the leaves toward the root
This can be implemented using the depth-first search (DFS) algorithm
combining in each node the probabilities from two previously computed subtrees
Computing Eq. (10) is therefore in O(mn) while the marginalization has the benefit of reducing the search space by a factor of (2m − 1)n
In addition we employ the marginalization to focus on the tree structure of the cell lineage rather than the attachment points of mutations
Since that number is typically much smaller than mn
Because tumor cells show chromosomal abnormalities
mutations can be observed as homozygous variants even without drop-out events
In order to also account for loss of heterozygosity
we adapt the scheme introduced in section Tree likelihood
Instead of computing the likelihood of the data when attaching a mutation to a node in the lineage tree in the heterozygous state only
we additionally compute the likelihood when attaching each mutation in the homozygous state
involving the nucleotide model when only alternative alleles are present
Note that homozygous mutations are only attached to inner nodes as the probability of observing a drop-out event in a single cell is assumed to be higher than a single homozygous mutation
Utilizing the tree structure, the sum can again be computed in O(m) time for each mutation on the tree. The overall likelihood (Eq. (10)) for each mutation becomes a weighted sum of the two possibilities leading to
we employ an MCMC scheme to sample from the posterior distribution of mutation assignments as well as tree structures given the data (for simplicity with uniform priors)
We change one parameter at a time with transition probability q(T′
θ) and accept the new configuration with probability
and can be verified by computing the correlation between two runs in practice
The overall runtime complexity is O(x × max(mn
c)) with c being the number of unique coverage values of the experiment
From the sample of trees and parameters we could also conditionally sample the placement of the mutations for the full joint posterior sample
utilizing the full weights of attaching each mutation to different edges we record the probability of each cell possessing each mutation
Averaging over the MCMC chain provides the posterior genotype matrix and hence our single-cell variant calls
In order to benchmark the performance of SCIΦ
we simulated tumor evolution by introducing a cell lineage tree and simulated read counts by mimicking the noisy MDA process
we created a random binary genealogical cell linage tree with 100 mutations attached to the edges
The placement of the mutations defines which cells possess each mutation
We chose the placement such that each mutation is shared by at least two cells because mutations in only one cell may be false positives from sequencing errors and are filtered out in practice as well as in our benchmark
among all the mutations present in cells a specified fraction μ was randomly selected as drop-outs
\({\textstyle{\mu \over 2}}\) of the mutations became wild type and \({\textstyle{\mu \over 2}}\) became homozygous alternative genotype
Then we generated an artificial reference chromosome of 1 million base pairs (bp) and divided it into segments of ~1000 bp for each cell individually
we generated a coverage distribution following a negative binomial distribution with a mean of 25 nucleotides and a variance of 50
10% of the segments were assigned 0 coverage to include missing information
The coverage c of specific positions was additionally randomized following a discretized Gaussian distribution with the segment coverage as mean and a standard deviation of 10% of that mean in order to simulate the uneven coverage profiles of real single-cell sequencing experiments
This process is repeated c times and the copies are retained
we change the number of initial copies of the wild type allele for a specific locus
We set the probability of x extra copies to be \({\textstyle{1 \over {2^x}}}\)
This strategy assumes all copy number changes happened prior to mutation events
the strategy provides lower bounds on the performance measures because the variant allele frequency decreases with increasing copy number
a nucleotide is mutated to account for sequencing errors
and the resulting simulated data was embedded into a multi-pileup file
Both experiments were in line with the previously reported results
The first five years of single-cell cancer genomics and beyond
Tumour heterogeneity and the evolution of polyclonal drug resistance
A population genetics perspective on the determinants of intra-tumor heterogeneity
Computational approaches for inferring tumor evolution from single-cell genomic data
Genomic DNA amplification by the multiple displacement amplification (MDA) method
Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics
Reliable detection of subclonal single-nucleotide variants in tumour cell populations
A mechanistic beta-binomial probability model for mRNA sequencing data
SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples
Enumerative Combinatorics (Cambridge University Press
Single-cell sequencing data reveal widespread recurrence and loss of mutational hits in the life histories of tumors
Calibrating genomic and allelic coverage bias in single-cell sequencing
Rapid amplification of plasmid and phage DNA using phi29 DNA polymerase and multiply-primed rolling circle amplification
Genome-wide detection of single-nucleotide and copy-number variations of a single human cell
Snakemake-a scalable bioinformatics workflow engine
ggplot2: Elegant Graphics for Data Analysis
Download references
We thank David Seifert for constructive discussions and C++ support as well as Franziska Singer for critical feedback. J.S. and J.K. were supported by ERC Synergy Grant 609883 (http://erc.europa.eu/). K.J. was supported by SystemsX.ch RTD Grant 2013/150 (http://www.systemsx.ch/)
These authors contributed equally: Jochen Singer
Department of Biosystems Science and Engineering
All authors drafted the manuscript and approved the final version
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Reprints and permissions
Download citation
DOI: https://doi.org/10.1038/s41467-018-07627-7
01 Oct 2017UK crime sentencing guidelinesFrom a murderer to a drunk on a plane that forced a plane to make a landing
here are some of those jailed in September 2017
From a dealer found with drugs outside a school to paedophiles and a lying businessman, these are just some of the people either from the Bristol region or who have been locked up after committing crimes in the area
was jailed for 28 months for dealing cocaine and cannabis
A drug dealer was found with more than £3,500 of cannabis and cocaine outside a Bristol school
Nathan Larcombe was discovered by officers on at Somerset Square near St Mary Redcliffe School smoking cannabis
finally catching up with him before a violent struggle where they had to pepper spray him
On searching Larcombe they discovered a JD Sports bag full of drugs
Larcombe admitted to possession both drugs with intent to supply on the basis that he was not selling to schoolchildren
A woman had her face smashed after she hit her violent partner with an ironing board
Christina Page “got in first” with the board when David Cox turned verbally aggressive
Bristol Crown Court heard Cox then pushed her and she struck her face on a marble fire surround
pleaded guilty to sending a malicious communication and inflicting grievous bodily harm in July
The court heard how the pair had met on an internet dating site
with the relationship quickly deteriorating as Cox acted “controlling and jealous” - fixated that she was unfaithful to him
'American gangster movie-style car chase' ends in hammer and baseball bat attack
Jailed: Kaluba for 15 months and Lucas for 18 months
An attempt to reclaim cash resulted in an American "gangster movie-style" car chase through Weston-super-Mare
Mutilla Kaluba and Oliver Lucas targeted doormen Kurt Duddridge and Robert Carlucci to get back some £2,000
In the bizarre events that followed Kaluba and Lucas stopped their victims’ Citroen and caused it to reverse in the road
Kaluba and Lucas then did a u-turn in their Ford and the two cars went bonnet-to-bonnet before the Citroen reversed into a post
'American gangster movie-style car chase' ends in hammer and baseball bat attack
The wrecked Citroen was then further smashed up by Kaluba and Lucas
having an offensive weapon and criminal damage and was jailed for 15 months
criminal damage and dangerous driving and was jailed for 18 months
A man from Bristol admitted a string of sexual abuse of a 10-year-old girl
carried out the abuse around 25 years ago was living in Chippenham in Wiltshire and was in his late 20s
Police said it had been both traumatic and upsetting for her to relive the awful abuse she suffered at the hands of Loosemore
When Mohammed Khalid went on a burglary spree he made a fatal error
He forgot Bristol is a city that never sleeps
As he broke into one property in the early hours he was disturbed by a chef returning home from work
And when he went on to try his luck at a property nearby he was disturbed by a postman getting ready for work
Frustrated by people cropping up everywhere he went
he turned his attention to an empty Peugeot car
smashed a window and stole around £5 in change
Khalid, 41, of Barbour Road in Hartcliffe
Kieron Wood has been cleared of a samurai sword attack but sent to two years’ youth custody for drugs offences.(Image: Avon and Somerset Police)
Self-confessed drug dealer Kieron Wood was cleared of a samurai sword attack on a customer
after stating that it was Christopher Hill who stormed into his flat put a knife to his throat and took his drugs
said he knew nothing about a later attack where a group of men went to Mr Hill's flat and attacked him
But Wood did admit possessing Class A drugs heroin and crack cocaine with intent to supply
as well as possessing cannabis and a blade and so was jailed anyway
Shelvi Varkey(Image: Avon and Somerset police)
A Southmead Hospital nurse who failed to declare he was being investigated for not dispensing medicine has been jailed
Whilst under investigation by the Nursing and Midwifery Council he attended a jobs fair at Southmead Hospital and applied for a nursing position
Not only did he fail to disclose he was being investigated
he provided two fake references and was taken on
a married father-of-three of Parade Court in Speedwell
denied wrongdoing but was found guilty of fraud by Bristol magistrates
Derek Root(Image: Avon and Somerset police)Jailed: Derek Root was jailed for 12 months
A pair of strangers got so drunk and abusive on a holiday jet
the pilot was forced to land at Bristol Airport
Derek Root
necked 10 shots of Jack Daniel's whiskey in the departure lounge when his flight from Glasgow to Alicante was delayed by three hours on July 8 this year
Police on the runway(Image: Jason Wassall)
Root finally boarded the plane and found himself sitting near passenger Alexander Gray
who handed him a bottle of Jagermeister which he also started drinking – the two had never met before
The pair soon became abusive to passengers and crew
with Root asking stewards if they 'wanted his c*ck' before Gray was sick on the floor
They were subdued by staff and returned to their seats - where they promptly fell fast asleep as the plane diverted to Bristol Airport
A sexual deviant who used two young girls as ‘sex toys’ was finally brought to justice after a victim disclosed his crimes during a psychic reading
Paedophile Roger Britton systematic abused the two girls during the 1990s “targeting occasions to abuse them” at his home and workplace
Britton admitted 14 charges of indecent assaults and gross indecency on the children
who finally spoke out about what he had done last year
Bristol Crown Court heard how he abused one of the girls from the age of seven through to 11 and another from the age of nine to 14
One of his victim’s had disclosed what had happened to her during her childhood during a psychic reading
Britton was confronted about what had happened and admitted everything before sending out a text message apologising to one of his victims
Shaun Hudd has been jailed for eight years for a series of online sex offences against children
Shaun Hudd was a chef at soft play business Jump in Cribbs Causeway and a scoutmaster with 36th Bristol Scout Group
from behind the safety of his computer screen
sexually exploiting children as young as nine
who had also worked at award-winning Bristol restaurant Casamia
posed as a youngster himself before contacting children through Facebook and Skype
Bristol Crown Court heard that he became hooked on exploiting boys and girls
persuading them to expose themselves to him and recording indecent photos and videos
When youngsters tried to rid themselves of Hudd
he threatened to distribute the humiliating footage - and carried out the threat on two occasions
Businessman William Irving lied to a court in a bid to avoid points on his driving licence
near Thornbury told North Avon Magistrates’ Court in August last year that he had no points at all
It came as the 62-year-old was midway through a trial for driving without insurance
Prosecutor Julian Howells said Irving already had three points on his licence from May 2014 for driving while using a phone and another three points from October 2015 for failing to give a driver’s details
The court heard a conviction for driving his Mercedes car without insurance on December 7
2015 would have meant he was banned under the ‘totting up’ rules
And during a trial contesting the charge that he was driving without insurance
Irving told magistrates he had a clean licence
Jailed: Two years with another two years extended licence
A violent steroid addict left a man in danger of going blind in one eye after he punched him
Restaurant owner Detjon Prenci suffered three fractures to his face and was told he could lose his sight after Peter Clark delivered a single blow in an “unprovoked” attack in the garden of Wees Lounge Bar on Park Street
Bristol Crown Court heard how 27-year-old Clark had a history of violence
when he was locked up for smashing someone over the head with a piece of wood
has been jailed after punching a man in the garden of Wees bar on Park Street
The latest attack was his sixth violent assault to be brought before the courts
including previous attacks using glasses and bottles in the city centre
Prosecutor David Maunder told the court how Mr Prenci had been out with friends on April 22 this year and ended up in the bar garden having a cigarette with a friend
The next thing he knew he was in the Bristol Royal Infirmary and unable to see from his left eye
Daniel Povey has been jailed for three years and three months after he was caught by undercover officers
A Kingswood man was caught trying to get naked pictures of underage girls on social media
Daniel Povey used online messaging app Kik to approach someone he thought was a 12-year-old girl
The security officer repeatedly asked for nude pictures and videos from the girl
and sent images of himself carrying out a sexual act
But the 'girl' was in fact a fake profile set up by undercover police to catch online paedophile predators
Alexander David Densley stole bicycles from Fishponds Police Station (pictured) and drove a stolen Porsche during his seven-month crime spree
Jailed: 12 months in a young offender institution
A prolific thief who counted stealing goods from a Bristol police station and driving a stolen Porsche among the highlights of a seven-month crime spree has finally been caught
Teenager Alexander David Densley was jailed last week after admitting to a string of offences including driving a stolen Porsche and breaking in to a retirement home
The 18-year-old, from Bath , also admitted to stealing eight bicycles worth an eye-watering £3,390 from the bike shed at Fishponds Police Station in February
Densley’s crime spree came to a sudden halt this month
after he was caught entering The Moorings retirement home
Jamie Mitchell has been jailed for six years for robbery Jailed: Six years and four months in prison with an extended licence period of five years
A Bristol man grabbed cash from a shop till and then pulled a knife on staff who tried to restrain him
Jamie Mitchell
just weeks before the incident at the Co-op in Station Road
The court heard he walked into the store and was approached by staff because he is banned
then while being removed from the store he grabbed £50 from the till
of Brigstocke Road in St Paul's then he pulled out and brandished a large bread knife and told staff he had enough before running out of the shop
dangerous driving and driving while disqualified
A man took his family on a terrifying tour of Bristol and assaulted his wife in the back of his Jaguar in front of their four-year-old
Monovar Hussain-Butt
Mr Gordon said it soon became obvious Hussain-Butt, of Cranmore Crescent in Southmead
Martyn Ford given a life sentence for the murder of his stepfather Ian Baker(Image: Avon and Somerset Police)
Jailed: Mandatory life sentence and will spend at least 20 years and four months behind bars before being considered for parole
A troublesome stepson battered his “kind and gentle” stepfather to death with a hammer just days after being released from prison
Martyn Ford launched a “vicious and fatal” attack on Ian Baker at his Hungerford Road home in Brislington before ensuring he was dead and then searching his home for cash
Bristol Crown Court heard how Ford stashed the murder weapon and his blood stained clothes in a blue suitcase before throwing it into the Feeder Canal – never to be discovered
then went out drinking with Mr Baker’s cash
visited KFC and even returned to the murder scene hours later to discover it cordoned off by police before heading back to a guest house
jailed after abusing a 14-year-old girl and concocting a story with his girlfriend to cover up his crimes(Image: Avon and Somerset Police)Jailed: Lukas Deacon seven-and-a-half years and Lisa Watson 18 months
A couple concocted a story to tell police and the courts in a bid to cover up his abuse of a 14-year-old girl
Lukas Deacon and his partner of 10 years
were both jailed after DNA proved their story could not be true
Bristol Crown Court heard how Deacon of Jacob Street in Old Market had abused the girl while Watson
But after the girl broke down and told her mum what had happened
the pair came up with a lie to tell police stating her claims could not be true as they were both awake and asleep at the same time that evening
The court was told how even when the teenager’s DNA was discovered on his boxer shorts
the couple continued in their story and denials of what had happened
was jailed for seven-and-a-half years for the abuse
perverting the course of justice and breaking into a neighbours flat to steal a bottle of whiskey
Watson of Ridgeway Court in Ridgeway Lane in Whitchurch
who was described by the judge as the most intellectual of the pair and the one to come up with the lie
A conman who used 'sleight of hand' to steal jewellery and coins from a shop in Bedminster
has been locked up following a similar con across the country
Staff at East Street Jewellers in Bristol didn’t even know they had been the victim of the crime when a necklace and a bracelet valued at £750 were stolen
But it has since emerged that they were the same thieves hitting jewellers across the UK – taking almost £24,000 worth of goods from unsuspecting shop owners
One of the two tricksters - Romanian national Mircea Rostas - was subsequently identified from CCTV footage and a fingerprint and was jailed for a total of 26 month for his role in the thefts
who had only been in the UK since October last year
admitted stealing the jewellery from the East Street shop in Bristol city centre on February 17
this year when he appeared at Shrewsbury Crown Court
Rostas also admitted thefts from seven other shops in Shrewsbury
Ely near Cambridge and Oxford between January and April this year
A burglar came up with an outlandish reason his blood was at a crime scene after he was arrested by police
Wayne Morris’ blood was discovered in a Clevedon home after he had smashed a window and made his way in
But when arrested the 48-year-old came up with a story as to why his DNA was left at the crime scene
Bristol Crown Court heard he had tried to tell police he had cut his elbow after tripping while down the street
before using a sock to soak up his blood and then chucked it in the bin
He said that the real thief must of fished his blood-soaked sock out of the rubbish and put it on their hands and used them as gloves during the burglary on August 4
must have realised his story wasn’t going to wash and pleaded guilty to the burglary at court
Stephen Priddis has been jailed for six months for threatening lampposts and bollards with a knife
A man has been jailed for carrying a knife after he was seen threatening lamp-posts and bollards in the centre of Bristol
Stephen Priddis was seen in Upper Maudlin Street pointing a blade at street furniture and making slashing motions in the air
where he helped himself to pasties from the Pumpkin Café and wandered off
Police found a “very drunk” Priddis with a half-eaten pasty at St James Barton roundabout
Priddis, 38, of no fixed address, pleaded guilty to possessing a blade, theft and a public order offence at Bristol Crown Court.
Judge William Hart jailed him for six months.