Splice — valued in 2021 at nearly USD $500 million after securing $55 million in funding – has acquired UK-based “high-end” virtual instrument library Spitfire Audio
Financial terms of the acquisition were not disclosed, but according to the Financial Times
which cited a person familiar with the matter
The acquisition marks Splice’s entry into the plugin sector
and aligns with its existing subscription and rent-to-own businesses
The move positions Splice to capitalize on the growing music creation market, forecast (by Midia Research) to nearly double to $14 billion by 2031
New York-based Splice generates more than $100 million in annual revenue with about 600,000 paying subscribers
The company was valued at nearly $500 million in a 2021 funding round led by Goldman Sachs and entrepreneur Matt Pincus’s investment firm MUSIC
Spitfire Audio, established in 2007, provides sampled virtual instruments — including recordings by Hans Zimmer, Olafur Arnalds, the BBC Radiophonic Workshop
and Abbey Road Studios — to professional composers and producers
“The teams at Spitfire Audio and Splice have deep respect for composers, musicians and producers and are committed to celebrating and supporting their work”, said Kakul Srivastava
“With Spitfire’s expressive instruments and Splice’s AI-powered platform
we’re just beginning to explore what’s possible.”
creator-led companies who believe great software and technology can supercharge the creative experience
Our shared vision is to develop tools that expand — not replace — human creativity,” Srivastava added
“We’ve always focused on inspiring people to create extraordinary music
we can now bring that inspiration to a whole new generation of artists
added: “We’ve always focused on inspiring people to create extraordinary music
both Splice and Spitfire Audio will continue to operate independently
with Olivier Robert-Murphy remaining as CEO of Spitfire Audio
Thomson will continue to oversee Spitfire Audio’s creative direction
“Splice has already built an incredible business,” said Robert-Murphy
“Joining forces means Spitfire Audio’s sounds will find new homes in studios around the world—whether that’s a bedroom producer or a blockbuster composer.”
The acquisition comes as Splice has been expanding its AI capabilities, with about 40% of its users embracing the platform’s AI tools, the company said. In 2024, Splice hit nearly 350 million downloads of its sound samples across all genres last year
without disclosing whether the figure marked a growth or decline from 2023
Speaking with Thomson in a one-on-one at London’s AIR Studios
Srivastava said Splice has been building new AI components “that are ethical
where artists are fairly compensated for their work
So bringing together some of that technology together
Thomas acknowledged concerns around the use of AI in the music creation community
around AI in the music creation community.”
or new ways of doing things… It’s just a tool to help you be more creative
and I think that’s where we should be focusing on,” Thomas added
to analyze recordings and identify sounds that match the harmony
Stay on top of the real stories shaping the music industry: Join over 60,000 industry professionals who rely on MBW's FREE daily newsletter and alert emails for essential insights and breaking news
the platform that helps music creators bring their ideas to life
announced a significant update to Splice Mobile
songwriters can also effortlessly record riffs
Splice Mic Video 1: https://www.instagram.com/p/DHADAzguyZE/
and the results take songwriters so much deeper into the finished process
This is so valuable to our community"
Splice Mic utilizes Splice's Create AI to analyze the recording and find sounds that perfectly match the harmony
Not only can songwriters hear their vocal ideas in full musical context using Splice sounds
they can also unlock fresh creative possibilities by switching genres within the stack
"The phone is already a huge part of music making," says Kenny Ochoa
"About 1 million users have made more than 28 million stacks so far
and now songwriters and producers can record vocal ideas over stacks of samples
and genre and have even more control over their creative vision
and now those stacks can be merged with vocals"
Leland Video: https://www.youtube.com/watch?v=RYP2pHqFtHw
"We got the team together to see who could start the best new Stacks
Bridging the gap between inspiration and production
Splice Mobile users can easily share ideas with collaborators directly from the app or Airdrop the stems to a Digital Audio Workstation
making the jump from mobile to studio seamless
Musicians are already using voice recording functions on their phones to capture ideas away from the studio
giving songwriters the creative depth of the Splice Sounds catalog and Create
Record your ideas with Splice Mic: https://youtu.be/JDiEjKuj9o0?si=sguk4Erpwnamf1Vo
Splice Mic is available now via Splice Mobile
Click here to see Splice Mic in action
He co-wrote the Grammy-nominated song "Rush" by Troye Sivan as well as the majority of Sivanʼs latest release
Leland has written songs for artists such as Cher ("DJ Play A Christmas Song")
Leland currently serves as in-house composer and on-camera mentor for the Emmy Award-winning competition series RuPaulʼs Drag Race
having worked on more than 15 seasons of the show
Laurelvale Studios is a premier full-service recording studio nestled in the serene Studio City Hills
offering breathtaking views of the iconic Mulholland Drive
Laurelvale Studios creates an inspiring environment where creativity can flourish
ensuring that every session is as seamless and enjoyable as possible
Laurelvale Studios provides the perfect balance of state-of-the-art equipment and a relaxed
welcoming atmosphere that fosters the best work from every artist
Splice helps music creators bring their ideas to life
A subscription to Splice's vast industry-leading sounds catalog includes high-quality
to accelerate deep sound discovery and inspiration
The company also provides affordable access to plugins and DAWs through a rent-to-own Gear marketplace
The New York-based startup (Co-founded by Steve Martocci and Matt Aimonetti) closed a Series D round in November 2020
John Vlautin, Splice, 1 818-763-9800, [email protected]
Do not sell or share my personal information:
Metrics details
f Source data are provided as a Source Data file
we focus on the variants that form a novel splice donor/acceptor motif and generate a new splicing junction at that location
We exclude from consideration variants that are distant from the novel splice-site and act by modifying the efficacy of splicing enhancers
we do not include variants that disrupt the normally used splice-sites
leading to the use of preexisting cryptic splice-sites
To develop a methodology for identifying SSCVs from transcriptome sequence data
particularly those associated with diseases
the corresponding splicing junctions are typically not observed in the general population
Mismatch bases corresponding to SSCVs are often observed in the short reads of transcriptome sequence data
The term “primary” in the “primary novel SS” signifies the direct formation of a novel SS by an SSCV, particularly considering situations where an SSCV within a deep intronic region generates a cryptic exon and subsequently leads to the formation of another novel SS (referred to as secondary SS, as shown in the right panel of Fig. 1a)
The slightly lower sensitivity compared to the 1000 Genomes Project dataset may be attributed to the fact that the benchmark variant set was limited to somatic variants
which often have lower variant allele frequencies and smaller splicing changes than germline variants
juncmut achieves a certain level of sensitivity and a high rate of precision
even though it uses only transcriptome data
which is typically challenging to detect without whole-genome analysis
These results indicate that the juncmut approach can effectively catalog disease-associated variants
a Frequencies of transcriptome sequence data analyzed
Transcriptome data were also grouped by the number of detected SSCVs
whose base counts are equal to or more than 1.0 Gbp and less than 2.0 Gbp
three or more SSCVs were identified in 74,805
b Base substitution patterns of SSCVs according to their relative position to primary novel SSs
Different colors are used to display different types of alternative bases
The x-axes represent different reference bases
and the y-axes represent the numbers of variants
c Histogram showing the distribution of relative position of primary novel SSs to their hijacked SSs (original SSs) for donor (left) and acceptor (right) creating SSCVs
Red dashed lines represent exon-intron boundaries
d Fraction of SSCVs with multiples of three shift sizes (difference between primary novel SSs and hijacked SSs) stratified by coding and non-coding genes
e Sequence motifs of SSCVs with the relative position of primary novel SS to hijacked SS is -4 (left) and +5 (right)
The “GT” dinucleotides at the intrinsic intron edge endow the -4 bp position with the potential to form a novel donor site
featuring “GT” at the fifth and sixth positions within the new intron
the inherent intron’s fifth and sixth base pairs often comprise “GT” at the donor site
this configuration frequently corresponding to the first two intronic bases of a novel splice donor at the +5 bp position
e Source data are provided as a Source Data file
a Counts of distinct SSCVs creating novel donor (left) and acceptor (right) sites stratified by splicing consequences at each relative position of primary novel SSs compared to hijacked SSs
b Counts of distinct SSCVs leading to in-frame (left) and frameshift (right) partial exon loss
stratified by PTC generation and NMD susceptibility
c Counts of distinct SSCVs at each size of augmented exon (restricted to multiples of three)
for both exon extension and cryptic exon inclusion
Each red point represents the ratio of PTC generation
d Counts of distinct SSCVs located in coding regions
categorized by mutation type assuming no abnormal splicing (silent
These counts are further stratified by PTC generation and NMD susceptibility
d Source data are provided as a Source Data file
using a richer set of SSCVs collected in this study
we explored higher resolution relationships between Alu elements and SSCVs
a Counts of distinct SSCVs within Alu sequences
at each primary novel SS mapped to the reference Alu sequence coordinates
The counts are stratified by Alu family (AluJ
These counts are faceted by the creation of donor and acceptor sites
and the orientation of the Alu sequences relative to transcripts (sense and antisense)
b The ratio of SSCVs forming novel exons (classified as cryptic exon inclusion by splicing consequence) at each motif creation type (donor and acceptor)
and in Alu sequence orientations (sense and antisense)
These ratios are further stratified based on whether the novel exons are confined within Alu sequences or not
c Typical splicing consequences of SSCVs within Alu sequences
SSCVs located on sense-inserted Alu sequences do not form exons and may create novel transcription start sites in an ambiguous manner
SSCVs on antisense-inserted Alu sequences are likely to form novel exons within the Alu sequences
d Frequently exonized parts by SSCVs in antisense-inserted Alu sequences
The green lines indicate the exonized parts
and the numbers on the right represent the counts observed in this study
the thickness of these green lines corresponds to the frequency
e Pairwise alignment of the Alu reference subsequences (reverse complemented) containing the Alu-antisense donor clusters in the left arm and right arm
It is observed that the 22nd nucleotide corresponds with the 157th
b Source data are provided as a Source Data file
a Source data are provided as a Source Data file
this indicates that our approach can successfully detect known pathogenic SSCVs and also suggests the potential pathogenicity of the other SSCVs
a Sashimi plot for samples with NOTCH1 c.5048-132 G > C (TCGA-A7-A13E
upper) and c.5048−132 G > T (SRR8951275
These mutations were expected to result in a 129 bp exon extension (without any stop codon within it)
leading to the production of a protein with an additional 43 amino acids
b Predicted schematics of the mechanisms for ligand-independent cleavage of NOTCH1 juxtamembrane expansion (JME) induced by the SSCVs
c (left) Sequencing chromatograms of two NOTCH1 DNA derived from single clones of two CRISPR-edited PC-9 cell lines (c.5048-132 G > C and c.5048-132 G > T)
(right) The PCR amplicons spanning NOTCH1 exon 27 and exon 28 show a 129 bp exon extension in clones with the indicated NOTCH1 genotype
‘M’ in the lane stands for the 100 bp marker
d Western blot analysis of the NOTCH intracellular domain (NICD) in CRISPR-edited clones
analysis is also provided on the Jurkat cell line
which is known to have an internal tandem duplication in exon 28
resulting in the insertion of 17 amino acids in the extracellular juxtamembrane domain
e (left) Schema depicting the design of splice-switching ASOs targeting c.5048-132 G > C (ASO1
(right) Images of PCR amplicons spanning NOTCH1 exon 27 and exon 28 generated from the cDNA of CRISPR-edited clones treated with indicated ASOs for two days
f Western blot analysis of the NOTCH intracellular domain (NICD) in CRISPR-edited clones treated with indicated ASOs for three days
All experiments have been performed in at least two independent experiments
f Source data are provided as a Source Data file
These findings demonstrate that these SSCVs lead to the activation of NOTCH1
which can be suppressed by splice-switching ASOs
the ability to acquire a catalog of SSCVs through reanalysis of existing transcriptome sequence data is an attractive feature
Particularly because juncmut can be performed on individual transcriptome sequence data
execution on large-scale transcriptome sequences is highly convenient
Our saturation analysis indicates that the continuous application of this method will lead to the identification of an increasing number of SSCVs as more sequence data is incorporated into the repository (Supplementary fig. 25)
The next important challenge will be to become capable of systematically and accurately predicting variants responsible for rare diseases and cancers from a vast list of SSCVs
with loss-of-function and gain-of-function variants intermingling for each gene
we can develop a system that autonomously archives important disease-related variants
some of which are targetable by splice-switching ASOs
We then extracted samples with a base number of ≥ 1 billion to ensure sufficient sequence coverage for reliable mutation detection
We removed run data that could not be downloaded even after repeated attempts (likely due to technical issues)
We also excluded sequence data that had severe issues
such as inconsistencies between the two paired-end files
discrepancies between sequence letters and base qualities
we discarded run data with an extremely high number of SSCVs
attributable to potential DNA contamination and other factors
We utilized the SRA Toolkit version 2.11.0
we executed the ‘prefetch’ command with the ‘–max-size 100000000’ option to download the SRA format file
we used the ‘fasterq-dump‘ command with the options ‘-v –split-files.’
We initiate the identification of aberrant splicing junctions within transcriptome sequence reads (more specifically
A splicing junction is characterized by its chromosomal location
and the end coordinate of the intron within each transcript
we established control panels of splicing junctions
which were consistently observed across multiple samples within specific cohorts
We processed transcriptome sequences from two cohorts
the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) Project
we processed 742 transcriptome sequence samples from non-tumorous tissues of cancer patients
We then identified a list of splicing junctions supported by a minimum of two reads across at least four different samples
we analyzed 8656 transcriptome sequences (comprising 479 individuals across 53 different tissues)
We extracted splicing junctions present in any tissue with two or more supporting reads in at least eight individuals
For each transcriptome sequence aligned by STAR
we extract splicing junctions focusing on those possibly associated with novel splice-site creation by a mutation
where one edge matches within a 5 bp margin to the exon-intron boundary of a known transcript (GENCODE Comprehensive gene annotation Release 31)
and the other edge does not correspond to any known exon-intron boundary (suggesting the formation of a novel splice-site)
we adjust the positions to ensure that one side of the edge perfectly aligns with the exon-intron boundary
The obtained splicing junctions at this stage are referred to as “primary novel splicing junctions (SJs).” For each primary novel SJ
we term the edge of a splicing junction that deviates from the exon-intron annotation as the “primary novel splice-site (SS),” and the edge that aligns with the annotation as the “matching splice-site (SS).”
We next remove primary novel SJs according to the following criteria:
Primary novel SJs with fewer than three supporting reads are excluded
Primary novel SJs that are registered in the splicing junction control panels generated in the previous section are excluded
we count the total number of supporting reads of splicing junctions sharing the matching SS
If the supporting read count for primary novel SJ constitutes less than 5% of this total
we count the number of intersecting splicing junctions
If the number of intersecting splicing junctions is five or more
we perform ‘samtools mpileup‘ and search for mutations that can explain the formation of the associated primary novel SS
For splice donor site creation (where the matching SS is an annotated acceptor)
we focus on the two exonic bases (positions -2 and -1) and the six intronic bases (positions +1 to +6)
we restrict our search to mutations that result in
we require that the first two bp of the intron in the primary novel SS be ‘GT’ following the mutation
For splice acceptor site creation (where the matching SS is an annotated donor)
we focus on the six intronic bases (positions -6 to -1) and the one exonic base (position +1) relative to the primary novel SS
we request that the last two bp of the intron in the primary novel SS be ‘AG’ following the mutation
If at least two mismatch corresponding to the relevant mutation is detected from the short reads of transcriptome sequence data
it is set as the candidate for splice-site creating variants (SSCVs) and subjected to validation via the subsequent realignment procedure
we perform an additional filtering step based on the realignment
For each candidate SSCV and its associated primary novel SJ
we prepare three types of “mini-transcripts”:
Primary novel transcript: extending 10 base pairs of the transcript sequence from both edges of the primary novel SJ
Reference transcripts: extending 10 base pairs of transcript sequences from both edges of the splicing junctions of know transcripts that share the matching SS (thus potentially resulting in multiple mini-transcripts)
Intron retention transcripts: extending 10 base pairs of genome sequences in both directions from the position of the SSCV
if the region of the transcript includes the SSCV
we also supply a version of the transcript with the SSCV mutation inserted
we are able to generate at most six types of mini-transcripts: primary novel transcripts with and without the SSCV
reference transcripts with and without the SSCV
and intron retention transcripts with and without the SSCV
The edit distance of the alignment must be two or less
There should be no mutations within 5 bp from the position of the candidate SSCVs
we choose the mini-transcript with the minimum edit distance
we choose in the following order: reference transcript without the SSCV
intron retention transcript without the SSCV
and primary novel transcript with the SSCV
If at least one read is classified as aligning with the primary novel
reference or intron-retention transcript with the SSCV
then the SSCV is retained as a final output
For each SSCV and its associated primary novel SJ detected by juncmut
we classify the types of splicing consequences
we predict the resulting amino acid changes
the generation of premature termination codons (PTCs)
and assess the susceptibility to nonsense-mediated decay (NMD)
the coordinate of the matching SS (the edge of the splicing junction that matches the exon-intron boundary of known transcripts) is identified from the corresponding primary novel SJ
we extract all transcripts that possess this matching SS within their exon-intron boundaries
we determine the transcript based on the following priorities:
The largest transcript (in the case of a tie
the transcript with the earlier ENST transcript ID is selected)
we identify the exon affected by the SSCV (referred to as the “affected exon”)
the affected exon is defined as the one whose start position is closest to
The end position of this affected exon is termed the “hijacked SS.” Conversely
the affected exon is the one whose end position is closest to
with the start position of this affected exon being the “hijacked SS.”
The splicing consequences of the SSCV are classified as follows:
“Partial exon loss” if the primary novel SS is located within the affected exon
“Cryptic exon” if the primary novel SS is located downstream or upstream (for donor and acceptor creation
and if there is a splicing junction with one edge corresponding to the hijacked SS and the other edge within 300 bp upstream or downstream (for donor and acceptor creation
“Exon extension” if the SSCV is not classified as a “cryptic exon,” and there is a sequence depth of one or greater observed from the hijacked SS to the primary novel SS
For SSCVs predicted to result in “partial exon loss,” “cryptic exon,” or “exon extension,” we investigate the consequent protein changes
determine whether they are in-frame or not
we verify whether the primary novel SJ is completely contained within the 5′UTR or the 3′UTR
and we exclude those scenarios from further analysis
We also ignore the cases where SSCVs cause skipping of the start or stop codon
we performed ‘samtools mpileup‘ directly on the corresponding CRAM file stored in Amazon Web Services (s3://1000genomes/1000G_2504_high_coverage/data/)
We decided that the SSCV predicted from RNAseq is a genuine genomic mutation (although the effect on splicing is uncertain at this point) if more than two reads support the base corresponding to the SSCV
and the proportion of these supporting reads exceeds 5% of all reads covering that position
We utilized GTEx data to verify whether SSCVs identified by juncmut in the 1000 Genomes Project transcriptome truly lead to significant splicing changes
We downloaded the GTEx transcriptome sequence data from the Sequence Read Archive and aligned them following the juncmut workflow
we downloaded GTEx V7 whole genome genotype calls
we counted the number of supporting reads for the corresponding primary novel SJ and hijacked SJ (splice junction connecting the matching SS and the hijacked SS in the reference transcript)
and we calculated the ratio of the primary novel SJ (#primary novel SJ / (#primary novel SJ + #hijacked SJ)) by parsing the SJ.out.tab file
we calculated the p-value that measures the difference in the ratio of the primary novel SJ between samples with and without the SSCV using a one-sided Wilcoxon rank-sum test with the wilcox.test function in the R language
We integrated these p-values using Fisher’s method across tissues
We also evaluated variants predicted to cause splice-site activation via SpliceAI as a comparison to juncmut
we downloaded VCF files from the 1000 Genomes Project from s3://1000genomes/1000G_2504_high_coverage/working/20201028_3202_phased/
we added the SpliceAI score using the precomputed file for all SNVs and 1 base insertions
we extracted SNVs with allele frequencies ≤ 0.01 satisfying the following criteria:
SNVs possessed by either of the 445 individuals whose matched transcriptome sequence data are available
SNVs where SpliceAI Delta score for acceptor or donor gain (DS_AG or DS_DG) is equal to or above 0.1
we identified a novel splice-site (corresponding to primary novel SS in the juncmut) using the information on Delta positions (DP_AG or DP_DG) provided by the SpliceAI annotation
based on GENCODE Basic annotation (Release 39)
identified the matching SS and hijacked SS
and obtained the corresponding hijacked SJ
for variants that were called in at least one GTEx sample
we calculated the combined p-values across tissues as above
To ensure a fair comparison between juncmut and SAVNet
we excluded variants identified by SAVNet if (1) the corresponding splicing junction is included in the control panel constructed from the GTEx transcriptome
(2) the support read count for the splicing junction is two or fewer
or (3) the proportion of the splicing junction is less than 0.05
we restricted to those splicing-associated variants in SAVNet that exhibit a pattern of substitution within novel splice motifs
we assessed the overlap of the variants identified by SAVNet and juncmut
we excluded splicing-associated variants detected by SAVNet
as was done in the comparison using 1000 Genomes Project data
we confined our analysis to SSCVs classified as “somatic,” aligning with SAVNet
which exclusively targets somatic variants
we convert the coordinates of the SSCV position
and the secondary SS (for SSCVs resulting in cryptic exons) in the human reference genome to the coordinate system in the reference Alu sequence
Japan) in 2005 and authenticated in 2022 using the Promega GenePrint 10 System (BEX)
Leukemia Jurkat cell line was purchased from RIKEN BioResource Research Center in 2022
PC-9 and Jurkat cells were grown in RPMI1640 (Gibco) with 10% FBS (Gibco) and 1% penicillin/streptomycin (Wako)
Lenti-X 293 T cell lines were purchased from Takara in 2021 and were cultured in DMEM (Gibco) with 10% FBS and 1% penicillin/streptomycin
All cell lines were tested negative for mycoplasma using Mycoplasma Plus PCR Primer Set (Agilent)
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
The list of splice-site creating variants are accessible through SSCV DB (https://sscvdb.io) and Zenodo (https://doi.org/10.5281/zenodo.14053979). Source data are provided with this paper
The workflow of juncmut is available at GitHub (https://github.com/ncc-gap/juncmut). The code of the version used in this study is available at Zenodo (https://doi.org/10.5281/zenodo.14011414)
The expanding landscape of alternative splicing variation in human populations
Splicing in disease: disruption of the splicing code and the decoding machinery
A comprehensive characterization of cis-acting splicing-associated variants in human cancer
Intron retention is a widespread mechanism of tumor-suppressor inactivation
Systematic analysis of splice-site-creating mutations in cancer
Discovery of driver non-coding splice-site-creating mutations in cancer
Annotation-free quantification of RNA splicing using LeafCutter
Transcriptome and genome sequencing uncovers functional variation in humans
& International nucleotide sequence database collaboration
Reproducible RNA-seq analysis using recount2
recount3: summaries and queries for large-scale RNA-seq expression and splicing
Massive mining of publicly available RNA-seq data from human and mouse
Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data
Splicing mutations in human genetic disorders: examples
A framework for individualized splice-switching oligonucleotide therapy
Intravitreal antisense oligonucleotide sepofarsen in Leber congenital amaurosis type 10: a phase 1b/2 trial
Spectrum of NPHP6/CEP290 mutations in Leber congenital amaurosis and delineation of the associated phenotype
Effect of an intravitreal antisense oligonucleotide on vision in Leber congenital amaurosis due to a photoreceptor cilium defect
The Genotype-Tissue Expression (GTEx) project
The mutational constraint spectrum quantified from variation in 141,456 humans
High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios
Predicting splicing from primary sequence with deep learning
Aberrant splicing prediction across human tissues
Genomic basis for RNA alterations in cancer
Comprehensive characterization of cancer driver genes and mutations
Comprehensive pan-genomic characterization of adrenocortical carcinoma
Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing
Empirical prediction of variant-activated cryptic splice donors using population-based RNA-Seq data
To NMD or Not To NMD: nonsense-mediated mRNA Decay in cancer and other genetic diseases
Haque, B. et al. Estimating the proportion of nonsense variants undergoing the newly described phenomenon of manufactured splice rescue. Eur. J. Hum. Genet. https://doi.org/10.21203/rs.3.rs-3054906/v1 (2023)
Evolutionary history of 7SL RNA-derived SINEs in supraprimates
Alternative splicing of Alu exons–two arms are better than one
The birth of new exons: mechanisms and evolutionary consequences
The birth of an alternatively spliced exon: 3’ splice-site selection in Alu exons
Detection of alu exonization events in human frontal cortex from RNA-seq data
Alu-containing exons are alternatively spliced
Highly sensitive and specific Alu-based quantification of human cells among rodent cells
ClinVar: improving access to variant interpretations and supporting evidence
The COSMIC cancer gene census: describing genetic dysfunction across all human cancers
ACMG SF v3.2 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG)
Cellular functions of the protein kinase ATM and their relevance to human disease
Exploitation of EP300 and CREBBP lysine acetyltransferases by cancer
Mutant p53 in cancer: from molecular mechanism to therapeutic modulation
Pharmacological reactivation of p53 in the era of precision anticancer medicine
NOTCH1 extracellular juxtamembrane expansion mutations in T-ALL
Impact of NOTCH1/FBXW7 mutations on outcome in pediatric T-cell acute lymphoblastic leukemia patients treated on the MRC UKALL 2003 trial
Silent mutations reveal therapeutic vulnerability in RAS Q61 cancers
Genomic and biological study of fusion genes as resistance mechanisms to EGFR inhibitors
Evaluating human mutation databases for ‘treatability’ using patient-customized therapy
Patient-customized oligonucleotide therapy for a rare genetic disease
Detection of aberrant splicing events in RNA-seq data using FRASER
TECHNICAL COMMENT ‘ comment on widespread RNA and DNA Sequence differences in the human transcriptome’
exact sequence alignment using edit distance
BEDTools: a flexible suite of utilities for comparing genomic features
Biopython: freely available Python tools for computational molecular biology and bioinformatics
Download references
The authors thank Erika Kawasaki and Rika Murakami (Division of Molecular Pathology
National Cancer Center Research Institute) for technical assistance
These authors contributed equally: Naoko Iida
Division of Genome Analysis Platform Development
developed the software for detecting splice-site creating variants
developed a platform for analyzing massive transcriptome sequence data deposited in the Sequence Read Archive
organized and interpreted the list of SSCVs
including the development of model systems using CRISPR editing
and splice-switching antisense oligonucleotide administration
provided computational assistance across various aspects of the project
The authors declare no competing interests
who co-reviewed with Sirui Zhang; and the other
reviewers for their contribution to the peer review of this work
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations
Download citation
DOI: https://doi.org/10.1038/s41467-024-55185-y
Anyone you share the following link with will be able to read this content:
a shareable link is not currently available for this article
Sign up for the Nature Briefing newsletter — what matters in science
The two companies plan to work together on new products that "blend Spitfire Audio’s cinematic soundscapes and orchestral expertise with Splice’s sample catalog and AI-powered discovery engine"
Music creation platform Splice has acquired Spitfire Audio, the British sample library and virtual instrument developer, for around $50 million, according to reports in The Financial Times
A press release shared by Splice states that the two companies are planning to work together on developing new products that blend Spitfire Audio's "cinematic soundscapes and orchestral expertise" with Splice's sample catalogue and AI-powered discovery engine
Spitfire Audio is a UK-based independent company known for producing high-end orchestral sample libraries and virtual instruments
Founded in 2007 by composers Paul Thomson and Christian Henson
the company's software tools are popular with composers and producers working in film and television
Splice has confirmed that both companies will continue to operate independently "in the near term"
Olivier Robert-Murphy will remain in place as Spitfire Audio's CEO
while Paul Thomson will continue to oversee creative direction for the company
“The teams at Spitfire Audio and Splice have deep respect for composers
musicians and producers and are committed to celebrating and supporting their work”
Splice CEO Kakul Srivastava said in a statement
Our shared vision is to develop tools that expand - not replace - human creativity
With Spitfire’s expressive instruments and Splice’s AI-powered platform
we’re just beginning to explore what’s possible.”
Thomson reassured viewers that Spitfire Audio will continue releasing perpetual-license sample libraries and supporting its existing products
allaying fears that the company could transition entirely to a Splice-style subscription model
Reactions to the video have not been wholly positive
"As an owner of a significant investment in Spitfire Audio sample libraries
I don’t know what to believe about future stability."
“Most musicians do not want to make music that way
but AI will enable [artists] to do things they could not do today,” she said
"They could use string quartets from Spitfire
but you might want to invent your own instrument
You can start with a particular sound and merge instruments together to get a novel sound that has never been heard before..
Read our 2024 interview with Splice CEO Kakul Srivastava.
Matt MullenSocial Links NavigationTech EditorI'm MusicRadar's Tech Editor
working across everything from product news and gear-focused features to artist interviews and tech tutorials
I love electronic music and I'm perpetually fascinated by the tools we use to make it
you'll probably find me behind a MIDI keyboard
carefully crafting the beginnings of another project that I'll ultimately abandon to the creative graveyard that is my overstuffed hard drive
you will then be prompted to enter your display name
“All Strats aren't equal… Then it’s how you smack it
or zing it or strum it… A lot of it is that too”: Session guitar legend Micheal Thompson reveals how he created the famous clean tone that’s on countless '80s and '90s hits
“It’s jokingly the worst rap album in history because there are no lyrics on it at all”: It turns out there’s a reason why André 3000 turned up at the Met Gala with a grand piano on his back
Years after Thom Yorke told her that she was “the only one doing anything interesting these days,” Billie Eilish has covered Radiohead’s Creep
Essential digital access to quality FT journalism on any device
Complete digital access to quality FT journalism with expert analysis from industry leaders
Complete digital access to quality analysis and expert insights
complemented with our award-winning Weekend Print edition
Terms & Conditions apply
Discover all the plans currently available in your country
See why over a million readers pay to read the Financial Times
Wesleyan students, faculty, and staff can RSVP on WesNest
The SPLICE Ensemble features Keith Kirchoff on piano
Focused on cultivating a canon of electroacoustic chamber music
the group has previously premiered works by students of Professor of Music and Director of Graduate Studies Paula Matthusen in 2020
and has also performed Matthusen's works including site-specific recordings in Mammoth Cave in Kentucky.Featuring works by graduate music students Lea Bertucci
and Carl Testa ’06. The evening also features a sound installation by Sam Boston ’25
It may not display all features of this and other websites
Please upgrade your browser
Login
For assistance please contact Our Customer Service on: Tel: +44(0)20 8955 7020. Email: musicweek@abacusemedia.com
Please enter your email so we can send you password reset link
An email has been sent to you containing a link to reset your password
Music creation platform Splice has acquired Spitfire Audio
the UK-based developer of high-end virtual instrument libraries.
The acquisition marks Splice’s entry into the fast-growing plugin space
adding to the company’s Splice Sounds subscriptions and rent-to-own businesses
The plugin market alone is valued at $640 million
while the wider music software and services sector exceeds $7 billion
Since launching its Splice Sounds platform in 2015
Splice has become a key player in modern music production
One million sounds are downloaded every day from its sample catalogue
Splice has more than 10 million music producers and creators using its ethical AI-powered platform.
Founded in 2007, Spitfire Audio has become an established platform for composers, producers, artists and musicians. The British company provides virtual instrument libraries, including recordings by Hans Zimmer, Olafur Arnalds
“The teams at Spitfire Audio and Splice have deep respect for composers, musicians and producers and are committed to celebrating and supporting their work”, said Kakul Srivastava, CEO of Splice
Our shared vision is to develop tools that expand – not replace – human creativity
“Our shared vision is to develop tools that expand – not replace – human creativity,” Srivastava added
“With Spitfire’s expressive instruments and Splice’s AI-powered platform
The companies are set to start work on new products that blend Spitfire Audio’s cinematic soundscapes and orchestral expertise with Splice’s sample catalogue and AI-powered discovery engine.
“We’ve always focused on inspiring people to create extraordinary music,” said Paul Thomson, who co-founded Spitfire Audio with Christian Henson
The combined company is well positioned to capitalise on growth in the music creation market, which is projected to nearly double to $14 billion by 2031, according to MIDiA Research.
“Splice has already built an incredible business,” added Olivier Robert-Murphy
“Joining forces means Spitfire Audio’s sounds will find new homes in studios around the world—whether that’s a bedroom producer or a blockbuster composer.”
Both Splice and Spitfire Audio will continue to operate independently in the near term
Robert-Murphy will remain CEO of Spitfire Audio
while Thomson will continue to oversee Spitfire Audio’s creative direction
PHOTO: (L-R) Paul Thomson and Kakul Srivastava (photo by Matthew Johnson)
For more stories like this, and to keep up to date with all our market leading news, features and analysis, sign up to receive our daily Morning Briefing newsletter
You are using an outdated browser. Upgrade your browser today or install Google Chrome Frame to better experience this site
a group company of Sumitomo Electric Industries
announced that it was named an honoree in the 2025 Lightwave+BTR Innovation Reviews for Lynx-CustomFit™ Splice-On Connectors.*1 Lynx-CustomFit™ was recognized as an innovative product that excels in both error-free assembly and reliability
earning the highest score in the Optical Components category
the largest global conference and exhibition for optical communications and networking professionals
Our shared vision is to develop tools that expand – not replace – human creativity.”
When you purchase through affiliate links on MusicTech.com, you may contribute to our site through commissions. Learn moreCredit: Matthew Johnson
Get MusicTech breaking news as it happens by following us on Telegram: https://t.me/MusicTechOfficial
the company says its decision to purchase Spitfire comes as the plugin market is valued at $640 million
and the wider music software and services sector “exceeds $7 billion”
Described as the “leading platform for music creation”
Splice hosts a sample library with thousands of royalty-free sounds
and a growing suite of AI tools to help creators “unlock inspiration
experiment with sound and generate unique compositions”
offers a collection of virtual instrument libraries
including collections made in collaboration with Hans Zimmer
“The teams at Spitfire Audio and Splice have deep respect for composers
musicians and producers and are committed to celebrating and supporting their work”
Our shared vision is to develop tools that expand – not replace – human creativity
Splice and Spitfire are planning to release new products which “blend Spitfire Audio’s cinematic soundscapes and orchestral expertise with Splice’s sample catalogue and AI-powered discovery engine”
“We’ve always focused on inspiring people to create extraordinary music,” says Paul Thomson
The music creation market is reportedly projected to nearly double to $14 billion by 2031 [per MIDiA Research]
Splice hopes to position itself to lead the market
“Splice has already built an incredible business,” added Olivier Robert-Murphy
“Joining forces means Spitfire Audio’s sounds will find new homes in studios around the world – whether that’s a bedroom producer or a blockbuster composer.”
both Splice and Spitfire Audio will continue to operate independently for the time being
Olivier Robert-Murphy will remain as Spitfire CEO
while Paul Thomson will continue to oversee Spitfire Audio’s creative direction
To read some FAQs surrounding the acquisition, head to Spitfire Audio.
Get the latest news, reviews and tutorials to your inbox.
The world’s leading media brand at the intersection of music and technology.
Suggestions or feedback?
RNA splicing is a cellular process that is critical for gene expression
After genes are copied from DNA into messenger RNA
portions of the RNA that don’t code for proteins
are cut out and the coding portions are spliced back together
This process is controlled by a large protein-RNA complex called the spliceosome
MIT biologists have now discovered a new layer of regulation that helps to determine which sites on the messenger RNA molecule the spliceosome will target
The research team discovered that this type of regulation
which appears to influence the expression of about half of all human genes
The findings suggest that the control of RNA splicing
a process that is fundamental to gene expression
is more complicated than it is in some model organisms like yeast
even though it’s a very conserved molecular process
There are bells and whistles on the human spliceosome that allow it to process specific introns more efficiently
One of the advantages of a system like this may be that it allows more complex types of gene regulation,” says Connor Kenny
an MIT graduate student and the lead author of the study
Christopher Burge, the Uncas and Helen Whitaker Professor of Biology at MIT, is the senior author of the study, which appears today in Nature Communications
allows cells to precisely control the content of the mRNA transcripts that carry the instructions for building proteins
Each mRNA transcript contains coding regions
They also include sites that act as signals for where splicing should occur
allowing the cell to assemble the correct sequence for a desired protein
This process enables a single gene to produce multiple proteins; over evolutionary timescales
splicing can also change the size and content of genes and proteins
when different exons become included or excluded
is composed of proteins and noncoding RNAs called small nuclear RNAs (snRNAs)
an snRNA molecule known as U1 snRNA binds to the 5’ splice site at the beginning of the intron
it had been thought that the binding strength between the 5’ splice site and the U1 snRNA was the most important determinant of whether an intron would be spliced out of the mRNA transcript
the MIT team discovered that a family of proteins called LUC7 also helps to determine whether splicing will occur
but only for a subset of introns — in human cells
it was known that LUC7 proteins associate with U1 snRNA
There are three different LUC7 proteins in human cells
and Kenny’s experiments revealed that two of these proteins interact specifically with one type of 5’ splice site
which the researchers called “right-handed.” A third human LUC7 protein interacts with a different type
The researchers found that about half of human introns contain a right- or left-handed site
while the other half do not appear to be controlled by interaction with LUC7 proteins
This type of control appears to add another layer of regulation that helps remove specific introns more efficiently
“The paper shows that these two different 5’ splice site subclasses exist and can be regulated independently of one another,” Kenny says
“Some of these core splicing processes are actually more complex than we previously appreciated
which warrants more careful examination of what we believe to be true about these highly conserved molecular processes.”
Previous work has shown that mutation or deletion of one of the LUC7 proteins that bind to right-handed splice sites is linked to blood cancers
including about 10 percent of acute myeloid leukemias (AMLs)
the researchers found that AMLs that lost a copy of the LUC7L2 gene have inefficient splicing of right-handed splice sites
These cancers also developed the same type of altered metabolism seen in earlier work
“Understanding how the loss of this LUC7 protein in some AMLs alters splicing could help in the design of therapies that exploit these splicing differences to treat AML,” Burge says
“There are also small molecule drugs for other diseases such as spinal muscular atrophy that stabilize the interaction between U1 snRNA and specific 5’ splice sites
So the knowledge that particular LUC7 proteins influence these interactions at specific splice sites could aid in improving the specificity of this class of small molecules.”
Working with a lab led by Sascha Laubinger
a professor at Martin Luther University Halle-Wittenberg
the researchers found that introns in plants also have right- and left-handed 5’ splice sites that are regulated by Luc7 proteins
The researchers’ analysis suggests that this type of splicing arose in a common ancestor of plants
but it was lost from fungi soon after they diverged from plants and animals
“A lot what we know about how splicing works and what are the core components actually comes from relatively old yeast genetics work,” Kenny says
“What we see is that humans and plants tend to have more complex splicing machinery
with additional components that can regulate different introns independently.”
The researchers now plan to further analyze the structures formed by the interactions of Luc7 proteins with mRNA and the rest of the spliceosome
which could help them figure out in more detail how different forms of Luc7 bind to different 5’ splice sites
National Institutes of Health and the German Research Foundation
This website is managed by the MIT News Office, part of the Institute Office of Communications
Massachusetts Institute of Technology77 Massachusetts Avenue
Metrics details
A Publisher Correction to this article was published on 18 December 2024
This article has been updated
Mutations that affect RNA splicing significantly impact human diversity and disease
Here we present a method using transformers
to detect splicing from raw 45,000-nucleotide sequences
We generate embeddings with residual neural networks and apply hard attention to select splice site candidates
enabling efficient training on long sequences
in detecting splice sites in GENCODE and ENSEMBL annotations
Using extensive RNA sequencing data from an Icelandic cohort of 17,848 individuals and the Genotype-Tissue Expression (GTEx) project
our method demonstrates superior performance in detecting splice junctions compared to SpliceAI-10k (PR-AUC = 0.834 vs
PR-AUC = 0.820) and is more effective at identifying disease-related splice variants in ClinVar (PR-AUC = 0.997 vs
These advancements hold promise for improving genetic research and clinical diagnostics
potentially leading to better understanding and treatment of splicing-related diseases
it is difficult to scale them to long sequence context
This is because self-attention scales quadratically with sequence length
a For each position in an input DNA sequence
the method looks at the surrounding context region and outputs a predicted score for three options: no splicing
b Comparison of Transformer-45k with SpliceAI-10k on both ENSEMBL and GENCODE annotations with regard to area under the precision-recall curve (PR-AUC) and top-k accuracy
95% confidence intervals (CIs) are shown in brackets
N denotes the number of splice sites in the test set
the total size of the ENSEMBL test set is 664,940,000 nt
c Receiver operating characteristic (ROC) curve and precision-recall curve for cases where SpliceAI and Transformer-45k disagree (TVD ≥0.1)
d The total number of false positive and true positive splice sites as a function of the decision threshold for cases where SpliceAI and Transformer-45k disagree (TVD ≥0.1)
We can also look at the top-k decision thresholds to look at the agreement between predicted splice sites
here we see that the models agree on 175,825 splice sites and 664,757,322 non-splice sites
and they disagree on 3599 splice sites and 3254 non-splice sites
For cases where the the predictions disagree
Transformer-45k has 0.609 accuracy (4172 correct sites) and SpliceAI-10k has 0.391 accuracy (2681 correct sites)
The predictions are mostly in agreement
except SpliceAI-10k does not detect the acceptor for the final exon
our method detects 98.5% of junctions annotated in ENSEMBL and 71.8% of unannotated junctions
while SpliceAI-10k with pre-trained weights detects 96.6% of the annotated junctions and 53.8% of unannotated junctions
Transformer-45k fine-tuned only on GTEx splice site annotations detects 98.1% of junctions annotated in ENSEMBL and 67.6% of unannotated junctions
a PR-AUC plotted against maximum distance from an sQTL to the closest splice site annotation
b Precision-recall curve for sQTLs determined to be splice-disrupting or splice-creating
c Precision-recall curve for 35,464 pathogenic splice variants in ClinVar
d A scatter plot showing the distribution of delta scores for non-splicing variants (n = 40,528)
and pathogenic splice variants (n = 35,464)
This variability suggests a more complex prediction landscape for benign splice variants
we have looked at predicting splice sites with transformers and shown that they can learn to utilize long sequence contexts to predict splicing with better classification accuracy than the current best splice site prediction methods in the literature
We tested our method on splice site annotations from ENSEMBL and GENCODE and showed that it was able to predict splicing with greater accuracy than SpliceAI-10k
both with regard to PR-AUC and top-k accuracy
Focusing on the splice site predictions where our method disagreed with SpliceAI-10k
we saw that our method makes fewer false positive predictions while making about as many true positive predictions
By providing the transformer with a list of 512 potential splice sites
we enable it to produce more accurate predictions than those achieved with SpliceAI alone
This improvement may be attributed to the model’s ability to learn the dependencies between splice sites over a larger sequence context
supporting the hypothesis that longer raw sequences are beneficial for capturing splice site interactions
When classifying unannotated splice junctions and splice variants
we found that fine-tuning the model on RNA-Seq data was necessary to achieve better performance than SpliceAI-10k
This is likely due to our training set only consisting of protein-coding ENSEMBL transcripts
Genes can have multiple transcript annotations and splice sites observed in RNA-Seq can come from any one of these transcripts
The ENSEMBL annotations can be combined into a single gene annotation and this could improve performance for detecting splice sites in RNA-Seq data
and many splice sites observed in RNA-Seq would still be missing from the annotations
We conducted an additional experiment in Fig. 1b to determine whether the observed performance improvement can be attributable to the transformer architecture or the increased sequence context
We trained a transformer model using a 10kb context and compared its performance to both SpliceAI and our 45kb context transformer model
The 10kb context transformer model outperformed SpliceAI
confirming that the transformer architecture contributes to more accurate predictions
the 45kb context transformer model achieved the highest performance
highlighting that an extended sequence context is a significant factor in improving model accuracy
our method can be trained on larger contexts than 45,000 nt
and there is no reason to assume that increasing the context further will not be beneficial
The same applies to the number of selected splice sites and parameters such as depth and number of heads
these models may need more training data or longer training to show any improvements over our current method
A known issue with policy gradient methods is their tendency to exhibit high gradient variance
this can slow down convergence or prevent the model from reaching optimal policies
our model quickly learned a policy that selected almost all annotated splice sites
reducing gradient variance could potentially further refine the policy and improve model performance
we designed a splice site prediction method that utilizes transformers and showed that they can significantly improve the state-of-the-art
Our method utilizes hard attention to reduce pre-mRNA sequences to a set of potential splice sites
that have a manageable length for transformers to learn long-range dependencies between splice sites
The model is trained on a about four times larger context than SpliceAI and Nucleotide Transformer v2
we showed that the Transformer-45k makes fewer false positive predictions than SpliceAI while predicting about as many true positives
that Transformer-45k primarily attends to other annotated splice sites when performing splice site predictions
when our method is fine-tuned on RNA-Seq data from a large Icelandic cohort and GTEx V8
it detects more unannotated splice junctions
and pathogenic splice variants than SpliceAI
Only protein-coding transcripts with one or more splice junctions were used and transcripts on chromosomes 1
In ENSEMBL we only selected transcripts with support level 1
This resulted in an ENSEMBL training set that has 22,375 transcripts and a GENCODE training set with 13,384 transcripts
The corresponding ENSEMBL test set includes 8955 transcripts and the GENCODE test set includes 1652
Before training we removed 10% of transcripts from the training set and placed them in a validation set
in the ENSEMBL-based annotations 21,432 splice junctions were selected into this set
Nucleotides were one-hot encoded as as A = [1,0,0,0]
The labels were encoded as ’no splicing’ = [1,0,0]
Nucleotide sequences are stored in sparse arrays split by chromosome
where nucleotides outside of genes are stored as zeros
The array indices correspond to nucleotide chromosome position and to deal with negative-strand genes we reverse complement the nucleotide sequences on the fly
This allows us to easily change the context and sequence length without needing to write a copy of the sequence to disk
The proposed method consists of three main parts
All three parts of the model are trained simultaneously from scratch and optimized with the following loss function:
This combined loss function is designed to simultaneously quantify the models' proficiency at splice site classification (Cross-entropy loss) and the selector modules' ability to select relevant splice sites (Policy loss)
The Policy loss is scaled by a factor λ and during training
To train the model we use a 2D cross-entropy loss:
where N is the length of the sequence context
yi,j is a one-hot encoded splice site label (‘no splice’
The encoder module takes a pre-mRNA sequence as input and maps each position in the sequence to a 32-dimensional (32D) feature space based on its context. Nucleotides in the input sequence are one-hot encoded and mapped using a CNN that has the same architecture as SpliceAI-10k (Fig. 1a)
the module can learn to encode information for each sequence position from its surrounding 10k context
We base the encoder architecture on SpliceAI since it has been thoroughly tested and shown to be effective at splice site prediction
This allows us to focus on designing other parts of the model
where the policy πθ is the probability of taking action \({a}_{s}^{t}\) at step t and trajectory s
given an embedding Xs and previous actions \({C}_{s}^{t-1}\)
\({C}_{s}^{t-1}\) is an indicator vector that masks out previous actions and prevents the policy from selecting the same splice site twice
S is the total number of trajectories and \({R}_{s}^{t}\) is the reward
we found that using one trajectory for each sequence was enough to achieve stable training
We want the module to select the annotated splice sites and also select promising functional splice sites that are not in the annotations
annotated splice sites receive reward \({R}_{s}^{t}=1\) and other sites \({R}_{s}^{t}=0\)
This ensures that the policy is not penalized for selecting non-splice sites
An exception is made if the acceptor selector selects an annotated donor
to discourage the selector from selecting the wrong splice type
here a \({R}_{s}^{t}=-1\) penalty is given
The policy is parameterized by a fully connected feed-forward network with 32D vector input
one hidden layer with four units and a leaky ReLU activation
The policy network learns to take embeddings from the encoder module as inputs and returns acceptor and donor site logits as outputs
During training these logits are used to parameterize two categorical distributions
The policy alternates between sampling acceptor and donor sites from the distributions
until it has selected 512 potential splice sites
we simply select the acceptors and donors with the largest logits
The output of the transformer module is finally sent to the prediction head
This is a convolutional layer with kernel size one and a softmax activation function
it maps 32D feature maps down to three feature maps
where the three possible outputs correspond to ‘no splice’
All models were trained for 10 epochs with the AdamW optimizer44 and with 96 samples per batch
We used linear warm-up for the first 1000 optimization steps
After five epochs the learning rate was reduced by half each epoch
The model weights were randomly initialized ten times and trained
Training the model for ten epochs with 3 NVIDIA A100 GPUs takes about 9 hours
SpliceAI-10k was retrained on data and code made available by Jaganathan et al.8
The original model was implemented using Keras (version 2.0.5) with TensorFlow backend and is trained on a GENCODE annotations constructed by the authors
we implemented the model using PyTorch and constructed a training set using ENSEMBL annotations
The reported results for the methods trained on ENSEMBL are the average predictions of ten models
To fine-tune the models on data from the Icelandic RNA-Seq cohort and GTEx V8
weights from the ENSEMBL dataset training runs were used as a starting point and trained for four additional epochs on splice sites obtained from RNA-Seq
During fine-tuning all weights were kept trainable and the learning rate was set to 2e−4
The RNA-Seq data from the Icelandic cohort consist of 17,848 samples drawn from blood
from the same number of individuals (9784 females
8064 males) collected using Illumina NovaSeq and HiSeq machines with read length 2 × 125 and poly-A mRNA isolation
These samples were aligned separately to the maternal and paternal inherited genome references using STAR v2.5.3a
we transferred the alignment files (BAM) to GRCh38 reference space (updating CIGAR and POS fields)
merged the two files into a single BAM file
and annotated the parental alignment with a higher alignment score as primary alignment
The alignment files were scanned to detect splice sites from the CIGAR strings of primary alignment
Alignment counts per splice site were gathered on the fragment level and annotated with information on multi-mapping and length of sequence overhang aligned to aside exons
Splice sites were included if one individual fulfilled the following splice count requirements; (1) at least 4 fragments mapped
(2) maximum of shorter overhang is larger than 7 base pairs
(3) log2 entropy of left and right overhang length is larger or equal to two and (4) donor or acceptor site is within annotated gene boundary
Using aggregated data from all individuals
splice sites were filtered out if multi-mapped alignment excited more than 20% of mapped alignments or if the maximum fragment count was less than 5% of the expected transcript abundance
After filtering 351,546 splice sites were used in subsequent analysis
These sets of splice sites allowed us to quantify alternative splicing by calculating the percentage spliced in (PSI) per individual; the proportion of splice count divided by the total number of fragments aligned to any of the splice sites in the SOSJ
A cis-sQTL scan was carried out by testing for association between PSI and sequence variants closer than 30kb to annotated gene overlapping SOSJ
The most significant sequence variants associated with PSI were annotated as lead-sQTLs
The cohort was a homogeneous population of 17,848 Icelanders (9784 females
The year of birth (YOB) data was binned into 5-year intervals
with the oldest participants born closest to 1920 and the youngest born closest to 2005
we adjusted for both technical covariates and kinship
since the pedigree of Icelanders was available
PSI values were adjusted for technical covariates (median coverage variation
and age was evaluated as a potential covariate but excluded due to minimal contribution to PSI variation
we detected 257,372 lead-sQTL of which 146,372 are within genes and pass a basic quality filter (REF ≠ ALT)
We detect 80,976 lead-sQTLs with p-values below the Bonferroni threshold (\(\frac{0.05}{146,372}\))
1588 sQTLs disrupt highly conservative splice motifs GT/AG while 2113 sQTL
These variants are highly likely to truly affect splicing and we refer to them as splice-disrupting if they remove a splice motif and splice-creating if they create a splice motif
we constructed a list of variants in the vicinity of the lead-sQTL that RNA sequencing never detects to affect splicing
we randomly select one of these negative examples for each lead-sQTL
The splice site annotations used for fine-tuning our model were constructed by combining RNA-Seq splice junctions detected in all 49 tissues in GTEx and the Icelandic blood samples
Junction reads were selected if they were present in four or more individuals
and if either end of the junction was present in the canonical transcript for a gene
The combined set of splice site annotations consists of 360,601 acceptors and 359,934 donors from 17,239 genes
using the same method to construct annotations using exclusively reads from tissues in GTEx V8
we identified 310,532 acceptors and 311,499 donors from 16,308 genes
as the fraction of k positions that are correctly predicted to belong to a class
where k is the number of positions truly belonging to the class and the decision threshold is chosen so that exactly k positions are predicted for this class
To calculate 95% confidence intervals for PR-AUC and top-k accuracy we performed bootstrapping with 1000 samples
where P and Q are probability distributions
we visualized the attention in transformer encoders by calculating
the average value of all attention matrices
Statistical analyses were conducted to identify and replicate splicing quantitative trait loci (sQTLs) in our cohort compared to those reported in the GTEx V8 whole blood dataset
significant sQTLs were determined using a false discovery rate (FDR) threshold of 5% (q-value < 0.05) to control for multiple testing
Replicates were defined as the lead-sQTLs identified in GTEx that were also present in our dataset
We assessed replication by testing these variants for association with the corresponding splicing events in our cohort
A replication was considered successful if the variant showed a significant association at a Bonferroni-adjusted p-value threshold (\(\frac{0.05}{1,972}\))
The majority (94.2% [1858 out of 1972]) of lead-sQTLs from GTEx were replicated in our cohort
indicating high reproducibility of the findings
To compute the delta score we followed the procedure outlined by Jaganathan et al.8
We first calculate the difference between the predictions for an alternative sequence that includes a sequence variant and the prediction for the reference sequence
Then the location and splice site with the highest absolute difference in either the acceptor or donor site predictions is located
This difference is defined as the delta score and if the score is sufficiently high
it indicates a splice site gain or loss at that location
We downloaded the ClinVar variants in variant call format and selected variants that were marked as splice variants
These variants were then labeled as pathogenic if their clinical significance was annotated as pathogenic or likely pathogenic and benign if their clinical significance was annotated as benign or likely benign
This resulted in 35,464 variants labeled as pathogenic and 1001 labeled as benign
To calculate PR-AUC for delta scores we used 40,528 variants as negative examples
that had been determined to be highly unlikely to affect splicing based on differential splicing analysis in whole blood
This research received approval from the National Bioethics Committee of Iceland (approval number VSN 14-015) and was conducted in accordance with guidelines from the Icelandic Data Protection Authority (PV_2017060950þS/–)
Informed consent was obtained from all participants
and an external party encrypted all personal identifiers before they were added to the deCODE database
All ethical regulations relevant to human research participants were followed
Local researchers from deCODE genetics in Iceland were actively involved throughout the research process
The research was developed in collaboration with local partners to ensure its relevance to the Icelandic population and the broader scientific community
The study did not involve any activities that are restricted or prohibited in the researchers’ setting
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
The Icelandic RNA-Seq data used in this study are not publicly available due to information
that could compromise research participant privacy
and releasing this information publicly is against Icelandic state law
Other data supporting the findings of this study are available from the corresponding authors upon reasonable request
A Correction to this paper has been published: https://doi.org/10.1038/s42003-024-07379-9
RNA splicing is a primary link between genetic variation and disease
Pre-mRNA splicing in disease and therapeutics
Recommendations for clinical interpretation of variants found in non-coding regions of the genome
Pathogenic variants that alter protein code often disrupt splicing
Improving genetic diagnosis in Mendelian disease with transcriptome sequencing
Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders
Cadd-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores
CI-SpliceAI—improving machine learning predictions of disease causing splicing variants using curated alternative splice sites
Romero, D. W. et al. Towards a general purpose CNN for long range dependencies in ND. Preprint at https://arxiv.org/abs/2206.03398 (2022)
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2020)
Highly accurate protein structure prediction with alphafold
Effective gene expression prediction from sequence by integrating long-range interactions
Dalla-Torre, H. et al. The nucleotide transformer: building and evaluating robust foundation models for human genomics. Preprint at bioRxiv https://doi.org/10.1101/2023.01.11.523679v1 (2023)
DNABERT-2: efficient foundation model and benchmark for multi-species genome
Flashattention: fast and memory-efficient exact attention with io-awareness
Child, R., Gray, S., Radford, A. & Sutskever, I. Generating long sequences with sparse transformers. Preprint at https://arxiv.org/abs/1904.10509 (2019)
Dai, Z. et al. Transformer-xl: attentive language models beyond a fixed-length context. Preprint at https://arxiv.org/abs/1901.02860 (2019)
Beltagy, I., Peters, M. E. & Cohan, A. Longformer: the long-document transformer. Preprint at https://arxiv.org/abs/2004.05150 (2020)
Big bird: Transformers for longer sequences
Efficiently modeling long sequences with structured state spaces
Sun, Y. et al. Retentive network: a successor to transformer for large language models. Preprint at https://arxiv.org/abs/2307.08621 (2023)
Hyena hierarchy: towards larger convolutional language models
Hyenadna: long-range genomic sequence modeling at single nucleotide resolution
attend and tell: Neural image caption generation with visual attention
In International Conference on Machine Learning 2048–2057 (PMLR
Saccader: improving accuracy of hard attention models for vision
The gtex consortium atlas of genetic regulatory effects across human tissues
Clinvar: public archive of relationships among sequence variation and human phenotype
Splicevault predicts the precise nature of variant-associated mis-splicing
Predicting RNA splicing from DNA sequence using pangolin
Gencode: the reference human genome annotation for the encode project
Reinforcement Learning: An Introduction (MIT Press
Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016)
On layer normalization in the transformer architecture
In International Conference on Machine Learning
Hendrycks, D. & Gimpel, K. Gaussian error linear units (gelus). Preprint at https://arxiv.org/abs/1606.08415 (2016)
Annotation-free quantification of RNA splicing using leafcutter
Jónsson, B. et al. Supplementary data for ’transformers significantly improve splice site prediction’. Zenodo https://doi.org/10.5281/zenodo.14109868 (2024)
Jónsson, B. et al. Transformers significantly improve splice site prediction (figure data). figshare https://doi.org/10.6084/m9.figshare.27607056 (2024)
Jónsson, B. et al. Spliceformer: transformer model for splice site prediction. Zenodo https://doi.org/10.5281/zenodo.14019451 (2024)
Download references
developed the method and designed statistical experiments
oversaw the processing RNA-Seq data and analysis of sQTLs in the Icelandic cohort
contributed to writing the final version of the manuscript
All authors are employed by deCODE Genetics/Amgen
Communications Biology thanks Peter ’t Hoen
reviewer(s) for their contribution to the peer review of this work
Primary Handling Editors: Laura Rodríguez Pérez and Johannes Stortz
Download citation
DOI: https://doi.org/10.1038/s42003-024-07298-9
Splice — valued in 2021 at nearly USD $500 million after securing $55 million in funding – has named pluggnb the fastest-growing music genre on its platform in 2024
The music creation platform has also confirmed that it clocked “nearly 350 million” downloads of its sound samples across all genres last year
The company didn’t say whether that marked growth or decline from 2023. Back in 2020, when pandemic lockdowns caused indoor activities – like music creation – to spike in popularity, Splice reported 1.1 million daily downloads
implying an annual pace of more than 400 million at the time
In a new report co-authored with market research firm MIDiA and released on Wednesday (January 22)
Splice said pluggnb – a blend of trap subgenre plugg and 1990s R&B – was the fastest-growing genre on its platform in 2024
as measured by downloads of pluggnb sample packs
Splice says it tracks “hundreds” of music genres on its platform
Downloads of pluggnb sample packs jumped 342.8% YoY in 2024
putting the genre ahead of second-place K-pop
with 328.2% YoY growth (832,058 downloads)
That was followed by house/hip-hop hybrid Jersey club (up 281.3% YoY
to 1,298,679 downloads) thanks in part to the genre’s growing popularity among music creators in Berlin
“Unofficial pluggnb remixes dominated TikTok in 2024 and led to adoption of the genre by K-pop heavyweights like LE SSERRAFIM and ILLIT,” the report said
It also noted that some genres declined in popularity in 2024
with neo soul recording the largest drop in sample pack downloads – down 46.8% YoY
That was followed by future soul (down 37.8% YoY) and dancehall (down 35% YoY)
and Jersey club may be the fastest-growing genres
with none of the three making Splice’s list of the top 10 genres by download
The roughly 700,000 downloads of pluggnb sample packs amounts to a small fraction of the more than 48 million downloads of hip-hop sample packs
And the report isn’t entirely convinced that pluggnb will be the next big thing in music
“Pluggnb’s rapid ascent may raise questions about its long-term viability and sustainability
It has benefited from trends in an environment – internet culture – where trends are often short-lived,” the report said
pluggnb has laid a foundation strong enough to shape the sound of one of K-pop’s biggest hits of 2024
The coming year is a critical window for continued growth.”
“Splice is uniquely positioned to see the sounds that are driving music production globally.”
Splice’s list of top genres continues to be dominated by more established musical styles
which retained their places as the most popular and second most popular genres
Hip-hop sample packs were downloaded 48.7 million times
pop music – which had a banner year in 2024 – experienced a decline on Splice’s platform
dropping from the third most popular genre in 2023 to fifth place in 2024
“This is not to say that pop is on its way out; instead
with new regional styles coming to the fore,” the report said
“East Asian offshoots like K-pop and Japanese city pop are growing fast
In its year-end report for 2024, market monitor Luminate identified pop as the fastest-growing core genre in the US
Luminate attributed pop’s strength to acts like Taylor Swift and other female artists such as Billie Eilish
Splice says its download data can be used to predict coming trends in music in part because the platform “overindexes” among creators aged 16 to 24
meaning that youth culture trends are easier to spot on the platform than elsewhere
“The sounds that producers use to create today give us a window not only into the genres that will be big tomorrow but also early indications of entirely new genres that will emerge from the process of mixing samples across styles and cultures.”
“Splice is uniquely positioned to see the sounds that are driving music production globally
This report gives us a sense of how music producers are bending and blending genres in real time and just what genres and regions will define what we hear in 2025 and beyond,” Splice CEO Kakul Srivastava said
Managing Director and Music Analyst at MIDiA
said there is “perhaps no more forward-looking cultural trend” than sample usage
“The sounds that producers use to create today give us a window not only into the genres that will be big tomorrow but also early indications of entirely new genres that will emerge from the process of mixing samples across styles and cultures,” he said
“The genres that stand out in this report also underline wider trends: the growing importance of scenes and fan remixing in shaping the sounds of the future.”
Metrics details
Mutation or deletion of the U1 snRNP-associated factor LUC7L2 is associated with myeloid neoplasms
and knockout of LUC7L2 alters cellular metabolism
we show that members of the LUC7 protein family differentially regulate two major classes of 5′ splice sites (5′SS) and broadly regulate mRNA splicing in both human cell lines and leukemias with LUC7L2 copy number variation
We describe distinctive 5′SS features of exons impacted by the three human LUC7 paralogs: LUC7L2 and LUC7L enhance splicing of “right-handed” 5′SS with stronger consensus matching on the intron side of the near invariant /GU
while LUC7L3 enhances splicing of “left-handed” 5′SS with stronger consensus matching upstream of the /GU
We validated our model of sequence-specific 5′SS regulation both by mutating splice sites and swapping domains between human LUC7 proteins
Evolutionary analysis indicates that the LUC7L2/LUC7L3 subfamilies evolved before the split between animals and plants
Analysis of Arabidopsis thaliana mutants confirmed that plant LUC7 orthologs possess similar specificity to their human counterparts
indicating that 5′SS regulation by LUC7 proteins is highly conserved
C Representative sequence logos of 5′SS for all cassette exons (left) and exons with significantly increased or decreased inclusion upon depletion of LUC7L2 or LUC7L3
which is defined as the number of consensus bases at positions +4/ + 5/ + 6 minus the number of consensus bases at –3/–2/–1 of a 5′SS
E Mean phyloP score for human 5′SS of each possible 5′SS Balance score
F Log-odds estimates of 5′SS differential inclusion versus skipping as a function of the 5′SS Balance score for each human LUC7 RNA-seq dataset
Point estimates are derived from 100 bootstrap samples
each containing 500 differentially included (FDR < 0.1
dPSI > 0) and 500 differentially skipped (FDR < 0.1
Error bars represent the 95% confidence intervals
calculated as the standard deviation of the 100 bootstrap estimates
Source data are provided as a Source Data file
a non-constitutively bound U1 snRNP auxiliary factor
presented the strongest metabolic phenotype and altered the splicing of several hundred exons
The shift toward OXPHOS could be partially explained by changes in the splicing of key metabolic genes
and their broad impact on the transcriptome
human LUC7 proteins are absent from all published cryo-EM spliceosome structures
suggesting that LUC7L2’s influence on splicing and cellular metabolism may contribute to leukemogenesis
the molecular functions of the LUC7 paralogs remain incompletely understood
we investigated the distinct roles of LUC7s in pre-mRNA splicing
we found that different human LUC7 paralogs broadly impact the splicing of thousands of exons in a predictable
Experiments and analyses in a broad range of systems
demonstrate that two subfamilies of LUC7 proteins regulate two newly defined classes of “left-handed” (LH) and “right-handed” (RH) 5′SS in opposing manners
helping to explain the distinct phenotypes of these paralogs
These observations suggest 5′SS strength contributes at most minimally to exon responsiveness to Luc7 perturbations
supporting that the 5′SS sequence of the impacted cassette exon is the primary feature that predicts LUC7 regulation
and position +3 is ignored for the time being so that Balance scores are centered on 0
where LH 5′SS with more consensus matching on the exon side receive negative scores and RH 5′SS with more consensus matching on the intron side receive positive scores
“Balanced” exons with equal consensus matching on both sides of the /GU motif receive a score of 0
these results imply that LH and RH 5′SS subclasses are abundant and stably maintained as subclasses of 5′SS
supporting our conclusions that the splicing of LH 5′SS is promoted by LUC7L3
while RH 5′SS are promoted by LUC7L2 and LUC7L
To systematically identify 5′SS motifs impacted by each LUC7 protein
we measured the impact of LUC7 proteins on the enrichment of each 5′SS 9mer (spanning positions –3 to +6) in exons differentially included or skipped
uses a Dirichlet-multinomial model to approximate the log-odds of a given 5′SS sequence occurring in significantly included or skipped exons versus unchanged exons within an RNA-seq experiment
Positive enrichment values indicate over-representation of a 5′SS sequence in included exons and negative values indicate enrichment in skipped exons
while values near zero indicate the absence of bias for inclusion/skipping (or the presence in unchanged exons only)
A Volcano plot of 5′SS enrichment scores of individual 5′SS 9mers (each dot represents a 9-mer) from LUC7 meta-analysis (left)
Sequence logos derived from significant 9mers (qvalue < 0.01) from LUC7 meta-analysis (right)
B Receiving operator characteristic curve (ROC) of LUC7 score’s ability to discriminate held-out differentially included vs skipped splicing events (FDR < 0.1
C Distribution of LUC7 score and frequency for distinct 9mer 5′SS sequences (with /GT) in Gencode human genome protein-coding exons
D Heat map showing Pearson correlation between new measures – 5′SS Balance and LUC7 score – versus standard 5′SS measures for /GT donors in protein-coding exons
The individual PWMs for the LH and RH motifs generated from LUC7 protein RNA-seq data are weakly
with the predicted minimum free energy of interaction with U5 and U6 snRNAs
these findings indicate that the LUC7 Score quantitatively summarizes the sequence features of exons whose splicing depends on different LUC7 family members
Sequence alignment of the LUC7L2 AE2 and LUC7L PTC exons revealed that they are 82% identical at the nucleotide level
implying a common evolutionary origin; no comparable exon was found in LUC7L3
A Schematic of the pSpliceExpress minigene construct used
in which an internal exon of interest with flanking splice sites is inserted into a minigene expressing rat insulin exons 2 and 3 along with the intervening intron
B Bar plots of the cassette exon’s mean percent spliced in (by RT-PCR with primers in flanking exons) across all experiments; bars are color-coded to match LUC7 paralog colors in (E)
C Representative gel images from LUC7-related minigenes
Percent Spliced In values are shown at the bottom of each lane
D Mean PSI for the LUC7L2 AE2 minigene with mutagenized 5′SS (left)
and representative RT-PCR gel images (right)
E Proposed regulatory relationships between human LUC7 family members
F Bar plot of PSI for the SNRPC E2 minigene with wildtype RH 5′SS (left)
G Bar plot of PSI for the XPA E3 minigene with wildtype LH 5′SS (left)
For all minigene experiments reported (panels B
error bars represent the standard error of the mean
These results indicate that RH 5′SS are necessary, but not sufficient, to confer positive regulation by LUC7L or LUC7L2, but may be sufficient to confer repression by LUC7L3 (Fig. 3G)
our mutagenesis experiments suggest that the “handedness” (LH or RH character) of the 5′SS sequence is a key determinant of regulation by LUC7 family members
A eCLIP enrichment of LUC7L2 around windows of constitutive and cassette exon splice sites
with crosslinks aggregated into 10 nt bins (top)
Pearson correlation between LUC7 score and LUC7L2 eCLIP enrichment is shown
with significant correlations (p < 0.05) identified by Pearson’s correlation test
B Protein domain structure of human LUC7 proteins and experimentally investigated chimeric proteins
C Change (“delta”) in percent spliced in (qRT-PCR) from transfected minigenes containing different internal exons of introns following overexpression of LUC7 WT or chimeric cDNA (shown at right)
Mean is plotted with error bars representing standard error of the mean
and bars are color-coded by the Balance score of the internal exon’s 5′SS
D Heat map of Pearson correlations of delta percent spliced values for each LUC7 WT or chimeric cDNA overexpressed in (C)
our results indicate that the C-terminal regions of these paralogs perform similar
A Normalized gene expression values for human LUC7 family in LAML samples
colored by whether samples possess a LUC7L2 CNV loss (CNV log2 value < –0.5)
A two-sided t-test was used to test differences in LUC7 paralog expression between LUC7L2+/+ (n = 159) and LUC7L2 CNV loss (n = 14)
B Overlap of LUC7L2 CNV loss samples and LUC7L2Low expression samples
C Mean dPSI per 5′ splice site sequence for differentially spliced exons when comparing LUC7L2Ctrl versus LUC7L2Low expression samples
A linear regression line (black) with a 95% confidence interval is shown
and the Pearson correlation and Pearson correlation test are displayed
D Gene set enrichment analysis of differentially expressed genes comparing LUC7L2Low versus LUC7L2Ctrl expression samples
E Copy-number variation analysis for LUC7L2 and LUC7L3 loci across all TCGA cancer types
Bonferroni-corrected two-sided Wilcoxon-rank sum test
boxplots display the data distribution within each group
The box represents the interquartile range (IQR)
spanning from the 25th percentile (Q1) to the 75th percentile (Q3)
with the horizontal line inside the box indicating the median (50th percentile)
The whiskers extend to the most extreme data points within 1.5 times the IQR from Q1 and Q3
these analyses show that LUC7L2Low AMLs inefficiently splice RH 5′SS relative to LH 5′SS
supporting a role for reduced LUC7L2 levels in shaping the transcriptomes of these tumors
kidney renal papillary cell carcinoma (KIRP) had increased copy number of LUC7L3
Some of these observations reflect well-established chromosomal aberrations commonly found in specific tumor subtypes
which is common in GBM and SKCM associated with EGFR amplification
would also yield an additional copy of LUC7L2 in these tumors
Changes in LUC7 expression due to CNVs may generally contribute to the observed splicing variation found in many cancer subtypes and might contribute to metabolic changes as well
A Maximum likelihood phylogenetic tree built from multiple sequence alignment of Luc7 proteins from 33 animal
adding Trichomonas vaginalis Luc7 protein as an outgroup
Two main clusters are shaded to indicate Luc7 subfamilies; individual proteins represented by symbols indicating clade of origin
B Representation of presence/absence and likely duplication/loss events for Luc7 subfamilies overlaid on the eukaryotic phylogenetic tree
C Correlation matrix of 5′SS models learned from dinucleotide features of 5′SS differentially spliced in human and plant Luc7 RNA-seq experiments
D Sequence logos of top or bottom 10% unique 5′SS sequences identified by dinucleotide 5′SS models
E Residue conservation estimated from multiple sequence alignment of LUC7L2-type proteins from 6A)
projected onto the yLUC7p (from Bai et al.
To assess whether evolutionarily related LUC7 proteins possess analogous 5′SS specificities
we performed RNA-seq on every possible combination of Arabidopsis thaliana luc7 single
and carried out differential splicing analysis
No bias for LH or RH 5′SS in differentially skipped or included exons was observed in the luc7 triple mutant
These observations indicate that the two subfamilies of LUC7 proteins in plants have distinct activities on 5′SS subclasses
While the 5′SS subclasses impacted by human and plant LUC7 proteins are very similar overall, we do observe some subtler species-specific differences. For example, Arabidopsis LUC7RL promotes LH 5′SS with –1 G, rather than LH 5′SS with a –2 A/–1 G pair promoted by human LUC7L3 (Fig. 6D)
our observations support that orthologous human and A
thaliana LUC7 proteins have largely retained their ancestral specificities for specific 5′SS subclasses over 1.5 billion years of evolution
A Mean position-specific information content (calculated as in Irimia et al., 201938) of GT-type 5′SS motifs color-coded by animals (n = 13)
Error bars represent standard error of the mean
B Density plot of the distribution of 5′SS subclass frequencies each organism
vertical dashed line reflects mean LUC7 score of the clade
C Frequency of dinucleotide features that define classical LH and RH 5′SS subclasses in 5 representative eukaryotes
Source data are provided as Source Data file
these data suggest long-term coevolution between LUC7 subfamilies and 5′SS subclasses
with depletion of LH 5′SS and concomitant loss of the LUC7L3-subfamily occurring early in fungal evolution
Our minigene experiments validate that the influence LUC7 proteins have on pre-mRNA splicing is dependent on specific nucleotide features of 5′SS
which are succinctly summarized by the 5′SS Balance score
Our domain-swapping experiments reveal that LUC7 structured regions are largely sufficient to confer specificity for 5′SS subclasses
The most straightforward model is a U1 stabilization model in which the LUC7L/LUC7L2 structured regions stabilize the interaction of U1 with RH 5′SS
and the LUC7L3 structured region stabilizes U1 interactions with LH 5′SS
a model in which LUC7L/LUC7L2 and LUC7L3 preferentially destabilize interactions with LH and RH 5′SS
These findings are consistent with our own experimental and evolutionary observations
in which the functionally similar LUC7L and LUC7L2 promote splicing of exons with fungal-like RH 5′SS
while the human LUC7L3 ZnF2 promotes recognition of 5′SS unlike those seen in the budding yeast genome
so regulation of 5′SS subclasses at later stages of splicing is also possible
the absence of U1-associated proteins like LUC7L3 from S
pombe suggests that these common model organisms of splicing may not recapitulate all aspects of mammalian 5′SS choice
and reasonably well-defined rules of pre-mRNA splicing make it a compelling target for the therapeutic modulation of gene expression
Our findings linking the activity of LUC7 proteins with specific 5′SS sequence features may have implications for the future advancement of small molecule regulators of splicing
our proposed mechanism in which LUC7 proteins modulate splicing via recognition of U1:5′SS RNA duplex structures implies that LUC7 proteins will likely influence the specificity of small molecules that stabilize U1 snRNP:5′SS interactions
the synthetic lethal relationship between LUC7L and LUC7L2 suggests that splicing therapeutics specifically targeting the recognition of RH 5′SS will be more effective in AML patients with monosomy 7 or LUC7L2 mutations
LUC7L2 and LUC7L3 ORFs were amplified from human cDNA and cloned into pcDNA3.1(+)IRES GFP (Addgene #: 51406)
Domain swap constructs were synthesized as gblocks from IDT and cloned into pcDNA3.1(+)IRES GFP
Exons and flanking intronic regions used for pSpliceExpress minigenes were PCR amplified from human male genomic DNA using primers with attB overhangs and subsequently recombined into pSpliceExpress using BP Clonase II (Thermo Fisher
HEK293T cell line authentication was performed at ATCC using STR profiling and referenced to ATCC’s internal database
HEK293T RMCE cell lines were cultured in Advanced DMEM supplemented with 5% FBS
25 mM HEPES and Glutamax and tested negative for mycoplasma
cells were plated 24 h in advance in 24-well plates
cells were transfected with 500 ng of 95:5 w/v of a cDNA overexpression vector and minigene reporter respectively using Lipofectamine LTX (Thermo Fisher
RNA was extracted 24 h after transfection using Qiagen RNeasy Mini kit (cat
74104) according to manufacturer’s instructions with the optional on-column DNAse digestion (cat
RNA was eluted in nuclease free water and quantified using Nanodrop
we used 125 ng of RNA input into a 12.5 µL LunaScript Multiplex One Step Master Mix for RT-PCR (cat
we mixed PCR samples with NEB 6X loading dye and loaded 5 µL of PCR products on a 3% agarose gel infused with ethidium bromide
Images were acquired using Azure Biosystems c600 with UV imager
Agarose gel images were manually quantified using chromatograms in ImageJ
Percent Spliced In values were calculated by taking the signal intensity of the larger band and dividing it by the sum of the signal intensity of the included product and the skipped product
LUC7L2 and LUC7L3 ORFs were transfected into HEK293 RMCE cells as described above
RNA was eluted in nuclease-free water and quantified using Nanodrop
Illumina-compatible libraries were prepared by MIT BioMicroCenter using NEB II Ultra Directional RNA with poly(A) selection and sequenced on NovaSeq 6000 with 2 × 150 bp reads
only a subset of exons likely change in splicing
are direct summaries of the read counts and are affected by sampling variation in the read counts which may artificially inflate changes for exons with low read counts
shrinkage estimates are used in differential expression analyses to account for this issue
Shrinkage considers the set of all effect sizes to constrain the noise in estimates from low read count events
requires parameter estimates and associated uncertainty
we reconstruct the effect size in log-odds scale δ from the rMATS read counts and approximate a standard deviation σ describing the uncertainty in δ using the rMATS p-value
We pass these effect sizes and standard deviations to ashr using the ‘normal’ option
which assume that the proportions of up- and down-regulated exons are equal (Stephens
we reconstruct an estimate of Δψ∗ (see Supplemental Methods)
We used a Dirichlet-multinomial model to calculate the log-odds of whether a given 5′SS was more likely to be involved in a significantly included event vs significantly skipped event
we excluded all events with fewer than 10 junction-count reads on average across all samples
we combined the counts of 5′SS from significant included and skipped events (FDR < 0.1) and their respective background sets
which consisted of an equal number of unregulated 5′SS that were matched for both PSI and expression level
we added a pseudocount of 1 to every observed 5′SS sequence and accounted for class imbalance by dividing each column by a weight that reflected the fraction of included or skipped events
if there were 4,000 significantly included events and 1000 significantly skipped events
the significantly included counts and their respective background set were divided by 0.8 and skipped events and their respective background were divided by 0.2
we used this count matrix as the alpha parameters for Dirichlet-multinomial model
and simulated drawing from the posterior distribution 2500 times
For each draw we calculated the log-odds of a given 5′SS being enriched in the included versus skipped set
The posterior distribution of log odds generated from the Dirichlet-multinomial model was used to calculate the posterior mean and the posterior standard deviation
which were both passed to ashr for shrinkage using the uniform option
we plotted the PosteriorMean estimates in 5′SS Enrichment plots
which can be directly interpreted as the log-odds of a given sequence occurring in the differentially included exon set over the differentially skipped exon set
we used scaled 5′SS enrichment scores to cluster 5′SS sequences by their activity across each LUC7 paralog RNA-seq dataset using Euclidean distance and ward.D2 linkage
The events were then aggregated into a single table such that each RNA-seq data was equally represented in the final 5′SS enrichment analysis
To account for opposing effects on LH and RH 5′SS subclasses for different experiments
the direction of differentially included and skipped events (and their respective background sets) from LUC7L3 KD
LUC7L OE and LUC7L2 OE analyses were flipped
such that included events represented RH 5′SS and skipped events represented LH 5′SS
Then we performed a 5′SS enrichment analysis as described above using a Dirichlet-multinomial model and simulated drawing from the posterior 10,000 times
the distribution of log odds generated from the Dirichlet-multinomial model was used to calculate the posterior mean and the posterior standard deviation
we plotted the PosteriorMean estimates in 5′SS enrichment plots and the associated qvalue
LH 5′SS (LUC7L2-repressed/LUC7L3-promoted) were defined as 5′SS with negative meta-5′SS enrichment scores with qvalue < 0.01 and and RH 5′SS (LUC7L2-promoted/LUC7L3-repressed) were defined as 5′SS with positive meta-5′SS enrichment scores with qvalue < 0.01
Sequence logos were created from LH and RH 5′SS sequences identified from meta-5′SS enrichment analyses
Individual PWM were created by calculating the observed frequency of a nucleotide at a given position
assuming a uniform nucleotide distribution
Pseudocounts of 0.1 were used to avoid division by zero
The LUC7 score was calculated by taking the ratio of the LUC7L2-promoted/LUC7L3-repressed RH PWM over the LUC7L3-promoted/LUC7L2-repressed LH PWM
We used ViennaRNA 2.534 to model free energy predictions between all 5′SS and the 5′end of U1 snRNA (ATACTTACCUG)
U6 snRNA (ATACAGAGA) and U5 snRNA loop 1 (GCCUUUUAC) using RNAcofold with default parameters
Publicly available LUC7L2 eCLIP data was downloaded from European Read Archive (PRJNA663333)
10 nucleotide UMIs were extracted from reads and appended to read name
Then Illumina adaptors were removed from reads using cut-adapt with the following settings (-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \ --minimum-length 18 \ --quality-cutoff 6 \ --match-read-wildcards \ -e 0.1)
This was performed twice to ensure the complete removal of adaptor sequences
we created a genomic index using GRCh38.primary_assembly.genome.fa and gencode.v38.primary_assembly.annotation.gtf with --sjdbOverhang 65
Trimmed eCLIP reads were then aligned to this genome using STAR with default settings and resulting bam files were then deduplicated with umi-tools
Crosslink counting was restricted to cassette exons observed in LUC7L2 KO datasets around 250 nt windows surrounding four splice sites (upstream 5′SS
cassette exon 5′SS and downstream 3′SS) and crosslinks in 10 nt bins were summed together
2 pulldowns and 2 size-matched (SM) inputs were considered
Crosslink counts in each library were first normalized by library size
Cassette exons were then stratified into 10 bins using their 5′SS LUC7 score (1 = most LH
10 = most RH) and the number of crosslinks for each LUC7 score bin were summed together
Position-based eCLIP enrichment was calculated as the ratio of normalized eCLIP crosslinks to normalized SM-input crosslinks at each 10 nt bin
eCLIP enrichment within each 10 nt bin was then correlated with the mean LUC7 score of binned cassette exons
For each of the species used in our phylogenetic analysis
we downloaded their genomes and associated gene annotation files from Ensembl and extracted their splice site sequences using bedtools 2.29.2
we filtered for only /GT splice sites and calculated the information content per position of the 5′SS
luc7a-1 luc7rl-1 and luc7b-1 luc7rl-1) and a luc7 triple mutant (luc7a-2 luc7b-1 luc7rl-1) were surface sterilized with chlorine-gas and then grown on half-strength Murashige Skoog (MS) plates containing 0.8% phytoagar in continuous light at 22 °C for 10 days
Seedlings were collected and flash-frozen in liquid nitrogen
Total RNA was isolated using RNeasy® Plant Mini Kit (Qiagen
74904) according to the manufacturer’s instructions
mRNA stranded library preparation and sequencing (PE150) was done by Novogene (Cambridge
United Kingdom) using an Illumina Novaseq6000 system
To validate the alternative splicing found in RNA-seq
1 µg of RNA was treated with DNase I (Thermo Fisher Scientific)
cDNA synthesis was carried out using the RevertAid First Strand cDNA Synthesis kit (Thermo Fisher Scientific) with 100 µM oligo-dT
RT-PCR was then performed using Taq DNA Polymerase and the products were analyzed on a 2% agarose gel
Gene set enrichment was performed in the Xena browser web portal
The same 16 samples identified as LUC7L2 low-expressing AMLs were selected and compared to the remainder of the cohort with available data
To identify and score dinucleotide features contributing to 5′SS inclusion or skipping
we calculated the frequency of adjacent and non-adjacent dinucleotide pairs in the 5′SS of differentially spliced exons
normalized by the background frequency from a set of unregulated exons
The log-odds ratio of a dinucleotide feature’s occurrence in differentially included versus skipped exons was calculated
and the variance was estimated by sampling the unregulated set 100 times
The resulting mean and standard deviation were shrunk using the ashr package
and only significant dinucleotides (q-value < 0.01) were retained
the posterior log-odds of dinucleotides in the 9mer were summed to score the 5′SS
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
The code required to perform shrinkage on differential splicing analyses can be found at https://gitlab.com/LaptopBiologist/spliceformats
The 5′ terminus of the RNA moiety of U1 small nuclear ribonucleoprotein particles is required for the splicing of messenger RNA precursors
Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes
Quantitative activity profile and context dependence of all human 5′ splice sites
Loss of LUC7L2 and U1 snRNP subunits shifts energy metabolism from glycolysis to OXPHOS
a novel yeast U1 snRNP protein with a role in 5’ splice site recognition
The U1 snRNP-associated factor Luc7p affects 5’ splice site selection in yeast and human
Structure–function analysis and genetic interactions of the Luc7 subunit of the Saccharomyces cerevisiae U1 snRNP
Structures of the fully assembled Saccharomyces cerevisiae spliceosome before activation
Prespliceosome structure provides insights into spliceosome assembly and regulation
A unified mechanism for intron and exon definition and back-splicing
Functional analyses of human LUC7-like proteins involved in splicing regulation and myeloid neoplasms
Functional analysis of a chromosomal deletion associated with myelodysplastic syndromes using isogenic human induced pluripotent stem cells
Putative RNA-splicing gene LUC7L2 on 7q34 represents a candidate gene in pathogenesis of myeloid malignancies
Maximum entropy modeling of short sequence motifs with applications to RNA Splicing Signals
Comparative analysis detects dependencies among the 5’ splice-site positions
but be quick: 5’ splice sites and the problems of too many choices
m6A modification of U6 snRNA modulates usage of two major classes of pre-mRNA 5’ splice site
The U1 spliceosomal RNA is recurrently mutated in multiple cancers
Crossregulation and Functional Redundancy between the Splicing Regulator PTB and its paralogs nPTB and ROD1
Poison exon splicing regulates a coordinated network of SR protein expression during differentiation and tumorigenesis
Genomics of deletion 7 and 7q in myeloid neoplasm: from pathogenic culprits to potential synthetic lethal therapeutic targets
Complex landscape of alternative splicing in myeloid neoplasms
Mitochondrial metabolism as a potential therapeutic target in myeloid leukaemia
The U1 snRNP subunit LUC7 modulates plant development and stress responses via regulation of alternative splicing
Introns and splicing elements of five diverse fungi
A computational analysis of sequence features involved in recognition of short introns
ERISdb: A Database of Plant Splice Sites and Splicing Signals
Functional Analysis of the Zinc Finger Modules of the S
A single m6A modification in U6 snRNA diversifies exon sequence at the 5’ splice site
Extended base pair complementarity between U1 snRNA and the 5’ splice site does not inhibit splicing in higher eukaryotes
but rather increases 5’ splice site recognition
An RNA Switch at the 5′ Splice Site Requires ATP and the DEAD Box Protein Prp28p
rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data
MAFFT multiple sequence alignment software version 7: improvements in performance and usability
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data
Coevolution of genomic intron number and splice sites
Download references
We thank the staff of the MIT BioMicro Center for Illumina NovaSeq library preparation and sequencing
and Gordon Simpson for their helpful discussions
This work was supported by grant GM085319 from the NIH (C.B.B.) and DFG grant LA2633-4/2 and 400681449/GRK2498 TP13 (S.L.)
and analyzed all experiments and computational analyses under the supervision of C.B.B
conducted mouse and human evolutionary analyses under the supervision of C.J.K
assisted with statistical approach development and eCLIP analysis
performed Arabidopsis RNA-seq and qRT-PCR experiments
wrote the manuscript with input from all authors
All authors contributed to manuscript revisions
is a member of the Scientific Advisory Board of Remix Therapeutics and has equity interests in Remix Therapeutics and Arrakis Therapeutics: both companies are developing small molecule therapeutics targeting RNA
The authors claim no other competing interests with respect to this work
Download citation
DOI: https://doi.org/10.1038/s41467-025-56577-4
Metrics details
Splice site recognition is essential for defining the transcriptome
Drugs like risdiplam and branaplam change how human U1 snRNP recognizes particular 5′ splice sites (5′SS) and promote U1 snRNP binding and splicing at these locations
Despite the therapeutic potential of 5′SS modulators
the complexity of their interactions and snRNP substrates have precluded defining a mechanism for 5′SS modulation
We have determined a sequential binding mechanism for modulation of −1A bulged 5′SS by branaplam using a combination of ensemble kinetic measurements and colocalization single molecule spectroscopy (CoSMoS)
Our mechanism establishes that U1-C protein binds reversibly to U1 snRNP
and branaplam binds to the U1 snRNP/U1-C complex only after it has engaged with a −1A bulged 5′SS
Obligate orders of binding and unbinding explain how reversible branaplam interactions cause formation of long-lived U1 snRNP/5′SS complexes
and its action depends on fundamental properties of 5′SS recognition
it is thought that these drugs enhance U1 snRNP affinity for the SMN2 exon 7 5′SS
which in turn promotes spliceosome assembly
these drugs appear to modulate the SS recognition process to ultimately alter the nucleotide sequences of spliced mRNA products
by how quickly U1 can associate with a 5′SS and how long it may remain bound in order to promote spliceosome assembly before the RNA is degraded
Splice site modulation by drugs such as branaplam is likely also restricted to the same
To elucidate 5′SS recognition and modulation in humans
we reconstituted a model human U1 snRNP and assayed its interactions with RNA oligos in the presence and absence of branaplam
A combination of surface plasmon resonance (SPR)
microscale thermophoresis (MST) and colocalization single molecule spectroscopy (CoSMoS) assays reveals how 5′SS containing a bulged adenosine at the −1 position (−1A) are recognized and modulated by drugs working collaboratively with protein splicing factors
Branaplam reversibly binds to the U1 snRNP/5’SS complex
and drug modulation of this complex is strictly dependent on reversible binding of U1-C
U1-C in turn can only bind to the snRNP if the 5′SS has not yet been engaged
our sequential binding mechanism predicts that 5’SS modulation by branaplam depends on an ordered series of events: U1-C binds to U1 snRNP
and finally branaplam binds to the U1 snRNP/U1-C/5′SS ternary complex
This mechanism reveals how a reversibly binding splicing modulator can elicit formation of long-lived U1 snRNP/5′SS interactions as well as fundamental features of human 5′SS recognition
A Crystal structure of minimal U1 snRNP (PDB: 4PJO)
Arrows indicate relative placement of modifications for single molecule measurements
Shaded region indicates predicted base pairing interactions with the U1 snRNA (shown at the top
above a schematic of the exon (box)/intron (line) junction)
C SPR sensorgrams showing the association and dissociation of surface-tethered (top) 11bp and (bottom) 11bp-1A RNAs at various U1 snRNP concentrations (0.02 to 100 nM)
D Cartoon schematic of the two-color CoSMoS assay for monitoring U1/RNA interactions
immobilized U1 snRNP molecules) and 532 nm (right
interacting RNA oligos) excitation (scale bar = 20 µm)
Inset highlights colocalization (scale bar 1 µm)
Fluorescent beads were included as fiducial makers (yellow arrow)
Images are rendered by averaging three consecutive images and applying uniform brightness and contrast values
F Fluorescence in arbitrary units (au) across time in seconds (s) showing the binding of 9bp (top) and 9bp-1A (bottom) to surface tethered U1 snRNP (0.33 frames/s
G Linear regression (solid line) of keq values (circle) on 9bp concentration
The shaded region is the 95% confidence interval of the linear regression
the kon value is fixed from maximum likelihood estimations of unbound dwell times (see F) at kon = 3.9 ×106 M−1s−1
H Cumulative probability distributions of (left) unbound dwell times (0.5 nM
N = 4373) and (right) bound dwell times (0.5 nM
N = 4895) across range of 9bp-1A RNA concentrations
I Cumulative probability distribution of bound dwell times of the 9bp-1A RNA at 1 nM (grey
N = 5216) overlaid with MLE of single (blue dashed) and biexponential (red) distributions
Source data are provided as a Source data file
we used single-molecule co-localization spectroscopy (CoSMoS) to observe U1 snRNP/5′SS interactions
we reconstituted our U1 snRNP particle without U1-C
The U1-C zinc-finger domain was then separately purified and added directly into solution as required (typically at 100 nM unless otherwise noted; U1-C binding was also analyzed in depth as described subsequently)
This demonstrates that non-specific binding did not meaningfully impact our analysis under the experimental conditions
The slope was constrained at the kon value from MLE
resulting in a koff = 6.4 ± 2.3 × 10−4 s−1
we estimate a KD ≈ 1.2 × 10−10 M for a 9bp 5′SS RNA
which closely matches our SPR data of an 11bp 5′SS RNA (KD = 2.03 × 10−10 M)
The similarity of these values provides high confidence that neither photobleaching nor surface immobilization is significantly impacting our analysis
Together the SPR and CoSMoS data indicate that U1 snRNP binds highly complementary RNAs very tightly with a KD of ~100 pM
and this affinity is primarily attributable to formation of very stable bound complexes with lifetimes of ~27 min rather than rapid association kinetics
These data also indicate that additional base pairing interactions between the +7 and+8 positions of a 5′SS with the AU dinucleotide present at the 5 end of the snRNA do not necessarily confer significant changes in the dissociation constant
At the single molecule level, the introduction of the -1A bulge (9bp-1A) into the 5′SS results in dynamic RNA binding to U1 snRNP (Fig. 1F, Supplementary Fig. 7)
Given the faster kinetics from the weaker binding
we were able to perform equilibrium measurements whereby imaging commenced after equilibrium was reached
we see only a slight decrease in kon to 2.9 ± 0.2 × 106 M−1 s−1
indicating the energetic penalty of -1A bulge stems from duplex stability and not recruitment
the 9bp-1A 5′SS RNA exhibits a koff of 3.4 ± 0.2 × 10−3 s−1 and KD = 1.2 × 10−9 M
These results are also close to our SPR data for the 11bp-1A 5′SS RNA and confirm an order of magnitude reduction in kinetic stability from the -1A substitution
A RNA oligos containing -1A bulged 5′SS
Shaded region indicates predicted base pairing interactions with the U1 snRNA (top)
C SPR showing the association and dissociation of 11bp-1A RNA across various concentrations of branaplam (0–5 µM) and at 10 nM U1 snRNP
D Dose response curve showing the fitted dissociation rates (koff) of response units vs branaplam concentration for 11bp-1A (white circles) and 11bp (black circles) RNAs
Data for 11bp-1A is overlaid with the fitted equation to determine EC50 value (1.3 ± 0.1 µM)
E Fluorescence in arbitrary units (au) across time in seconds (s) of 1 nM 9bp-1A RNA binding to immobilized U1 in the presence of 100 nM U1-C plus (top) DMSO (0.33 Hz) or (bottom) 10 µM branaplam (0.11 Hz) overlaid with idealizations (black lines)
F Dose response curves of 9bp-1A RNA binding to U1 snRNP vs branaplam concentration
The fraction bound at each branaplam concentration (white cirlcle) is overlaid with a fitted equation (solid red line) and the 95% confidence interval of the fitted equation (shaded red region) to estimate an EC50 value (0.45 ± 0.17 µM)
G Cumulative probability distribution of bound dwell times across branaplam concentrations (DMSO
H Cumulative probability distribution of bound dwell times at 1 nM 9bp-1A and 100 nM U1-C (grey circles
N = 3640) in presence of 1 µM branaplam overlaid with MLE of mono (blue dashed) and biexponential distributions (solid red)
I Maximum likelihood estimates of (left) time constants (\({\tau }_{B}^{1}\) and \({\tau }_{B}^{2}\)) and (right) amplitude of \({\tau }_{B}^{2}\) of a biexponential distribution for bound dwell times at 1 nM 9bp-1A RNA and 100 nM U1-C across branaplam concentrations (mean ± SEM)
Plotted parameters are computed across all single molecules for each branaplam concentration (DMSO
J Contour plots showing the correlation between successive bound event durations (\(i\) and \(i+1\)) within individual molecules
consistent with branaplam specifically perturbing 5′SS RNA dissociation
an order of magnitude increase in \({\tau }_{B}^{2}\) for the 9bp-1A 5′SS RNA is observed between the absence (342 ± 8 s) and presence of 10 µM branaplam (4426 ± 458 s)
the SPR and CoSMoS data indicate that branaplam does not facilitate -1A 5′SS RNA association and primarily functions to stabilize the U1 snRNP/-1A 5′SS RNA complex
only a subset of U1 snRNP/-1A 5′SS interactions are branaplam-sensitive
This result suggests the presence of two types of U1 snRNP molecules on the surface
We were able to robustly detect the smaller population due to the analysis of many thousands of single molecule binding events and by avoiding ensemble averaging which would have obscured their presence
the relative amplitudes from MLE of our bound time distributions in the presence of branaplam would presumably correspond to the fractions of U1 snRNP molecules without (the amplitudes of the short-lived parameters) and with U1-C (the amplitudes of the long-lived parameters)
A Fluorescence in arbitrary units (au) across time in seconds (s) of 1 nM 9bp-1A binding in the absence of U1-C with (top) DSMO and (bottom) 10 µM branaplam overlaid with idealizations (black lines)
B Violin plots of bound dwell time distributions at 1 nM 9bp-1A RNA with and without U1-C and/or branaplam
DMSO was included in the absence of branaplam
C Change in fluorescence due to branaplam binding to a duplex of 11bp-1A and U1 snRNP in the absence (grey triangles) and presence (white circles) of U1-C by microscale thermophoresis (MST)
The change in fluorescence in the presence of U1-C is overlaid with a fitted equation (solid purple) and 95% confidence interval of the fitted equation (shaded purple region) to estimate a KD value (2.69 ± 0.36 µM)
D Violin plots showing the bound dwell time distributions across different permutations of U1-C and branaplam concentrations in solution across indicated RNA oligo sequences (SMN2
Each violin plot is overlaid with box plot that show the median (horizontal line)
and whiskers representing data within 1.5\(\times\)IQR
Highlighted nucleotides in the 5′SS sequences above each plot indicate predicted base pairs to the U1 snRNA
The lower case letters in HTT* indicate a +7 G:A and +8G:U substitutions included to enable RNA synthesis
numbers above the violins indicate the number of bound lifetimes included in each distribution
these data show that branaplam binds a -1A bulged U1 snRNA/5′SS duplex only in the presence of the U1 snRNP and U1-C and support the U1-C-first model
the endogenous sequence proved to be synthetically intractable due to a stretch of guanine bases
we substituted guanines at the +7 and +8 positions with UA (HTT*)
all four 5′SS display weaker binding to U1 and no effect upon addition of branaplam in the absence of U1-C
we were not even able to observe enough binding events above background to determine a bound lifetime in the absence of U1-C
our combined single molecule and ensemble data show that U1-C must be present for branaplam to bind to and modulate U1 snRNP/-1A 5′SS complexes and that the extent of binding enhancement is sequence-dependent
A Dose response curves of 9bp-1A RNA binding to U1 snRNP vs
The red line and shading indicate the fit and 95% CI to the fitted equation to estimate an EC50 value (1.3 ± 0.20 nM)
B Cumulative probability distribution of unbound dwell times at 1 nM 9bp-1A RNA across U1-C concentrations (0 nM
C Unbound time constants (\({\tau }_{U}\)) determined from MLE of a monoexponential distribution for unbound dwell times overlaid with a fit to the fitted equation (EC50 = 1.8 ± 0.5 nM)
Plotted \({\tau }_{U}\) values (circles) are shown as mean ± SEM and are computed across all single molecules for each U1-C concentration (0 nM
The fitted equation is shown as the fit (solid line) and 95% confidence interval of the fit (shaded region)
D Cumulative probability distribution of bound dwell times at 1 nM 9bp-1A RNA across U1-C concentrations (0 nM
E Cumulative probability distribution of bound dwell times at 1 nM 9bp-1A RNA and 100 nM U1-C (grey circles overlaid with MLE of mono (blue dashed) and biexponential distributions (solid red)
F MLE of bound time constants (\({\tau }_{B}^{1}\) and \({\tau }_{B}^{2}\)
left) and amplitude of \({\tau }_{B}^{2}\) (right) of a biexponential distribution for bound dwell times at 1 nM 9bp-1A RNA across U1-C concentrations (mean ± SEM)
The amplitudes of \({\tau }_{B}^{2}\) are overlaid with the with a fit to the fitted equation (EC50 = 1.0 ± 0.2 nM)
Plotted parameters and errors are computed across all single molecules for each U1-C concentration (0 nM
G Contour plots showing the correlation between successive bound event durations (\(i\) and \(i+1\)) within individual molecules
Source data are provided as Source data file
we see the association rate at 1 nM 9bp-1A 5′SS RNA double between the absence and presence of saturating U1-C
the 9bp-1A 5′SS RNA duplex exhibits \({\tau }_{B}^{1}\) ≈ 30 s and \({\tau }_{B}^{2}\) ≈ 330 s
U1 snRNP can still form the longer-lived complexes with this RNA; however
these are rare relative to the shorter-lived interactions
the amplitude of \({\tau }_{B}^{1}\) is dominant at low concentrations of U1-C
but \({\tau }_{B}^{2}\) dominates at high concentrations
The change in the amplitude of \({\tau }_{B}^{2}\) across U1-C concentrations yielded an EC50 = 1.0 ± 0.2 nM
To test whether these two time constants reflect dynamic association/dissociation of U1-C with a kinetically homogenous population of U1 snRNP molecules, we correlated the dwell time durations of successive binding events within individual immobilized U1 snRNPs (Fig. 4G)
At the extremes of either no U1-C or saturating U1-C
bound lifetimes of the 9bp-1A 5′SS RNA predominately align to a single cluster at the level of individual molecules corresponding to ether faster \({\tau }_{B}^{1}\) durations or slower \({\tau }_{B}^{2}\)
we observe both short and long events corresponding to dynamic interconversion of the two time constants
which likely stems from the association and dissociation of U1-C
Higher concentrations of U1-C increase the probability of binding to U1 snRNP and the probability of observing a more stable duplex
these data show that U1-C dynamically binds U1 snRNP
and that its presence can help recruit and stabilize a -1A bulged 5′SS RNA
Shaded region indicates predicted base pair interactions with the U1 snRNA (top)
B Apparent association rates for 9 bp (left) and 9bp-1A RNAs (right) at various concentrations in the absence and presence of saturating U1-C (100 nM)
Data are overlaid with linear fits (solid line) to determine kon (9 bp with 100 nM U1-C: kon = 3.9 ± 0.4 × 106 M−1 s−1
R2 = 0.97; 9 bp without U1-C: kon = 1.8 ± 0.5 × 106 M−1 s−1
R2 = 0.95; 9bp-1A with 100 nM U1-C: kon = 3.2 ± 0.6 × 106 M−1 s−1
R2 = 0.98; 9bp-1A without U1-C: kon = 3.2 ± 0.6 × 106 M−1 s−1
Shaded region indicates the 95% confidence interval of the linear regression
C Violin plots showing the distributions of unbound (top) and bound (bottom) dwell times for various RNAs at 0 and 100 nM U1-C
N indicates the number of single molecule events included in the violin plot
D Scatter plot showing the average unbound (top) and bound (bottom) dwell times of RNAs at 0 (x-axis) or 100 nM (y-axis) U1-C
we designed two +1/+2 GU 5′SS-containing RNAs with 6 base pairs of complementarity to the U1 snRNA either on the exonic (-4 to +2 positions) or intronic ( +1 to +6 positions) side of the exon/intron boundary
All experiments were conducted at a single concentration of the RNA (9bp-1C: 1 nM
6bp-exon: 3 nM; 6bp-intron: 3 nM; depending on their affinity) and either in the absence or presence of 100 nM U1-C
For all of these RNAs, we see a decrease in the average unbound lifetimes, corresponding to a faster association rate, when U1-C is included relative to its absence (Fig. 5D
U1-C increased the apparent association rate by 2.5-fold
This suggests that U1-C does not have a single
presence of a GU 5’SS) for facilitating RNA association to U1 snRNP
and U1-C is ineffective at stabilizing the bound state
This suggests that U1 snRNP may enforce the requirement for a +1G at the 5′SS through kinetic selection against mismatches at this position
This selection results from both U1-C independent (poor binding in the absence of U1-C) and dependent (failure of U1-C to stabilize the bound state) components
reinforcing that splicing outcomes in cells are often dependent on U1-binding kinetics
A kinetic model for -1A bulged 5′SS association and dissociation in the presence and absence of U1-C and branaplam. Optimized rate transitions for 9bp-1A are provided in Table 1
Equilibrium arrows in grey indicate transitions that are not supported by our experimental data or mathematical modeling
reversible drug binding can nonetheless contribute to formation of very long-lived U1 snRNP/5′SS interactions
we used a reconstituted U1 snRNP to study the detailed kinetics of its interactions with 5′SS RNA oligos and how these interactions change upon the inclusion of a small molecule splicing modulator
Both single molecule and bulk biophysical measurements show that U1 snRNP binds RNA in a sequence-specific manner
branaplam extends U1 snRNP/oligo lifetimes of -1A bulged 5′SS if U1-C is present
and that splicing modulation can involve a complex
The origin of this complexity is in part due to reversible binding of the U1-C component
which dynamically interacts with the snRNP
U1-C itself both promotes RNA binding by U1 snRNP and stabilizes the U1/RNA complex in a sequence-specific manner
We were able to use our feature-rich and large single molecule data sets to determine a sequential binding mechanism for U1 snRNP
and branaplam interactions with a -1A 5′SS-containing oligo
U1-C associates with the snRNP prior to 5′SS binding and decreases the KD for the 5′SS sixteen-fold
Branaplam then reversibly associates with this complex with a moderate KD ( ~ 0.7 µM)
kinetic selection of substrates chemically competent for splicing is a conserved feature of both human and yeast U1 snRNPs even though neither U1 snRNP is present during the transesterification steps
Another prediction of our kinetic model is that U1-C can only associate with U1 snRNPs in the absence of 5′SS pairing
U1-C must be pre-associated with U1 snRNP prior to its engagement with RNA in order to modulate 5′SS recognition
it should not be assumed that a U1-C-containing U1 snRNP that is recruited to the transcriptional machinery for co-transcriptional spliceosome assembly still contains U1-C at the moment a 5′SS is transcribed
our work suggests that understanding the correlations between predicted base-pairing strength
location of the base pairs within the U1 snRNA/5′SS duplex
and dependency on U1-C for exon inclusion in vivo are all critical for predictive modeling of U1 snRNP occupancy in cells
While it may seem counterintuitive that a reversibly-binding splicing modulator leads to formation of long-lived interaction between U1 snRNP and -1A bulged 5′SS
our kinetic mechanism provides a rationale for this observation
The -1A 5′SS RNA can only dissociate from U1 snRNP when branaplam is not bound and rapid re-binding of branaplam limits the lifetime of this state
Recently, a thermodynamic model for splicing modulator drug action has been proposed based on RNA-Seq data and measurements of mRNA production in cells14
the authors proposed two branaplam-binding modes for U1 snRNP: a risdiplam-like binding mode that occurs on -1A bulged 5′SS that also contain a -2G and a second state that leads to hyperactivation of some 5’SS that additionally contain a -3A
It is unlikely that these two states are due to presence/absence of U1-C since our data shows that branaplam can only bind U1 snRNP when U1-C is present
While we did not study the sequence requirements for hyperactivation explicitly
we do note that the 5′SS RNA oligos that showed the largest changes in U1 snRNP bound state lifetimes were also those with the hyperactivation AGA motif (9 bp -1A
These results suggest that the hyperactivation phenotype has a kinetic basis and might be due to larger changes in the lifetime of the U1 snRNP/5′SS interaction
while the thermodynamic model included two different branaplam-binding modes
the authors were not able to determine if the risdiplam-like binding mode is a necessary precursor for formation of the hyperactivated state
Our single molecule data supports only a single branaplam-bound state for the U1 snRNP/U1-C/5′SS complex
The risdiplam-like and hyperactivated binding modes of branaplam likely occur independently of one another
each involving particular molecular interactions with their corresponding 5′SS
be limited in part by the inherent kinetic properties of the factors and processes involved
Key chemicals and materials are described in Supplementary Table 6
RNA oligonucleotides (Supplementary Table 1) for SPR
and single-molecule experiments were purchased from Integrated DNA Technologies (IDT
Stocks of fluorescent RNAs intended for single-molecule experiments were prepared by resuspending the lyophilized oligonucleotides in nuclease-free water (20-50 µM
RNA concentrations were calculated from their absorbance values 260 nm using a NanoDrop and the extinction coefficients from IDT via the Beer-Lambert law
All plasmids were purchased from GenScript (Piscataway
USA) based on the pET-28a(+) backbone and codon optimized then transformed into Escherichia coli BL21 Star (DE3) cells (Cat# C601003
For the U1-70K_SmD1/D2 polycistronic construct
an N-terminal thioredoxin-6xHis-tag followed by a tobacco etch virus (TEV) protease cleavage site was appended to the U1-70K fragment comprised of residues 2-59 followed by a Gly-Ser triplet linker then residues 7-91 of SmD1
A second open reading frame containing SmD2 was comprised of residues 1−118
an N-terminal 6xHis-tag followed by a TEV cleavage site was appended to residues 1−126 of SmD3
which was followed by a second open reading frame for residues 1-95 of SmB
an N-terminal His-SUMO-Avi tag was introduced prior to residues 1-75 of SmF
followed by additional open reading frames for SmE (residues 1-92) and SmG (1-76)
a construct with only a His-SUMO tag was used
a C-terminal 6xHis-tag was added after residues 1-61 of U1-C
Cells were cultured at 37 °C in 1 L of 2xYT media supplemented with kanamycin (50 µg/mL) then induced with 0.5 mM IPTG at 16 °C overnight
Cell pellets were resuspended in lysis buffer (20 mM HEPES
pH 7.5) plus cOmplete ULTRA EDTA-free protease inhibitor cocktail (Roche
Clarified lysates were diluted in IMAC Buffer A (20 mM HEPES
pH 7.5) then loaded onto a HisTrap HP 5 mL Ni-NTA column (Cytiva
USA) and eluted with a gradient of IMAC Buffer B (20 mM HEPES
Eluted fractions were pooled in dialysis tubing (Cat# 68035
ThermoFisher Scientific) with TEV protease and dialyzed overnight against 20 mM HEPES
the solution was adjusted to 1 M KCl and loaded onto a HisTrap column equilibrated in IMAC Buffer A
The flow-through was collected and injected onto a Superdex HiLoad 75 26/60 column (Cytiva) equilibrated in 20 mM HEPES
pH 7.5 and the fractions were collected then concentrated by centrifugation (Cat# UFC9003
cells were cultured like the other constructs with the addition of 1% (w/v) glucose to the culture media
Protein was similarly purified via the 6xHis-tag
and finally purified by IMAC and SEC as described
the SmF/E/G trimer was biotinylated on the SmF AviTag using the BirA biotin-protein ligase reaction kit (Avidity LLC
USA) and biotinylation was confirmed by MALDI-TOF MS
The U1 snRNA used for reconstitution of the miniU1 particle was purchased from AxoLabs (LGC Group
Germany) and dissolved to a concentration of 500 µM in RNAse-free ddH2O comprising the sequence: 5′-AmUmACψψACCU GGCAGUGACC ACCACACACU GCAUAAUUUG UGGUAGUGGG CGAAAGCCCG-3′
where Am and Um represent 2′-O-methyl nucleotides
a U1 snRNA of the same sequence was produced with an aminohexyl linker on the 3′ end that was subsequently labeled with Cy5 NHS ester as the fluorophore
U1 snRNA was prepared by refolding at 80 °C for 3 min and then cooling on ice for 10 min
In a pre-warmed solution of Reconstitution Buffer (20 mM HEPES
pH 7.5) containing 40 U/mL RNAsin (Cat#N2111
each Sm protein sub-complex was combined to a final concentration of 8 µM and incubated for 5 min at 37 °C
U1 snRNA was added to a final concentration of 4 µM and incubated for 45 min at 37 °C
U1-C_61 can be added to a final concentration of 8 µM
then the complex is cooled overnight at 4 °C
The crude complex was then loaded onto a MonoQ 10/100 GL column (Cytiva) in Reconstitution Buffer and eluted with a gradient of Reconstitution Buffer containing 1 M KCl
Eluted fractions were pooled and loaded onto a Superdex column (Cytiva) equilibrated in Reconstitution Buffer
fractions corresponding to miniU1 were concentrated using a 30 kDa MWCO centrifugal filter (Cat#UFC9030
a Biacore 8 K (Cytiva) was used with a streptavidin-coated Series S Sensor Chip SA (Catalog #BR100531)
The instrument was equilibrated in 20 mM HEPES
RNA was synthesized with a 3’-biotin (-1A bulge: CAGAGUAAGUAU; SMN2: AGGAGUAAGUCU; Match: CAGGUAAGUAU; Reverse: UAUGAAUGGAC; Dharmacon) and injected at 1 nM with 30 s contact time at 10 µL/min to afford 3-5RU of capture
U1 snRNP binding studies were performed by injecting a titration of complex that was serially diluted from 100 nM with a contact time of 180 s and a dissociation time of 600 s at a flowrate of 30 µL/min in duplicate
the chip surface was regenerated by an injection of 3 M MgCl2
All data was analyzed after reference subtraction
and a 1.5-2.5% (v/v) DMSO solvent correction applied
Data was analyzed using a two-state binding model in the Biacore Insight Software
To assess the effect of ligand on U1 snRNP binding kinetics
a co-inject format was used to allow for compound to be present during the association and dissociation phases of the experiment
A protein solution of U1 snRNP was prepared at 10 nM with varying concentrations of Branaplam serially diluted from 5 µM and injected across the immobilized RNA surface as previously described
a serial dilution was performed in DMSO at 25x final concentration
followed by dilution with assay buffer (20 mM HEPES
a pre-formed complex of U1 snRNP ΔU1-C was prepared at 200 nM in the presence of 20 nM of a 5’SS oligonucleotide labeled with Cy5 (5’-Cy5-CAGAGUAAGUAU; Metabion) with or without the addition of 500 nM U1-C supplementation
The branaplam titration was then mixed 1:1 with the pre-formed U1 snRNP complex and incubated at room temperature for 15 min before loading Monolith LabelFree Premium Capillaries (MO-Z025; NanoTemper Tech)
Capillaries were analyzed by a Monolith X red-continuous instrument (NanoTemper Tech) at 25 °C with 100% LED and laser power
U1-C protein was serially diluted at 2x final concentration in assay buffer
a pre-formed complex of U1 snRNP ΔU1-C was prepared at 200 nM in the presence of 20 nM of a 5′SS oligonucleotide labeled with Cy5 with the addition of branaplam at 5 µM or DMSO
to yield a final DMSO concentration of 2% (v/v)
The U1-C titration was mixed 1:1 with the pre-formed U1 snRNP complex and assessed on a Monolith X as previously described
Curves were fit using nonlinear regression using the custom code in MATLAB with the function nlinfit
95% confidence intervals were computed from the estimated Jacobian returned by nonlinear least squares fitting via nlparci
Single molecule imaging chambers were prepared using microscope slides (24 mm × 60 mm
GoldSeal) and cover glasses (25 mm × 25 mm
Corning) at least one day before each experiment
Substrates were first cleaned by successive sonication in 2% v/v Micro-90
and 1 M KOH for 60 min each in slide-mailers (Fisher Scientific)
Cleaned substrates were then dried with high purity nitrogen (Airgas) and aminosilanized with 1.5% (v/v) VECTABOND (Vector Laboratories) in acetone (Spectrophotometric Grade
and passivated by incubation of a 1:100 w/w mixture of mPEG-biotin-SVA (Laysan Bio) and mPEG-SVA (Laysan Bio) in 100 mM NaHCO3 (pH 8) overnight
the substrates were rinsed with MilliQ water and dried with nitrogen
Imaging chambers were created by placing thin strips of double-sided tape and vacuum grease along the glass slide and adhering a cover glass on top
This typically resulted in three 25 µL volume lanes per slide
assembled chambers were rinsed with at least 200 µL of wash buffer (WB: 20 mM HEPES pH 7.5
All images were collected 2x2 pixel binning with active hot pixel correction
streptavidin-labeled fluorescent beads (T10711
Invitrogen) were flowed into the lane at a low concentration (~5 × 104-fold dilution from stock in WB) to serve as fiducial markers for channel alignment and lateral drift correction
The lane was then washed with 50 µL of 0.2 mg/mL streptavidin (SA10-10
followed by 50 µL of WB to remove unbound beads and streptavidin
The U1 snRNP particle labeled with Cyb5 was then diluted to 10-20 pM in WB and incubated in a lane for one minute
The surface density of Cy5-labeled U1 snRNP-∆U1-C was checked by flowing in imaging buffer (IB: 20 mM HEPES pH 7.5
Two imaging schemes were used for data collection: alternating laser excitation or sequential laser excitation
a 50 µL solution containing variable concentrations of RNA
and branaplam in IB was added and successive images were captured with a 1 s exposure under 532 nm then 633 nm excitation
approximately 30 s were first recorded (633 nm
1 Hz) to identity areas of interest (AOI) followed by addition of 50 µL solution containing variable concentrations of RNA
and Branaplam in IB was added and images were captured sequentially at 1 Hz (532 nm
a total of 600 frames were collected across varying frame rates (0.11 to 1 Hz) to minimize photobleaching
the lane was washed with WB to remove oxygen scavengers and images were collected under 633 nm excitation until all surface tethered molecules photobleached (typically 30-60 frames)
this step allowed us to ensure we only analyzed AOIs featuring a single U1 snRNP molecule
Detected AOIs were fit to a two-dimensional gaussian function within a 5x5 pixel space
AOIs were filtered by removing those with intensity values of greater than three scaled median absolute deviations from the median (e.g.
multiple overlapping molecules) and those with a Euclidean distance less than 5 pixels away from a neighboring AOI
Accepted AOIs were then mapped to the 532 nm channel using the mathematical transformations described above
The time dependent fluorescence of each AOI in each channel was computed by integrating over all frames in a 3x3 pixel space centered on each AOI’s sub-pixel location
All the steps of this process were incorporated into a graphical user interface (smVideoProcessing)
Only molecules exhibiting at least one binding event were included for further analysis
All the steps of this process were incorporated into a graphical user interface (smTraceViewer)
Dwell time distributions are visualized as either their cumulative probabilities, violin plots, or histograms. Cumulative probability plots were constructed by computing a cumulative distribution function (CDF) estimate (Eq X) where the value of each bin \(({v}_{i})\) is computed by Eq. 5
Overlays of MLE of mono- and biexponential distributions were computed by integrating PDF1 and PDF2 over a range [\({t}_{\min },\, {t}_{\max }\)] as provided by Eqs. 6 and 7
only the initial unbound event (aka time to first binding) was considered for MLE estimation to reduce the potential bias introduced by photobleaching of tight binders
the likelihood of 2 clusters vs 1 cluster was determined by computing a Bayesian Information Criterion (BIC) score for each model by
and SSE is the sum of squared errors within a cluster summed across all clusters by
\({x}_{j}\) is the data point in cluster \({C}_{i}\)
and \({\mu }_{i}\) is the centroid of cluster \({C}_{i}\)
A lower BIC value was used as evidence for a better model
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
Single molecule data (raw images and analyzed.mat files) can be accessed at https://doi.org/10.5281/zenodo.13738649. Source data are provided with this paper
Scripts for single molecule analysis and figure generation can be found at https://github.com/David-Scott-White/White_2024
Pre-mRNA splicing-associated diseases and therapies
Splicing modulators: on the way from nature to clinic
Antisense Oligonucleotide Therapies for Neurodegenerative Diseases
Progress in spinal muscular atrophy research
Identification and characterization of a spinal muscular atrophy-determining gene
An update of the mutation spectrum of the survival motor neuron gene (SMN1) in autosomal recessive spinal muscular atrophy (SMA)
Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1
A negative element in SMN2 exon 7 inhibits splicing in spinal muscular atrophy
Antisense masking of an hnRNP A1/A2 intronic splicing silencer corrects SMN2 splicing in transgenic mice
Structural basis of a small molecule targeting RNA for a specific splicing correction
Binding to SMN2 pre-mRNA-protein complex elicits specificity for small molecule splicing modifiers
SMN2 splice modulators enhance U1-pre-mRNA association and rescue SMA mice
SMN2 splicing modifiers improve motor function and longevity in mice with spinal muscular atrophy
Small molecule splicing modifiers with systemic HTT-lowering activity
Principles and correction of 5’-splice site selection
Control of alternative splicing by the differential binding of U1 small nuclear ribonucleoprotein particle
An RNA switch at the 5’ splice site requires ATP and the DEAD box protein Prp28p
Kondo, Y., Oubridge, C., van Roon, A. M. & Nagai, K. Crystal structure of human U1 snRNP, a small nuclear ribonucleoprotein particle, reveals the mechanism of 5’ splice site recognition. Elife 4, https://doi.org/10.7554/eLife.04986 (2015)
CryoEM structure of Saccharomyces cerevisiae U1 snRNP offers insight into alternative splicing
U1-specific protein C needed for efficient complex formation of U1 snRNP with a 5’ splice site
Coupling mRNA processing with transcription in time and space
In vitro reconstitution of mammalian U1 snRNPs active in splicing: the U1-C protein enhances the formation of early (E) spliceosomal complexes
Determination of parameter identifiability in nonlinear biophysical models: A Bayesian approach
RNAstructure: software for RNA secondary structure prediction and analysis
Jarmoskaite, I., AlSadhan, I., Vaidyanathan, P. P. & Herschlag, D. How to measure and evaluate binding affinities. Elife 9, https://doi.org/10.7554/eLife.57264 (2020)
Analysis of spliceosome dynamics by maximum likelihood fitting of dwell time distributions
Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure
Goldschen-Ohm, M. P. et al. Structure and dynamics underlying elementary ligand binding events in human pacemaking channels. Elife 5, https://doi.org/10.7554/eLife.20797 (2016)
Regulated control of gene therapies by drug-induced splicing
a selective survival of motor neuron-2 (SMN2) Gene splicing modifier for the treatment of spinal muscular atrophy (SMA)
Normal and mutant human beta-globin pre-mRNAs are faithfully and efficiently spliced in vitro
Mutations in conserved intron sequences affect multiple steps in the yeast splicing pathway
Hansen, S. R. et al. Multi-step recognition of potential 5’ splice sites by the Saccharomyces cerevisiae U1 snRNP. Elife 11, https://doi.org/10.7554/eLife.70534 (2022)
Quantitative Activity Profile and Context Dependence of All Human 5’ Splice Sites
Larson, J. D., and Hoskins, A. A. Dynamics and consequences of spliceosome E complex formation. Elife 6, https://doi.org/10.7554/eLife.27592 (2017)
A handful of intron-containing genes produces the lion’s share of yeast mRNA
Small nuclear RNAs from Saccharomyces cerevisiae: unexpected diversity in abundance
Structure of a transcribing RNA polymerase II-U1 snRNP complex
Cotranscriptional spliceosome assembly and splicing are independent of the Prp40p WW domain
U1 snRNP increases RNA Pol II elongation rate to enable synthesis of long genes
4sUDRB-seq: measuring genomewide transcriptional elongation rates and initiation frequencies within cells
Competition between pre-mRNAs for the splicing machinery drives global regulation of splicing
A novel intra-U1 snRNP cross-regulation mechanism: alternative splicing switch links U1C and U1-70K expression
Identification of alternative splicing regulators by RNA interference in Drosophila
U1 snRNP determines mRNA length and regulates isoform expression
Design and construction of a multiwavelength
micromirror total internal reflectance fluorescence microscope
Edelstein, A. D., et al. Advanced methods of microscope control using muManager software. J Biol Methods 1, https://doi.org/10.14440/jbm.2014.36 (2014)
Dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes
White, D. S., Goldschen-Ohm, M. P., Goldsmith, R. H., and Chanda, B. Top-down machine learning approach for high-throughput single-molecule analysis. Elife 9, https://doi.org/10.7554/eLife.53357 (2020)
Learning rates and states from biophysical time series: a Bayesian approach to model selection and single-molecule FRET data
Solving ion channel kinetics with the qub software
Extracting dwell time sequences from processive molecular motor data
Download references
This work was supported by funding from Remix Therapeutics
grants from the National Institutes of Health (R35 GM136261 to A.A.H.) with additional support from a Research Forward grant award from the Wisconsin Alumni Research Foundation and a NIH postdoctoral fellowship award (F32 GM143780 to D.S.W.)
and Amira Yazidi at NMX Research Solutions for assistance with protein production
Maria McGresham at August Bioservices for assistance with SPR experiments
and Maximilian Plach at 2bind GmbH for assistance with MST experiments
and members of the Hoskins and Herschlag labs for helpful discussions
and A.A.H wrote the manuscript with input from B.M.D
All authors contributed to reviewing and revising the manuscript
is a member of the Scientific Advisory Board (SAB) for Remix Therapeutics and is carrying out sponsored research in collaboration with Remix
are paid employees and interest holders of Remix Therapeutics
completed this work while employed as a postdoctoral scientist in the Hoskins Laboratory at UW-Madison and is a current employee of Element Biosciences
Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work
Download citation
DOI: https://doi.org/10.1038/s41467-024-53124-5
Metrics details
Genetic and experimental findings point to a crucial role of RNA dysfunction in the pathogenesis of Amyotrophic Lateral Sclerosis (ALS)
Evidence suggests that mutations in RNA binding proteins (RBPs) such as FUS
affect the regulation of alternative splicing
We have previously shown that the overexpression of wild-type FUS in mice
a condition that induces ALS-like phenotypes
a protein with key roles in RNA metabolism
suggesting that a pathological connection between FUS and hnRNP A2/B1 might promote FUS-associated toxicity
Here we report that the expression and distribution of different hnRNP A2/B1 splice variants are modified in the affected tissues of mice overexpressing wild-type FUS
degenerating motor neurons are characterized by the cytoplasmic accumulation of splice variants of hnRNP A2/B1 lacking exon 9 (hnRNP A2b/B1b)
In vitro studies show that exon 9 skipping affects the nucleocytoplasmic distribution of hnRNP A2/B1
promoting its localization into stress granules (SGs)
and demonstrate that cytoplasmic localization is the primary driver of hnRNP A2b recruitment into SGs and cell toxicity
boosting exon 9 skipping using splicing switching oligonucleotides exacerbates disease phenotypes in wild-type FUS mice
these findings reveal that alterations of the nucleocytoplasmic distribution of hnRNP A2/B1
likely contribute to motor neuron degeneration in ALS
implying that pathological FUS may affect the overall expression
that might in turn promote motor neuron degeneration
the actual contribution of these splicing changes in ALS pathogenesis is essentially unknown
in this study we aimed at verifying whether the changes in hnRNP A2/B1 splicing induced by pathological FUS affect hnRNP A2/B1-related activities that might in turn contribute to motor neuron degeneration induced by FUS
here we uncover for the first time the existence of a functional connection between hnRNP A2/B1 splicing isoforms and ALS pathology
disease progression is marked by the accumulation of hnRNP A2/B1 splicing isoforms lacking exon 9 (hnRNP A2b/B1b)
which preferentially localize in the cytoplasm of degenerating motor neurons
In vitro experiments demonstrate that the hnRNP A2b variant exhibits an increased propensity to relocalize into the cytoplasm
and that its de-localization is sufficient to drive SGs formation and cell toxicity
disease phenotypes in hFUS mice worsen upon treatment with splicing switching oligonucleotides that enhance exon 9 skipping
these findings support the existence of a pathological cascade orchestrated by FUS and hnRNP A2/B1 and strengthen the idea that the functional network connecting RBPs is widely affected by ALS conditions
A Schematic representation of alternative splicing of exon 2 (green) and exon 9 (orange) of hnRNP A2/B1
The filled rectangles represent included exons
while the empty rectangles represent skipped exons
Arrows represent the specific primers used for cDNA amplification
B The alternative splicing pattern of hnRNP A2/B1 exon 2 and exon 9 was monitored by semiquantitative RT-PCR analysis in spinal cords from hFUS transgenic mice
along with age-matched non-transgenic (Ctrl) animals
Bands were quantified through densitometric analysis
and a splicing index was calculated as the ratio between the upper and lower band and plotted considering the corresponding ratio in a Ctrl mouse equal to 1
Data are expressed as means ± SD (n = 4–5 mice/group)
Statistical significance was calculated by Student’s t-test
D Lumbar spinal cord lysates from control (Ctrl) and hFUS mice at the symptomatic (C)
and end-stage (D) phases of the disease were subjected to western blot analysis using anti-exon 2
Data are expressed as means ± SD (n = 4/5 mice/group) considering the relative expression of a Ctrl mice equal to 1
Statistical significance was calculated by Student’s t-test referred to Ctrl
and anti-exon 9-immunoreactive isoforms are strongly downregulated
while anti-exon 8/10 signal is significantly upregulated in hFUS mice compared to control animals
these effects appear enhanced at this stage of the disease
suggesting that the observed alterations in the expression of hnRNP A2/B1 match disease progression
these results show that ALS disease in hFUS mice is characterized by a shift in the expression of hnRNP A2/B1 towards isoforms lacking exon 9 (either A2b
A–C Immunofluorescence staining on spinal cord sections of non-transgenic (Ctrl) and end stage hFUS mice with antibodies against Exon 9 (green) (A)
Exon 2 (green) (B) or Exon 8/10 (green) (C) and SMI32 (red)
Nuclei were detected by DAPI staining (blue)
The dotted white lines mark the separation between white and grey matter of the spinal cord
Magnifications of the highlighted areas are also shown
Exon 2 (E) and Exon 8/10 (F) staining in hemisections of Ctrl and end stage hFUS mice
****p < 0.0001 (n = 4 animals for group
G Quantification of nuclear/cytoplasmic distribution of Exon 8/10 signal in motor neurons (MNs) of Ctrl and end stage hFUS mice
The bar plot shows the percentage distribution of Exon 8/10 staining in the nuclear (nuc) and cytosolic (cyt) compartments for the Ctrl and hFUS mice
Statistical significance was assessed using Two-way ANOVA followed by Šidák’s multiple comparisons test
Asterisks indicate the level of statistical significance: ****p < 0.0001 (n = 4 animals for group
HeLa cells were transfected with the HA-tagged hnRNP A2
B1 and B1b (B) isoform constructs and analysed 24 h after transfection by immunofluorescence
using an anti-HA antibody (red) and anti-TIA1 antibody (green)
Magnifications of the highlighted areas are also shown (zoom)
For fluorescent distribution across the cell
A straight line was overlaid across the cell and then the fluorescent intensity was measured across the line using the built-in function
Graphs (lower panels) represent fluorescent intensity across the line in images; the yellow shaded area denotes the nucleus
SuperPlots on the right show the percentage of cells with cytosolic HA signal or cells where the HA signal colocalizes with TIA1-positive SGs
calculated for both the A2/A2b (A) and B1/B1b isoforms (B)
The distribution of measures from n = 3 independent experiments is reported
with each biological replicate color-coded: the mean value from each of the three replicates is represented by black dots
and the mean ± SD of the three replicates is shown as a black line
as well as the nuclear (nuc) and cytosolic (cyt) fractions from HeLa cells transfected with HA-tagged hnRNP A2/B1 isoform constructs were analyzed by western blot
hnRNP A2/A2b/B1/B1b expression levels were monitored using an anti-HA antibody
lamin B1 and β-actin levels were included to assess the purity of the cytosolic
A HeLa cells were transfected with the HA-tagged hnRNP A2 and hnRNP A2b isoform constructs for 24 h
untreated or treated with 0.5 mM sodium arsenite (NaAs) for 20
40 and 60 minutes and analysed by immunofluorescence using an anti-HA antibody (red) and anti-TIA1 antibody (green)
B SuperPlots showing the percentage of cells with the HA signal colocalizing with TIA1-positive SGs
Statistical significance was calculated by One-way ANOVA test
and the significant differences between A2 and A2b isoforms at the same time point are shown
as well as the nuclear (nuc) and cytosolic (cyt) fractions from HeLa cells transfected with HA-tagged hnRNP A2 and A2b isoforms constructs
hnRNP A2 and A2b expression levels were monitored using an anti-HA antibody
Lamin B1 and β-actin levels were analyzed to assess for the purity of the cytosolic
and the presence of the D290V mutation does not further enhance this effect
Representative western blot (A) and quantification (B) of total (INPUT) protein extracts
as well as the insoluble (INS) and soluble (SOL) fractions from HeLa cells transfected with HA-tagged hnRNP A2
A2-D290V and A2b-D290V isoforms constructs
Isoform expression levels were monitored using an anti-HA antibody
β-actin levels were used as a loading control
Data are reported as mean value ± SD (n = 3 independent experiments)
Statistical significance was calculated by Two-way ANOVA
comparing soluble and insoluble fractions between groups
C Numerical output scores resulting from GraPES analysis of hnRNP A2 compared to hnRNP A2b
with higher values indicating increased likelihood of the protein being localized in a biological condensate (a value greater than 0.90 suggests a high propensity); Disorder
representing the percentage of protein residues predicted to be disordered by DISOPRED3; Net charge
that is the overall sum of the positively and negatively charged residues at neutral pH; PScore
reflecting the quantity of π-π interactions
that are linked to the propensity of the protein to phase separate in vitro; Soluprot
a protein solubility score where higher values correspond to higher solubility; GRAVY Score
a measure of protein hydrophobicity; and RBP Pred
a likelihood prediction for a protein to exhibit RNA-binding capabilities
D Graphical plots generated by GraPES analysis showing MaGSeq Z-score (upper panel) and Disorder score (lower panel) of hnRNP A2 and hnRNP A2b along the precomputed score distributions of human proteome (shown in gray)
The y-axis represents the percentage (density) of proteins associated with a given score (the x-axis)
Scores relative to known markers for SGs and processing bodies (p-bodies) are also shown as references
A Schematic representation of hnRNP A2/B1 variants used
which encodes a portion of the M9-NLS (A2_ΔNLS and A2b_ΔNLS)
hnRNP A2 and A2b isoforms fused at the C-terminal with an extra NLS were also produced (A2 + NLS and A2b+NLS)
All six variants contain an HA epitope at the N-terminal
B Hela cells were transfected with the indicated plasmids and analyzed by immunofluorescence using anti-HA (red) and anti-TIA1 (green) antibodies
C SuperPlots showing the percentage of cells with cytosolic HA signal (upper panel) and cells where the HA signal colocalizes with TIA1-positive SGs (lower panel)
Statistical significance was calculated by one-way ANOVA test
and the significant differences between the A2 variants or between the A2b variants are shown
D Representative western blot and relative quantification of SH-SY5Y cells untreated (NT) or transfected with an empty plasmid (mock)
or with plasmids coding for HA-tagged hnRNP A2 (A2) and HA-tagged hnRNP A2b (A2b)
as well as the nuclear and cytosolic fractions were analysed by western blot with an anti-cleaved PARP (cPARP) and anti-cleaved Caspase 3 (cCasp3) antibodies
A2 and A2b expression levels were monitored using an anti-HA antibody
Lamin B1 and β-actin levels were monitored to assess the fractions’ purity and normalize the fractions
E Representative western blot and relative quantification of SH-SY5Y cells untreated (NT) or transfected with an empty plasmid (mock)
or with plasmids coding for HA-tagged hnRNP A2 (A2)
hnRNP A2b+NLS (A2b+NLS) and hnRNP A2b_ΔNLS (A2b_ΔNLS)
cytosolic protein extracts were analyzed in western blot with an anti-cCasp3 antibody
Expression of hnRNP A2/B1 isoforms has been evaluated in total lysates (input) with an anti-HA antibody
β actin antibody was used as a loading control
B Overlapping genes whose alternative splicing is misregulated upon A2/B1 and FUS downregulation (ASO)/ knock-out (KO) have been analysed by the Enrichr analysis tool
The top 10 terms enriched in Gene Ontology biological process are listed according to their decreasing −log10 (p value) (enrichment)
The colour code indicates the adjusted p value
and the bubble size reflects the number of genes enriching that annotation (count)
C The alternative splicing pattern of 11 selected common hnRNP A2/B1 and FUS target genes was assessed through semiquantitative RT-PCR analysis in the spinal cords of end-stage hFUS mice
along with age-matched non-transgenic control (Ctrl) animals
Representations of constitutive exons (dark grey rectangles) and alternatively spliced exons (light grey and white rectangles) analysed are shown
Bands were quantified by densitometric analysis
and a splicing index was calculated as follows
For genes that are expressed in more than one isoform
the ratio between the upper and the lower band was calculated and plotted considering the corresponding ratio in Ctrl mice equal to 1
For genes that are expressed as unique isoform
the splicing index was calculated as the ratio between band intensity and the relative intensity of the housekeeping Gapdh gene
Data are expressed as means ± SD (n ≥ 3mice/group)
Statistical significance was calculated by student’s t-test
A Nissl-stained spinal cord sections of male non-transgenic (Ctrl) and symptomatic hFUS mice treated with PBS (Veh) or 40 µg SSO A were analyzed 35 days after ICV injection
Quantification of motor neuron (MNs) numbers/ventral horn is provided
Statistical significance was calculated using ANOVA
E Spinal cord sections from non-transgenic (Ctrl) and symptomatic hFUS mice treated with PBS (Veh) or 40 µg SSO A were analyzed 35 days after ICV injection and subjected to immunofluorescence staining with an antibody against NeuN (green) (B)
D Iba1 positive cells in vehicle- and SSO A treated hFUS mice were analyzed by ImageJ software for different size descriptors (Area and Perimeter)
Statistical significance was calculated using Student’s t-test
F The alternative splicing pattern of Atxn2
Fchsd2 and Sorbs1 target genes was assessed through semiquantitative RT-PCR analysis in the spinal cords of non-transgenic (Ctrl) and symptomatic hFUS mice treated with PBS (Veh) or 40 µg SSO A
and a splicing index was calculated as the ratio between band intensity
Data are expressed as means ± SD (at least n = 3 mice/group)
Statistical significance was calculated using unpaired t-test
the functional network connecting RBPs is widely affected by ALS conditions
and an extensive process of mislocalization and/or aggregation of RBPs can occur along with the progression of the disease and might have significant implications for the overall pathological process
a significant number of genes that are regulated by hnRNP A2/B1 display alterations in their splicing patterns in the spinal cord of symptomatic and end-stage hFUS mice
suggesting that a loss of splicing regulation by hnRNP A2/B1 has a role in this process
Whether these changes are caused by a decreased nuclear pool of hnRNP A2/B1 or by the concurrent cytoplasmic mis-localization of exon 9-lacking isoforms is still to be fully defined
SSOs that increase exon 9 skipping in hFUS mice enhance the observed splicing alterations
suggesting that these changes are tightly linked to A2b/B1b accumulation and to the overall disease progression
and showed that modifying hnRNP A2/B1 expression affect the dynamics of pathological SGs induced by mutant FUS
thus supporting the hypothesis that A2b/B1b isoforms might impact on this function
the accumulation of hnRNP A2/B1 into the cytoplasm of degenerating motor neurons in hFUS mice is a circumstantial but compelling suggestion that an altered SG dynamics might be involved in disease progression in ALS mice
Results from cultured neuronal cells support the conclusion that an increased cytoplasmic expression of hnRNP A2/B1 might be harmful to cells
cytoplasmic A2b isoform promotes a significant increase in activated caspase-3 expression compared to controls and to cells expressing A2
relocates to the cytosol upon removal of the M9_NLS
demonstrating that uncontrolled cytoplasmic delocalization of A2 is sufficient to promote cellular toxicity
This further suggests that the exclusion of exon 9
and the subsequent cytoplasmic accumulation of the A2b isoform
may play a role in the pathogenic mechanism of FUS-related ALS
The experiments performed by in-vivo injection of SSOs strengthen this conclusion
enhanced exon 9 skipping induced by SSO treatment increases motor neuron degeneration and neuroinflammation that characterize the disease course in hFUS mice
demonstrating that the accumulation of A2b/B1b isoforms contributes to FUS-associated toxicity in mice
control mice treated with SSO A do not display significant phenotypic changes within the observed period
indicating the need for further investigation into the long-term consequences of exon 9 skipping
considering that the alterations in alternative splicing that we detect in hnRNP A2/B1 coincide with the onset of symptoms
these data suggest that changes in isoform expression alone are not sufficient to induce an ALS phenotype but rather play an active role in disease progression
our findings support the notion that ALS conditions broadly impact the functional network of RNA-binding proteins and identify hnRNP A2/B1 mislocalization as a possible player in the pathological process characterizing FUS-ALS
Plasmids containing degenerated protein-coding sequences of wild type and mutated D290V hnRNP A2 isoforms were purchased by Addgene
hnRNP B1 sequence were amplified by PCR from hnRNP A2 plasmid
were produced by PCR-driven overlap extension
Δ323–341 deletion variants (hnRNP A2_ΔNLS and hnRNP A2b ΔNLS) were mutagenized by PCR from wild type hnRNP A2 and hnRNP A2b plasmids
An extra canonical SV-40 nuclear localization signal has been introduced in wild type hnRNP A2 and hnRNP A2b plasmids by PCR to produce hnRNP A2 + NLS and hnRNP A2b+NLS variants
All hnRNP A2/B1 variants were cloned into pcDNA3.1 plasmid vector (Invitrogen) and fused with an HA epitope at the N-terminal end
Fully modified 2’-O-methyl splicing switching oligonucleotides (SSO) with a phosphorothioate backbone have been synthesized and HPLC purified by Eurofins Genomics for subsequent test in vitro
with low endotoxin level (guaranteed < 0.5 EU/mg)
SSO was synthesized by Microsynth AG for studies in vivo
The sequences of SSOs used are: SSO A 5’-UUUAUUACCUCCUCCA-3’ and scramble SSO 5’-ACCUUCUUACUUCAUC-3’
and mouse NSC34 cells (originally obtained from Neil Cashman
Canada) were cultured in Dulbecco’s modified Eagle’s medium (DMEM) with Glutamax (Corning)
supplemented with 10% fetal bovine serum (FBS
and 1% penicillin/streptomycin (Sigma-Aldrich) at 37 °C in a 5% CO2 atmosphere
cells were treated with sodium arsenite (NaAs
Sigma Aldrich) at a concentration of 0.5 mM
cells at 80% confluence were transfected with appropriate plasmids or antisense oligonucleotides using Lipofectamine 2000 (Invitrogen) according to the manufacturer’s instruction
Cells were collected after 24 and/or 48 hours for subsequent analysis
digested with 0.25% trypsin (Gibco) and 0.2 mg/ml DNase (Sigma-Aldrich) in DMEM
After dissociation with a fire-polished Pasteur pipette and passage through 70 μm filters
primary cortical neurons were suspended and plated in 12-well plates previously coated with poly-L-lysine (PLL) (1 mg/mL) or on PLL-coated coverslips and maintained in Neurobasal® medium (Gibco Life Technologies) supplemented with B-27® (Life Technologies) at densities ranging from 4 × 104 cells/cm2 to 6 × 104 cells/cm2
Lumbar spinal cord from n = 4 animals per group were dissected and homogenized with a homogenizer (MICCRA D-1) in lysis buffer containing 20 mM Hepes pH 7.4
10 mM EDTA and a protease inhibitor cocktail (Sigma-Aldrich)
the lysates were centrifugated for 20 minutes at 16,000 × g at 4 °C
The supernatant was quantified using the Bradford assay and resuspended in Laemmli buffer (Biorad)
SH-SY5Y and Hela cells were lysed in RIPA buffer (50 mM Tris-HCl pH 7.4
5 mM MgCl2) containing a protease inhibitor cocktail
incubated for 30 minutes on ice and centrifugated for 10 minutes at 16,000 × g at 4 °C
Supernatants were quantified using the Bradford protein assay (Bio-Rad) and resuspended in Laemmli buffer
For the preparation of insoluble protein extracts
the pellets were resuspended in Laemmli Buffer
HeLa cells were centrifuged at 600 × g for 5 minutes at 4 °C and washed with cold PBS
Cell pellet was resuspended by gentle pipetting with cold hypotonic lysis buffer (HLB
Tris 10 mM pH 7.5; NaCl 10 mM; MgCl2 3 mM; NP-40 0.1%; glycerol 10%)
1 mM sodium fluoride and a cocktail of protease inhibitors (Sigma-Aldrich)
Laemmli buffer was added to a portion of lysates to obtain the total fraction
The remnant cell suspension was centrifuged at 1000 × g for 3 minutes at 4 °C
Supernatant containing the cytoplasmic fraction was clarified at 5000 × g for 5 minutes at 4 °C
quantified by Bradford assay and then resuspended in Laemmli Buffer
Pellet containing the nuclear fraction was washed by carefully pipetting with cold PBS
centrifuged at 300 × g for 2 minutes at 4 °C
Protein lysates were separated by SDS-PAGE and transferred to a nitrocellulose membrane
The membranes were blocked at room temperature for 1 hour in Tris-buffered saline solution with 0.1% Tween-20 (TBS-T) containing 5% non-fat dry milk
and then incubated with primary antibodies
diluted in TBS-T containing 2% non-fat dry milk
at 4°C overnight or for 2 hours at room temperature
HRP-conjugated secondary antibodies (Jackson ImmunoResearch) were applied at room temperature for 1 hour
Chemiluminescent detection was performed using ECL solution (Roche)
Following densitometry-based quantification and analysis using ImageJ software (National Institute of Health
the relative density of each identified protein was calculated
Spinal cords from n = 4 animals per group were fixed using a 4% paraformaldehyde solution (PFA) in 0.1 M PBS for 12 hours and tissues were cryoprotected in 30% sucrose in PBS solution at 4 °C
spinal cords were cut into 30-μm-thick slices with a freezing cryostat (Leica Biosystems)
After blocking for 1 hour in 10% normal donkey serum (NDS) in PBS containing 0.3% Triton X-100
spinal cord slices were incubated for 3 days at 4 °C with primary antibodies diluted in 2% NDS in PBS
and then for 3 h at room temperature with appropriate fluorescent secondary antibody
Nuclei were stained with 1 μg/ml DAPI (Sigma-Aldrich) for 10 min
The slides were coverslipped with Fluromount Aqueous Mounting Medium (Sigma-Aldrich)
HeLa and SH-SY5Y cells were fixed using 4% PFA in PBS for 10 minutes
permeabilized with a 0.1% Triton X-100 solution in PBS for 5 minutes and blocked with 2% FBS diluted in PBS for 30 minutes at room temperature
Cells were then incubated with primary antibodies diluted in 2% FBS in PBS for 1 hour at 37° and
with appropriate fluorescent-conjugated secondary antibodies in PBS
Nuclei were stained with 1 μg/ml DAPI (Sigma-Aldrich) for 5 min
Immunofluorescence images were analysed using a LEICA TCS SP5 confocal microscope
Images were captured under constant exposure time
Digital image brightness and contrast were adjusted using the LAS AF software (Leica)
Background subtraction was performed after defining a region of interest
and the average pixel intensity was calculated
All image quantifications were done using ImageJ software (NIH)
To assess the nuclear and cytoplasmic distribution of the Exon 8/10 fluorescence signal in motor neurons
images were analyzed using ImageJ software (NIH)
Regions of interest (ROIs) were drawn around the nucleus and the entire cell of individual motor neurons
The mean fluorescence intensity was measured for each ROI
The cytoplasmic signal was calculated by subtracting the nuclear intensity from the total cellular intensity
For the quantification of cells displaying cytoplasmic and SG localization of HA-hnRNP A2/B1 isoforms
at least 50–100 HA-positive cells per condition from randomly selected fields in n = 3 independent experiments were visually scored using a Zeiss Axioplan fluorescence microscope
The total number of motor neurons in the L3–L5 segments of the lumbar spinal cord was quantified by analyzing serial sections from each mouse
To visualize the Nissl substance within neurons
the sections were stained with 0.02% cresyl violet solution
the sections underwent a graded dehydration process using ethanol (50% to 100%)
Images of the sections were captured using a Zeiss Axioskop 2 microscope at 20x magnification
Both the right and left ventral horns were examined to count neurons
characterized by cell bodies exceeding 200 μm²
and the average count from the sections was calculated for each mouse
Immunofluorescences (IF) and immunoblots (WB) were performed with the following primary antibodies: rabbit anti-Exon 2 (1:1000-WB
1:500-IF) and rabbit anti-Exon 8/10 (1:500-WB
Rothnagel from the University of Queensland (Australia); rabbit anti-Exon 8/10 (1:5000-WB
custom antibody produced by Bio-Fab Research)
Secondaries antibodies for WB were anti-rabbit (1:2500) and anti-mouse (1:5000) IgG peroxidase-conjugated from Bio-Rad Laboratories (Hercules
Secondary fluorescent antibodies for IF were Alexa-Flour 488-Donkey anti-rabbit (1:200)
from Jackson ImmunoResearch Laboratories (West Grove
PCR products were run in 2% agarose gels and visualized by SYBR Safe DNA Gel Stain (Invitrogen) staining
Images were acquired on ChemiDocTM Imaging System (Bio-Rad)
bands were quantified using the ImageJ software (NIH) and the splicing indices were calculated as the ratio between the upper and the lower bands
The expression levels of the isoform lacking exon 9 were calculated as percentages relative to the total expression of the isoform containing exon 9 and the one lacking exon 9
using the MaGSeq (MaGS Sequence-based tool) predictive model
MaGSeq is a general linearized model (GLM) based only on protein sequence features and provides a Z-score
representing the propensity of proteins to localize into biological condensates
as well as the feature scores used to generate the predictions
A MaGSeq value greater than or equal to 0.90 for human suggests that the protein is highly likely part of phase-separated organelles within the cell
Sterile PBS or SSO A (20 μg or 40 μg) diluted in 0.01% Fast Green (Sigma-Aldrich) was injected intracerebroventricularly (ICV) in newborn pups (P0-P1) using a glass syringe (Hamilton
the needle was inserted at the midpoint of a line defined between the right eye and the lambda intersection of the skull
The needle was carefully advanced into the lateral ventricle to a depth of approximately 3 mm
the pups were allowed to recover on a heating pad under a heat lamp before being returned to their mother
statistical significance was assessed using a two-tailed Student’s t-test
One-way analysis of variance (ANOVA) or Two-way ANOVA
All statistical analyses were conducted using GraphPad Prism 9.0 software (GraphPad Software
the number of animals in the experimental groups was determined through power analysis
The parameters were calculated based on previous experiments using the same animal model
groups were balanced for age or disease stage before randomization
Investigators remained blinded to treatment allocation during outcome assessment to reduce bias
Data available on request from the authors
Dysregulation of RNA-binding proteins in amyotrophic lateral sclerosis
Disruption of RNA metabolism in neurological diseases and emerging therapeutic interventions
The role of TDP-43 mislocalization in amyotrophic lateral sclerosis
The era of cryptic exons: implications for ALS-FTD
Stress granule mediated protein aggregation and underlying gene defects in the FTD-ALS spectrum
Stress granules as crucibles of ALS pathogenesis
Toxic gain of function from mutant FUS protein is crucial to trigger cell autonomous motor neuron loss
Importance of functional loss of FUS in FTLD/ALS
Mechanisms of FUS mutations in familial amyotrophic lateral sclerosis
Converging mechanisms in ALS and FTD: disrupted RNA and protein homeostasis
RNA dysregulation in amyotrophic lateral sclerosis
Fused in sarcoma neuropathology in neurodegenerative disease
Mutations in the 3’ untranslated region of FUS causing FUS overexpression are associated with amyotrophic lateral sclerosis
Overriding FUS autoregulation in mice triggers gain-of-toxic dysfunctions in RNA metabolism and autophagy-lysosome axis
Overexpression of human wild-type FUS causes progressive motor neuron degeneration in an age- and dose-dependent fashion
An ALS-associated mutation in the FUS 3′-UTR disrupts a microRNA–FUS regulatory circuitry
Cause Familial Amyotrophic Lateral Sclerosis Type 6
Divergent roles of ALS-linked proteins FUS/TLS and TDP-43 intersect in processing long pre-mRNAs
Functional interaction between FUS and SMN underlies SMA-like splicing changes in wild-type hFUS mice
FUS ALS-causative mutations impair FUS autoregulation and splicing factor networks through intron retention
Cytoplasmic aggregation of mutant FUS causes multistep RNA splicing perturbations in the course of motor neuron pathology
RNA-binding proteins with prion-like domains in health and disease
The roles of hnRNP A2 / B1 in RNA biology and disease
Differential subcellular distributions and trafficking functions of hnRNP A2/B1 spliceoforms
Heterozygous frameshift variants in HNRNPA2B1 cause early-onset oculopharyngeal muscular dystrophy
Protein-RNA networks regulated by normal and ALS-associated mutant HNRNPA2B1 in the nervous system
Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS
hnRNPA2B1 represses the disassembly of arsenite-induced stress granules and is essential for male fertility
It’s not just a phase: function and characteristics of RNA-binding proteins in phase separation
GraPES: the granule protein enrichment server for prediction of biological condensate constituents
Enrichr: a comprehensive gene set enrichment analysis web server 2016 update
The role of hnRNPs in frontotemporal dementia and amyotrophic lateral sclerosis
Linking hnRNP function to ALS and FTD pathology
TDP-43 proteinopathies: a new wave of neurodegenerative diseases
Mutant FUS proteins that cause amyotrophic lateral sclerosis incorporate into stress granules
Cytoplasmic FUS triggers early behavioral alterations linked to cortical neuronal hyperactivity and inhibitory synaptic defects
Genetic mutations in RNA-binding proteins and their roles in ALS
Amyotrophic lateral sclerosis: translating genetic discoveries into therapies
Neuroprotective effects of niclosamide on disease progression via inflammatory pathways modulation in SOD1-G93A and FUS-associated amyotrophic lateral sclerosis models
RNA-binding proteins and the complex pathophysiology of ALS
RNA-binding proteins in amyotrophic lateral sclerosis
Stress granules in the spinal muscular atrophy and amyotrophic lateral sclerosis: The correlation and promising therapy
ALS-FUS mutations cause abnormal PARylation and histone H1.2 interaction
A liquid-to-solid phase transition of the ALS Protein FUS accelerated by disease mutation
ALS/FTD mutation-induced phase transition of FUS liquid droplets and reversible hydrogels into irreversible hydrogels impairs RNP granule function
Dysregulation of stress granule dynamics by DCTN1 deficiency exacerbates TDP-43 pathology in Drosophila models of ALS/FTD
Phase separation of C9orf72 dipeptide repeats perturbs stress granule dynamics
FUS pathology in ALS is linked to alterations in multiple ALS-associated proteins and rescued by drugs stimulating autophagy
OpenCell: endogenous tagging for the cartography of human cellular organization
A quick phenotypic neurological scoring system for evaluating disease progression in the SOD1-G93A mouse model of ALS
Muramatsu R, Yamashita T. Primary culture of cortical neurons. Bio Protoc. 2013;3: https://doi.org/10.21769/BioProtoc.496
Culturing pyramidal neurons from the early postnatal mouse hippocampus and cortex
Download references
This work was supported by Fondazione Arisla ETS (Project Spliceals to M.C.
and European Union—Next Generation EU and founded by the Ministry of University and Research (MUR)
National Recovery and Resilience Plan (PNRR)
project MNESYS (PE0000006)—A Multiscale Integrated Approach to the Study of the Nervous System in Health and Disease (DN
are supported by European Union—Next Generation EU
within the PNRR project “Rome Technopole—Innovation Ecosystem”
receive fundings from the European Union—Next-GenerationEU—National Recovery and Resilience Plan (NRRP)—MISSION 4 COMPONENT 2
Dr Valeria Gerbino (Fondazione Santa Lucia
Italy) is gratefully acknowledged for providing help with ICV in vivo injection of SSOs
Joseph Rothnagel (School of Chemistry and Molecular Biosciences
Australia) for providing hnRNP A2/B1 isoform specific antibodies
PhD Program in Cellular and Molecular Biology
Institute of Biology and Molecular Pathology
and interpreted most of the molecular and cell biology experiments
and interpreted most of the mouse biology experiments
IDV helped with mouse experiments and ICV injection of SSO
SB helped with cell biology experiments and with the design and in vitro testing of SSO
MA helped with the molecular analysis of SSO effects
EDA aided with the maintenance of cell cultures and sample generation
MDS aided with sample preparation and analysis
AS helped with confocal microscopy analysis
supervised and interpreted the experiments
All animal procedures were performed according to the European Guidelines for the use of animals in research (2010/63/EU) and the requirements of Italian laws (D.L
The ethical procedure was approved by the Italian Ministry of Health (protocol number 383/2022 PR/G)
Download citation
DOI: https://doi.org/10.1038/s41419-025-07538-8
Metrics details
We present SpliceTransformer (SpTransformer)
a deep-learning framework that predicts tissue-specific RNA splicing alterations linked to human diseases based on genomic sequence
SpTransformer outperforms all previous methods on splicing prediction
Application to approximately 1.3 million genetic variants in the ClinVar database reveals that splicing alterations account for 60% of intronic and synonymous pathogenic mutations
and occur at different frequencies across tissue types
tissue-specific splicing alterations match their clinical manifestations independent of gene expression variation
We validate the enrichment in three brain disease datasets involving over 164,000 individuals
we identify single nucleotide variations that cause brain-specific splicing alterations
and find disease-associated genes harboring these single nucleotide variations with distinct expression patterns involved in diverse biological processes
SpTransformer analysis of whole exon sequencing data from blood samples of patients with diabetic nephropathy predicts kidney-specific RNA splicing alterations with 83% accuracy
demonstrating the potential to infer disease-causing tissue-specific splicing events
SpTransformer provides a powerful tool to guide biological and clinical interpretations of human diseases
recognizing variations in alternative splicing becomes an essential task for clinical diagnosis
For instance, aberrant splicing in CPEB4 has been reported to be highly associated with autism-like phenotype19
these alternative splicing events may not be detectable in clinically accessible tissues such as blood
accurate prediction of splice-altering mutations in a tissue-specific manner holds significant clinical importance for genetic diagnosis
most existing algorithms did not address the tissue-specificity of splicing into their model
a The SpTransformer model takes an only sequence as input and predicts tissue-specific splicing in 15 human tissues
The model can be used to evaluate genetic variants and predict tissue-specific splicing alterations
b Performance of 6 algorithms in splice site prediction task
Top-k accuracy is calculated by choosing a threshold to make predicted positive sites and actual splice sites have the same number
then computing the fraction of correctly predicted splice sites
PR-AUC is the area under the precision-recall curve
c Tissue-usage prediction of SpTransformer in comparison with other models
d The distribution of SpTransformer prediction score for tissue usages of splice sites in the test dataset
Tissue usages were grouped into low (<0.5) and high (≥0.5) by their original usage ratio across all samples in the same tissue types
was released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license
Tissue usage was not totally dominated by gene expression
b Impact of in silico mutation around intron in the GLA gene
SpTransformer considers sequence features both proximal and distal to the splice donor site
Mutagenesis weight was calculated by the decrease in the predicted strength of the splice site when that nucleotide is mutated
c Impact of in silico mutation around exons in the APBB2 gene
Several known RBP motifs were found in regions of large weight
d De novo motifs that influence the tissue-usage prediction of SpTransformer (left) and their presentations in different tissues (right)
we illustrate the sequence feature identified by SpTransformer using the GLA gene as an example
which encodes the enzyme alpha-galactosidase A
SpTransformer detected the “GT” sequence around the exon–intron junction
SpTransformer recognized regions with relatively high mutagenesis weight at 300 nt
demonstrating its capability to detect sequence features in distal intronic regions
There were also de novo motifs that were not similar to known RBP motifs
by learning the tissue-specific splicing events
SpTransformer was able to implicitly learn the joint contribution of expression and sequence context to the tissue-specific regulatory code
a SpTransformer is applied to evaluate the splicing effect of a single nucleotide variant by calculating an ΔSplice score and matching graphical representations
b Examples of two pathogenic mutations in the ClinVar database
SpTransformer successfully predicted splicing changes even far from variants (right panel)
Both cases were validated by RT-PCR in previous studies c The distribution of mutations classified by clinical significance within several intervals of ΔSplice scores
the ratio of pathogenic mutations becomes larger
d Distributions of ΔSplice scores of all SNVs
grouped by both pathogenicity in ClinVar database and annotated variant type
The number of SNVs and the proportion of SNVs above/below the cutoff were annotated
The bar chart on the left aggregates the data by rows
while the bar chart at the top tabulates the data by columns
SNVs with alternative pathogenicity annotations (e.g.
“conflicting interpretations”) were excluded from the analysis
Identifying pathogenic variants and interpreting variants of uncertain significance (VUS) in noncoding regions and synonymous mutations has been a long-standing challenge in the field
Our analysis unveils a significant contribution of splicing alterations in intronic and synonymous pathogenic mutations
underscoring the value of applying SpTransformer in regions beyond splicing sites for diagnosis and interpretation of candidate pathogenic mutations or VUS
a The strategy to derive tissue specificity variants from model prediction
We created a reference set of common splicing sites to derive background distribution and calculate tissue-specific z-scores for new variants in order to make fair comparisons across tissues
and gene enrichment is calculated based on tissue-specific splice-altering SNVs
b Top five genes enriched for tissue-specific splice-altering SNVs for each of the 15 tissues as predicted by SpTransformer
The size of the bubbles represents the number of SNVs in each gene
and the color of the bubbles represents the significant level of enrichment
one-sided hypergeometric test was used for statistics
We manually examined genes associated with tissue-specific phenotypes from the HPO database and marked by a black rectangle box
c Expression pattern of top 3 genes in enrichment result of each tissue
d Proportion of pathogenic SNVs predicted as tissue-specific splice altering in different tissues
Only genes that have a p-value < 0.05 in enrichment were included
The box extends from the first quartile to the third quartile of the data
The dashed line represents the median proportions of SNVs in each tissue
e Number of tissue-specific splice-altering SNVs grouped by pathogenic classifications on TTN gene in different tissues
f Genome coordinate and Tissue z-score of SNVs on a sub-region of TTN gene
It is worth noting that both Blood and Skin
which are considered the most clinically accessible
displayed lower proportions of tissue-specific splice alterations compared to the median average across all tissues
This observation suggests that Blood and Skin may not be suitable alternatives for estimating splicing events in other tissues
The identification of numerous tissue-specific splicing alterations in the heart further supports the capabilities of the SpTransformer algorithm
Together, these results suggest that SpTransformer has the capability to discern sequence features unique to tissue-specific isoforms of genes associated with disease clinical manifestations. Moreover, the SpTransformer annotation provided mechanistic insights for numerous VUS labeled in ClinVar, specifically regarding tissue-specific splicing alterations (Fig. 4e)
suggesting SpTransformer as a powerful tool to be used for genetic diagnosis and VUS interpretation purposes
a Statistical data for the three analyzed databases
b Splicing effect prediction for different variant types in the three brain disorder datasets: ASC
c Enrichment of tissue-specific splicing alterations in ASD
A two-sided z-test for two groups was performed
The dashed line represents threshold powers for p = 0.05
d Number of tissues showing expression for genes filtered by brain-specific splicing altering SNVs in the case group
e Enriched GO term for genes in (d) that are expressed only in brain tissue (left) and those expressed in 11–15 tissues (right)
f Network view of enriched biological processes of genes carrying brain-specific splice-altering SNVs from case group in three brain disorders
g Detailed visualization of genes enriched in GO pathway GO:0007610 “Behavior” in three brain disorders
This analysis underscores the contribution of brain-specific splicing alteration of cytoskeleton-related genes to multiple brain disorders
we believe that in addition to considering only gene expression
tissue-specific splicing is also crucial in clinical diagnosis
Although not all those genes were investigated
our analysis did find evidence of associations between ASD and genes out of overlap
These findings underscore the importance of incorporating tissue-specific splicing patterns into the investigation of ASD genetics in order to better understand the missing inherence of this disorder
the findings of SpTransformer underscore the importance of investigating brain-specific splicing dysregulation as a disorder-causing mechanism for brain disorders
which holds great promise for advancing our understanding of these conditions and developing targeted therapies
a Overview of DN patients involved and samples collected for SpTransformer prediction and RNA-seq-based validation
b Flow chart showing the filtering steps of kidney-specific splicing variants for variants called directly from WES data
d Examples of heterozygous variants predicted as kidney-specifically splice altering validated by matched renal tubule RNA-seq
SpTransformer prediction on WES identified variants (upper) and sashimi plot of matched RNA-seq data (lower) for CLCNKA (c) and BTN3A2 (d) gene
e Top ten GO terms enriched from genes harboring kidney-specific splicing SNVs
f Top ten terms enriched in the DisGeNet database from genes harboring kidney-specific splicing SNVs
Our findings in this pathway suggested that aberrant splicing may represent a potential mechanism underlying the abnormalities in AA metabolism in DN
these results support the reliability of SpTransformer prediction and enable us to explore DN candidate pathogenic variants from the perspective of splicing alterations
These findings are concordant with the known pathology of DN and highlight key genes harboring kidney-specific aberrant splicing that may contribute to renal dysfunction in DN
the application of SpTransformer helps effectively prioritize disease-associated mutations and sheds light on unresolved disease mechanisms
Predicting RNA splicing directly from sequence data has been a long-standing challenge in the field
we have developed a novel computational framework
utilizing an attention-based deep-learning neural network
SpTransformer stands out as the pioneering method to employ a transformer model for predicting RNA splicing with tissue specificity
This transformer architecture benefits SpTransformer from large-scale
SpTransformer emphasizes the tissue specificity of these events
an aspect often overlooked by most existing splicing prediction methods
This unique feature enables a more comprehensive understanding of the splicing landscape across different tissue types
SpTransformer has been successfully applied to mutation databases and disease-specific datasets
identifying tissue-specific splicing alterations and their associated disease manifestations
Splice-altering mutations make up an essential class of known disease-causing mutations
and accurate prediction methods are crucial for interpreting VUS and pathogenic variants in clinical diagnostic tasks
While ideal scenarios would include RNA-seq profiles of diseased tissue together with the genotyping data
practically most disease-manifested tissues are not accessible or easily accessible
we identified genes enriched in mutations that may alter splicing in various tissue types
which provides stronger supporting evidence for clinical interpretation of VUS pathogenicity in the relevant tissues
there is an increasing need for accurate pathogenicity prediction of mutations in the noncoding regions
Through extensive analysis of the ClinVar database
we identified a significant proportion of intronic pathogenic/like pathogenic mutations that may affect splicing
may provide valuable information for pathogenicity prediction and interpretation of variants in the noncoding region
The success of SpTransformer in achieving tissue specificity is attributed to the application of NLP models and large datasets
Although previous studies have demonstrated the effectiveness of deep convolutional networks in this domain
the convolution and transformer architectures we designed have several advantages
the tissue-specific splicing is presumably achieved through CREs as represented by sequence motifs
The attention mechanism in the transformer can help capture such CREs much more effectively
transformers have demonstrated advantages in capturing distal information and can perform better at capturing CREs located far away
we utilize RNA-seq data from four distinct mammalian species
This approach enables our model to discern similarities and homologies of splicing sites across species
we feed the GTEx data and evolutionary data into two convolution encoders before the transformer to extract different layers of splicing information
which supports the model with more comprehensive but structured input
has exhibited a clear advantage over other state-of-the-art methods in tissue-specific splicing prediction on the GTEx dataset
SpTransformer is the first application of the transformer deep learning architecture that achieves remarkable performance in tissue-specific splicing prediction
The limitations of the model come in several aspects
there is room for increasing the training data and including more rare splicing events
our splicing annotation is based on annotation from GTEx common splicing events
which does not account for splicing variability at the individual level
the model’s inclusion of tissues is still limited
While the SpTransformer model handles approximately 15 different tissue types with high accuracy
its performance decreases as the number of multitask events in the model increases
a combined result of the splicing events of all single cells in the tissue
experimental technologies that measure splicing at the single-cell level are currently limited
we envision the collection of more individualized and cell-specific splicing events can potentially enhance the deep learning model and enable more precise splicing predictions in the future
The input for the model is pre-mRNA sequences
These encoded nucleotides are combined to form a 4 × N matrix
where N stands for the length of the sequence
[0,0,0,0] are used for padding sequences with insufficient length or to present “unknown” nucleotides in unclear regions
The model utilizes this input matrix to capture sequence features
we denote the length of the input sequence as N = Ncontext + Ntarget + Ncontext
where Ntarget represents the length of the target region that we aim to predict
and Ncontext represents the length of the flanking sequence to each side of the target region
each nucleotide in the target region is assigned a splice label set [SN
SD] and a numerical tissue usage label set [S1
SN represent the possibility that a position is an “acceptor”
indicate the possibility that a position is used as a splice site in a certain tissue
the model produces a matrix with the shape of (3 + t) × Ntarget as its output
SpTransformer utilized the sequence context to predict splicing sites and their corresponding usage in 15 tissues for the central 1000 nt sequence
providing 17,382 samples from 53 tissues and two cell lines
To obtain meaningful splice sites and the corresponding tissue usage ratio
we processed the exon-exon junction read counts file from the dataset for 15 representative tissue types
sequences around splice junctions in the GRCh38 reference genome were extracted
The base preceding and following each splice junction (i.e.
the 5’ and 3’ ends of exons) were defined as splice sites
Only samples from the 15 selected target tissues were considered
A splice site position was labeled as “acceptor” or “donor” label if it was supported by any sample and had no conflict
The splice site at the exon start site was labeled as “acceptor”
and the splice site at the end site was labeled as “donor”
All other positions were labeled as “neither”
the tissue usage label was calculated for each splice site
representing the proportion of samples belonging to the tissue that contained corresponding splice junctions
The SpTransformer code frameworks also support other combinations of tissue types
any splice site with a maximum usage label of less than 0.05 across all tissue classes was excluded and re-labeled as “neither” class
The independent RNA-seq dataset underwent similar processing steps
We utilized mammalian organ transcriptomes
the genes that show orthology or paralogy to human genes in the test dataset were excluded
A splice site was identified if it was in the gene body and supported by at least one split read in each of at least two different samples
The dataset was partitioned following the same strategy as the GTEx dataset
The part of the training data was considered an extension of the training dataset
while the test data segment remained unused
We excluded gene sequences that have paralogs from the testing dataset
Despite splitting the two datasets independently
we made sure that there was no overlap between the training and testing data after the steps to split data by chromosomes and paralogs
the pre-mRNA sequences of each gene were extracted
the extracted sequence began from the most upstream site observed across all transcripts and ended at the most downstream site observed across all transcripts
each sequence was divided into blocks of length 1000 nt
Blocks that did not contain any splice sites were discarded
the flanking sequence with 4000 nt + 4000 nt and the corresponding 1000 nt label was packaged as a single training (testing) data entry
The architecture of our model is shown in Supplementary Fig. 1
The input is an RNA sequence of length N = Ncontext + Ntarget + Ncontext
where Ntarget denotes the length of the target sequence
and Ncontext represents the sequence context of the target region (Ncontext = 4000 in our pipeline)
The parameter L dictates the number of channels in each convolution layer
with L1 = 192 and L2 = 64 used in this study
The convolution layer in ResBlocks is characterized by parameters L
Following the calculation of the encoder module
and a truncation operation are applied to ensure that the input to the attention module does not exceed a length of 8192
The Sinkhorn Transformer module has 256 channels
8 attention heads (including two local attention heads) per layer
The final output is a (N × 3) shaped matrix and a (N × 15) shaped matrix
representing splice site prediction and tissue usage prediction
2) The dimension of encoder layers was gained by grid-search in {32
Multiple hyperparameters of the transformer module were also tried
batch size = 12 and learning rate = 0.001 were used in order to keep consistent with SpliceAI
3) Other parameters was selected from: batch size = {6
Those options were established in reference to previous publications
the combination with the best performance on the validation dataset was subsequently used
Further details of input and output have been provided in the “Data representation” section
Different measurement was applied to the output scores
we applied a Softmax activation function to produce probability prediction of “Acceptor”
we applied a sigmoid activation for each tissue type
The whole network was then trained on the GTEx training dataset to get the final model
This approach enabled SpTransformer to learn from multiple datasets with similar biological meanings but different data formats
minimizing the need for extensive coding or conversion when differently sourced data was received
Despite both datasets being under the sequence model
the diverse splicing representations and distinct data content encourage the deep model to comprehend latent sequence features from various aspects
akin to a visual model examining a human face in multiple ways
The strategy improved the model’s performance compared to only using one dataset
demonstrating the potential to integrate diverse bioinformatics data for a single task
Special loss functions are used in the backpropagation of deep learning
Each sequence in the training dataset is a contiguous nucleotide sequence of length n
The i-th position has a splicing label Ai and a tissue-usage label Bti for T different tissues
the model outputs si for splice site prediction
and outputs uti for tissue-usage prediction
we compute the categorical cross-entropy loss
as it is a multi-class classification task
which is a multi-label classification task (meaning one sample can belong to multiple classes)
we calculate the Binary Cross-Entropy loss
We apply mean reduction for the two loss functions
The loss function above is sufficient for the encoders to learn basic sequence patterns
as the number of supported tissues increases
models struggle to learn the features of different tissues in a balanced manner during the training process
occasionally demonstrating superior predictive performance for specific tissues only
there is a relative scarcity of samples with strong tissue specificity compared to those with weak tissue specificity
This imbalance leads to models tending to produce similar tissue usage scores for splice sites
in order to persuade the transformer module to overcome these difficulties:
We use this method to encourage the model to balance the performance on multiple tissues and pay attention to those tissues that harder to classify
a weight wi was multiplied to encourage the model to pay more attention to splice sites with stronger tissue specificity
wi related with the variance of tissue-usage labels of i-th position
The model was trained for 12 epochs in each stage
Adam optimizer was used to minimize the combined loss
and was multiplied by 0.7 after every epoch
We evaluated the performance of SpTransformer on two tasks using the compiled test dataset: 1) splice site prediction in long sequences: the model took each pre-mRNA sequence as input and identified every splice acceptor and donor within a target region of 1000 nt
Given that most positions in the sequences are not splice sites
we computed the top-k accuracy and the area under the precision-recall curve (AU-PRC) for splice site prediction
The top-k accuracy was defined as follows: if a sequence has k positive positions that truly belong to the class
a threshold is selected so that exactly k positions are predicted to be positive
The fraction of these k predicted positions that truly belong to the class is reported as the top-k accuracy
We calculated the top-k accuracy and AU-PRC value for the acceptor and donor classes separately
and reported the average performance of the two classes
2) Tissue usage level prediction: the model was tasked with predicting the usage level in each of the 15 tissue classes for each position of the sequence
Given the absence of a widely accepted “tissue usage” protocol
we divided all splice sites in the test dataset based on their tissue usage label
Since most tissue usage labels were close to (or equal to) 0 or 1
we defined a tissue usage label greater than 0.5 as “high usage”
while the remaining sites were classified as “low usage.” The usage was set to 0 if a position was not a splice site
The model was then tasked to classify the usage of each position in the test dataset
positions that did not pass the top-k threshold in task 1 were forcibly masked as negative in the prediction result
We calculated AU-PRC for each tissue class
including an ablation test to highlight the advantages of our method
we prepared three different versions of the SpTransformer model: SpTransformer-noextra
which was trained only on the GTEx training dataset; SpTransformer-extra1
which used an extra training dataset of two species
which used the full training dataset of four mammalian species
All versions were trained with the same configuration
we applied the published version of SpliceAI to the tasks for comparison
The key difference was a specific alteration: the output channel number of the final convolution layer was adjusted from 3 to 18
This modification enabled the model to predict tissue usage at each splice site
The modification was a permissible adjustment within the conventional design framework of CNN
This network was trained on the GTEx dataset with the same hyperparameter and loss functions as the first stage of SpTransformer
We also retrained SpliceAI on our training dataset using the same hyperparameters as the original version
We then included the published version of Pangolin
Pangolin supports the prediction of four tissues (brain
and testis) and does not distinguish between acceptor and donor
The maximum splice scores of them were used in task 1
The tissue scores of them were used in task 2
It is worth noting that SpliceAI-modified has a remarkably similar structure to Pangolin
despite the differences in their last output layers
SpliceAI-modified predicts splicing effects across 15 tissues using a single model
whereas Pangolin utilizes four distinct models
we adapted the task for earlier methods that were not entirely compatible with our tasks
and MaxEntScan were designed for classifying a single position with short flanking sequences
We modified the input format to enable them to predict each position of the long sequences individually
the lengths of input sequence were carefully selected based on the recommendations provided by their respective publications
we excluded the “Acceptor” class when evaluating HAL
The “MMSplice_MTSplice” tool is a combined model where “MMSplice” predicts splice sites and “MTSplice” predicts tissue-specific usage of those splice sites
we evaluated MMSplice in two different scenarios: the full dataset
as well as a simpler task where it was restricted to predicting positions within a 20 nucleotide range of each splice site
The performance on the simpler task was marked as “MMSplice-short”
MTSplice was able to predict usage scores in 56 detailed tissue types
We took the maximum score of corresponding types as the prediction of 15 classes in our dataset
Since “MMSplice-short” exhibited an advantage against “MMSplice”
MTSplice was also tested on the restricted task
To identify important regions within the input sequences
we performed a procedure referred to as “in silico mutagenesis”
The “mutagenesis weight” of a nucleotide with respect to a splice site is defined as follows: Let sref denote the splice (or tissue usage) score of the target splice site
The score is recalculated by replacing the nucleotide under consideration with A
The mutagenesis weight of the nucleotide is estimated as:
Multiple associating motifs were enriched for each tissue through the outlined methodology
the motifs were treated as the same term if their IUPAC codes (given by XSTREME) have a Levenshtein Distance not greater than 1
In order to quantify the splicing alterations caused by SNVs
we calculated the difference in scores between the original sequences and the alternated sequences
we predicted the 2R + 1 length of the reference sequence surrounding the mutation (R represents the length of flanking nucleotides on each side
and was set to 100 by default in our analysis) using SpTransformer
We then used the alternative sequence for prediction
resulting in prediction scores for each position of the sequence
Regardless of the “not a splice site” class
scores for each class were represented by vectors in the shape of (2R + 1) × 1
for two splice site types and 15 tissue types
Δscore for other tissues was calculated in the same way
where abs() refers to the function to calculate the absolute value for each position
and max() is the function to find the max value in the vector
We defined ΔSplice as the max value between ΔAcceptor and ΔDonor to quantify the effect of splice alteration caused by a variant:
was processed to quantify the change in tissue usage
We created a reference mutation set with GTEx SNVs data
Upon checking the GTEx Genotype calls vcf file
we identified a total of 734,509,842 variants
We selected SNVs that met the following conditions: 1) For each tissue
there should be at least one available RNA-seq data from an individual carrying the SNV
there should be at least one available RNA-seq data from an individual not carrying the SNV
3) Within a 100nt range of the SNV location
should be observable across all tissues when comparing RNA-seq data between individuals with and without the SNV
The “low” and “high” were under the same definition as those used in the training dataset
we filtered out 27,843 mutations that cause splice alterations in all tissue classes
These mutations were expected to have minimal impact on tissue specificity
and their Δscore was used to build a reference distribution hereafter
SpTransformer predicted the ΔSplice score and tissue Δscores for each mutation
A tissue z-score was then calculated based on the following formula
and Zi representing the adipose tissue z-score for the i-th mutation
The distribution of the tissue z-score for each tissue was defined as the reference distribution mentioned above (Supplementary Fig. 7)
we similarly calculated its z-score using previously calculated μtissue and σtissue
we consider any real SNVs with a tissue z-score greater than X% of the reference distribution to be tissue-specific
We classified gene expression into “Low” (0–1 NAUC)
and “High” (over 20 NAUC) according to the recommended standard
This classification was used during analysis to exclude genes with “Low” expression in the tissues of interest from our investigation
All the gene expression values presented in the figures were based on the NAUC values
The SNVs were categorized based on their consequence annotations (e.g.
etc.) and clinical significance labels (e.g.
SNVs with ambiguous labels such as “Conflicting interpretations of pathogenicity” or “Likely risk allele” were excluded
We further utilized SNVs from the ClinVar dataset to establish a score threshold for SpTransformer (Supplementary Fig. 5)
The “Strict” panel presents the performance of our SpTransformer model in distinguishing between pathogenic and benign mutations
while the “Soft” panel presents the performance in distinguishing between pathogenic/likely pathogenic/uncertain mutations and benign/likely benign mutations
For each gene enriched with splicing alterations SNVs
we examined HPO terms for tissue-related phenotypes in the corresponding Mendelian disorders
we employed this test to examine whether there exists a larger proportion of brain-specific splicing variants among case SNVs that are related to splicing
Group A consisted of all SNVs with OR > 3.5 and ΔSplice ≥ 0.27
while Group B included all SNVs with OR ≤ 3.5 and ΔSplice ≥ 0.27
Genes were classified as “previously reported” if any publication in PubMed explicitly stated in the abstract
or discussion that the gene is associated with the disorders
f) was calculated by Metascape based on the hypergeometric test and Benjamini–Hochberg p-value correction algorithm
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
Splicing regulation: from a parts list of regulatory elements to an integrated splicing code
Genomic variants in exons and introns: identifying the splicing spoilers
Hutchinson–Gilford progeria syndrome: a premature aging disease
Maximum entropy modeling of short sequence motifs with applications to rna splicing signals
Learning the sequence determinants of alternative splicing from millions of random sequences
Mmsplice: modular modeling improves the predictions of genetic variant effects on splicing
Predicting rna splicing from dna sequence using pangolin
Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction
Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance
A deep intronic pkhd1 variant identified by spliceAI in a deceased neonate with autosomal recessive polycystic kidney disease
Clinical implementation of rna sequencing for mendelian disease diagnostics
Alternative splicing and related RNA binding proteins in human health and disease
Neuron-specific alternative splicing of transcriptional machineries: implications for neurodevelopmental disorders
Transcriptome-wide isoform-level dysregulation in ASD
Autism-like phenotype and risk gene mrna deadenylation by cpeb4 mis-splicing
Rna in situ conformation sequencing reveals novel long-range rna structures with impact on splicing
The human splicing code reveals new insights into the genetic determinants of disease
The GTEx consortium atlas of genetic regulatory effects across human tissues
Gene expression across mammalian organ development
Tissue-specific regulatory elements in mammalian promoters
A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing
The role of rna splicing factor ptbp1 in neuronal development
Analysis of APBB2 gene polymorphisms in sporadic Alzheimer’s disease
Grant, C. E. & Bailey, T. L. Xstreme: comprehensive motif analysis of biological sequence datasets. Preprint at https://doi.org/10.1101/2021.09.02.458722 (2021)
Attract—a database of rna-binding proteins and associated motifs
Characterization of germline tp53 splicing mutations and their genetic and functional analysis
Congenital afibrinogenemia: first identification of splicing mutations in the fibrinogen bbeta-chain gene causing activation of cryptic splice sites
Purification and properties of native titin
Identification of a novel mutation in the titin gene in a chinese family with limb-girdle muscular dystrophy 2j
Homozygous missense variant in the ttn gene causing autosomal recessive limb-girdle muscular dystrophy type 10
Tibial muscular dystrophy is a titinopathy caused by mutations in ttn
the gene encoding the giant skeletal-muscle protein titin
Truncating mutations in C-terminal titin may cause more severe tibial muscular dystrophy (tmd)
Titin founder mutation is a common cause of myofibrillar myopathy with early respiratory failure
C-terminal titin deletions cause a novel early-onset myopathy with fatal cardiomyopathy
Genetic profile and clinical characteristics of Brugada syndrome in the Chinese population
Cognitive impairment in duchenne muscular dystrophy
Duchenne and becker muscular dystrophies: a review of animal models
Networking to optimize dmd exon 53 skipping in the brain of mdx52 mouse model
Modulation of neurofibromatosis type 1 (nf1) gene expression during in vitro myoblast differentiation
Mast cells and the neurofibroma microenvironment
Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism
Rare coding variants in ten genes confer substantial risk for schizophrenia
Exome sequencing in bipolar disorder identifies akap11 as a risk gene shared with schizophrenia
Dctn1-related neurodegeneration: Perry syndrome and beyond
Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders
Epigenetic dysregulation of SHANK3 in brain tissues from individuals with autism spectrum disorders
Rare coding variation provides insight into the genetic architecture and phenotypic context of autism
Functional and structural analysis of CLC-K chloride channels involved in renal disease
Overt nephrogenic diabetes insipidus in mice lacking the CLC-K1 chloride channel
Exploring genes for immunoglobulin A nephropathy: a summary data-based mendelian randomization and fuma analysis
Arachidonic acid metabolism and kidney inflammation
Arachidonic acid in health and disease with focus on hypertension and diabetes mellitus
Quantifying splice-site usage: a simple yet powerful approach to analyze splicing
Deep residual learning for image recognition
In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Scapture: a deep learning-embedded pipeline that captures polyadenylation information from 3’ tag-based rna-seq of single cells
Multinet++: multi-stream feature aggregation and geometric loss strategy for multi-task learning
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Ascot identifies key regulators of neuronal subtype-specific splicing
A program for annotating and predicting the effects of single nucleotide polymorphisms
SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3
Metascape provides a biologist-oriented resource for the analysis of systems-level datasets
ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization
The human phenotype ontology in 2024: phenotypes around the world
You, N. et al. Splicetransformer predicts tissue-specific splicing linked to human diseases. Splicetransformer v1.0.0. https://doi.org/10.5281/zenodo.13824839 (2024)
Download references
Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory
National Clinical Research Center for Kidney Diseases
Research Institute of Intelligent Complex Systems
drafted the original manuscript and revised the manuscript
performed data analysis and drafted the original manuscript; H.J
collected clinical samples and performed experiments to generate data regarding DN patients; J.S
supervised the work and contributed to the manuscript; N.S
and revised the manuscript with inputs from S.P.
All authors read and approved the manuscript
The authors have submitted a patent application for the method
the authors declare that they do not have any competing interests
Nature Communications thanks Dadi Gao and the other
Download citation
DOI: https://doi.org/10.1038/s41467-024-53088-6
The dates displayed for an article provide information on when various publication milestones were reached at the journal that has published the article
activities on preceding journals at which the article was previously under consideration are not shown (for instance submission
All content on this site: Copyright © 2025 Elsevier B.V.
Metrics details
Individuals with heritable thoracic aortic disease (HTAD) face a high risk of deadly aortic dissections
but genetic testing identifies causative variants in only a minority of cases
We explored the contribution of non-canonical splice variants (NCVAS) to thoracic aortic disease (TAD) using SpliceAI and sequencing data from diverse cohorts
including 551 early-onset sporadic dissection cases and 437 HTAD probands with exome sequencing
57 HTAD pedigrees with whole genome sequencing
and select sporadic cases with clinical panel testing
NCVAS were identified in syndromic HTAD genes such as FBN1
including intronic variants in FBN1 in two Marfan syndrome (MFS) families
Validation in the Penn Medicine BioBank and UK Biobank showed enrichment of NCVAS in HTAD-associated genes among dissections
These findings suggest NCVAS are an underrecognized contributor to TAD
particularly in sporadic dissection and unsolved MFS cases
highlighting the potential of advanced splice prediction tools in genetic diagnostics
to our cohorts of unsolved thoracic aortic disease (TAD) cases to assess the role of NCVAS
DNA samples from affected individuals and relevant family members were collected after obtaining informed consent and human subject research approval
The UTHealth HTAD cohort includes families with two or more members affected by thoracic aortic disease
as well as trios of probands with aneurysm surgery or dissection at ≤ 40 years of age
with unaffected parents confirmed by imaging
The ESTAD cohort focuses on sporadic dissection cases in individuals ≤ 60 years of age without syndromic features or a family history
Genetic testing reports from patients with early onset sporadic aortic dissection undergoing clinical panel testing were obtained and reviewed
Exome sequencing (ES) was performed on the full HTAD and ESTAD cohorts
with select cases from unsolved HTAD pedigrees also undergoing whole genome sequencing (WGS)
Dissection cases were identified in individuals of European ancestry with an aortic dissection International Classification of Diseases
10th Revision (ICD10) diagnosis or cause of death code (I71.0) or surgical code for aortic dissection (L27.4
resulting in 467 cases available for analysis
individuals of European ancestry with a thoracic aortic aneurysm (TAA) were identified using the ICD10 code for thoracic aortic aneurysm
After excluding individuals with dissections as previously defined
a total of 1084 TAA cases remained for further analysis
A subset of 263 TAA patients requiring surgery was identified using surgical codes for open repair of the thoracic aorta or aortic root
the remaining 447,570 individuals of European ancestry without any ICD10 codes for aortic disease (I71) or congenital malformations
deformations and chromosomal abnormalities (Q00-Q99) were included for comparison
Thoracic aortic dissection was defined as having an encounter with an ICD10 diagnosis code of I71.01 or I71.03
or International Classification of Diseases
Ninth Revision (ICD9) codes 441.01 or 441.03
TAA was defined as having an encounter with an ICD10 diagnosis code of I71.1
Sanger sequencing of UTHealth probands and any available affected family members was done to confirm variants identified through ES or WGS
and further analyzed with custom Python scripts
single nucleotide variants (SNVs) were filtered for a read depth \(\ge\) 7 and were retained if they either had one or more heterozygous variant genotype with an allele balance ratio \(\ge\) 0.15
Dermal fibroblasts from a normal control and the individual with the FBN1 c.2294-3 C > A variant were grown
One of two culture plates was incubated in the presence of cycloheximide (100 μg/ml
Sigma-Aldrich) for 6 hours before extraction of total RNA from both plates with the RNeasy Mini kit (Qiagen)
Complementary DNA (cDNA) was synthesized with random hexamers and SuperScript™ III reverse transcriptase (Invitrogen)
The FBN1 region of interest was amplified by PCR with a sense primer in exon 17 (5’- GAATGACGTCAGCAGGCAGT) and an antisense primer in exon 21 (5’- GGAGCAGCACTGGGACTTTA)
The products were separated on 7% polyacrylamide gel
and visualized using the “Carestream 212PRO” camera
The normal and all abnormal products were excised from the gel
The DNA was retrieved by submersion of the gel slices
in 100 μl of sterile water at room temperature overnight
and 1 μl of each was reamplified using the same primers in exons 17 and 21
The amplicons were sequenced with BigDye™ Terminator v3.1 and capillary electrophoresis on the ABI 3500 Genetic Analyzer
and the data were analyzed with the Chromas software
Dermal fibroblasts from a patient with the FBN1 c.7820-3 C > A splicing variant and a gender and age-matched healthy control were grown in DMEM/High Glucose media (Hyclone) plus 10% FBS (Sigma) and antibiotic antimycotic solution (Sigma) in a 37°C
patient and control cells were treated for 8 hours with either cycloheximide (100 µg/ml
Total RNA from each plate was prepared with Trizol reagent (Thermo Fisher)
cDNAs were generated using SuperScript IV VILO (Thermo Fisher)
PCR products crossing the putative mutation site were amplified with primers E61-F (5’-CAGACCGGCTCCAGCTGTGAAGA-3’) and E65-R (5’-CATTGGCTTCTGTCTCAGACTG-3’) with KAPA HIFI PCR kit (Roche)
The PCR products were Sanger sequenced with primers E61-Fa (5’-CCAGCTGTGAAGACGTGGAC-3’) and E64-Ra (5’-CAAGCCTCTGGGGAGAGTGA-3’)
a Family members carrying the FBN1 variant are marked with a ‘+’
including the proband’s brother with a systemic score (SS) of 8
Individuals with normal aortic imaging are marked with a ‘*’
b Gel electrophoresis image of RT-PCR from skin fibroblasts of proband’s brother
Band 1 (blue) represents a small amount of two abnormal splice products generated by the rare use of two cryptic acceptor sites in the exon
Bands 2-5 (blue) are two heteroduplex pairs formed between normal and abnormal products
Ao: Aortic diameter at the sinuses of Valsalva; Z-score: normalized aortic diameter
an inhibitor of nonsense-mediated decay (NMD)
ES exome sequencing; WGS whole genome sequencing; ESTAD sporadic aortic dissection cohort ( < 60 years of age without family history or syndromic features); HTAD Families with multiple members affected by heritable thoracic aortic disease; TAA thoracic aortic aneurysm; HI haploinsufficiency; MAF minor allele frequency; LB/B likely benign or benign; NS not significant (p > 0.05) *number of HTAD pedigrees
Pedigree 1 (a) had extensive aortic disease and multiple individuals with a clinical diagnosis of MFS but negative genetic testing
predicted to activate a cryptic splice site within intron 56
Pedigree 2 (b) was similarly affected with aortic disease and clinical MFS diagnoses but negative molecular testing
WGS of the proband revealed a novel FBN1 variant
predicted to cause intron retention and extension of exon 11 of the mRNA transcript
aortic diameters at the sinuses of Valsalva (Ao)
and normalized aortic diameter Z-scores are shown for individuals who underwent clinical assessment
MFS – Marfan syndrome; WGS – whole genome sequencing
A possible contributor to the varied phenotype associated with NCVAS is the efficiency of aberrant splicing
which may lead to varied levels of the wild-type transcript in different tissues
The effect of the variant also depends on whether a new donor or acceptor site is created and how many nucleotides are inserted or deleted
the translational reading frame is preserved
If the insertion or deletion of nucleotides is not a multiple of three
leading to an unstable mRNA molecule that may be degraded via NMD
While bioinformatic computational tools can predict such events
only RNA-splicing assays can functionally validate the impact and extent of splicing changes
the FBN1 c.6872-1003 C > T variant identified on WGS in two brothers with MFS had a lower score (SpliceAI = 0.39) than the c.2294-3 C > A variant (SpliceAI = 0.45) found in a sporadic dissection case and family members without any aortic enlargement
additional genetic and non-genetic factors
such as dissection-specific polygenic risk due to common variants and hypertension
may also augment the likelihood of aortic dissection in carriers of NCVAS
The absence of these variants in TAA cases suggests that alternative mechanisms
may contribute to aneurysm formation in the general population rather than rare variants disrupting HTAD genes
these results indicate that WGS should be considered for individuals and families who meet the diagnostic criteria for MFS but a causative variant is not identified with clinical genetic testing
allowing more accurate assessments of aberrant splicing and its role in disease
multiple splice products are possible and should be considered due to the potential impact on pathogenicity and phenotype
variants outside the canonical ± 1,2 splice sites may be an underrecognized contributor to TAD
specifically for early-onset sporadic dissection cases and MFS patients meeting diagnostic criteria but with negative molecular testing
show promise in identifying such variants that have been excluded or not identified in bioinformatic analyses
Despite the observed overall enrichment of these variants in dissection cases
further investigation is required to predict the penetrance of disease in carriers
The UTHealth datasets are available in dbGaP Study Accession: phs000693.v7.p3
The PMBB dataset is not publicly available due to IRB restrictions requiring a collaboration with a Penn investigator to access PMBB data
The UKB data is available to researchers upon approval by an expert access committee
The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author
2022 ACC/AHA Guideline for the Diagnosis and Management of Aortic Disease: A Report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines
Marfan syndrome caused by a recurrent de novo missense mutation in the fibrillin gene
Update on the genetic risk for thoracic aortic aneurysms and acute aortic dissections: implications for clinical care
Role of Clinical Genetic Testing in the Management of Aortopathies
Use of genetics for personalized management of heritable thoracic aortic disease: how do we get there
Genes in thoracic aortic aneurysms/dissections - do they matter
Next-generation sequencing of 32 genes associated with hereditary aortopathies and related disorders of connective tissue in a cohort of 199 patients
Genetic diversity and pathogenic variants as possible predictors of severity in a French sample of nonsyndromic heritable thoracic aortic aneurysms and dissections (nshTAAD)
Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup
Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
Predicting Splicing from Primary Sequence with Deep Learning
Lord, J. et al. Predicting the impact of rare variants on RNA splicing in CAGI6. Hum. Genet. https://doi.org/10.1007/s00439-023-02624-3 (2024)
An FBN1 pseudoexon mutation in a patient with Marfan syndrome: confirmation of cryptic mutations leading to disease
An FBN1 deep intronic mutation in a familial case of Marfan syndrome: an explanation for genetically unsolved cases
Overcoming challenges associated with identifying FBN1 deep intronic variants through whole-genome sequencing
Guo, D.-C. et al. An FBN1 deep intronic variant is associated with pseudoexon formation and a variable Marfan phenotype in a five generation family. Clin. Genet. 103, 704–708, https://doi.org/10.1111/cge.14322 (2023)
Transcriptome-directed analysis for Mendelian disease diagnosis overcomes limitations of conventional genomic testing
Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases
UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age
The Penn Medicine BioBank: Towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population
MYLK pathogenic variants aortic disease presentation
and characterization of pathogenic missense variants
ClinVar: public archive of relationships among sequence variation and human phenotype
SciPy 1.0: fundamental algorithms for scientific computing in Python
SMAD4 rare variants in individuals and families with thoracic aortic aneurysms and dissections
Fast and accurate short read alignment with Burrows-Wheeler transform
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv 201178. https://doi.org/10.1101/201178 (2018)
ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data
seqr: A web-based analysis and collaboration tool for rare disease genomics
Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program
Inactivating variants in ANGPTL4 and risk of coronary artery disease
The importance of mutation detection in Marfan syndrome and Marfan-related disorders: report of 193 FBN1 mutations
Evaluating the quality of Marfan genotype-phenotype correlations in existing FBN1 databases
Alternative splicing of exon 37 of FBN1 deletes part of an “eight-cysteine” domain resulting in the Marfan syndrome
UMD (Universal mutation database): a generic software to build and analyze locus-specific databases
Detection of thirty novel FBN1 mutations in patients with Marfan syndrome or a related fibrillinopathy
Mutation spectrum of the fibrillin-1 (FBN1) gene in Taiwanese patients with Marfan syndrome
NGS panel analysis in 24 ectopia lentis patients; a clinically relevant test with a high diagnostic yield
Gene panel sequencing in heritable thoracic aortic disorders and related entities - results of comprehensive testing in a cohort of 264 patients
Spontaneous intracranial hypotension as first symptom of aneurysms-osteoarthritis syndrome: a case report
First genetic analysis of aneurysm genes in familial and sporadic abdominal aortic aneurysm
The revised Ghent nosology for the Marfan syndrome
Z-score for adults. Marfan Foundation https://marfan.org/dx/z-score-adults/ (2021)
Identification of the minimal combination of clinical features in probands for efficient mutation detection in the FBN1 gene
Clinical and genetic analysis of Korean patients with Marfan syndrome: possible ethnic differences in clinical manifestation
Predicting RNA splicing from DNA sequence using Pangolin
A novel heterozygous intronic FBN1 variant contributes to aberrant RNA splicing in marfan syndrome
Functional Analysis of an Intronic FBN1 Pathogenic Gene Variant in a Family With Marfan Syndrome
Fifteen novel FBN1 mutations causing Marfan syndrome detected by heteroduplex analysis of genomic amplicons
Download references
This work was supported by NHLBI R01HL109942 (D.M.M)
the Remebrin’ Benjamin and John Ritter Foundations (D.M.M) and funds from the American Heart Association 23POST1011251 (J.D.)
Sequencing and data analysis were provided by the University of Washington Center for Rare Disease Research (UW-CRDR)
with support from NHGRI grants U01 HG011744 and U24 HG011746
The PMBB is supported by the Perelman School of Medicine at the University of Pennsylvania
and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA award number UL1TR001878
University of Texas Health Science Center at Houston (UTHealth)
University of Pennsylvania Perelman School of Medicine
Department of Laboratory Medicine and Pathology
New York City Office of Chief Medical Examiner
Brotman-Baty Institute for Precision Medicine
wrote the manuscript with input from other co-authors
All authors read and approved the final manuscript
Download citation
DOI: https://doi.org/10.1038/s41525-025-00472-w
Music creation platform Splice has appointed Jeff Roberto as Senior Vice President of Marketing
Roberto will be responsible for overseeing global marketing initiatives and supporting the platform’s strategic growth
the company announced Thursday (December 5)
Roberto brings extensive experience to the role, having served as Chief Marketing Officer at Nodle and DistroKid
He also served as SVP Marketing at Picsart
where he contributed to the platform’s growth to 100 million monthly users and helped secure a $130 million funding raise in 2021 at a $1 billion+ valuation
Roberto also held roles at Shazam, Napster
He began his career at the University of Connecticut‘s college radio station
someone that can help us scale an already thriving business
who understands the music industry and most importantly
cares deeply about the creative communities we support,” said Kakul Srivastava
cares deeply about the creative communities we support.”
“Our AI Roadmap gives us a huge opportunity to build revolutionary creative experiences for musicians.”
“Music and creators are two of my greatest passions
and Splice is perfectly positioned at their intersection
“I’m excited to support music producers of all levels
I look forward to expanding the brand across new horizons as we continue to deliver innovative AI tools that enhance creative workflows.”
Roberto’s appointment came five months after Splice appointed music industry veteran Kenny Ochoa as Senior Vice President of Content
Splice provides a catalog that includes high-quality
The company also offers access to plugins and DAWs through a rent-to-own “Gear” marketplace
and craft music directly from their phones
merges the company’s sample library with its Create technology
In 2022, Spice introduced an AI-powered app called “CoSo” that uses AI to find sound samples from across the Splice catalog that work together
Splice was valued at nearly USD $500 million in 2021 after securing $55 million in funding, according to Bloomberg
Behind The Scenes28.04.25Sound decisions: Splice's work on Adolescence
Adolescence has become the third most-watched English-language Netflix show of all time
racking up over 130 million views and counting
crafted by Splice’s James Drake and Jules Woods
played a key role in shaping its emotional depth and immersive storytelling
the series continues to resonate with global audiences in an unprecedented way
Watch the video case study to see how sound helped define the world of Adolescence – and why it made all the difference
SUBSCRIBE TODAY
The new Splice Mic feature is available now on iOS
Splice has launched a new feature for its iOS app called Splice Mic
The new addition lets you record vocals straight into the app over existing Stacks — Splice’s name for a selection of loops from their catalogue that make up an idea
The vocals can then be saved along with your Stack
or you can use Splice’s built-in AI tools to analyse the recording and recommend more loops from Splice’s library that match your vocal
Once you’re happy with the Stack and vocals you’ve created
you can save them and export them to your DAW to continue working on the idea
Splice will maintain all pitch shifting and warping to ensure your samples remain compatible
“The phone is already a huge part of music making,” said Splice’s SVP of content
“Now songwriters and producers can record vocal ideas over Stacks of samples
and now those Stacks can be merged with vocals.”
Splice Mic is now available in beta on the free Splice Mobile app. Click here to download the app and watch the video below to hear it in action
Last year, Splice updated its mobile app with an AI interface. Following that, the sample giant added custom sample uploads to its AI-powered Create engine.
by Nilay Patel
FacebookThreadsIf you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement.
The head of the sample platform thinks creatives “deserve better” than AI tools that do all the work for them
Today, I’m talking with Kakul Srivastava, CEO of music creation platform Splice. I don’t think I need to really introduce Splice, actually — I just need to play this clip:
If you exist on planet Earth, you know that as the guitar loop from Sabrina Carpenter’s “Espresso,” which is an inescapable pop music phenomenon. You can check out the sample in full right here in the “Espresso” chorus
Listen to Decoder, a show hosted by The Verge’s Nilay Patel about big ideas — and other problems. Subscribe here
you know that some of my favorite conversations are with people building technology products for creatives and that I am obsessed with how technology changes the music industry
because it feels like whatever happens to music happens to everything else five years later
So this one was really interesting because Splice is all wrapped in all that — and some of its new products
might change how music is made all over again
Srivastava joined Splice as its CEO three years ago
so she has a lot of experience working at a company that makes tools for a creative user base that’s threatened by things like automation and AI
But if you’ve listened to any of our Adobe episodes
you know that the flip side of that is people actually using these tools at high rates
because they’re fun to play with and make some parts of the creative process easier
So I really wanted to dig into that with Srivastava
not only to understand where Splice stands
but also to see how the broader music industry can try and make sense of this technology and what it could do to music
I also wanted to talk about how the company navigates the incredibly complex minefield of copyright law and attribution on the internet — something that’s only getting more complicated with AI and the increasing number of copyright lawsuits filed against big AI companies
There’s a lot in this one — and Srivastava was willing to fall pretty deep down some of these rabbit holes with me
This interview has been edited for length and clarity
I have wanted to have this conversation forever
we ran into each other at the Code Conference last year and we just were off to the races talking about music and technology and AI
and I’m glad you’re finally here because so much has changed since then
But all of the issues are kind of still there and still working towards resolution
some of the core issues are still the core issues
at least one massive hit single has been made using loops from Splice
I think “Espresso.” There’s “Espresso” and there’s a lot of other ones
but a very large proportion of top music everywhere uses Splice
Let’s start with the very basics for people who maybe aren’t familiar with how music is made today or with Splice
Splice is a music creation platform that is used by music creators and musicians
we have this tagline “starts with sound,” so we do start with sound and we provide them with probably the world’s most diverse
I was looking at a report from our team that just came back from Brazil
We’re meeting artists on the ground, so we’re capturing the sounds of the world, and we make that available through our platform. We also provide AI-based creative tools that help you start with a sound, but make it your own. We have compositional AI. We just launched something brand new at SXSW called Splice Mic
start with a musical idea right in your phone
and we’ll help you compose around that by putting the right samples next to it
which is “here are the foundational pieces of making a song,” right
we’re going to have this library of audio,” and then there’s this turn
which I see a lot of companies that make creative software starting to make
Adobe I think is the paradigmatic example of this right now
You can just push generate to fill in Photoshop and it just does a bunch of stuff for you
You can prompt Photoshop now in various ways and it does stuff for you
Are you all the way there with Splice and where you’re going
“Write me a country song,” and Splice will just do it for you
That’s totally not what we’re trying to do
and I’m so glad you asked this question because I really want to put this idea out there
the people who we think about all day long
the last thing they want is someone to make the song for them
one of the things that we learned when we launched Create
How are they able to get the tools to capture what’s happening inside and turn it into a song
turn it into something that they can share with other people
So it is absolutely not push-button creation
You must at the company have some sense of how people perceive building music out of sample packs
And even before “Espresso,” like “Umbrella” by Rihanna was [made with] GarageBand
which I think is just a moment in music that should belong in the history books
music is now just assembling a bunch of pre-made samples and that’s good or bad
Is that a framework you’re using as you enter the AI generation era
I’m going to take exception with what you just said
I don’t think music making today is putting just a bunch of samples together
I think that using samples to create music is a really profound creative process
I will concede that that is a very reductive criticism
it’s a process that’s been developed over decades that it’s really powerful
So I think of samples as the building blocks for how modern music is made
One of our largest growing genres is country music
You’re using samples to make country music
I think the artistry of using samples to make music is that you start with a sample
what you do inside the digital audio workstation
I asked that question somewhat to provoke that response
you have that response to the criticism of sample usage
Is that informing how you’re thinking about the criticism of AI usage
And I personally spend a ton of time with creators
and what they are telling me over and over and over again is
“I want better tools.” And when I was at Adobe
this is also something that we heard from people
“I want better tools.” And so the work for us
the work for any company that’s wanting to really meet the needs of this growing and large market is
how do you build better tools in this era of AI
let me type a bunch of prompts and I get a song out at the end,” but what happens next
How do I change this particular part of the song and get it to sound a certain different way
How do I take this sample and make it into something else
You saw this with the Splice Mic launch as well
A lot of it is: how do we get more of you into the music creation process as quickly as possible
whether we’re talking about using a synthesizer to make music
how do you make sure the creative process is respected throughout those different transitions in music innovation
That’s a lot of incoming about what your product should look like
Now there are these huge private equity companies that own huge catalogs that want to assert their rights in various ways
Do they have a point of view that’s informing how you’re using AI
or how you’re thinking about sample licensing
Because that seems like the most complicated part of your business
We are aligned across the industry — whether it’s with Universal Music or any of the other key
high-quality players in the industry — we are very aligned that the rights of the creator have to be respected
We’re going to focus on the creators and what creators want
and we’re going to try to meet their needs
So the rights of the creators have to be respected
we take this pretty seriously and we take it seriously throughout the entirety of our process
How does a sample producer or sample pack creator come into the Splice platform
We have an entire organization that does the intake
So that process of ingesting is something we take seriously
Is it tagged appropriately all the way through getting onto the platform
And all the quality stuff: Is the sound clear
what is the experience of someone who is downloading a sample from Splice and able to use it
We want to make sure that every single download that you do on Splice lets you download the PDF that says
you’ve got full rights to this material to use it for any kind of creation.” So that’s something that’s a basic part of what we do
and that’s the big story that everyone’s talking about
which is if you’re going to use content to train
you should train on content that you have rights to
It’s not okay to disrespect the rights of creators
most players in this space are pretty aligned on that
It occurs to me just as you described that
that you are a creator platform for creators
There are people who sit around making sample packs and then they might make money uploading sample packs to Splice
you’ve got artists who are downloading the sample packs
paying you money to go use some in other songs
and I wanted to come back to that for one second
That’s what’s really magical about using samples to make music
It’s not just a random sound that you got on Splice
There’s an artist at the other end of that
we work with some of the people who are really at the forefront of funk
and what that means and what that sound is
So when you’re using a sample pack from Splice
It’s a storytelling between those two different artists coming together
Are there creators who make their entire living just making sample packs for you
Is that a viable approach to being a professional musician
they make hundreds of thousands of dollars
they’re building their own musical career and this is part of what they’re doing
I will say that the revenue that we’ve shared with the artists on our platform over time
So it’s nice to be able to feel good about that
I’ve been spending a lot of time just thinking about the economics of creator platforms
you see that creators have to augment their income
they all have to do brand expansions or sponsored content or whatever
Is there a ceiling to how successful you can be on a Splice
Sabrina Carpenter makes “Espresso.” I’m guessing the person who made that sample pack did not get paid more money because that song was a hit
that’s the pro and the con of being royalty-free
what that means for the creators is they don’t have to get stressed about it
You don’t have to worry about clearing the rights
The downside is you don’t get to share in the sort of upside when something big like that happens
We’re really here to make sure that as many people can create as possible
and that gives them access to this unlimited library of sounds
along with the creative tools that we’re investing in heavily for the future
You get a certain number of credits per month
and use those credits to download sounds that you can then use as you want
And is growth just getting more and more artists to use Splice on both sides
as creators and as people who are subscribers
it’s been an interesting journey over the last three years while I’ve been here
I think that brings us to the Decoder questions
I think you had two different stints at Adobe
Right out of business school and when it was a perpetual business and more recently
Adobe is the creative software company. They have a very, I would say, back-and-forth relationship with creatives. We had [Adobe CEO] Shantanu Narayen on the show
We got feedback on that episode of Decoder like nothing else we’ve ever experienced
what AI means to Adobe as a company and its user base
You obviously have some of that experience
how have you thought about applying those lessons to what is well on its way to being one of those companies for the musical community
Adobe has been a really important part of my career journey
I was also one of the early people at Flickr
There’s some Flickr users right now who are writing us emails
I was also head of product and marketing at GitHub
So I’ve had a chance to see creator tools in multiple different places
and all of that has really informed what I’m bringing here to Splice
The journey for me has been a little bit around pattern recognition
that I see here at Splice is that you have a business that’s centered around content
and you have a lot of rich metadata around that content
and you have lots and lots of impressions around that content so that users are giving you information about it
we have about a million songs that are samples that are sounds that are downloaded today
We have 28 million stacks that have been created using our AI tools
So we have a lot of impressions of what sounds people are listening to
you can use that to build rich experiences on top
which is what we’re doing now with the creative tools
That feels very familiar to bring to Splice
to bring to the music industry where I’ve seen it at GitHub
“We are going to make the tools that actually help you create the music.” You can look at it in a slightly more abstract way
Splice doesn’t see what you’re doing in those apps
but those are the dominant music creation apps
You’re suggesting with something like Splice Mic or Splice Create
you’re going to take some of that creation
I think there’s a lot of opportunity to reinvent how we make music
The features you launched at SXSW are interesting
because they use AI to make that a little bit faster
You can sketch an idea very quickly on a phone now.Is that the extension — “We’re going to take some of Pro Tools market share
We’re going to go take some of Logic’s market share”
So I think that word “take” suggests a zero-sum game. This is not a zero-sum game, right? It’s about expanding and exploring the creative process. Many of our users use Splice Mic
or use our mobile app as an adjunct part of their process that they will ultimately finish inside a digital audio workstation (DAW)
super top-end producers has worked with many of the big names that you would recognize
I’ll generate a bunch of stacks so that by the time I get to the studio
I’ve got a bunch of ideas that I can show the artist right away to say
‘Do you want to go this way or do you want to go that way?’” And that’s a really core part of his creative process
I was just at my kid’s school where they have a digital music production class
listening to sounds on Splice is a really core part of learning
What does it mean to create a Bollywood hit
What does it mean to create something that’s a K-pop sound?” And I think that’s a different way to use this experience
it’s not that we’re going to take [market share] away from this place or this place
but how do we expand how much we’re part of the creative journey in different ways
But the idea that you’re going to start and finish a song in a Pro Tools
Do we think that we’re going to directly compete with Pro Tools
There are people who tell us every single day
“You will take Ableton out of my cold dead hands
It’s not going to happen.” And there are a lot of other parts of the creative process that are painful
inevitably there will be a situation where they’re like
we need to find a certain kind of kick drum.” And they’ll find a folder and they’ll do a subfolder and they’ll do a sub-folder
and then they’ll finally find the sub-sub-sub folder that has 20 kick drum sounds that they have saved
You just go through and you listen to these sounds
we’ve just done this new experience that we launched in October last year where we integrated with Studio One
and there’s a Splice integrated search with sound experience
So we listen to what you’re creating inside Studio One
and we’ll suggest the samples that go with it right there integrated as part of your creative workflow
Do I think I’m going to replace Studio One
Can I make the Studio One experience a lot better
because Splice is there and Splice is smart with AI
How do those conversations work with all those digital audio workstation providers
The companies that make them are all very different
All of the music tech industry is very quirky
eccentric Europeans floating around this industry in particular
It’s one of my favorite parts of the tech industry to cover
they’re just going to do whatever it wants to do
How does that competition and cooperation work
I’ve actually been really impressed at how collaborative the industry really is
So the conversations with Studio One was very
and we’re working with other partners as well to bring that integration
I think there’s generally a recognition that we’re good at what we do
the kind of work that we can do in terms of bringing these sample packs to the world
It’s not something that they want to replicate
They want to make great experiences inside Ableton
I think the AI stuff is new to a lot of people in the industry
A lot of the team that I’ve brought into Splice over the last few years comes from a core tech background
which is unique in some ways for the music tech space
So I think there’s a lot of respect around that
I think there’s an attractiveness to a subscription business model that has been difficult for this industry to adopt
I think there’s a lot of curiosity about that
Could we use a content business model to get more recurring revenue
But I think many people have found that it’s not as easy as it looks
One of the things you say about bringing people who have a core tech background is that helps you innovate in things like AI
Where you just need to be on the cutting edge of the technology
Tech and music in particular have always just crashed into each other
The thing I say on the show over and over again is if you pay attention to the music industry and what tech is doing to the music industry
you have the view into what tech will do to everything else five years out
How are you thinking about that dynamic right now
“I need to hire more tech people.” Is it just for AI or is there something else you’re trying to accomplish with the addition of that talent
and when I look at the music creation process
I feel like these music creators have been underserved with great innovative experiences
and I think it’s important to focus on the creative workflow and provide people better tools over time
When I think about the collision between tech and music
it’s weird because there’s actually more similarity than dissimilarity in Splice
we have some really great software developers who love music
We have a whole bunch of musicians and artists who think in that same weird mathy way that great software developers think
I also think that there’s this mindset out there that musicians are scared of technology
I actually think that musicians love hacking
there was all this threat around synthesizers and all of that stuff
and then Stevie Wonder took it to a totally different
and it allows them better tools to get to the other place
What they don’t love is push-button creation
you will find the right ways to bring technology innovation here
I think there’s something else that you’re pushing on here that I think is important
and maybe it’s one of your Decoder questions
around how do you bring the cultural mindset from the tech industry and meld it with the music industry
I might as well ask you the Decoder questions
We are fundamentally a product company first
So my largest organization at Splice is the product development organization
Because I think that tight loop is super important
they’re all in one org and that’s product development
Our second largest org is our content team
And those are the people that they’re going to Brazil
Maybe the third thing that I’ll point out that’s really important to me and how I structure the org
Is we have a very strong central data organization that reports directly to me
So a lot of people put that inside product dev
Data’s obviously important for finance and how we run the business
And how is it split between those three groups
they say they have investments in content teams
but they really just hope the scale carries them forward
Instagram does not have some huge content team that is traveling the world to get content
It’s the same with YouTube or TikTok or whoever else
They might manage some of their top influencers
but really the volume of content comes to them
Is that a tipping point that you think Splice can reach
or do you want to maintain control over the library
It’s really important for us to make sure our library is the highest quality that it can be
it’s not going to be a free-for-all where anyone is uploading anything they want
because we need to maintain that high quality
There’s all kinds of stuff that’s being uploaded to all of these big platforms
So I think the biggest change has been around
The reason that’s important for me is because I need to understand what our customers actually care about
how are they voting with their clicks as opposed to whatever opinions everybody else has
and that is really where data and the math and the science turns into something else
which is a real experience that people can feel
The reason that’s important is because we’re serving creative people
and that’s what creative people do as well — they take all of these inputs and they turn it into something new
So building a strong design team that is either made up of music creators themselves or people who spend a lot of time with music creators is really important. And the third thing that I really brought in that’s important is that we build our products with the customers. So everything that we launch, there are tools that we built in to allow people to give us feedback. In fact, when we launched Create
the biggest button in the Create experience was the feedback button
Every single time someone typed in something to give us feedback
it comes into a Slack channel that’s with all the designers and the engineers and the product managers
So we’re actively talking about the feedback from the customers as it’s coming in
I absolutely love that we build product that way
I think everyone should build product that way
One of the reasons I always ask about structure on the show is that it’s a proxy for culture
You make some big choices about how things are organized
You’re in an interesting spot because you took over for co-founders
How have you thought about changing the culture
The reason I love your question around structure is because I do see that it’s a proxy for values
and that’s why I answered it the way I did around data
Those are fundamental values that I want to bring and inculcate into the company
There’s something else that we also did that was around building culture
I spent a lot of time listening to the team
trying to learn what made this culture unique
and then I reflected back to the organization
these are the values that I’m hearing from you all
And we came up with something that we call our DISCO values: direct
And even though these are new values that we came up with after I joined
they have felt so authentic to the culture that we have that’s existed for a long time
Every single new employee that comes on talks about which DISCO value they resonate with most
which is also in many ways a proxy for culture and values
I’ve always been a very math and science kind of person
I’ve always been someone who’s very analytical
I study all the different tools for decision-making
but as the decision sets that come to me become more complex
and as we operate in an increasingly more complex world
I have found myself relying more and more on intuition
I would say that my decision-making process is
People know in my team that I spend a lot of time on our dashboards
I will spend a lot of time watching research videos and understanding how people are using our tools
I will spend a lot of time personally talking to different customers
and once I’ve kind of drowned myself in all this information
We’re going to put this into practice because the “making creative software for creative people in the age of AI” is about as tense as it gets in the balance between what the numbers are telling us and how the people feel
And what I mean specifically is the numbers are telling everyone that people are using the AI tools
every software maker I’ve talked to has introduced AI tools with any meaningful value
they’re doing generative fill all day long
Then what you hear from the creatives on social media or online
They stole everything from me.” And that is about as big of a divide in tech
I think that is challenging a lot of how everyone is going to make decisions
So I’m going to read you a quote from one of your ostensible competitors
and it tracks with everything you’re saying
but I suspect you are going to disagree with this quote
and I just want to sit with that for a minute
So you have said, “Right, creators just want to create, they want all this stuff to get out of their way.” So here’s the CEO of Suno, Mikey Shulman. Suno is just “push a button, it makes you a song,” right? You say country song, it just spits out a country song at you. And here’s what he recently said: “It takes a lot of time
You have to get really good at an instrument or really good at a piece of production software
I think the majority of people don’t enjoy the majority of time they spend making music
It is not really enjoyable to make music now.”
I have no idea what Mikey Shulman is talking about
but that does track with what you’re saying
that you just want to get the software out of the way
But he spun the knob all the way to “just prompt me for a song.” And a lot of people reacted to this quote very strongly
How do you sit in the middle of that to say
“There’s a line and I’m going to enforce the line
and we’re not just going to prompt it all the way to a song?”Also
Do you think people don’t enjoy making music
Here’s what I have learned by serving creative people for most of my career: the creative process is essential for people who create
but the struggle is to authentically translate what is inside you into something else
your tools will help you —will enable you to do that — and other times your tools will get in the way
Understanding the distinction between those two is the whole ball game
but it’s really about allowing the struggle to come to life
and to dismiss it by this push-button set of tools
I think the creative process and creative people deserve better
They deserve better technology that enables them
as opposed to reducing this profound activity to a button
So this is where I think the line is inherently qualitative
here’s what we’re going to do and here’s what we’re not going to do.” And the tension of
“It’s not really enjoyable to make music now,” you can describe that as using the software sucks
or I just want to have an idea and hear it as fast as I can
And then you can describe it the way you’re saying
which is there’s some parts of the struggle that are the creative process
If the data tells you that people really want to just click the button and make the music
are your values strong enough to not send you all the way down the road
I think it depends on which people you’re listening to
We are really clear about the people that we’re listening to
We are listening to creative people who love the process of music creation
when we gave them Create for the first time
I want something that gives me more creative freedom
the people we’re listening to are super clear and the signals they’re giving us
The other side of this marketplace is consumers
We see consumers and fans all the time now react very strongly to AI generated imagery
you make a movie poster and it’s got a bunch of AI in it
You can’t see that the characters in the movie poster have 12 fingers and their hair bleeds into the skyscraper behind them
Do you perceive that kind of consumer or fan backlash to AI in music the same way that we see it in visual art
I have seen really clear signals from our customers that they are not really interested in computer-generated samples
We are in fact investing in more human-created samples
This is why we’re sending people out to the sort of subgenre locations
It’s really important for our strategy to continue to do that
because people want to connect with the stories of the real artists on the other side of the sample
I think what an end user who’s listening to Sabrina Carpenter today and will listen to somebody else’s music tomorrow
what they can hear is going to be interesting
I love that Kendrick Lamar won the Pulitzer Prize for music
and the people who won the Pulitzer Prize for music
So what art is and what is acceptable changes over time
I would expect it to continue to change over time
I know that artists will use different tools
discussion about watermarks and encryption
and letting people know when images were edited by AI or created by AI
and I would say there are some deep and meaningful challenges with even making that technology work consistently
There’s not anything quite like that on the music side
I think it’s going to be really hard to disambiguate around sound
You’ve had some really great conversations about this topic on your podcast
I think it’s a really important debate and discussion to have
There is going to be a bunch of bad AI-generated content out there
I think that the toothpaste is out of the tube
we have to do the right thing around respecting the rights of creators
and doing the right thing with respect to training data
Maybe some of these cases that are open will help us get to the right answer
but I don’t think it’s going to come out of watermarking
You talked about the flood of AI content that’s coming
The big consumer platforms are embracing it
Mark Zuckerberg would love it if all the content on Facebook was AI and he was paying zero out to creators
YouTube is really leaning into the idea that you should interact with your favorite creators through AI avatars
and that they should make even more videos or AI should help them make even more videos to increase the volume of content that appears
I don’t know exactly how it’s going to play out
but I understand the incentives for those platforms to make those choices to say
what we want always is more content because that will create more attention and we can serve more ads
and we’re in this finite zero-sum intention game.” You’re not in that game specifically
and you do allow artists to make music with AI using your tools
Do you allow AI-generated samples to enter your library
Why is it okay to make music with AI but not to have it in the sample library
I think it’s what users are coming to Splice for today
They are coming to find those authentic sounds made by humans
That’s not to say that people aren’t using AI to master sounds or things like that
You’re using AI to master your audio and video probably here
and that’s fine as long as there’s an authentic artist’s artistic vision and voice behind it
that’s super important for us to continue to be focused there
With respect to these social platforms that you’re talking about
And inasmuch as these social platforms are important for our creators as a way to share their output
But these social platforms have grown because they allow people to have emotional connection with each other
“I’m really angry about this particular issue,” or “I’m reaching out for support for these fires in LA,” or these connections that we make
and finding support around this very specific cancer that I have that I can’t find other people to connect with online
if we erode those actual emotional connections between people in order to save a buck in paying out creators
I think the value of these platforms will diminish over time
Maybe we shouldn’t spend so much time on TikTok
Maybe we should spend more time creating music on our own
So I think these are really interesting evolutions that are going to happen in the industry
I care a lot about where some of this stuff goes
and so many things our users create just to hang out on their desktop because it was just for the joy of creating
And some of it goes on and becomes a Billboard top 100 hit
but I’m just as happy that someone is spending time creating
Let me ask that again in just a different frame
We’ve talked a lot about active creation and what the tools are for and the fact that your customers
They want to add something to what the computer-generated product is giving them
and that process of addition creates additional value
Some very important songs have been made that way using Splice and other tools
But you’re saying that is not a good enough argument to get AI-generated audio into the sample library
If it’s good enough for me to send to a major label and play on the radio
shouldn’t it be good enough to get into the Splice sample library
So I think that the distinction in my mind
and I think for many of our creators is that
or was AI used as a tool to bring a human creator’s idea to life
Do people use technology to create the samples that end up on Splice
People are using lots of tools and technology
“I’ve created an algorithm to pump out a whole bunch of samples that are computer-generated for the mass market.” Those are not going to end up on Splice
who is creating art that they care deeply about
and they’re using AI tools as part of that process
I just don’t know how to write it down in a way that can be consistently enforced across all the geographies that you’re operating in
with all of your teams going out in the world
or in a way that’s understandable to artists who might want to be part of Splice
Is there a definition you have of where the line is
I will take it down to something very simple
There’s a human being who we have a relationship with on both sides of our platform
and so on the side of the platform where we are working with a musician
an instrumentalist who wants to provide a sample to Splice
and we talk to them about what they’re trying to do
How many sample packs do we need every quarter
there’s a Japanese potter who is making handmade percussion instruments that he then records
we’ve got crazy kids making all kinds of super electronic
and they’ve got a different tool set that they’re using as part of their process
you can’t use this tool because it’s AI-generated or not,” but do you have that authentic vision for what you’re creating
And it’s not that difficult to tell the difference between a person who is creating that way
“I typed in a bunch of prompts and I got a whole plethora of computer-generated sounds.”
The other extremely challenging piece of the puzzle with AI-generated content is when you veer into impersonation. We’ve seen this in the hip-hop industry a lot recently. We’ve seen it with OpenAI and Scarlett Johansson’s voice. There’s a lawsuit. The voice got pulled. Who knows how that’s going to play out? We see there’s the Elvis Act
in Tennessee where impersonation is illegal
and I don’t think there’s a great answer for whether Elvis impersonators themselves are now illegal
Are you playing in that space where you’re letting people use artist voices or sound-alikes
I think there are lots of people who are playing in that space or interested in that space
and creative people are actually really clear with us
They are coming to Splice because they want to find their authentic sound
and so we work really hard at the very other end of that
which is how do we allow our users to authentically find their own vibe
Voices is one thing, right? They’re pretty recognizable. The fake Drake song set the industry ablaze
It was just very obviously a fake Drake song
There’s not a great legal system for saying
It seems like we’re on our way to understanding how to get there
Then there’s kind of the existing mess of music copyright. We talk about the “Blurred Lines” case on this show a lot
I think more than any other podcast we’ve talked about “Blurred Lines,” a song which came and went and whose moment is over
but it continues to come up on Decoder maybe once a month
That lawsuit is “you guys stole a vibe from Marvin Gaye
not anything direct,” but the jury was like
Robin Thicke and Pharrell have to pay the money.”
That’s something you could very easily see a user of Splice wandering into
We’re going to layer some samples and we’re going to get to a vibe that’s too close to another artist.” Is that something you worry about
Is that something you try to protect users from
It is, and it’s also been a core part of how music evolves over time. There’s this whole conversation around reheated nachos and what that means
and I think artists and musicians build upon each other’s work
and this conversation’s been around since the beginning of sampling
which is “what am I referring to when I use this sample
and what’s the story that I’m trying to tell?”
or you could argue that it’s building on a shared piece of work that’s a community piece of work that continues to evolve over time
I think that that’s what makes art and music in particular super fascinating
I love that you guys have this whole debate around that particular song
and what’s right and wrong should be defined by the artists
But the idea that you would accidentally boost too much of an existing song by using an AI tool
which is trained on bits and pieces of existing songs
The push and pull is people being very unhappy about the money, and now we’re at a place where it’s easier than ever to be derivative, and the money is absolutely not clear — that artists are very upset about their work being trained on, maybe not in your tools, but certainly in other tools. The labels are suing Suno and Udio, its competitor, for training on their data
Because it seems like the problem is going to get worse faster than the legal system will even comprehend the technology
Most of these problems get worse before the legal system catches up
Technology outpaces how quickly legislative action catches up
we’re doing a lot of work to try to create standards within
great companies in the music space that are saying
We have to make sure our training data is clean.”
I think there are a lot of companies that are trying to do the right thing
Is there one standard that has won out amongst all the others
but I know that a lot of people are working really hard on this problem
We care deeply about the rights of creators
so that’s going to stay really important for us
How do you feel about the labels suing Suno and Udio
Is that something that’s a warning sign for you
Do you think that that is going to get resolved
I think what the labels are trying to do is support the rights of the creators
so we absolutely support the rights of the creators
it’s always going to be about the creators first
and I know my customers deeply care about the fact that they have rights to the content they create using Splice
That’s why we allow people to download the rights PDF
even if they’re not putting their song up on Spotify or trying to make a billion dollars from it
they want to know that they have the ability to do that
So that’s what governs our decisions around clean training data
If I wanted to sign up for a Splice account
You say you can’t train AI on these tracks
Yeah, I don’t either. It’s such an important issue. And the scale of the Internet, the scale of content on the internet is so vast that — What is fair use? What is not fair use? What is public consumption? What is public record? What is public ownership? We are in uncharted territory, and we’re going to be watching it just like you are.
How would you write a fairer system if you were clean sheeting this? How would you write a fairer system that makes creators feel valued, gets them paid, and still allows people to build these AI systems that a lot of people are getting some value out of?
I would love to say that I’m the expert who could write something like that. I have a much more straightforward problem to look after, which is, how do I help creative people be creative and get the ideas from their hearts and minds out there? Yeah, I’m going to leave that problem to people way smarter than me, who are legal minds who are working really hard on this.
Well, if I get anyone on the show who has an answer, I’ll let you know.
I just talk for a living. I haven’t done anything useful in a long time. Kakul, you’ve given us so much time. What’s next for Splice? What should people be looking out for?
All right. We’ll have to have you back soon as some of these issues play out. Thank you so much for coming by.
I would love to. I had such an enjoyable conversation. Thank you so much, Nilay.
Questions or comments about this episode? Hit us up at decoder@theverge.com. We really do read every email!
A podcast from The Verge about big ideas and other problems.
A weekly newsletter by David Pierce designed to tell you everything you need to download, watch, read, listen to, and explore that fits in The Verge’s universe.
Please upgrade your browser to improve your experience
Johns Hopkins researchers have developed a powerful new AI tool called Splam that can identify where splicing occurs in genes—an advance that could help scientists analyze genetic data with greater accuracy
offering new insights into how genes function and mutations contribute to disease
Their results appear in Genome Biology
"Precisely identifying splicing sites is key to understanding how cells interpret genetic instructions," says co-lead author Kuan-Hao Chao, a doctoral student in the Whiting School of Engineering's Department of Computer Science who is affiliated with the Center for Computational Biology (CCB)
"Splam lets us analyze genetic data with accuracy and efficiency
showing how mutations affect our health and why the same gene can produce different proteins in different conditions."
He is joined on the project by his advisors—Steven Salzberg, the Bloomberg Distinguished Professor of Computational Biology and Genomics and the director of the CCB, and Mihaela Pertea
an associate professor of biomedical engineering and genetic medicine with a secondary appointment in the Department of Computer Science—as well as Alan Mao
a fourth-year undergraduate double majoring in biomedical engineering and computer science
Image credit: Whiting School of Engineering
Cells rely on genes to guide their functions
with each gene containing both useful instructions (called exons) and non-essential segments (called introns)
Splicing is the process by which cells trim away the non-essential portions
recognizing splice sites computationally is a crucial step in accurately assembling gene transcripts in modern genetics studies
where RNA sequencing experiments measure the level at which a gene is expressed—basically
whether it's turned on or off—in different conditions
cancer researchers often use RNA sequencing techniques to compare gene expression in healthy versus cancerous cells," says Chao
Identifying splice sites is also important in annotating genomes
which involves identifying which parts of our DNA are functional and what roles they play in the body
One familiar application of genome annotation is in genetic testing services
such as those offered by companies like 23andMe
These tests analyze parts of your genome to tell you about your ancestry
Genome annotation makes this possible by identifying and interpreting these regions of the human genome
Compared to the state-of-the-art "SpliceAI" tool
the Hopkins team's "Splam" method uses a much shorter DNA sequence window to predict RNA splice sites
making its model more biologically realistic and feasible for use in research
The team's Splam algorithm takes a DNA sequence of 800 nucleotides—400 each of adenine (A)
and thymine (T) on both sides of potential donor and acceptor sites—and outputs the probability for every base pair being a donor site
"Our algorithm attempts to recognize these donor/acceptor sites in pairs
just as a spliceosome 'molecular machine' does in the cell when it cuts out an intron," says Chao
The researchers developed their algorithm to recognize splice junctions within a window of 800 nucleotides—a far smaller region than the 10,000 nucleotides required by Splice AI
The team reports that despite requiring less genomic data
Splam achieves better splice junction recognition accuracy than SpliceAI
After training their deep learning model on human DNA
the researchers ran additional tests on other species' genetic codes
"A frequent concern about deep learning methods is whether they simply memorize their training data or if their predictive models will work on data that diverges from what they have seen in training," Chao says
"So to evaluate whether Splam had learned more general splicing rules
we collected data from three successively more distant species and applied the algorithm to each of them without re-training."
The team chose the genomes of a chimpanzee
and a flowering plant in the mustard family
Their subsequent experiments demonstrated that Splam's biologically inspired design still produced highly accurate results on these more distant DNA sequences—showing that their method had indeed learned essential splicing patterns shared across many animals and plants
The team's next steps include applying its model to more species and integrating its method into existing RNA sequencing pipelines for practical use in transcriptome assembly
"Our method has immediate applications in improving transcriptome assembly and reducing splicing noise
making it valuable for a wide range of genomic studies," says Chao
"We hope that Splam will contribute to the better understanding of our genomes and the genes within them."
Posted in Health, Science+Technology
Tagged genetics, computer science, artificial intelligence
What will the music of tomorrow sound like
According to the latest report from Splice
the creator economy and cross-entertainment
and creative shifts that are set to define the musical landscape in 2025
The "Sounds of 2025" report reveals that music will be characterized by genre fusions and global influence. Among the hundreds of genres across Splice, one is raising its volume louder than the rest: "pluggnb"
Fusing the trap sub-genre "plugg" with '90s R&B and gospel harmonies
"pluggnb" is the fastest-growing genre on the platform with downloads spiking 342.8% in 2024
Unofficial "pluggnb" remixes dominated TikTok in 2024 and led to adoption of the genre by K pop heavyweights like LE SSERRAFIM and ILLIT
"Splice is uniquely positioned to see the sounds that are driving music production globally
This report gives us a sense of how music producers are bending and blending genres in real time and just what genres and regions will define what we hear in 2025 and beyond." Kakul Srivastava
Creators in LA drive the most Splice downloads by far
with New York creators driving roughly half as many
And while American cities make up half of the ranking for Splice's top ten cities
the Splice user base is increasingly global
Australia has both Sydney and Melbourne in the Top 10
"Splice is uniquely positioned to see the sounds that are driving music production globally
This report gives us a sense of how music producers are bending and blending genres in real time and just what genres and regions will define what we hear in 2025 and beyond."
"The music industry is always trying to get ahead of trends
but there is perhaps no more forward-looking cultural trend than sample usage," says Mark Mulligan
"The sounds that producers use to create today give us a window not only into the genres that will be big tomorrow but also early indications of entirely new genres that will emerge from the process of mixing samples across styles and cultures
The genres that stand out in this report also underline wider trends: the growing importance of scenes and fan remixing in shaping the sounds of the future."
● Pluggnb: The Fastest Growing Genre of 2024
Pluggnb—a fusion of trap's plugg subgenre with '90s R&B and gospel harmonies—rose an incredible 342.8% in downloads
while Seoul has emerged as a key hub for this genre's growth
TikTok and digital culture are driving its global rise
making it a key contender for the mainstream in 2025
● Jersey Club: From Underground to Global Phenomenon
Jersey club, the high-energy hip-hop/house hybrid, saw exponential growth in 2024, particularly in Berlin, where it became the city's fastest-growing genre on Splice. Its influence is growing worldwide, with artists like UNIIQU3 and Cookiee Kawaii bringing it to new audiences
Expect Jersey Club to continue its global domination as 2025 progresses
"Rage," a harder
and YoungBoy Never Broke Again are introducing this sound to the mainstream
This genre is likely to continue its rapid rise
blending intense energy with a more experimental
● A New Era of Dance Music: Melodic House & Techno
cinematic approach gaining traction globally
● Afro House and the Global South's Influence
The global rise of "Afro House"
particularly from South African creators like Black Coffee
As international audiences increasingly embrace the genre's fusion of African rhythms and house beats
it's expected to play a significant role in the sounds of 2025
it highlights the emergence of 'Brazilian phonk'
which was made popular by European producers who fused "phonk" with the sounds of Latin America
and its reclamation by Brazilian producers
While Los Angeles and New York remain the top cities for Splice downloads
the rise of international hubs is undeniable
Tokyo and Berlin are two of the fastest-growing music production cities globally
with new sounds and genres emerging from these places
● Los Angeles: Dominating trends like K-pop and pluggnb
LA continues to be a hotbed for genre experimentation and cross-cultural influence
● São Paulo: Home to the rapidly growing drift phonk sound
this Brazilian city is becoming a critical center for phonk's evolution
● Johannesburg: As Afro House continues to soar
South Africa is establishing itself as a global powerhouse for innovative
● Berlin: Jersey club's unexpected rise in Berlin exemplifies the city's place in the global music scene
To read the full report - click here
John Vlautin, Splice, 1 818-763-9800, [email protected], www.splice.com
Metrics details
a neuronal RNA-binding protein expressed in the central nervous system
is essential for survival in mice and normal development in humans
A single amino acid change (I197V) in NOVA1’s second RNA binding domain is unique to modern humans
we generated mice carrying the human-specific I197V variant (Nova1hu/hu) and analyzed the molecular and behavioral consequences
While the I197V substitution had minimal impact on NOVA1’s RNA binding capacity
it led to specific effects on alternative splicing
and CLIP revealed multiple binding peaks in mouse brain transcripts involved in vocalization
These molecular findings were associated with behavioral differences in vocalization patterns in Nova1hu/hu mice as pups and adults
Our findings suggest that this human-specific NOVA1 substitution may have been part of an ancient evolutionary selective sweep in a common ancestral population of Homo sapiens
possibly contributing to the development of spoken language through differential RNA regulation during brain development
the genetic basis underlying these specialized human traits remains to be fully identified
These findings underscore the importance of incorporating diverse human samples to identify and validate the genetic background of modern human traits through genomic comparisons
technical concerns continue to make definitive conclusions about the nature of the NOVA1 I197V variant in brain challenging
we generated humanized mice harboring this variant to study its consequences for RNA regulation and behavior in vivo
we used gene-editing to substitute the NOVA1 isoleucine (I) isoform present in most mammals and archaic hominids (Neanderthals and Denisovans) with the human-specific valine (V) variant at position 197 in mice
Comparison of these humanized NOVA1 mice (Nova1hu/hu) with wild-type mice carrying the ancestral Nova1 gene (Nova1wt/wt) revealed specific transcriptomic and behavior differences related to vocalization
and evidence that the human-specific amino acid 197 variant confers vocalization changes in humanized mice suggest a role for NOVA1 in the evolution of human-specific language
d Comparison of normalized Tajima’s D values
The first gene set includes NOVA1 and NOVA2
and NOVA1-neighboring genes (FOXG1 and STXBP6) on chromosome 14
The second gene set includes all genes on chromosome 14
e Model of the evolutionary timing for the 197th amino acid change in the NOVA1 gene
noting the Nova1hu/hu mice generated in this study
Nova1hu/hu mice express the modern human-specific amino acid in the NOVA1 protein
The bottom panel shows the corresponding position within the KH2 domain of the NOVA1 protein
Amino acids structurally proximal (<5 Å) to the 197th amino acid
Using human genetic data from the 1000 Genomes Project
we calculated the DH value for the NOVA1 locus
which was nominally significant at p = 0.046
given the multiple hypothesis testing involved in our exploration of several selection tests
the observation that the NOVA1 197 V allele became nearly fixed and is shared across human population groups suggests it arose and increased to high frequency before their divergence
Our analyses support the idea that the NOVA1 197 V variant was part of an ancient selective sweep in modern humans
predating many other known sweeps in the human genome
and showed down-regulation in the P21 midbrain of Nova1hu/hu mice (average TPM 18.6 in Nova1wt/wt
these in vivo and in vitro studies reveal not only the resilience of the I197V variant in maintaining the biophysical properties of RNA binding with minimal global disruption but also its remarkable conservation of overall function
this variant exerts specific effects on alternative splicing (AS)
prompting us to investigate the I197V variant’s impact on RNA regulation
We first determined the brain regions for AS analysis based on the expression patterns of NOVA1
Expressed genes in the P21 midbrain were used as background for this analysis
f Percentage of genes with differential AS changes in each behavior-related gene ontology category
The number of transcripts with differential AS events in Nova1hu/hu is shown relative to the total number of genes in each category
it is plausible that these vocalization related transcripts are similarly affected in a context-dependent manner
such as in response to sensory cues from surrounding environment
these findings indicate that Nova1hu/hu mice with I197V substitution exhibit subtle but specific impact on RNAs in the brain
particularly in genes involved in animal behavior and vocalization
These data strengthen the potential relationship between Nova1hu/hu and vocalization
suggesting that vocalization studies in these mice would be valuable
a Isolation-induced ultrasonic vocalization (USV) test in pups
b USV parameters and syllable classification
c Fqmax distribution and two-Gaussian fit for pup USVs
Ashman’s D score (a measure of separation of two distributions
where a score above 2 indicates good separation) is shown
Each Gaussian center and weight are labeled
with the intercept of the two Gaussian distributions (black triangle) used as the cutoff between high and low Fqmax USVs
d Ratio of high or low Fqmax in syllables “d” and “m”
The ratio of syllables belonging to each distribution (high or low) is calculated for the total number of each syllable type
e Courtship-induced USV test for adult mice
f Duration distribution and two-Gaussian fit for syllable “s” in adult USVs
The intercept of the two Gaussian distributions (black triangle) was used as the cutoff between long and short duration
Short and long “s” syllable examples are shown at the top of the plot
g Peak frequency parameters in long duration “s”
h Fqmax distribution and two-Gaussian fit in adult USVs
The black star marks the 100 kHz cutoff between high and low Fqmax
Examples of low and high Fqmax syllables are shown at the top of the plot
i Frequency variance (Fq variance) in high Fqmax in adult USVs
h) at the bottom of the density plots show the mean (black dots) and standard deviation (whiskers) by peak for each genotype
No significant differences were observed between genotypes
each circle represents data from a single pup: Nova1hu/hu N = 41
three experiments were conducted over consecutive weeks
and the average value for each mouse is plotted (white circles): Nova1hu/hu N = 13
Statistical analysis was performed by Wilcoxon rank sum tests (two-sided
We also tested the bimodality and syllable ratios with Fqmin and confirmed the same trend (increased ratio in high Fq in Nova1hu/hu pups)
Heterozygous Nova1wt/hu pups showed intermediate values between Nova1hu/hu and Nova1wt/wt pups for these parameters
suggesting that the effect of the I197V substitution in NOVA1 protein on pup USVs is dosage-dependent
These observations demonstrate distinct changes in the vocalizations of Nova1hu/hu pups
changes in vocal quality in Nova1hu/hu pups had no impact on the behavior of the mother mice in this assay
These changes were not observed in pup isolation-induced USVs
indicating that this effect is developmentally specific and/or context-dependent
with simple syllables like “s” having lower values and more complex syllables like “m” with multiple jumps having higher values
This suggests that Nova1hu/hu mice produce more complex high-frequency USVs than Nova1wt/wt mice
These findings demonstrate that vocal behavior is altered in both pups and adults in Nova1hu/hu mice
we investigated the biological effect of a single amino acid substitution
By analyzing Nova1hu/hu mice carrying this allele
we identified molecular changes in alternative splicing in the brain
including brain regions associated with vocal behavior
and identified changes in vocalization patterns in pups and adult mice
These findings suggest that during human evolution
the I197V substitution in NOVA1 protein may have contributed to the development of neural systems involved in more complex vocal communication
This underscores the unique nature of the I197V variant
which occurred within a region of the genome resistant to change
These results confirm that NOVA1 has undergone strong positive selection and that the I197V variant is part of an evolutionary selective sweep in the emergence of Homo sapiens
may leave subtler genetic signatures that require novel detection methods
This suggests that the ancient NOVA1 selective sweep may represent part of a broader set of undiscovered ancient sweeps
it is plausible that the I197V substitution affects cortical regulation of vocalization
the Y-maze test results indicated that Nova1hu/hu mice had spatial working memory comparable to that of control mice
Future studies will be necessary to investigate the effects of the I197V substitution on USVs in female mice
as well as adult female preferences to USVs in adult Nova1hu/hu male mice
These observations may indicate a common or related molecular alteration in the neural circuits involved in the USV production between humanized Nova1 mice and humanized Foxp2 mice
Future studies should aim to identify the molecular and neural basis of these alterations
as well as the physiological significance of these vocalization changes in the context of social behavior
Our molecular analysis showed that the sequence-specific RNA binding of NOVA1 was unaffected by the human substitution and that steady-state gene expression levels in the brains of Nova1hu/hu mice were nearly identical to those of wild type mice
we detected alternative splicing changes in several transcripts associated with vocalization
The expression pattern of NOVA1 in the brain and the enrichment of its target transcripts to specific biological pathways support a link between NOVA1 function and vocal behavior
Uncovering the precise molecular mechanisms underlying the phenotypes in Nova1hu/hu mice will require further study of the neural circuits for vocalization
as well as on regulatory factors influencing NOVA protein function
This study sets the groundwork for understanding molecular mechanisms driving the evolution of human vocal communication
we analyzed a single amino acid unique to modern humans in the RNA binding protein NOVA1 and examined its biological effects in vivo by introducing this amino acid in mice
NOVA1 is highly intolerant to changes in amino acid sequences during evolution with the exception of this single amino acid change in humans
We propose that this change was part of an evolutionary sweep associated with specific changes in the neuronal transcriptome and vocal communication
All procedures were performed according to the guidelines of the Institutional Animal Care and Use Committee (IACUC) under the IACUC protocol # 23014 at the Rockefeller University
000664) mice were obtained from the Jackson Lab
Nova1hu/hu mice generated in this study were backcrossed to C57BL/6J strain at least 8 times
The mice were housed in individually ventilated cages (five per cage) under conditions of a 12 h light/dark cycle and ambient temperature of 21 ± 4 °C with 40–70% humidity
Male or female mice aged 7 days (for isolation induced pup USV test) and 8–20 weeks (for playback behavioral experiment and courtship-induced adult USV test) were used for animal experiments
Littermates of the same sex were randomly assigned to experimental groups
Nova1hu/hu mice were generated by directly injecting the sgRNA/Cas9 RNP with a single-stranded repair template DNA (ssDNA) into C57BL6 zygote to substitute isoleucine to valine at amino acid 197 of mouse NOVA1
gRNA and the ssDNA were designed as follows
gRNA (5’-TGCTACTGTGAAGGCTATAA-3’): overlapping the DNA sequence (mm10/ chr12: 46,700,902–46,700,904) of the mouse Nova1 genomic locus encoding the 197th amino acid of NOVA1
ssDNA: 140 nt length DNA homologous to the NOVA1 locus with a nucleotide substitution (A to G) to cause an amino acid change from isoleucine to valine at the 197th position
Two silent mutations were also designed to create BtsaI restriction enzyme recognition site for genotyping
Genomic DNA was extracted from the tail of the F0 animals
and the DNA corresponding to the area around the 197th amino acid was amplified by PCR and subsequently cloned into a plasmid for determining the sequence of the modified allele
Genomic sequence analysis revealed that among 13 F0 animals
8 animals harbored the designed allele (with three nucleotide substitutions: one causing I197V amino acid substitution
two for restriction enzyme recognition site for genotyping (not causing amino acid changes))
Animals carrying the designed humanized Nova1 allele were crossed to the wild-type C57/BL6 mice
and this process was continuously repeated for subsequent generations to eliminate possible off-target mutations
sequences around the genomic DNA encoding the 197th amino acid were amplified by PCR subsequently digested with the BtsaI restriction enzyme
Each Mouse genotype; wild type (Nova1wt/wt)
heterozygous (Nova1hu/wt) was determined by band size obtained by electrophoresis
Siblings obtained by crossing heterozygous parents were used in the experiment
DNA band size after restriction enzyme treatment: wild type (613 bp), homozygous (389 bp and 224 bp), heterozygous (613 bp/ 389 bp + 224 bp) (see Supplementary Fig. 3c)
Primary antibodies used for immunohistochemistry and western blotting were as follows; rabbit anti-NOVA1 (1/1000 dilution) [EPR13847] (ab183024
rabbit anti-NOVA1 C-terminal (1/1000 dilution) [EPR13848] (ab183723
human anti-pan NOVA (1/10,000 dilution) (anti-Nova paraneoplastic human serum) and rabbit anti-ATCB (1/10,000 dilution) (ab8227
3 or 12-week-old mice were perfused with PBS and 4% paraformaldehyde (PFA)
The solution was sequentially replaced with 15% sucrose/ PBS and 30% sucrose/ PBS
Frozen brains were sliced into 30–50 μm thick sections in a cryostat (CM3050S
Slices were washed three times with PBS at room temperature (RT)
incubated in 0.2% Triton X-100/PBS for 15 min at RT
blocked in 1.5% normal donkey serum (NDS)/PBS for 1 h at RT
incubated overnight at 4 °C with primary antibody in 1.5% NDS/PBS
then incubated in Alexa Incubated with 488
555 or 647 conjugated donkey secondary antibody
The nuclei were stained using 4’,6-diamidino-2-phenylindole (DAPI) solution (1 μg/ml)
Images of specimens were collected with a BZ-X700 (KEYENCE) microscope
midbrain and cerebellum) of P21 mouse brains were lysed in RIPA buffer (50 mM Tris-HCl; 150 mM NaCl; 0.1% SDS; 0.5% sodium deoxycholate; 1% NP-40)
and subjected to immunoblotting using the antibodies described above
Quantification of western blots was done with ImageJ (v1.53)
Each band signal was quantified and normalized with ACTB signal to control for differences in loading
The genes encoding each NOVA1 protein (NOVA1wt and NOVA1hu) were cloned into the pGEX6p1 vector and expressed in E
N-terminally GST (Glutathione S-Transferase) fused NOVA1 was induced by the addition of IPTG (final conc
and then incubated in the presence of Triton-X (final conc
Cleared supernatant was collected after centrifugation (12,000 x g 10 min 4 °C)
After incubating with Glutathione Sepharose beads (GE Healthcare Biosciences
the mixture was washed three times with PBS
The GST tag was cleaved from the NOVA1 protein by PreScission Protease treatment (GE Healthcare
4 °C for 4 h) to obtain purified NOVA1 protein
The concentration of each purified NOVA1 protein was determined by SDS-PAGE followed by GelCode Blue staining (Thermo Fisher Scientific
The single-stranded RNA probe was designed as previously41
The following single strand RNA were synthesized by IDT:
CCTTATCATGCTGACTCACGTCATTTCATCTCATCAAGGGAGTCAGTGGGATA
Synthesized RNA was first incubated at 80 °C for 10 min
and labeled at the 5’ end by T4 polynucleotide kinase treatment (New England BioLabs
The labeled probes were purified by G-25 column (VWR
95017-621) and diluted to the appropriate concentration with water
or dissected midbrain at E18.5 of Nova1hu/hu and Nova1wt/wt mice
The mRNA-seq library was prepared from RNA extracted with Trizol following the Illumina TruSeq protocol of polyA selection
Multiplex libraries were sequenced as 125 nt paired-end runs on the HiSeq-2500 platform at Rockefeller University Genomic Core
These raw datasets and processed data files have been deposited with Gene Expression Omnibus (GSE253297)
NOVA1-CLIP was performed in P21 dissected cortex
midbrain and cerebellum of Nova1hu/hu and Nova1wt/wt using each three biological replicates
triturated using 20 G needle and crosslinked three times on ice for 400 mJ/cm2 using Stratalinker
Crosslinked material was collected by centrifugation
0.5% deoxycholate and 0.1% SDS with protease inhibitor)
and subjected to DNase (RQ1 DNase: Promega) and RNase (RNase A: Affymetrix) treatment at a final dilution of 1:20,000 for 5 min
The lysate was clarified by centrifugation at 20,000 × g for 20 min
The supernatant was used for immunoprecipitation with 200 μL of Protein A Dynabeads (Invitrogen) loaded with 18 μg anti-Nova1 antibody (abcam) for 2 h at 4 °C
The samples were washed as follows: twice with wash buffer
twice with Nelson stringent wash buffer (15 mM Tris pH 7.4
twice with Nelson high salt buffer (15 mM Tris pH 7.4
twice with Nelson low salt buffer (15 mM Tris pH 7.4
and twice with PNK wash buffer (50 mM Tris pH 7.4
RNA fragments were dephosphorylated using FastAP Alkaline phosphatase (Thermo Fisher Scientific) and subjected to 3′ ligation overnight at 16 °C with a pre-adenylated linker (preA-L32) using truncated KQ T4 RNA Ligase2 (NEB)
The RNA-protein complexes were labeled with 32P-γ-ATP using T4 PNK (NEB) and subjected to SDS-PAGE and transfer to nitrocellulose membrane
Appropriate regions of the membrane were cut out and RNA was extracted according to the following conditions: 100 mM Tris PH7.5
RNA was purified by phenol-chloroform extraction method
Cloning was performed using the BrdU-CLIP protocol
the reverse transcription reaction was performed using Superscript III (Thermo Fisher Scientific)
and the cDNA was BrdU-labeled by including BrdU in the reaction solution
Immunoprecipitation was performed with 5 μg anti-BrdU antibody (abcam) and 25 μg protein G Dynabeads per reaction (45 min at room temperature)
followed by washing with the following solutions (including Denhardt’s solution): once with IP buffer (0.3x SSPE
BrdU-immunoprecipitation was performed again under the same conditions
cDNA was circularized on beads using CircLigase II (Epicentre) and PCR was performed using Accuprime Pfx supermix (Thermo Fisher Scientific) and Syber Green until RFU 250–500
PCR products were purified using Agencourt AMPure XP (Beckman Coulter) and concentrations were measured by TapeStation
High-throughput sequencing was performed at the Rockefeller University Genome Resource Center
These raw datasets and processed data files have been deposited with Gene Expression Omnibus (GSE253296)
The data set for Nova1 knockout mouse (E18.5 midbrain) was kindly provided by Dr
The data are available from GEO submission GSE69711
Data visualizations were done using R (v4.2.0)
Correlation matrix was visualized using corrplot package
PCA analysis was performed using FactoMineR and factoextra packages and visualized using ggplot2 package
Sequencing tracks were visualized using Integrative Genomic Viewer (IGV
De novo motif analysis and motif density analysis were done using findMotifsGenome.pl and annotatePeaks.pl commands in HOMER (v4.11)
7-day-old pups were isolated from their mother and littermates
Each pup was placed quietly on a small open-faced plastic plate in the sound attenuating chamber (15” × 24” × 12” Igloo® beach cooler with a tube for pumped air circulation input
An ultrasonic microphone was suspended a small distance from the pup
the recording box was cleaned with 70% alcohol and distilled water
and allowed to fully dry before the next experiments
Vocalizations were recorded with UltraSoundGateCM16/CMPA ultrasonic microphones connected to an Ultrasound Gate USGH amplifier
Recordings were saved using the AvisoftRecorderUSG software (Sampling frequency: 250 kHz; FFT-length: 1024 points; 16-bits)
All acoustic hardware was obtained from Avisoft Bioacoustics® (Berlin
Mothers rearing 7-day-old offspring were used in the playback experiment
We used a three-chamber box (12” × 23.5” × 15.5”) connected by a passageway through which a mouse could pass for the test
Each chamber at both ends was equipped with a speaker (Vifa ultrasonic speaker
FLIR) was placed on the ceiling of the chamber to record the behavior of the mouse
The speakers were connected to an UltraSoundGate Player 216H (Avisoft Bioacoustics)
using Avisoft Recorder USGH and had a frequency range (±12 dB as the maximum deviation from the average sound volume) of 25–125 kHz
We adjusted the loudness between the channels by controlling the level of the peak power before the experiment
we made sure that both songs were audible at the entrance of both rooms so that the mother can respond to the songs but not loud enough that the microphones could detect the song being played in the other room
playbacks were triggered when the mouse broke an infrared sensor located in the center of the three-chamber box
One speaker on one side played one pup-USV recording and the other speaker simultaneously played another pup-USV recording both of which were previously recorded during the pup isolation induced USV test for 5 min
Pup-USV recording was prepared in Audacity® by stitching vocalizations from 4-5 pups for each genotype
These recording files contained an equivalent number of pup-USVs (Nova1wt/wt 1689 USVs
Nova1hu/hu 1561 USVs) and were confirmed to reflect the vocal characteristics of each genotype
a second 5 min playback session was conducted after 1 min quiet period
the two recordings playing from the speakers were switched to eliminate the possible preference by the location
the mother was allowed to explore freely in the box and the time she spent in each room was counted
The box was cleaned between experiments with 70% alcohol and distilled water and allowed to fully dry before the next experiments
and the male mice housed in the same cage until the test day
the males were placed in a new cage and then singly habituated in the sound recording environment (as described for pup USV test) for 15 min
The males were then exposed to adult female mice for 5 min
We used the females (8-12 weeks old) in estrus (selected visually for wide vaginal opening and pink surrounding)
The test was conducted three times per mouse
and a different female mouse was used as the stimulus each time to avoid familiarity effects
The order of mice tested each time was shuffled to avoid the possible order effects
the mouse cage was cleaned with 70% alcohol and distilled water
where μ and σ are the center and standard deviation of each Gaussian
Cutoff Fqmax in pup USVs between low and high USVs were defined as the intercept of the two Gaussian fits to the distribution to the nearest frequency (kHz)
Cutoff durations between short and long USVs were defined as the intercept of the two Gaussian fits to the USV duration distribution to the nearest millisecond
Statistical analysis was performed by pairwise Wilcoxon rank sum tests
with correction (Bonferroni) for multiple comparisons between genotypes
Correction was not applied for the parameters in call structure
This is because individual properties are assumed to be related to each other
which increases type 2 errors caused by overcorrection
The tests were performed using the elevated revolving rod (Stoelting
Mice were placed on the apparatus and habituated for few minutes
The rod accelerated at a constant rate (4 to 40 rpm in 300 s) and the time it took the animals to fall was recorded
Tests were performed three times and the average value was calculated
Statistical analysis was performed by Wilcoxon rank sum test
The Y-maze tests were conducted according to the described procedure130
The tests were performed in a Y-maze with three arms of equal length at 120° angles to each other (Stoelting
Mice were placed in the center of the maze and has free access to all three arms
If the animal chooses an arm different from the arm it arrived in
This is considered a correct response; conversely
returning to the previous arm is considered an error
The number of times and the order in which the animals entered the arms are recorded and used to calculate the alternation rate
The behavior of the mice was recorded for 8 min
The sequencing data of three Neanderthal genomes were obtained from The Draft Neanderthal Genome Project (https://www.ebi.ac.uk/ena/browser/view/PRJEB2065). The sequencing data of Denisovan genome accompanied with nine modern human genomes were obtained from Denisovan Genome Project (http://cdna.eva.mpg.de/denisova/)
The fastq files of each sample were aligned to the human genome (hg19) by Burrows-Wheeler Aligner (BWA
The aligned SAM files were processed into BAM files by Picard (v2.18.7) and Genome Analysis Toolkit (GATK
Variant calling for each sample was processed with Mutect2 of GATK4
The variants were annotated by using ANNOVAR (v2)
The variants in modern human populations were obtained from ExAC database (v0.3.1) (https://gnomad.broadinstitute.org/downloads)
which contains 60,706 exomes mapped to hg19
the variants in NOVA1 loci (chromosome 14: 26912296 ~ 27067239) of Neanderthal
and modern human populations were subsetted by using bcftools (v1.19)
The frequency of minor alleles for each position in NOVA1 loci was calculated
Information of statistical methods and the number of biological replicates in the analysis are in the figure legends and methods section of each analysis as appropriate
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
Revising the human mutation rate: implications for understanding human evolution
Deciphering African late middle Pleistocene hominin diversity and the origin of our species
Evolution of vocal learning and spoken language
Evidence of a vocalic proto-system in the Baboon (Papio papio) suggests pre-hominin speech precursors
Evolutionary loss of complexity in human vocal anatomy as an adaptation for speech
A high-coverage genome sequence from an archaic Denisovan individual
The complete genome sequence of a Neanderthal from the Altai Mountains
A high-coverage Neandertal genome from Vindija Cave in Croatia
No evidence for recent selection at FOXP2 among diverse human populations
Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment
Comment on ‘Human TKTL1 implies greater neurogenesis in frontal neocortex of modern humans than Neanderthals’
A forkhead-domain gene is mutated in a severe speech and language disorder
Identification of FOXP2 truncation as a novel cause of developmental speech and language deficits
A Foxp2 mutation implicated in human speech deficits alters sequencing of ultrasonic vocalizations in adult male mice
Knockout of Foxp2 disrupts vocal development in mice
A humanized version of Foxp2 affects cortico-basal ganglia circuits in mice
A humanized version of Foxp2 does not affect ultrasonic vocalization in adult mice
A humanized version of Foxp2 affects ultrasonic vocalization in adult female and male mice
The derived FOXP2 variant of modern humans was shared with Neandertals
Human TKTL1 implies greater neurogenesis in frontal neocortex of modern humans than Neanderthals
is homologous to an RNA-binding protein and is specifically expressed in the developing motor system
The human pancreatic islet transcriptome: expression of candidate genes for type 1 diabetes and the impact of pro-inflammatory cytokines
Nova1 is a master regulator of alternative splicing in pancreatic beta cells
NOVA1 prevents overactivation of the unfolded protein response and facilitates chromatin access during human white adipogenesis
Paraneoplastic syndromes involving the nervous system
The neuronal RNA-binding protein Nova-2 is implicated as the autoantigen targeted in POMA patients with dementia
Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability
CLIP identifies Nova-regulated RNA networks in the brain
HITS-CLIP yields genome-wide insights into brain alternative RNA processing
Nova regulates GABA(A) receptor gamma2 alternative splicing via a distal downstream UCAU-rich intronic splicing enhancer
NOVA1 acts on Impact to regulate hypothalamic function and translation in inhibitory neurons
Common molecular pathways mediate long-term potentiation of synaptic excitation and slow synaptic inhibition
Integrative modeling defines the Nova splicing-regulatory network and its combinatorial controls
Response to comment on ‘Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment’
Efficient high-precision homology-directed repair-dependent genome editing by HDRobust
The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo
The neuronal splicing factor nova co-localizes with target RNAs in the dendrite
NOVA2-mediated RNA regulation is required for axonal pathfinding during development
Sequence-specific RNA binding by a Nova KH domain: implications for paraneoplastic disease and the fragile X syndrome
Crystal structures of Nova-1 and Nova-2 K-homology RNA-binding domains
Molecular population genetics of sequence length diversity in the Adh region of Drosophila pseudoobscura
Statistical tests for detecting positive selection by utilizing high-frequency variants
Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph
Fast and accurate estimation of selection coefficients and allele histories from ancient and modern DNA
marks gastric atrophy and shows evidence of adaptive gene loss in humans
A point mutation in the FMR-1 gene associated with fragile X mental retardation
Essential role for KH domains in RNA binding: impaired RNA binding by a mutation in the KH domain of FMR1 that causes fragile X syndrome
a female germ cell-specific tumor suppressor gene in Caenorhabditis elegans
affect a conserved domain also found in Src-associated protein Sam68
The onconeural antigen Nova-1 is a neuron-specific RNA-binding protein
the activity of which is inhibited by paraneoplastic antibodies
Binding of a novel SMG-1-Upf1-eRF1-eRF3 complex (SURF) to the exon junction complex triggers Upf1 phosphorylation and nonsense-mediated mRNA decay
Human Upf1 is a highly processive RNA helicase and translocase with RNP remodelling activities
The mechanism of eukaryotic translation initiation and principles of its regulation
Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase
A METTL3–METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation
From unwinding to clamping—the DEAD box RNA helicase family
Differential NOVA2-mediated splicing in excitatory and inhibitory neurons regulates cortical development and cerebellar function
Protein-RNA and protein-protein recognition by dual KH1/2 domains of the neuronal splicing factor Nova-1
Nova autoregulation reveals dual functions in neuronal splicing
Microstimulation in different parts of the periaqueductal gray generates different types of vocalizations in the cat
Stimulation of the midbrain periaqueductal gray modulates preinspiratory neurons in the ventrolateral medulla in the rat in vivo
The midbrain periaqueductal gray control of respiration
Brain stem integration of vocalization: role of the midbrain periaqueductal gray
Anatomical study of the final common pathway for vocalization in the cat
Integrated defence reaction elicited by excitatory amino acid microinjection in the midbrain periaqueductal grey region of the unrestrained cat
Flight and immobility evoked by excitatory amino acid microinjection within distinct parts of the subtentorial midbrain periaqueductal gray of the cat
The emotional motor system and micturition control
GABAergic control of micturition within the periaqueductal grey matter of the male rat
The role of the periaqueductal grey in vocal behaviour
The midbrain periaqueductal gray as an integrative and interoceptive neural structure for breathing
The contribution of periaqueductal gray in the regulation of physiological and pathological behaviors
Genome-wide association studies establish that human intelligence is highly heritable and polygenic
highly polygenic and associated with FNBP1L
CTag-PAPERCLIP reveals alternative polyadenylation promotes cell-type specific protein diversity and shifts Araf isoforms with microglia activation
The human language-associated gene SRPX2 regulates synapse formation and vocalization in mice
Sociability and synapse subtype-specific defects in mice lacking SRPX2
AUTS2 regulation of synapses for proper synaptic inputs and social communication
Truncating mutations in NRXN2 and NRXN1 in autism spectrum disorders and schizophrenia
Regulated intron removal integrates motivational state and experience
Mouse vocal communication system: are ultrasounds learned or innate
The neural control of vocalization in mammals: a review
Midbrain periaqueductal gray and vocal patterning in a teleost fish
The neurobiology of vocal communication in marmosets
Discrete subregions of the rat midbrain periaqueductal gray project to nucleus ambiguus and the periambigual region
Effects of midbrain lesions on lordosis and ultrasound production
Role of the periaqueductal grey in vocal expression of emotion
The effects of brainstem lesions on vocalization in the squirrel monkey
A specialized neural circuit gates social vocalizations in the mouse
Ultrasonic vocalisation emitted by infant rodents: a tool for assessment of neurobehavioural development
Neonatal behaviors associated with ultrasonic vocalizations in mice (mus musculus): a slow-motion analysis
Functional ontogeny of hypothalamic Agrp neurons in neonatal mouse behaviors
Male mice song syntax depends on social contexts and influences female preferences
Development of social vocalizations in mice
Detecting Bimodality in Astronomical Datasets
mixtools: an R package for analyzing mixture models
mothers rush: does maternal responsiveness affect the amount of ultrasonic vocalizations in mouse pups
Differences in patterns of pup care in Mus musculus domesticus
Effects of previous experience and parity in XLII inbred mice
Ultrasonic vocalizations in mouse models for speech and socio-cognitive disorders: insights into the evolution of vocal communication
Chabout, J., Jones-Macopson, J. & Jarvis, E. D. Eliciting and analyzing male mouse ultrasonic vocalization (USV) songs. J. Vis. Exp. https://doi.org/10.3791/54137 (2017)
Waidmann, E. N., Yang, V. H. Y., Doyle, W. C. & Jarvis, E. D. Mountable miniature microphones to identify and assign mouse ultrasonic vocalizations. bioRxiv 2024.02.05.579003 https://doi.org/10.1101/2024.02.05.579003 (2024)
The temporal organization of mouse ultrasonic vocalizations
Quantifying ultrasonic mouse vocalizations using acoustic analysis in a supervised statistical machine learning framework
and men: the mouse ultrasonic song system has some features similar to humans and song-learning birds
Longer metaphase and fewer chromosome segregation errors in modern human than Neanderthal brain development
Reduced purine biosynthesis in humans after their divergence from Neandertals
Ultraconserved elements in the human genome
GC-biased gene conversion drives accelerated evolution of ultraconserved elements in mammalian and avian genomes
Bayesian inference of ancient human demography from individual genome sequences
A novel reticular node in the brainstem synchronizes neonatal mouse crying with breathing
A functionally and anatomically bipartite vocal pattern generator in the rat brain stem
Large-scale mapping of vocalization-related activity in the functionally diverse nuclei in rat posterior brainstem
Social cognition and the evolution of language: constructing cognitive phylogenies
Vocal labeling of others by nonhuman primates
Social context increases ultrasonic vocalizations during restraint in adult mice
Spatial organization of receptive fields in the auditory midbrain of awake mouse
Ultrasonic emissions: Do they facilitate courtship of mice
Female mice ultrasonically interact with males during courtship displays
Ultrasonic vocalizations emitted during dyadic interactions in female mice: a possible index of sociability
OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds
Systematic discovery of regulated and conserved alternative exons in the mammalian brain reveals NMD modulating chromatin regulators
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data
CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data
Analysis of Mouse Vocal Communication (AMVOC): a deep
analysis and classification of ultrasonic vocalisations
The Y-maze for assessment of spatial working and reference memory in mice
Statistical method for testing the neutral mutation hypothesis by DNA polymorphism
Hitchhiking under positive Darwinian selection
Wang, W. RockefellerUniversity/popgen_dbsnp: Popgen_v2. (Zenodo, 2024). https://doi.org/10.5281/ZENODO.14367749
Download references
Yuhki Saito for providing guidance on the method and analysis of the transcriptome experiments
J Lomax Boyd for assistance in the design of the playback experiments
We are deeply grateful to the Rockefeller University Resource Centers: the CRISPR and genome editing center
the Transgenic and Reproductive Technology Center and the Genomics Resource Center
David Reich for critical review and constructive comments
as well as members of the Darnell lab for discussions of the manuscript
Japan Society for the Promotion of Science postdoctoral fellowship for research abroad (J.S.P.S.) (YT)
NIH Awards NINDS Outstanding Investigator Award R35NS097404 (R.B.D.)
Keck Foundation Award and NIH Transformative Research Award R01DC018691 (E.D.J.)
are Howard Hughes Medical Institute Investigators
This research was supported by US National Institutes of Health grant R35-GM127070 (to A.S.) and the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory
The content is solely the responsibility of the authors and does not necessarily represent the official views of the US National Institutes of Health
The Laboratory of Molecular Neuro-oncology
The Laboratory of Neurogenetics of Language
The Laboratory of Biochemistry and Molecular Biology
Conceptualization: R.B.D.; Methodology: Y.T.; Investigation: Y.T.; Statistics on population genetics: J.D.L.
A.S.; Visualization: Y.T.; Funding acquisition: Y.T.
Download citation
DOI: https://doi.org/10.1038/s41467-025-56579-2
Metrics details
A Correction to this article was published on 01 April 2025
The etiology of congenital heart disease (CHD) is complex
comprising both genetic and environmental factors
the genetic etiology remains largely elusive
Trio exome sequencing identified a heterozygous FLT4 splice site variant in two families with respectively tetralogy of Fallot (TOF)
and variable CHD comprising both the TOF spectrum and aortic coarctation
Sanger sequencing on cDNA confirmed aberrant splicing for the c.985+1G > A variant
transcriptome sequencing uncovered altered splicing for the c.1657+6T > C variant
our study establishes FLT4 splice site variants as a molecular cause of both left and right-sided isolated CHD
RNA-sequencing emerges as a valuable technique in unraveling the missing inheritability of CHD
Prices may be subject to local taxes which are calculated during checkout
Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study
All variants have been submitted to the Clinvar database and can be accessed using the following accession numbers: SCV005407801 for NM_182925.5: c.1657+6T > C
and SCV005407803 for NM_182925.5: c.985+1G > A
The original online version of this article was revised: Author Tim Van Damme’s name was incorrectly written as Tim Vandamme
A Correction to this paper has been published: https://doi.org/10.1038/s41431-025-01831-y
Moons P, Sluysmans T, De Wolf D, Massin M, Suys B, Benatar A, et al. Congenital heart disease in 111 225 births in Belgium: birth prevalence, treatment and survival in the 21st century. Acta Paediatr. 2009;98:472–7. https://doi.org/10.1111/J.1651-2227.2008.01152.X
Jang MY, Patel PN, Pereira AC, Willcox JAL, Haghighi A, Tai AC, et al. Contribution of previously unrecognized RNA splice-altering variants to congenital heart disease. Circ Genom Precis Med. 2023;16:224. https://doi.org/10.1161/CIRCGEN.122.003924
Lambrechts D, Devriendt K, Driscoll DA, Goldmuntz E, Gewillig M, Vlietinck R, et al. Low expression VEGF haplotype increases the risk for tetralogy of Fallot: a family based association study. J Med Genet. 2005;42:519–22. https://doi.org/10.1136/JMG.2004.026443
Škorić-Milosavljević D, Lahrouchi N, Bosada FM, Dombrowsky G, Williams SG, Lesurf R, et al. Rare variants in KDR, encoding VEGF receptor 2, are associated with tetralogy of Fallot. Genet Med. 2021;23:1952–60. https://doi.org/10.1038/s41436-021-01212-y
Kawasaki T, Kitsukawa T, Bekku Y, Matsuda Y, Sanbo M, Yagi T, et al. A requirement for neuropilin-1 in embryonic vessel formation. Development. 1999;126:4895–902. https://doi.org/10.1242/DEV.126.21.4895
Stalmans I, Lambrechts D, De Smet F, Jansen S, Wang J, Maity S, et al. VEGF: a modifier of the del22q11 (DiGeorge) syndrome? Nat Med. 2003;9:173–82. https://doi.org/10.1038/nm819
Page DJ, Miossec MJ, Williams SG, Monaghan RM, Fotiou E, Cordell HJ, et al. Whole exome sequencing reveals the major genetic contributors to non-syndromic tetralogy of Fallot Europe PMC Funders Group. Circ Res. 2019;124:553–63. https://doi.org/10.1161/CIRCRESAHA.118.313250
Jin SC, Homsy J, Zaidi S, Lu Q, Morton S, Depalma SR, et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat Genet. 2017. https://doi.org/10.1038/ng.3970
Tabib A, Talebi T, Ghasemi S, Pourirahim M, Naderi N, Maleki M, et al. A novel stop-gain pathogenic variant in FLT4 and a nonsynonymous pathogenic variant in PTPN11 associated with congenital heart defects. Eur J Med Res. 2022;27:286. https://doi.org/10.1186/s40001-022-00920-8
Reuter MS, Jobling R, Chaturvedi RR, Manshaei R, Costain G, Heung T, et al. Haploinsufficiency of vascular endothelial growth factor related signaling genes is associated with tetralogy of Fallot. Genet Med. 2018;21:1001–7. https://doi.org/10.1038/s41436-018-0260-9
Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q.Genome Aggregation Database Consortium et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:19. https://doi.org/10.1038/s41586-020-2308-7
Richards S, Aziz N, Bale S, Bick D, Das S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. https://doi.org/10.1038/gim.2015.30
den Dunnen JT, Dalgleish R, Maglott DR, Hart RK, Greenblatt MS, Mcgowan-Jordan J, et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat. 2016;37:564–9. https://doi.org/10.1002/HUMU.22981
Martin FJ, Amode MR, Aneja A, Austine-Orimoloye O, Azov AG, Barnes I, et al. Ensembl 2023. Nucleic Acids Res. 2023;51:D933–41. https://doi.org/10.1093/NAR/GKAC958
Ttir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. https://doi.org/10.1093/bib/bbs017
Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Azaro Pinto BL´, Salazar GA, et al. InterPro in 2022. Nucleic Acids Res. 2023;51. https://doi.org/10.1093/nar/gkac993
Gordon K, Spiden SL, Connell FC, Brice G, Cottrell S, Short J, et al. FLT4/VEGFR3 and Milroy disease: novel mutations, a review of published variants and database update. Hum Mutat. 2012. https://doi.org/10.1002/humu.22223
Monaghan RM, Naylor RW, Flatman D, Kasher PR, Williams SG, Keavney BD. FLT4 causes developmental disorders of the cardiovascular and lymphovascular systems via pleiotropic molecular mechanisms. Cardiovasc Res. 2024. https://doi.org/10.1093/cvr/cvae104
Fontana F, Haack T, Reichenbach M, Knaus P, Puceat M, Abdelilah-Seyfried S. Antagonistic activities of Vegfr3/Flt4 and notch1b fine-tune mechanosensitive signaling during zebrafish cardiac valvulogenesis. Cell Rep. 2020;32. https://doi.org/10.1016/J.CELREP.2020.107883
Truty R, Ouyang K, Rojahn S, Garcia S, Colavin A, Hamlington B, et al. Spectrum of splicing variants in disease genes and the ability of RNA analysis to reduce uncertainty in clinical interpretation. Am J Hum Genet. 2021;108:696. https://doi.org/10.1016/J.AJHG.2021.03.006
Wai HA, Lord J, Lyon M, Gunning A, Kelly H, Cibin P, et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet Med. 2020;22. https://doi.org/10.1038/s41436
Download references
The authors thank the families for their kind availability in sharing the findings within the scientific community
This project was supported by a grant “Scientific research on heart diseases (2023)” from the Philanthropic Center Pelicano to BC and by a Research Grant of the Research Foundation—Flanders (G035620N) to BC
BC is a senior clinical investigator of the Research Foundation—Flanders
Kristof Vandekerckhove & Joseph Panzer
and SV; writing—original draft preparation
All authors have read and agreed to the published version of the manuscript
This study was conducted in accordance with the 1984 Declaration of Helsinki and its subsequent revisions
The legal guardians of the individuals involved in this study provided written informed consent for the disclosure of case details
a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law
Download citation
DOI: https://doi.org/10.1038/s41431-025-01788-y
Metrics details
The approval of splice-switching oligonucleotides with phosphorodiamidate morpholino oligomers (PMOs) for treating Duchenne muscular dystrophy (DMD) has advanced the field of oligonucleotide therapy
PMOs encounter challenges such as poor tissue uptake
thereby affecting patient’s prognosis and quality of life
we have developed a PMOs-based heteroduplex oligonucleotide (HDO) technology
This innovation involves a lipid-ligand-conjugated complementary strand hybridized with PMOs
significantly enhancing delivery to key tissues in mdx mice
and serum creatine kinase by restoring internal deleted dystrophin expression
PMOs-based HDOs normalized cardiac and CNS abnormalities without adverse effects
Our technology increases serum albumin binding to PMOs and improves blood retention and cellular uptake
Here we show that PMOs-based HDOs address the limitations in oligonucleotide therapy for DMD and offer a promising approach for diseases amenable to exon-skipping therapy
however since PMO cannot cross the blood-brain barrier (BBB)
it is not expected to have a therapeutic effect on these symptoms
we have developed a new type of HDO with a PMO in place of the gapmer-type ASO and a different intracellular mechanism from that of conventional HDOs
We also assessed whether PMO/HDO improves treatment efficacy using a dystrophic mdx mouse model
a Structure of phosphorodiamidate morpholino oligomers (PMOs) duplexed with lipid ligand (tocopherol (Toc) or cholesterol (Cho))-conjugated complementary strand
b Confirmation of annealing between PMOs and the complementary strand with lipid ligands electrophoresed on a 16% acrylamide gel
d) Pharmacokinetics of PMO after intravenous injection of a single 100 mg/kg (11.88 μmol/kg) PMO dose or molar equivalent of Toc-HDO or Chol-HDO
The hybridization-based ELISA shows the pharmacokinetic (c)
and biodistribution (d) data in mdx mice (n = 4) injected with PMO
lipid conjugated PMO/HDO with mouse albumin
HDOs showed highly significant enhancements in the binding affinity for albumin for which the parent PMO showed no affinity
PMO conjugated directly with cholesterol could be synthesized
thereby preventing their administration to mdx mice
This experience highlighted the benefit of attaching lipids to PMO with a complementary strand
a Timeline of PMO or HDO administration and animal sacrifice
b Detection of exon 23-skipped dystrophin mRNA in the heart
and skeletal muscles of mdx mice 2 weeks after once-weekly systemic intravenous (IV) injections for a total of 1
or Chol-HDO at a dose equimolar to PMO (11.88 μmol/kg)
(n = 4–9 per group) Data are presented as mean ± S.E.M
c Time course of exon 23-skipped dystrophin mRNA 14
or 112 days after five injections of PMO or Chol-HDO (11.88 μmol/kg) in indicated muscles of mdx mouse (n = 4–9 per group)
a Images of dystrophin immunostaining (red) in indicated muscles 2 weeks after the fifth PMO dose (100 mg/kg) or Chol-HDO at an equimolar dose to PMO (11.88 μmol/kg)
b Quantification of dystrophin-expressing fibers (%) to the total number of fibers in the indicated tissues (n = 4 per group)
and (d) percentage of centrally nucleated fibers (CNF) in quadriceps 2 weeks after five weekly injections of phosphate-buffered saline (PBS)
e Representative images of caveolin 3 immunostaining (red) in quadriceps femoris (QF) counterstained with DAPI (blue) to evaluate CNF and CSA
f Western blot showing robust dystrophin expression in the heart and QF from HDO-treated mice
Data are presented as mean ± S.E.M and were analyzed using one-way analysis of variance followed by Tukey’s tests (b–d)
Production of dystrophin was confirmed using western blot analysis (Fig. 3f)
We found that PMO/HDOs restored markedly higher levels of dystrophin than PMOs in the heart and QF
compared with those in wild-type B10 control mice
a Serum Creatine Kinase (CK) levels in mice injected once weekly for a total of 5 doses with PBS or HDOs (11.88 μmol/kg)
Serum CK levels are reduced after treatment with Toc- or Chol-HDO
correlating with the levels of dystrophin restoration (n = 5-9 per group)
b Forelimb grip test (n = 9–15 per group) and (c) treadmill test (n = 4–7 per group) performances evaluated in mdx mice injected once weekly for a total of 5 doses with PBS or HDOs (11.88 μmol/kg)
ECG abnormalities observed in mdx are prevented in the treated mdx mice (d) QTc and (e) QRS duration (n = 6–11 per group)
f Quantification of the heart fibrosis-stained regions in the left ventricle (n = 4 per group)
Transverse sections revealing the level of the papillary muscles 8 weeks (2 months) after once weekly for a total of 5 doses of PBS or HDOs (11.88 μmol/kg)
g Representative images of the heart after Masson’s trichrome staining in the left ventricle
Data are presented as mean ± S.E.M and were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (a–f)
Chol-HDO treatment normalized both functions of mdx mice to the level of those in B10 mice
a Detection of exon 23-skipped dystrophin mRNA in the whole brain of mdx mice 2 weeks after once weekly for a total of 5 doses of PMO or Chol-HDO (11.88 μmol/kg) (n = 4 per group)
b Duration of tonic immobility (freezing) expressed as a percentage of freezing time (n = 4–6 per group)
c Total horizontal movement distance traveled (distance run in 10 min) (n = 4–6 per group)
d Representative trajectory diagram of B10 and mdx mice treated with PBS
a–c And were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (b
a Detection of exon 23-skipped dystrophin mRNA in the indicated muscle of mdx mice after five weekly SC injections of Chol-HDO (11.88 μmol/kg) (n = 4–7 per group)
b Serum CK levels in mice with five weekly SC injections of Chol-HDO (11.88 μmol/kg) (n = 4–9 per group)
c Treadmill test (n = 4–7 per group) and (d) forelimb grip test (n = 4–13 per group) were also evaluated in mdx mice subcutaneously injected five times with Chol-HDO (11.88 μmol/kg)
Data are presented as mean ± S.E.M (a–d) and were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (b–d)
a Structure of the cholesterol-conjugated complementary strand DNA gap and 2’-OMe gap
b Detection of exon 23-skipped dystrophin mRNA in the indicated tissues of mdx mice 2 weeks after weekly injections for a total 5 doses of Chol-HDO (11.88 μmol/kg) with DNA gap or 2’-OMe gap (n = 4–6 per group)
c Serum CK levels (n = 4–9 per group) and (d) forelimb grip test (n = 4–13 per group) results in mice treated with Chol-HDO (11.88 μmol/kg) with DNA gap or 2’-OMe gap
b–d and were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (c
small foci of inflammatory cell infiltration were observed in the liver parenchyma of mdx mice treated with any of the interventions
No other significant lesions were observed in mdx mice treated with Chol-HDO
there was occasional increased size heterogeneity of hepatocyte nuclei
no significant lesions were noted in mdx mice following treatment with PBS or Chol-HDO
mdx mice treated with Toc-HDO occasionally showed a slight increase in cellular density in glomeruli
the normalization of cardiac dysfunction with robust expression of dystrophin in the heart of mdx mice indicate a potential for improved prognosis in patients with DMD
The present results show that Chol-HDO improved freezing and the movement distance traveled in response to restraint in the mdx mice
This suggests that the expression of dystrophin contributes to this normalization not only in the skeletal muscle but also in the CNS
Improving the freezing behavior could be associated with improving CNS/psychological symptoms in patients with DMD
the DLS results showed that no particle formation was observed in PMO/HDO
Cholesterol and tocopherol-conjugation increased PMO delivery to the liver and kidneys
such as altered serum hepatic or renal function indices
or adverse clinical outcomes were observed during the course of our experiments (up to 4 months after the last injection)
even following administration of multiple high doses (100 mg/kg; 11.88 μmol/kg)
SC administration of PMO/HDO induced a slightly weaker skipping effect than IV administration; however
improved functioning was observed with the former
long-term SC administration is expected to have higher efficacy and provides the option of self-administration
these results indicate that HDO technology may represent a new avenue for novel exon-skipping drugs for DMD and other multisystemic disorders
The basic concept of HDO originally assumed that the complementary strand is cleaved by RNase H in the cell. However, as shown in Supplementary Fig. 1A
the complementary strand of this new type of HDO might be cleaved by other RNases
DNase would cleave the complementary strand in case of use natural DNA instead of natural RNA in center portion of the complementary strand
PMO/HDOs consisting of a complementary strand fully composed of 2’OMe modifications showed no in vivo skipping activity
likely because the complementary strand was not cleaved owing to the high resistance of 2’-OMe RNA to nucleases
the increased skipping efficiency achieved by single dosing may not correspond with the extreme increase in PMO concentration (100–150-fold) within the muscles of mice treated with PMO/HDO
We initially postulated that the complementary strand separation was poor
most of the complementary strand was likely already separated from the PMO/HDO in the muscle tissue since ISH and HELISA use the complementary strand of the PMO sequence as a probe and binds only to single-stranded PMO
the endosomal escape of PMO from PMO-HDO may be inefficient
increased delivery into necrotic fibers might be unproductive
PMO was distributed in normal-sized muscle fibers but was highly abundant
especially in necrotic fibers and small-diameter fibers that appeared to be regenerating fibers
necrotic and regenerating fibers were relatively absent
suggesting that long-term administration may decrease the concentration of PMO in the QF
as it was taken up by normal-sized muscle fibers
ligands must be developed that will be preferentially taken up by normal-sized muscle fibers
we have developed a new type of lipid-conjugated HDO using parent PMOs
resulting in a functionally normal motor phenotype in a mouse model of DMD
including the normalization of abnormalities in cardiovascular and behavioral symptoms
Although further optimization of intracellular complementary strand cleavage is necessary
these PMO/HDO properties make it particularly attractive as a treatment for patients with DMD and other genetic diseases affecting the heart
who are eligible for exon-skipping therapy
All complementary strands for the experiment were synthesized by GeneDesign (Osaka
PMOs target the donor splice site of exon 23 (+7–18) of the mouse dystrophin pre-mRNA
All animals were maintained on a 12 h light/12 h dark cycle in a pathogen-free animal facility (temperature: 18–24 °C; humidity: 40–70%) with free access to food (CLEA Rodent Diet CE-2
6–8 week-old males) were injected intravenously in the retro-orbital sinus or subcutaneously once per week with AONs
They were randomly assigned to experimental or control groups
All studies were conducted in accordance with the ethical guidelines of Tokyo Medical and Dental University
and in strict compliance with the Fundamental Guidelines for Proper Conduct of Animal Experiment and Related Activities in Academic Research Institutions as set forth by the Ministry of Education
Approval for the experiments was granted by TMDU (Approval number A2022-085A)
Experiments are in accordance with the ARRIVE guidelines
All possible efforts were made to minimize the number of animals used and to alleviate their discomfort
Each antisense oligonucleotide (AON) against exon 23 of the dystrophin gene was dissolved in PBS (stock concentrations: 2 mM)
and 11.88 μmol/kg of each AON was injected into the retro-orbital sinus once weekly for a total of 1
mice were sacrificed under anesthesia with 4% isoflurane (Wako
Total RNA was extracted from cells or muscle tissues using ISOGEN 2 (NIPPON GENE
and 300 ng or 500 ng of total RNA was processed using the QIAGEN OneStep RT-PCR Kit (QIAGEN
according to the manufacturer’s instructions
The primer sequences were mEx22F 5’-ATCCAGCAGTCAGAAAGCAAA-3’ and mEx24R 5’-CAGCCATCCATTTCTGTAAGG-3’ for amplification from exons 22 to 24
The PCR conditions were 50 °C for 30 min and 95 °C for 15 min
The PCR bands were analyzed using Bioanalyzer 2100 (Agilent
and the resulting PCR bands were extracted using a QIAquick Gel extraction Kit (QIAGEN
Nederland) for direct sequencing using an ABI 3100 (Thermo Fisher Scientific
Skipping efficiency was calculated using the following formula:
[(molality of skipped translation products) × 100% / (molality of skipped translation products + molality of unskipped translation products)]
Ten-micrometer cryosections were cut from flash-frozen muscle using the Leica CM3050 S
Germany) placed on MAS-coated glass slides (Matsunami Glass Industrial
and blocked for 1 h with 5% goat serum (S-1000 Vector Laboratories
USA) in PBS or mouse-on-mouse blocking buffer containing mouse IgG blocking reagent (#MKB-2213 Vector Laboratories) at room temperature (~25 °C)
The tissues were then incubated with the following primary antibodies overnight at 4 °C: rabbit anti-dystrophin against C-terminus (ab15277
tissue sections were treated with secondary antibodies (Alexa Fluor 546 goat anti-mouse #A-11030 and Alexa Fluor 568 goat anti-rabbit #A-11011 Thermo Fisher Scientific) for 1 h (1:1000)
Coverslips were mounted using VECTASHIELD Antifade Mounting Medium with 4’,6-diamidino-2-phenylindole (DAPI) (VECTOR H-1200)
Centrally nucleated fibers and the myofiber cross-sectional area of QF were measured using HALO® Image Analysis (Indica labs
In situ hybridization of the morpholino oligomer was performed using the miRNAscope® HD (RED) Assay Kit (Advanced Cell Diagnostics [ACD]
Fresh-frozen QF muscles and heart tissues were sectioned (10 μm) using the Leica CM3050 S
Germany) and placed on SuperFrost Plus slides (Thermo Fisher Scientific)
Slides were fixed in 4% paraformaldehyde for 1 h at 4 °C
and washed in 100% ethanol twice for 5 min each
Sections were then incubated in hydrogen peroxide for 10 min at room temperature and washed in distilled water twice for 1 min each
Protease IV treatment was applied to the tissues
which were incubated in a chamber at room temperature for 30 min
Slides were incubated with PMO sequence probes (SR-ASO-PMO-S1
Further amplification of the target probe signal was performed according to the manufacturer’s instructions (miRNAscope HD detection protocol Amp 1-6)
Fast red was prepared by combining Red-A and Red-B (1:60)
and incubated for 10 min at room temperature
and imaged on SLIDEVIEW VS200 (Evident Co.
the hybridization of probe to PMO was performed according to the miRCURY® LNA® miRNA ISH Optimazation Kits (FFPE) protocol (Qiagen
The LNA-modified probe with overhang for signal amplification was designed and synthesized at Qiagen
The sequence is 5-CTCTATATCTCCAACCCGAATTTCAGGTAAGCCGAGGTTT-3’
Slides were washed in 5X SSCT for 10 min and then incubated in amplification buffer (5X SSCT
10% low molecular weight dextran sulfate) for 30 min at room temperature Hybridization chain reaction was performed in amplification buffer containing 6 μmol/L hairpin amplifiers
Slides were mounted in Prolong Diamond with DAPI (P36966
Sections were imaged on an STELLARIS 8 confocal microscope (Leica Microsystems
Imaging analysis was conducted with Imaris (ver
Proteins were extracted from sliced frozen muscle using SDS buffer (0.125 M Tris/HCl with pH 6.4
and 0.005% BPB) supplemented with 1X Protease Inhibitor (Complete Mini
The normal control lysate from a B10 mouse was prepared as a reference for dystrophin expression
Subject and normal control lysates were denatured at 100 °C for 3 min and electrophoresed in a Tris-acetate 3–8% gradient polyacrylamide gel (Thermo Fisher Scientific) at 150 V for 40 min
The proteins were transferred to a PVDF membrane (Bio-Rad
After incubation with 5% nonfat milk (NACALAI TESQUE
the membrane was incubated at 4 °C overnight with an anti-dystrophin antibody (ab15277
The membrane was washed three times for 10 min each in TBST and incubated with a horseradish peroxidase-conjugated anti-rabbit (#111-035-003
1:10,000) antibodies (Jackson ImmunoResearch
followed by six washes with TBST and allowed to develop with West Dura Extended Duration Substrate (Thermo Fisher Scientific)
The immunoreactive bands were detected using the ChemiDoc XRS Image System (Bio-Rad Laboratories
PMOs in the blood were quantified using sera from blood samples of treated mdx mice or age-matched samples
homogenized in RIPA buffer (Thermo Fisher Scientific)
and incubated with proteinase K (NACALAI TESQUE
lysates were spun at maximum speed for 15 min to collect the supernatant
Probes with complementary sequences to the PMOs used were synthesized and conjugated at the 5′ and 3′ ends with digoxigenin and biotin
The first and last seven nucleotides of the probes were fully phosphorothioated
PMO amounts were calculated in reference to a standard curve constructed from fluorescence values given by the respective PMO standards
Muscle strength was measured using the forelimb grip test with a grip strength meter (MK-380CM/FM; Muromachi Kikai
The average of three measurements per animal per time point was recorded for comparative analysis
Running sessions were performed on a four-lane motorized treadmill equipped with electric shock (Treadmill for Rats and Mice Model MK-680 S; Muromachi Kikai Co.
Ltd) at least 1 week after the last injection
The treadmill was set at an inclination of 0°
All mice were acclimated to the treadmill belt for 5 min before starting to walk and then forced to run at 5 m/min for 5 min
the speed was increased by 1 m/min each minute
The test was stopped when the mouse was exhausted
or spent 5 continuous seconds on the shock grid
This was quantified as the time the mouse moved <0.5 cm (2 cm) per second
Unconditioned fear responses induced by this acute stress were characterized by periods of tonic immobility (freezing) during the 10 min recording period
Body-surface electrocardiography (ECG) was performed in a blinded manner, as described previously63
ECG in lead II configuration was recorded using the PowerLab system (PowerLab 4/26
ADInstruments) under anesthesia with 1% isoflurane
ECG parameters were obtained by averaging those from three different ECGs
The QT interval was defined as an interval between the onset of the QRS complex and the end of the negative component of the T wave
QTc was calculated using the following formula: QTc = QT interval (ms)/√(RR interval (s) × 10)
Blood chemistry was assessed in the SRL Laboratory (Tokyo
and the blood cell count was measured at LSI Medicine (Tokyo
The size and size distribution of nanoparticles were determined via DLS using a Zetasizer Pro instrument (Malvern Instrument Ltd.
The sample solutions were loaded into a low-volume cuvette (ZEN2112)
and the measurements were carried out with a detection angle of 173° and a temperature of 25 °C
PMO and PMO/HDO were labeled at the 5′ terminus of the PMO with Alexa Fluor 647
Binding measurements were conducted in 1X DPBS (Gibco) in flat-bottom non-binding 96-well plates (Corning
Alexa 647-labeled PMO or PMO/HDO were added at a final concentration of 2 nM to solutions of albumin ranging from sub nM to mM concentrations
Solutions were equilibrated at least 30 min before measuring fluorescence polarization (λex = 635 nm
λem = 675 nm) on a Tecan InfiniteM1000 Pro (Baldwin Park
The GraphPad Prism 9 software (version 9.5.0) and Microsoft Excel for Microsoft 365 MSO (version 2211) were used to analyze the data
All numerical values were presented as mean ± standard error of the mean (SEM)
Differences among more than three groups were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests
Statistical differences between two groups were analyzed using the Student’s one-tailed t-test
Significant levels were set at *P < 0.05
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
All data supporting the findings of this study are available within the paper and supplementary information files. Source data are provided with this paper
Neonatal screening for Duchenne muscular dystrophy: a novel semiquantitative application of the bioluminescence test for creatine kinase in a pilot national program in Cyprus
Diagnosis and management of Duchenne muscular dystrophy
and pharmacological and psychosocial management
Current and emerging treatment strategies for Duchenne muscular dystrophy
and smooth muscle failure in Duchenne muscular dystrophy
Function and genetics of dystrophin and dystrophin-related proteins in muscle
NS-065/NCNP-01: an antisense oligonucleotide for potential treatment of exon 53 skipping in Duchenne muscular dystrophy
Systemic administration of the antisense oligonucleotide NS-065/NCNP-01 for skipping of exon 53 in patients with Duchenne muscular dystrophy
and efficacy of viltolarsen in boys With Duchenne muscular dystrophy amenable to exon 53 skipping: a phase 2 randomized clinical trial
Eteplirsen for the treatment of Duchenne muscular dystrophy
Increased dystrophin production with golodirsen in patients with Duchenne muscular dystrophy
and pharmacokinetics of casimersen in patients with Duchenne muscular dystrophy amenable to exon 45 skipping: a randomized
Viltolarsen in Japanese Duchenne muscular dystrophy patients: a phase 1/2 study
Low dystrophin levels increase survival and improve muscle pathology and function in dystrophin/utrophin double-knockout mice
Dose-dependent restoration of dystrophin expression in cardiac muscle of dystrophic mice by systemically delivered morpholino
One-year treatment of morpholino antisense oligomer improves skeletal and cardiac muscle functions in dystrophic mdx mice
Functional correction in mouse models of muscular dystrophy using exon-skipping tricyclo-DNA oligomers
Cognitive dysfunction in Duchenne muscular dystrophy: a possible role for neuromodulatory immune molecules
Control of backbone chemistry and chirality boost oligonucleotide splice switching activity
Palmitic acid conjugation enhances potency of tricyclo-DNA splice switching oligonucleotides
Antibody-oligonucleotide conjugates enter the clinic
Enhanced exon skipping and prolonged dystrophin restoration achieved by TfR1-targeted delivery of antisense oligonucleotide using FORCE conjugation in mdx mice
A cell-penetrating peptide enhances delivery and efficacy of phosphorodiamidate morpholino oligomers in mdx mice
Peptide-conjugated oligonucleotides evoke long-lasting myotonic dystrophy correction in patient-derived cells and mice
The endosomal escape vehicle platform enhances delivery of oligonucleotides in preclinical models of neuromuscular disorders
DNA/RNA heteroduplex oligonucleotide for highly efficient gene silencing
Cholesterol-functionalized DNA/RNA heteroduplexes cross the blood–brain barrier and knock down genes in the rodent CNS
DNA/RNA heteroduplex oligonucleotide technology for regulating lymphocytes in vivo
Development and application of an ultrasensitive hybridization-based ELISA method for the determination of peptide-conjugated phosphorodiamidate morpholino oligonucleotides
Combined microRNA and mRNA detection in mammalian retinas by in situ hybridization chain reaction
Triggered amplification by hybridization chain reaction
and oxidative phosphorylation in mdx mouse muscular dystrophy
Human dystrophin expression corrects the myopathic phenotype in transgenic mdx mice
Functional rescue of dystrophin-deficient mdx mice by a chimeric peptide-PMO
Truncated dystrophin ameliorates the dystrophic phenotype of mdx mice by reducing sarcolipin-mediated SERCA inhibition
Myostatin propeptide gene delivery by adeno-associated virus serotype 8 vectors enhances muscle growth and ameliorates dystrophic phenotypes in mdx mice
Multiple pathological events in exercised dystrophic mdx mice are targeted by pentoxifylline: outcome of a large array of in vivo and ex vivo tests
Electrocardiographic findings in mdx mice: a cardiac phenotype of Duchenne muscular dystrophy
Regulation of the cardiac L-type Ca2+ channel by the actin-binding proteins alpha-actinin and dystrophin
Challenges and opportunities in dystrophin-deficient cardiomyopathy gene therapy
Evolution of the mdx mouse cardiomyopathy: physiological and morphological findings
Increased connective tissue growth factor associated with cardiac fibrosis in the mdx mouse model of dystrophic cardiomyopathy
Adeno-associated virus serotype-9 microdystrophin gene therapy ameliorates electrocardiographic abnormalities in mdx mice
Blunted cardiac beta-adrenergic response as an early indication of cardiac dysfunction in Duchenne muscular dystrophy
The association of cardiac muscle necrosis and inflammation with the degenerative and persistent myopathy of MDX mice
Early right ventricular fibrosis and reduction in biventricular cardiac reserve in the dystrophin-deficient mdx heart
Accelerating the mdx heart histo-pathology through physical exercise
A deficit of brain dystrophin impairs specific amygdala GABAergic transmission and enhances defensive behaviour in mice
Serum transaminase levels in boys with Duchenne and becker muscular dystrophy
Ratio of creatine kinase to alanine aminotransferase as a biomarker of acute liver injury in dystrophinopathy
Dystrophins carrying spectrin-like repeats 16 and 17 anchor nNOS to the sarcolemma and enhance exercise performance in a mouse model of muscular dystrophy
Functional deficits in nNOSmu-deficient skeletal muscle: myopathy in nNOS knockout mice
Mechanisms of palmitic acid-conjugated antisense oligonucleotide distribution in mice
Conjugation of hydrophobic moieties enhances potency of antisense oligonucleotides in the muscle of rodents and non-human primates
Self-assembly into nanoparticles is essential for receptor mediated uptake of therapeutic antisense oligonucleotides
Morpholino oligomer-mediated exon skipping averts the onset of dystrophic pathology in the mdx mouse
Mdx mice inducibly expressing dystrophin provide insights into the potential of gene therapy for Duchenne muscular dystrophy
A morpholino oligomer therapy regime that restores mitochondrial function and prevents mdx cardiomyopathy
Repeat-dose toxicology evaluation in cynomolgus monkeys of AVI-4658
a phosphorodiamidate morpholino oligomer (PMO) drug for the treatment of Duchenne muscular dystrophy
Efficacy of multi-exon skipping treatment in Duchenne muscular dystrophy dog model neonates
Low immunogenicity of LNP allows repeated administrations of CRISPR-Cas9 mRNA into skeletal muscle in mice
Innate and conditioned reactions to threat in rats with amygdaloid lesions
Characterization of the interactions of chemically-modified therapeutic nucleic acids with plasma proteins using a fluorescence polarization assay
Download references
Abe for their care of the laboratory animals
We appreciate the access to the slide scanner VS200 (Olympus) granted by the Research Core of Tokyo Medical and Dental University
This research was supported by the Basic Science and Platform Technology Programs for Innovative Biological Medicine (18am0301003h0005) and Advanced Biological Medicine (23am0401006h0005) to T.Y.
from the Japan Agency for Medical Research and Development (AMED) and a JSPS KAKENHI Grant-in-Aid for Scientific Research (A) (19H01016 to T.N
(A) (22H00440 to T.N.) and (B) (16H05221 to T.N.) from the Ministry of Education
Science and Technology (MEXT) of Japan (Tokyo)
This research was also supported by the Joint Research Fund with Takeda Pharmaceutical Company
These authors contributed equally: Juri Hasegawa
Department of Neurology and Neurological Science
Graduate School of Medical and Dental Sciences
NucleoTIDE and PepTIDE Drug Discovery Center
Department of Bio-informational Pharmacology
performed the experiments and analyzed data
All authors have read and approved the final manuscript
has ongoing collaborations with Takeda Pharmaceutical Co.
and serves as an academic advisor for Rena Therapeutics Inc
The other authors declare no competing interests
are paid employees of Takeda Pharmaceutical Company Limited
Download citation
DOI: https://doi.org/10.1038/s41467-024-48204-5
Heteroduplex oligonucleotide technology was applied to morpholino oligomers and normalized motor
and central nervous system functions of Duchenne muscular dystrophy model mice
Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology
Metrics details
CAG 3' splice sites (3'ss) are more than twice as frequent as TAG 3'ss
The greater abundance of the former has been attributed to a higher probability of exon skipping upon cytosine-to-thymine transitions at intron position -3 (-3C > T) than thymine-to-cytosine variants (-3T > C)
molecular mechanisms underlying this bias and its clinical impact are poorly understood
base-pairing probabilities (BPPs) and RNA secondary structures were compared between CAG 3'ss that produced more skipping of downstream exons than their mutated UAG versions (termed “laggard” CAG 3'ss) and UAG 3'ss that resulted in more skipping than their mutated CAG counterparts (canonical 3'ss)
The laggard CAG 3’ss showed significantly higher BPPs across intron-exon boundaries than canonical 3'ss
The difference was centered on positions -5 to -1 relative to the intron-exon junction
the region previously shown to exhibit the strongest high-resolution ultraviolet crosslinking to the small subunit of auxiliary factor of U2 snRNP (U2AF1)
RNA secondary structure predictions suggested that laggard CAG 3'ss were more often sequestered in paired conformations and in longer stem structures while canonical 3'ss were more frequently unpaired
the excess of base-pairing at 3'ss has a potential to alter the hierarchy in intrinsic splicing efficiency of human YAG 3'ss from canonical CAG > UAG to non-canonical UAG > CAG
to modify the clinical impact of transitions at this position and to change their classification from pathogenic to benign or vice versa
the translational potential of genomics has remained limited by our inability to reliably predict which variants lead to actionable phenotypes
often prohibiting accurate diagnosis and counseling
This challenge is magnified by realization that even identical mutations at the same position of traditional splice-site consensus sequences
may have unexpected or even opposite phenotypic effects
the importance of intramolecular RNA base-pairing at individual splice-site positions is poorly understood
if this bias can explain the higher abundance of CAG 3'ss in mammalian genomes
no ab initio tools exist to identify anomalous YAG 3'ss that increase exon skipping when mutated from UAG to CAG
it has been unclear why the non-canonical -3T alleles can
promote exon inclusion as compared to the -3C alleles and thus become superior to canonical -3C alleles
this study has compared base-pairing probabilities (BPPs) of transcript pairs with laggard CAG 3'ss and canonical 3'ss
Even the small number of informative transcript pairs (n = 22) has revealed higher average BPPs across intron-exon junctions of laggard CAG 3'ss (ie
3'ss with the hierarchy in splicing efficiency of UAG > CAG) as compared to 3'ss with the canonical order CAG > UAG
The maximum discrimination was observed for positions -5 to -1 relative to 3'ss
These results suggest that the accessibility of pyrimidine bases at position -3 can control not only splicing efficiency but also clinical outcome of these mutations on a scale benign to pathogenic or vice versa
PU values range between 0 (completely base-paired) and 1 (completely unpaired)
BPP and PU values were averaged and means and standard deviations of the two groups of 3'ss were compared using an unpaired t-test
Nucleotide distribution across 3'ss and distribution of paired and unpaired nucleotides in most stable structures was compared using χ2 tests
b Average BPPs across laggard and canonical 3'ss (n = 4 and 18
respectively) and their allelic counterparts
dashed lines represent BPP values for alternate pyrimidines
Asterisks represent the region with significant differences between the two groups of 3'ss
d Mean BPP values for the indicated regions and associated P-values for McCaskill (c) and CONTRAFold (d) algorithms
e Mean BPPs across laggard and canonical 3'ss and across their allelic counterparts
PU values across laggard and canonical 3' splice sites
a Mean PU values across 3'ss sequences of the two groups of 3'ss
b Comparison of average PU values for the indicated positions relative to the intron-exon junction (vertical line)
A lack of adenines and uridines between positions -3 and -20 of laggard CAG 3' splice sites
uridines upstream (a) but not downstream (b) of the intron-exon boundary
χ2 values for 2 × 4 contingency tables were 31.7 (P < 0.0001) (a) and 4.2 (P = 0.2) (b)
c Adenines were absent just upstream of laggard CAG 3'ss
d Nucleotide distribution upstream of 195,404 human 3’ss
independent profiling of BPPs and PU values across the two groups of 3'ss identified significant increase in predicted base-pairing in the group of transcripts where UAG 3'ss were
more efficient than their CAG 3'ss versions
RNA secondary structure has a potential to alter the hierarchy in intrinsic efficiency of human 3’ss from canonical CAG > UAG(>AAG > GAG) to non-canonical UAG > CAG(>AAG > GAG) (3'ss in parentheses have not been tested in this work)
the same C > T or T > C mutations at position -3 of 3'ss can have distinct phenotypic outcomes in different sequence and structural contexts
rather than secondary structure constraints could switch CAG versus UAG 3'ss preferences in splicing efficiency
Establishing a larger group of laggard CAG 3’ss and their local folding patterns should help define molecular interactions at this position and 3'ss responses to dynamic secondary structure formation across intron-exon junctions
the more abundant and generally more splice-proficient CAG 3'ss may turn into “laggards” and skip the downstream exon more than their intrinsically weaker UAG 3’ss counterparts
This work identifies a collection of 3'ss that provide a starting point for exploring structural requirements for their usage in much greater detail
which should facilitate our understanding of structural interactions that involve position -3
These results also suggest that prediction of splicing and clinical outcomes of DNA mutations and polymorphisms in mammalian genes may never be 100% accurate without considering RNA structure of primary transcripts
particularly across traditional and auxiliary splicing motifs
The data generated or analyzed during this study can be found within this article and its supplementary file
Standards and guidelines for the interpretation of sequence variants: a join consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
Alternative splicing caused by RNA secondary structure
Effects of RNA secondary structure on alternative splicing of pre-mRNA: is folding limited to a region behind the transcribing RNA polymerase
Short artificial hairpins sequester splicing signals and inhibit yeast pre-mRNA splicing
Pre-mRNA secondary structures influence exon recognition
Conserved RNA secondary structures promote alternative splicing
New insights into RNA secondary structure in the alternative splicing of pre-mRNAs
RNA secondary structure mediates alternative 3’ss selection in Saccharomyces cerevisiae
The role of short RNA loops in recognition of a single-hairpin exon derived from a mammalian-wide interspersed repeat
RNA structure in splicing: An evolutionary perspective
Quantitative evaluation of all hexamers as exonic splicing elements
A broad analysis of splicing regulation in yeast using a large library of synthetic introns
RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression
U2AF binding selects for the high conservation of the C
Comprehensive splice-site analysis using comparative genomics
Exonic splicing code and coordination of divalent metals in proteins
Scanning and competition between AGs are involved in 3’ splice site selection in mammalian introns
A classification model relative to splicing for variants of unknown clinical significance: application to the CFTR gene
c.2381-3T>C mutation of DMD gene: a rare SNP without significant pathogenicity
Unexpected inactivation of acceptor consensus splice sequence by a -3 C to T transition in intron 2 of the CFTR gene
In vitro splicing deficiency induced by a C to T mutation at position -3 in the intron 10 acceptor site of the phenylalanine hydroxylase gene in a patient with phenylketonuria
A leaky splicing mutation affecting SMN1 exon 7 inclusion explains an unexpected mild case of spinal muscular atrophy
Splicing of phenylalanine hydroxylase (PAH) exon 11 is vulnerable: molecular pathology of mutations in PAH exon 11
Genetic modulation of RNA splicing with a CRISPR-guided cytidine deaminase
Clinical characteristics of POC1B-associated retinopathy and assignment of pathogenicity to novel deep intronic and non-canonical splice site variants
Two autopsy cases of sudden unexpected death from Dravet syndrome with novel de novo SCN1A variants
Analysis of base-pairing probabilities of RNA molecules involved in protein-RNA interactions
CentroidFold: a web server for RNA secondary structure prediction
Prediction of RNA secondary structure using generalized centroid estimators
The equilibrium partition function and base pair binding probabilities for RNA secondary structures
Centroid estimation in discrete high-dimensional spaces with applications in biology
Using RNA secondary structures to guide sequence motif finding towards single-stranded regions
A rule of seven in Watson-Crick base-pairing of mismatched sequences
Amount of RNA secondary structure required to induce an alternative splice
A mutational analysis of the polypyrimidine tract of introns
Effects of sequence differences in pyrimidine tracts on splicing
A T to C mutation in the polypyrimidine tract of the exon 9 splicing site of the RB1 gene responsible for low penetrance hereditary retinoblastoma
Differences in allelic distribution of two polymorphisms in the VHL-associated gene CUL2 in pheochromocytoma patients without somatic CUL2 mutations
Structural basis for polypyrimidine tract recognition by the essential pre-mRNA splicing factor U2AF65
Cloning and intracellular localization of the U2 small nuclear ribonucleoprotein auxiliary factor small subunit
Inhibition of msl-2 splicing by Sex-lethal reveals interaction between U2AF35 and the 3’ splice site AG
Functional recognition of the 3’ splice site AG by the splicing factor U2AF35
Both subunits of U2AF recognize the 3’ splice site in Caenorhabditis elegans
Functional significance of U2AF1 S34F mutations in lung adenocarcinomas
Wild-Type U2AF1 Antagonizes the splicing program characteristic of U2AF1-mutant tumors and is required for cell survival
Elucidation of the aberrant 3’ splice site selection by cancer-associated mutations on the U2AF1
Download references
Generation of data used in this work was funded by inventor royalties (to IV) from a licensing agreement unrelated to this work (US patents 9,714,422 and 10,196,639) personally contributed to the University of Southampton and administered as a research grant by the same institution
Funding for open access charge was provided by the University of Southampton
IV conceived the scientific question addressed in this work
analyzed and interpreted the data and wrote the manuscript
The author declares no competing interests
Download citation
DOI: https://doi.org/10.1038/s10038-024-01308-8
Metrics details
A new study combines massively parallel assays
transcriptomics and biophysical modeling to provide a framework for analyzing the effects of compounds that modulate pre-mRNA splicing
The results lend important insights into the mechanisms of drug action and facilitate the design of splicing therapies
Download references
The Barcelona Institute of Science and Technology
Jorge Herrero-Vicente & Juan Valcárcel
Institució Catalana de Recerca i Estudis Avançats (ICREA)
is a member of the scientific advisory boards of Remix Therapeutics
Reprints and permissions
Download citation
DOI: https://doi.org/10.1038/s41589-024-01678-2
Find tour dates and live music events for all your favorite bands and artists in your city
announced it has moved into the plugin space with the acquisition of Spitfire Audio
a prominent UK-based maker of virtual instrument libraries
Spitfire Audio has developed a reputation for its virtual instrumentation among composers
The company’s virtual instrument libraries have been used in recordings by high-profile creatives and organizations such as Hans Zimmer
and producers and are committed to celebrating and supporting their work,” said Kakul Srivastava
Our shared vision is to develop tools that expand—not replace—human creativity.”
“We’ve always focused on inspiring people to create extraordinary music,” said Paul Thomson
The financial terms of the acquisition were not disclosed
citing a person with knowledge of the deal
reported that the transaction closed at about $50 million
New Artist Signings
Find tour dates and live music events for all your favorite bands and artists in your city! Get concert tickets, news and more!
CelebrityAccess provides unparalleled, detailed information on over 50,000 Entertainers, Speakers, Celebrities, and their representatives, as well as hundreds of thousands of records for venues, agents, and managers.
Get the best and latest industry news, data, new artist signings, insider commentary and more, delivered right to your inbox!
Volume 17 - 2024 | https://doi.org/10.3389/fnmol.2024.1412964
This article is part of the Research TopicCome as You R(NA): Post-transcriptional Regulation Will Do the RestView all 12 articles
Pediatric neurological disorders are frequently devastating and present unmet needs for effective medicine
The successful treatment of spinal muscular atrophy with splice-switching antisense oligonucleotides (SSO) indicates a feasible path to targeting neurological disorders by redirecting pre-mRNA splicing
One direct outcome is the development of SSOs to treat haploinsufficient disorders by targeting naturally occurring non-productive splice isoforms
The development of personalized SSO treatment further inspired the therapeutic exploration of rare diseases
This review will discuss the recent advances that utilize SSOs to treat pediatric neurological disorders
ASO gapmers have been recently approved by the FDA to treat SOD1 ALS
This review focuses on the progress of SSOs in targeting pediatric neurological conditions
The natural occurrence of alternative splicing and the identification of splicing enhancers/suppressors indicate that re-directing splicing holds its own dimension for gene regulation and therapeutic intervention
About 10% of exonic human mutations are estimated to cause diseases by disrupting pre-mRNA splicing (Soemedi et al., 2017). While whole-exome sequencing detects exonic and splice site mutations for genetically defined disorders, integrating transcriptome and whole-genome analysis uncovers more causal intronic splicing mutations (Cummings et al., 2017; Kim et al., 2023)
These splicing mutations frequently introduce aberrant splice sites that lead to loss-of-function or hypomorphic alleles
Disease-causing splicing variants can be suppressed to treat human diseases
Redirecting splicing can also lead to beneficial effects by (1) bypassing nonessential inframe exons that carry pathogenic mutations
(2) bypassing an additional exon to correct the reading frame
and (3) redirecting alternative splicing to promote functional isoform production
This review focuses on recently reported SSO strategies targeting pediatric neurological conditions and the value of genetic tools
(A) Variant-specific SSOs suppress the gain of cryptic splice sites in the introns (top) or exons (bottom)
Bypassing a non-essential exon that carries pathogenic mutations (top)
skipping an additional non-essential exon (orange) to correct the translational reading frame (middle)
or switching for a functional mutually exclusive exon (bottom)
(C) Gene-specific SSOs treating recessive or haploinsufficient conditions by converting naturally occurring non-functional (or unstable) splice isoforms to functional isoforms
Genetic suppression of non-productive splicing
mimicking the maximal and constant effect of an SSO
can provide in vivo evidence about the neurological and organismal functions of the non-productive isoform
to what extent the protein level can be restored
and whether it can rescue phenotypes associated with loss-of-function alleles
Recessive diseases frequently involve loss-of-function alleles, and several SSO-based therapeutic strategies have been reported (Figure 1)
SSO can promote the inclusion or exclusion of specific exons
it is straightforward to use SSOs to suppress undesired exons
SSOs can also block splicing silencers and promote exon inclusion to make functional proteins
These works suggest a promising exon-skipping strategy for CLN3 (Δex78) Batten’s disease
This work paved the path for expedited genetic diagnosis and individualized drug development
over 1,400 SCN1A mutations have been reported as pathogenic in ClinVar (a public database to aggregate genetic variants and clinical findings)
and a significant fraction of such mutations cause severe loss of function (frameshift
causal mutations for neurodevelopmental disorders have been reported in dozens to hundreds of genes
targeting such a vast number of mutated alleles using variant- or exon-specific SSOs presents a daunting task
the naturally occurring non-productive alternative splicing in disease-associated genes can be targetable switches for gene regulation
Clinical trials of the SSO in Dravet patients are ongoing and appear promising
These studies suggest that targeting the non-productive isoform can be a promising therapeutic approach
indicating the existence of a splicing enhancer for the A3SS-NMD
This study indicates that switching functionally equivalent but mutually exclusive exons can bypass deleterious effects and demonstrates the application of a human organoid-rat chimeric system
and completely blocking AS-NMD may have undesired consequences
mimicking the maximum effect of SSO treatment
can rescue or alleviate phenotypes in mouse models of human diseases
The active research and collaborative efforts in the field are drawing a promising future for SSO therapy
The author(s) declare that financial support was received for the research
XZ was supported by grants from the National Institutes of Health (DP2-GM137423 and R01-MH130594)
The author would like to thank Oriane Mauger
and Michael Kiebler for the opportunity to contribute this review; and thank Runwei Yang and other colleagues for critically reading this manuscript
The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest
All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations
Any product that may be evaluated in this article
or claim that may be made by its manufacturer
is not guaranteed or endorsed by the publisher
Consensus guidelines for the design and in vitro preclinical efficacy testing N-of-1 exon skipping antisense oligonucleotides
PubMed Abstract | Crossref Full Text | Google Scholar
Crossref Full Text | Google Scholar
Barbosa-Morais
The evolutionary landscape of alternative splicing in vertebrate species
Spliced segments at the 5′ terminus of adenovirus 2 late mRNA
Bhattacharyya
Crossref Full Text | Google Scholar
The TREAT-NMD DMD global database: analysis of more than 7,000 Duchenne muscular dystrophy mutations
Widespread intron retention in mammals functionally tunes transcriptomes
RNA-based translation activators for targeted gene upregulation
Aberrant inclusion of a poison exon causes Dravet syndrome and related SCN1A-associated genetic epilepsies
Crossref Full Text | Google Scholar
Therapeutic efficacy of antisense oligonucleotides in mouse models of CLN3 batten disease
Protracted CLN3 batten disease in mice that genetically model an exon-skipping therapeutic approach
splicing with antisense oligonucleotides reduces toxic amyloid-beta production
Antisense oligonucleotide therapeutic approach for Timothy syndrome
An amazing sequence arrangement at the 5′ ends of adenovirus 2 messenger RNA
Crossref Full Text | Google Scholar
Dawicki-McKenna
Mapping PTBP2 binding in human brain identifies SYNGAP1 as a target for therapeutic splice switching
Crossref Full Text | Google Scholar
Correction of prototypic ATM splicing mutations and aberrant ATM function with antisense morpholino oligonucleotides
structure and function of approved oligonucleotide therapeutics
PubMed Abstract | Crossref Full Text | Google Scholar
Very mild muscular dystrophy associated with the deletion of 46% of dystrophin
NOVA-dependent regulation of cryptic NMD exons controls synaptic protein levels after seizure
Google Scholar
Nusinersen versus sham control in infantile-onset spinal muscular atrophy
Crossref Full Text | Google Scholar
Alternative splicing: increasing diversity in the proteomic world
PubMed Abstract | Crossref Full Text | Google Scholar
Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells
mutations in SYNGAP1 in autosomal nonsyndromic mental retardation
Understanding genotypes and phenotypes in epileptic encephalopathies
PubMed Abstract | Crossref Full Text | Google Scholar
Crossref Full Text | Google Scholar
Peripheral SMN restoration is essential for long-term rescue of a severe spinal muscular atrophy mouse model
Google Scholar
Targeted deubiquitination rescues distinct trafficking-deficient ion channelopathies
RNA therapeutics: beyond RNA interference and antisense oligonucleotides
Mitochondrial clearance and maturation of autophagosomes are compromised in LRRK2 G2019S familial Parkinson's disease patient fibroblasts
Crossref Full Text | Google Scholar
Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements
Rescue of hearing and vestibular function by antisense oligonucleotides in a mouse model of human deafness
Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans
Integrative functional genomic analysis of human brain development and neuropsychiatric risks
Antisense oligonucleotide modulation of non-productive alternative splicing upregulates gene expression
Developmental attenuation of neuronal apoptosis by neural-specific splicing of Bak1 microexon
Disrupted auto-regulation of the spliceosomal gene SNRPB causes cerebro-costo-mandibular syndrome
Crossref Full Text | Google Scholar
Targeted intron retention and excision for rapid gene regulation in response to neuronal activity
Towards a therapy for Angelman syndrome by targeting a long non-coding RNA
Nusinersen versus sham control in later-onset spinal muscular atrophy
Evolutionary dynamics of gene and isoform regulation in mammalian tissues
Antisense oligonucleotide-mediated correction of CFTR splicing improves chloride secretion in cystic fibrosis patient-derived bronchial epithelial cells
Crossref Full Text | Google Scholar
Evaluating human mutation databases for "treatability" using patient-customized therapy
A single nucleotide difference that alters splicing patterns distinguishes the SMA gene SMN1 from the copy gene SMN2
Alternative splicing in neurodegenerative disease and the promise of RNA therapies
Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing
Panagiotakos
Aberrant calcium channel splicing drives defects in cortical differentiation in Timothy syndrome
ASO targeting RBM3 temperature-controlled poison exon splicing prevents neurodegeneration in vivo
Crossref Full Text | Google Scholar
Targeted exon skipping of a CEP290 mutation rescues Joubert syndrome phenotypes in vitro and in a murine model
a selective survival of motor Neuron-2 (SMN2) gene splicing modifier for the treatment of spinal muscular atrophy (SMA)
Antisense oligonucleotides: the next frontier for treatment of neurological disorders
Crossref Full Text | Google Scholar
SMN gene duplication and the emergence of the SMN2 gene occurred in distinct hominids: SMN2 is unique to Homo sapiens
Satterstrom
A single ataxia telangiectasia gene with a product similar to PI-3 kinase
The novel neuronal ceroid lipofuscinosis gene MFSD8 encodes a putative lysosomal transporter
RNA-targeting splicing modifiers: drug development and screening assays
Splicing defects in the ataxia-telangiectasia gene
ATM: underlying mutations and consequences
RNA isoform screens uncover the essentiality and tumor-suppressor activity of ultraconserved poison exons
Evaluation of 36 patients from Turkey with neuronal ceroid lipofuscinosis: clinical
neuroradiological and histopathologic studies
Van Nostrand
A large-scale binding and functional map of human RNA-binding proteins
a PDZ domain-containing protein expressed in the inner ear sensory hair cells
Safety and efficacy of drisapersen for the treatment of Duchenne muscular dystrophy (DEMAND II): an exploratory
PubMed Abstract | Crossref Full Text | Google Scholar
Alternative isoform regulation in human tissue transcriptomes
Single-cell long-read sequencing in human cerebral organoids uncovers cell-type-specific and autism-associated exons
Coordination of alternative splicing and alternative polyadenylation revealed by targeted long read sequencing
Cell-type-specific alternative splicing governs cell fate in the developing cerebral cortex
PSD-95 is post-transcriptionally repressed during early neural development by PTBP1 and PTBP2
Citation: Zhang X (2024) Splice-switching antisense oligonucleotides for pediatric neurological disorders
Received: 06 April 2024; Accepted: 12 July 2024; Published: 25 July 2024
Copyright © 2024 Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY)
distribution or reproduction in other forums is permitted
provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited
in accordance with accepted academic practice
distribution or reproduction is permitted which does not comply with these terms
*Correspondence: Xiaochang Zhang, eGN6aGFuZ0B1Y2hpY2Fnby5lZHU=
Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations
Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher
94% of researchers rate our articles as excellent or goodLearn more about the work of our research integrity team to safeguard the quality of each article we publish
Splice SVP of Content Kenny Ochoa notes how the “phone is already a huge part of music making”
When you purchase through affiliate links on MusicTech.com, you may contribute to our site through commissions. Learn moreCredit: Splice
Music creation platform and sample library Splice has launched Splice Mic
a new update to Splice Mobile which allows creators to record vocals over instrumentals created within the app
Since its revamp in 2023
users have been able to use Splice Create’s AI power to generate arrangements – or Stacks – using loops and sounds from the Splice library
Splice Mic enables producers to record vocals or other instruments over the top of those Stacks
using the microphone built into their smartphone
noting how the “phone is already a huge part of music making”
“About one million users have made more than 28 million stacks so far
and now songwriters and producers can record vocal ideas over stacks of samples,” he says
And now those stacks can be merged with vocals.”
Splice has teamed up with songwriter and DJ Leland – who has worked with the likes of Troye Sivan
Ariana Grande and Charli XCX – and LA’s Laurelvale Studios
inviting teams of songwriters to create Stacks with Create on Splice Mobile
Songwriters invited to participate in the project – dubbed 60 Second Stack – include Madison Love (Lady Gaga
“We got the team together to see who could start the best new Stacks,” says Leland
Designed to make on-the-go collaboration easier than ever
Splice Mobile allows users to share ideas directly within the app
“Musicians are already using voice recording functions on their phones to capture ideas away from the studio,” Splice says
giving songwriters the creative depth of the Splice Sounds catalogue and Create.”
Splice Mic is just the latest push by Splice to make songwriting and creative workflows more seamless. Last year, PreSonus Studio One became the first DAW to integrate Splice
offering millions of royalty-free samples into the Studio One workflow
Splice Mic is now available in Splice Mobile. For more information, head to Splice.
Metrics details
Interpreting the clinical significance of putative splice-altering variants outside canonical splice sites remains difficult without time-intensive experimental studies
we introduce Parallel Splice Effect Sequencing (ParSE-seq)
a multiplexed assay to quantify variant effects on RNA splicing
We first apply this technique to study hundreds of variants in the arrhythmia-associated gene SCN5A
Variants are studied in ‘minigene’ plasmids with molecular barcodes to allow pooled variant effect quantification
including disease-relevant induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs)
The assay strongly separates known control variants from ClinVar
enabling quantitative calibration of the ParSE-seq assay
Using these evidence strengths and experimental data
we reclassify 29 of 34 variants with conflicting interpretations and 11 of 42 variants of uncertain significance
we show that many synonymous and missense variants disrupted RNA splicing
Two splice-altering variants in the assay also disrupt splicing and sodium current when introduced into iPSC-CMs by CRISPR-Cas9 editing
ParSE-seq provides high-throughput experimental data for RNA-splicing to support precision medicine efforts and can be readily adopted to study other loss-of-function genotype-phenotype relationships
functional assays of splice-altering variants have thus far not
high-throughput splicing assay would facilitate the reclassification of variants of uncertain significance (VUS) and conflicting interpretation (CI) variants in disease-associated genes
Enabling larger scale investigations into splice-altering variants in SCN5A would decrease clinical uncertainty when managing patients at risk for this potentially fatal arrhythmia syndrome
Parallel Splice Effect-sequencing (ParSE-seq)
to determine the splice-altering consequences of hundreds of intronic and exonic variants in SCN5A
We implement ParSE-seq for 244 SCN5A variants in human embryonic kidney (HEK293) cells and 224 variants in induced pluripotent-derived cardiomyocyte cells (iPSC-CM)
We calibrate the assay with nearly 50 ClinVar-annotated benign and pathogenic variants and compare our experimental outcomes to the in silico tool SpliceAI
and contribute functional data to help adjudicate 29 CI variants in ClinVar
we demonstrate that some missense variants may be incorrectly described as having normal function by conventional cDNA-based patch clamping assays that cannot assess splicing outcomes
This design enabled the linkage of reads to variants even if the variant was not included in the spliced transcript
E PSI for all WT exons in iPSC-CMs and HEK cells
Data are averaged across three replicates and error bars represent the standard error of the mean
A Example lollipop diagram showing variants superimposed along construct
The y-axis represents mean ΔPSI_norm across three biological replicates
and the x-axis represents genomic position along the exonic (box) and intronic (line) segments of the synthetic insert
blue intronic variants outside the 2-bp canonical splice sites
B Lollipop diagram showing distribution of ParSE-seq investigated variants in HEK cells
An average of three experimental replicates is shown
C Lollipop diagram showing distribution of ParSE-seq investigated variants in iPSC-CMs
D Waterfall plot of mean ΔPSI_norm by variant (N = 243)
Red dashed line corresponds to −50% normalized ΔPSI
E Spearman correlation (two-sided test) of mean ΔPSI_norm between HEK and iPSC-CMs (N = 207)
Error bars refer to 95% confidence interval
F Volcano plot of normalized ΔPSI and −log10(FDR)
Each dot represents a variant studied in iPSC-CMs (N = 243)
Most variants fall within normal (blue) or abnormal (red) quadrants
but some remain indeterminant due to statistical or biological ambiguity (gray)
G Barplot of ParSE-seq variant outcomes by variant mutation type in iPSC-CMs
2-bp indicates the conserved 2-base pair canonical splice sites AG-GT
Raw data for plots available in Source Data
multiple aberrant splicing events were observed at appreciable levels
Many of these altered transcripts resulted in changes to the reading frame
some small and large in-frame insertions and deletions were also observed
there were 614 SCN5A-BrS patients harboring a variety of SCN5A variants
there were a total of 27 splice-altering variants affecting 43 patients (functionally abnormal non-consensus splice variants or consensus splice site variants)
we proactively investigated 18 unique variants harbored by 36 patients
which complemented our previous low-throughput investigations
Correlation of normalized ΔPSI for non-canonical splice site variants against aggregate SpliceAI scores (N = 140; D)
Confidence interval fit using LOESS (see “Methods” section)
P values were determined using a Pearson correlation
In addition to aggregate SpliceAI scores and ParSE-seq ΔPSI_norm comparisons across the library, we also compared specific SpliceAI molecular predictions to observed ParSE-seq splice outcomes (Supplementary Fig. 13)
SpliceAI predictions typically matched experimental data for activation of cryptic splice sites (c.3358A > T
although in the ParSE-seq assay a different cryptic site may become activated (c.4220C > G) or exon skipping may result (c.4298G > T and c.3564G > T)
Some variants led to multiple splice aberrations in the ParSE-seq experiment (e.g.
exon skipping and exon truncation/intron retention)
despite only having a single SpliceAI predicted aberrant event (c.3358A > T and c.2024-11T > A)
D Classifications of conflicting interpretation (CI) variants using functional evidence
Calibration control outcomes and ClinVar classifications in Source Data
We classified 1 CI variant as LP due to its splice-altering effect
ParSE-seq can help identify a class of missense
splice-altering variants for which cDNA-based assays of protein function yield incorrect conclusions about variant pathogenicity
This result highlights that for missense variants
ParSE-seq can be used to complement traditional cDNA-based assays of protein function
we developed a high-throughput method to assess the splicing consequences of hundreds of variants (ParSE-seq)
We implemented barcoding of a pool of minigene plasmids to enable multiplexed splicing readouts using high-throughput sequencing and applied the method to study variants in the cardiac sodium channel gene SCN5A
we quantified variant effects on splicing for 224 variants
and detected 78 variants with abnormal splicing
We observed concordance of splicing results for 45/47 B/LB and P/LP variants
and we determined that our assay could be applied at the strong level in the ACMG classification scheme (BS3 and PS3) when assuming ClinVar as ground truth
Leveraging these calibrated strengths of evidence
We determined that the in silico tool SpliceAI has high
concordance with experimentally measured splicing effects
We also demonstrated examples of missense variants that had normal electrophysiology using conventional heterologous expression cDNA-based approaches but disrupt splicing
we showed that our ParSE-seq results predict aberrant splicing in a disease-relevant iPSC-CM model
with consequences at both the RNA and protein levels
We envision that ParSE-seq will be applicable to many disease genes and will be accessible using openly available computational pipelines and democratized gene synthesis available to the community
this method may help classify large sets of variants in Mendelian disease-associated genes that act through a loss-of-function mechanism
Although we validate two ParSE-seq splice-altering variants by CRISPR editing of the iPSC-CMs
most variants were tested only in multiplexed minigene assays
While we anticipate most splice-altering variants to result in loss-of-function (NaV1.5 peak current abrogation for SCN5A variants)
there may be alternative mechanisms revealed by functional assessment of the CRISPR-edited iPSC-CM model
while ParSE-seq quantifies broad molecular impacts such as exon skipping
it is possible that some splicing abnormalities may not have a detrimental effect
on downstream protein function (protein tolerant in-frame insertion/deletions)
there are no observed indel variants >3 amino acids that are reported as B/LB
The frequency of affected variant heterozygotes is difficult to ascertain based off ClinVar data alone
as detailed patient phenotypes and case counts are not routinely reported from submitting centers
There may be examples where the ParSE-seq minigene assay does not fully capture all nuances of biology at the endogenous locus
splicing regulatory motifs in the native context may have a long-distance effect not captured in the minigene-based assay
This incomplete ascertainment may lead to discordant results with in silico predictors for a subset of variants
exons using non-canonical 2-bp splice sites
and exons that were difficult to synthesize due to high GC content or restriction enzyme incompatibility were not included in the library
Exons that undergo extensive alternative splicing in the endogenous tissue (e.g.
and 24) may have low intrinsic PSI in the minigene assay
which may limit the use of the assay for these exons
we anticipate that ParSE-seq will be a useful method for to rapidly assessing variant splicing effects
Given the plethora of variants that may act through disrupting splicing
our method can be used to efficiently characterize the splicing effects of variants in disease-associated genes
The participant (male age 30–40) from which these cells were derived provided informed consent
As we were interested in variant-level effects
we did not consider sex/gender or genetic ancestry in the selection of this cell line
The Vanderbilt University Medical Center IRB (#9047) approved the use of the induced pluripotent stem cells used in this study
we conservatively assumed at least one affected patient per reported variant
The minigene-based assay requires an acceptor and donor splice site on each end of the test exon
and is therefore incompatible with the first or last coding exons (2 and 28)
SCN5A uses two instances of non-canonical AC/AT splice sites between exons 3 and 4
we did not study variants in these four exons or in adjacent intronic locations
we were unable to include plasmids with exon 15 due to synthesis incompatibility (high GC content) and exon 17 due to overlap of restriction enzymes used for barcoding
primers with AscI/MfeI sequences flanking the region of interest were used in a Q5 PCR reaction (NEB) following manufacturer’s protocol
The amplicon was PCR purified (Qiagen) per manufacturer’s protocol
pAG424 and the amplicon were then each digested with AscI and MfeI (NEB) at 37 °C for 1 h
followed by separation on a 1% agarose gel
and then purified following instructions from a Gel Extraction Kit (QIAGEN)
Each component was then ligated with supplies from a T4 ligation kit (NEB) for 1 h
followed by heat inactivation at 65 °C for 10 min
A 1 μL aliquot was then used to transform 50 μl competent cells (NEB)
and DNA extraction using a Spin Miniprep Kit (QiIAGEN)
The plasmids were sequence verified by Genewiz before use in the ParSE-seq assay
the double stranded DNA was then phenol/chloroform extracted and digested using AscI and MfeI (NEB)
and was again purified by phenol/chloroform extraction
The pool of minigene plasmids was also digested with AscI and MfeI and cleaned by gel extraction (QIAGEN)
The digested vector pool and barcode insert were ligated using T4 ligase (NEB)
The ligation product was PCR purified (QIAGEN) and electroporated into ElectroMax DH10B cells (ThermoFisher) using a Gene Pulser Electroporator (BioRad; 2.0 kV
The resulting bacterial culture was then grown overnight
and DNA was isolated by a maxiprep (QIAGEN) to yield the barcoded plasmid library
Barcode diversity was estimated by plating dilutions of the library on LB-ampicillin plates and counting colonies
and used to generate a SMRT Bell 3.0 library (PacBio) according to the manufacturer’s instructions
The library was sequenced with PacBio Sequel II 8M SMRT Cell by Maryland Genomics
We recorded 30 h of PacBio SMRT cell sequencing
To mitigate sequencing errors in the raw PacBio data
we only analyzed Circular Consensus Sequence (CCS) reads
A total of 4,136,990 CCS reads were obtained as fastq files
The median Q score was 48 across CCS reads
The barcode identity was assigned as the most frequently aligned insert if that insert represented more than 50% of the read counts
After implementation of these quality control cutoffs
284 of the 290 targeted plasmids were successfully detected in the plasmid pool
The PCR protocol included a single denaturation step of 98 °C for 30 s
touchdown with 10 cycles of 98 °C for 10 s
10 cycles of 65–55 °C (decreasing by 1 °C/cycle for 15 s
followed by an additional 20 cycles of 98 °C 10 s
followed by a final extension of 72 °C for 5 min
PCR amplicons were purified with a PCR purification kit (QIAGEN)
Libraries were then sequenced using Illumina NovaSeq paired-end 150 base sequencing to ~50M reads/sample
A diagram of the computational pipeline is presented in Supplementary Fig. 3
Reads were filtered for correct barcode prefix and suffix sequences and were divided into separate files by barcode as described above
each barcode was required to be present in at least 25 reads in each replicate to be included
The PSI metric was calculated using grep searches for splice junctions corresponding to the WT exon splicing to the reference exons in the R1 and R2 reads:
For variants that would alter the coding sequence of the WT exon
a bespoke R1 or R2 junction was created and used for those specific variants
PSIs were then averaged across barcodes for each variant
using the barcode-variant lookup table from the assembly step described above
We used the assigned PSI as the average PSI for each variant across 3 independent transfections into HEK or iPSC-CMs
This ΔPSI_norm value represents the change in splicing of the variant compared to the level of splicing of the corresponding WT exon
The value is normalized to the level of WT splicing to determine the percent change of splicing regardless of the baseline level of WT PSI
if a WT exon had a PSI of 80% and the variant exon had a PSI of 40%
A ΔPSI_norm value of −100% indicates a complete loss of normal splicing
whereas a value of 0 indicates identical splicing to WT
Comparison of WT and variant PSI (using mean of three replicate samples) used a two-sided t-test implemented in R
FDR were calculated using the R command p.adjust
To also account for statistical significance
variants with FDR < 0.1 and ΔPSI_norm < −50% were considered splice-altering
Variants with FDR > 0.1 and ΔPSI_norm ≥ −20% were considered non-splice altering
All other variants were labeled indeterminate in the assay
We excluded variants from our analysis if the standard error of the PSI among the three replicates was >0.15
and barplots were plotted in R using ggplot2 (see GitHub for code)
We used locally estimated scatterplot smoothing (LOESS) as a non-parametric regression model for comparing in silico splicing predictors with experimental ParSE-seq data
LOESS was selected as a smoothing method due to the largely bimodal distribution of our experimental data
A 95% confidence interval is displayed alongside the line of fit
LOESS was implemented in ggplot2 using default settings with ΔPSI_normvariant plotted as a function of aggregate SpliceAI predictions
Full code describing this analysis is available on GitHub
we calculate the likelihood ratio of pathogenicity (termed OddsPath) for both splice-altering (pathogenic) and non-splice-altering (benign) assay results
we removed all variants with indeterminate scores
we calculated benign and pathogenic OddsPath values using the equations:
The benign and pathogenic posterior P2 was then calculated:
Following the approach recommended by Brnich et al.24
if P2Pathogenic = 1 it was conservatively estimated to have one additional discordant variant (a functionally abnormal benign variant) by the following equation:
an additional discordant variant was included (a functionally normal pathogenic variant):
Each posterior was combined with the prior to derive an OddsPath and assign evidence for PS3 and BS3 criteria
In the manual iPSC-CM patch clamp experiments
the series resistance (Rs) was monitored using Seal Test (Clampex 10.9 software) to achieve a range of 5–10 MΩ
Current–voltage curves were generated by repeated voltage changes to the same cells
Two trials were performed for each cell line and data were then averaged across all measured cells
Two complementary oligonucleotides were phosphorylated and annealed using T4 PNK (NEB) per protocol
followed by simultaneous digestion and ligation of pX458 with BbsI-HF (NEB) and T4 ligase (NEB) per manufacturer’s protocol
The sample was transformed into competent E
and then colony expansion and miniprep (Qiagen) as described above
The cloned guide plasmid and a 151-nucleotide repair template bearing the desired change and PAM site variant were co-electroporated into dissociated iPSCs using the Neon Transfection System (ThermoFisher MPK5000)
and sorted for GFP+ cells using a BD Fortessa 5-laser instrument
DNA extracted using QuickExtract (Lucigen)
PCR amplified using primers mo198 and mo199
and Sanger sequenced to identify a colony with a heterozygous edit
During manual patch clamp experiments on iPSC-CMs
membrane resistance (Rm) was monitored throughout using the Membrane Test (Clampex 10.9)
We first optimized the electrode capacitance compensation on the amplifier
performed following giga-seal formation and before achievement of the whole-cell configuration
the capacitive transients were completely and well-compensated by ~80% when whole-cell capacitance compensation was enabled
we used Seal Test (Clampex 10.9) as an oscilloscope window for monitoring the current signal to achieve a reading close to 10 MΩ
Optimization of the capacitance compensation was an extremely important step for accurate Cm and Rm measurements in Membrane Test
To achieve high quality giga-seal formation before cell membrane break-in
we chose cells with giga-seal of 1–2 GΩ for the experiments
Whole-cell voltage-clamp experiments in iPSC-CMs were conducted at room temperature (22–23 °C)
Glass microelectrodes were heat polished to tip resistances of 0.5–1 MΩ
Data acquisition was carried out using MultiClamp 700B patch clamp amplifier and pCLAMP 10 software suite (Molecular Device Corp.
four-pole Bessel filter) and digitized at frequency of 2–20 kHz by using an analog-to-digital interface (DigiData 1550B
capacitance and series resistance were corrected ~80%
Voltage-clamp protocols used are shown on the figures
Electrophysiological data were analyzed using Clampfit 10 software and the figures were prepared by using Graphing & Analysis software OriginPro 8.5.1 (OriginLab Corp.
To provide better voltage control of sodium current
the external solution was K+-free and Ca2+-free with a lower sodium concentration (50 mmol/L)
The pipette (intracellular) solution had (in mM) NaF 5
To eliminate the overlapped L-/T-type inward calcium currents and outward potassium currents
and 200 µM 4-aminopyride) were added into the cell bath solution
cells were held at −100 mV and current was elicited with a 50-ms pulse from −100 to +40 mV in 10 mV increments
Current densities were expressed in the unit of pA/pF after normalization to cell size (pF)
generated from the cell capacitance calculated by the function of Membrane Test (OUT 0) in pCLAMP 10 software
The average capacitances of the iPSC-CMs were 56.9 ± 4.8 pF (WT)
The average membrane resistances of the iPSC-CMs were 1.54 ± 0.09 GΩ (WT)
and 1.55 ± 0.1 GΩ (c.4220G > C variant)
These parameters were not statistically significantly different from each other (p > 0.05)
We did not measure membrane potentials during the sodium current (INa) measurements under modified experimental conditions (see below)
However differentiated iPSC-CMs from the same population control iPSC line studied with physiological intra- and extracellular solutions had potentials ranging from −75 to −90 mV
with an average potential of −82.2 ± 1.2 mV
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
Strategic vision for improving human health at The Forefront of Genomics
Spectrum of splicing variants in disease genes and the ability of RNA analysis to reduce uncertainty in clinical interpretation
Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9 https://doi.org/10.1126/scitranslmed.aal5209 (2017)
Genetic diagnosis of Mendelian disorders via RNA sequencing
Standardized practices for RNA diagnostics using clinically accessible specimens reclassifies 75% of putative splicing variants
Use of splicing reporter minigene assay to evaluate the effect on splicing of unclassified genetic variants
Valenzuela-Palomo, A. et al. Splicing predictions, minigene analyses, and ACMG-AMP clinical classification of 42 germline PALB2 splice-site variants. J. Pathol. https://doi.org/10.1002/path.5839 (2021)
Functional classification of DNA variants by hybrid minigenes: identification of 30 spliceogenic variants of BRCA2 exons 17 and 18
Intronic CRISPR repair in a preclinical model of Noonan syndrome-associated cardiomyopathy
O’Neill, M. J. et al. Functional assays reclassify suspected splice-altering variants of uncertain significance in Mendelian channelopathies. Circ. Genom. Precis. Med. https://doi.org/10.1161/circgen.122.003782 (2022)
Tobert, K. E. et al. Genome sequencing in a genetically elusive multi-generational long QT syndrome pedigree identifies a novel LQT2-causative deeply intronic KCNH2 variant. Heart Rhythm https://doi.org/10.1016/j.hrthm.2022.02.004 (2022)
High-throughput splicing assays identify missense and silent splice-disruptive POU1F1 variants underlying pituitary hormone deficiency
High-throughput mutagenesis identifies mutations and RNA-binding proteins controlling CD19 splicing and CART-19 therapy resistance
A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions
Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency
Contribution of noncanonical splice variants to TTN truncating variant cardiomyopathy
Identification of pathogenic gene mutations in LMNA and MYBPC3 that alter RNA splicing
Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework
Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1
A calibrated functional patch-clamp assay to enhance clinical variant interpretation in KCNH2-related long QT syndrome
Saturation-scale functional evidence supports clinical variant interpretation in Lynch syndrome
Genome-wide association analyses identify new Brugada syndrome risk loci and highlight a new mechanism of sodium channel regulation in disease susceptibility
Reappraisal of reported genes for sudden arrhythmic death: evidence-based evaluation of gene validity for Brugada syndrome
Arrhythmic phenotypes are a defining feature of dilated cardiomyopathy-associated SCN5A variants: a systematic review
An international compendium of mutations in the SCN5A-encoded cardiac sodium channel in patients referred for Brugada syndrome genetic testing
Sudden cardiac arrest associated with use of a non-cardiac drug that reduces cardiac excitability: evidence from bench
Cryptic 5′ splice site activation in SCN5A associated with Brugada syndrome
Genetic effects on gene expression across human tissues
Enhancing rare variant interpretation in inherited arrhythmias through quantitative analysis of consortium disease cohorts and population controls
Genetics of congenital arrhythmia syndromes: the challenge of variant interpretation
Pathogenicity assignment of variants in genes associated with cardiac channelopathies evolve toward diagnostic uncertainty
Listening to silence and understanding nonsense: exonic mutations that affect splicing
Translation of human-induced pluripotent stem cells: from clinical trial in a dish to precision medicine
Quality and quantity control of gene expression by nonsense-mediated mRNA decay
Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes
Bersell, K. R. et al. Transcriptional dysregulation underlies both monogenic arrhythmia syndrome and common modifiers of cardiac repolarization. Circulation https://doi.org/10.1161/circulationaha.122.062193 (2022)
Functionally validated SCN5A variants allow interpretation of pathogenicity and prediction of lethal events in Brugada syndrome
SCN5A (NaV1.5) variant functional perturbation and clinical presentation: variants of a certain significance
Mechanism and modeling of human disease-associated near-exon intronic variants that perturb RNA splicing
SpliceVault predicts the precise nature of variant-associated mis-splicing
Biology of cardiac arrhythmias: ion channel protein trafficking
Deep mutational scanning: a new style of protein science
Variant interpretation: functional assays to the rescue
Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk
SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation
High-throughput discovery of trafficking-deficient variants in the cardiac potassium channel KV11.1
Deep mutational scan of an SCN5A voltage sensor
Using high-resolution variant frequencies to empower clinical genome interpretation
and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library
Measuring the activity of protein variants on a large scale using deep mutational scanning
Real-time DNA sequencing from single polymerase molecules
Examining sources of error in PCR by single-molecule sequencing
Wada, Y. et al. Common ancestry-specific ion channel variants predispose to drug-induced arrhythmias. Circulation https://doi.org/10.1161/circulationaha.121.054883 (2022)
Chemically defined generation of human cardiomyocytes
Bodbin, S. E., Denning, C. & Mosqueira, D. Transfection of hPSC-cardiomyocytes using Viafect™ transfection reagent. Methods Protoc. 3 https://doi.org/10.3390/mps3030057 (2020)
pROC: an open-source package for R and S+ to analyze and compare ROC curves
An openly available online tool for implementing the ACMG/AMP standards and guidelines for the interpretation of sequence variants
An improved platform for functional assessment of large protein libraries in mammalian cells
A platform for functional assessment of large variant libraries in mammalian cells
High-throughput reclassification of SCN5A variants
Dominant negative effects of SCN5A missense variants
Glazer, A. M. et al. Arrhythmia variant associations and reclassifications in the eMERGE-III sequencing study. Circulation https://doi.org/10.1161/circulationaha.121.055562 (2021)
O’Neill, M. J. et al. Multicenter clinical and functional evidence reclassifies a recurrent noncanonical filamin C splice-altering variant. Heart Rhythm https://doi.org/10.1016/j.hrthm.2023.05.006 (2023)
CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens
fastp: an ultra-fast all-in-one FASTQ preprocessor
Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype
The sequence alignment/map format and SAMtools
Matthew J. O’Neill, D. M. R., & Andrew M. Glazer. https://doi.org/10.5281/zenodo.13170911 (Zenodo
Download references
These authors jointly supervised this work: Dan M
Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART)
developed the ParSE-seq experimental methodology and completed all computational analysis
provided experimental support for manual patch clamp of iPSC-CMs
assisted in iPSC-CM CRISPR editing and cell culture
All authors reviewed and edited the manuscript
All remaining authors declare no competing interests
Download citation
DOI: https://doi.org/10.1038/s41467-024-52474-4
a royalty-free sample and loop marketplace and music creation platform
has appointed music industry veteran Kenny Ochoa as Senior Vice President of Content
Ochoa will oversee Splice’s global content team
Ochoa’s remit will also include responsibility for Splice’s content pipeline
which according to Splice will include “building out robust Artist partnerships and driving the platform’s overall industry outreach”
Kenny Ochoa joins Splice from Snap where he served as Head of Music Curation & Licensing (responsible for the platform’s music programming and curation in the US
so it was vital that we found someone with the relationships
taste and passion to help us build and prepare for the future of music creation,” said Splice CEO
someone with deep experience of both the tech and music industries
who is also empathetic to the creator community.”
Added Srivastava: “Kenny is what we were looking for
who is also empathetic to the creator community”
“I’m excited about where music might go in the framework of a creator first platform
Splice is already one of the most talked about
Ochoa said: “I’m excited about where music might go in the framework of a creator first platform
“Their success is fueled by their commitment to their catalog and the creator community
I am thrilled to be working with Kakul and the amazing team she’s assembled to build tools that will serve the next generation of music creators around the world.”
In 2021, Splice was valued at nearly USD $500 million after securing $55 million in funding, according to Bloomberg
In March, Splice launched a new mobile app experience
powered by AI technologyMusic Business Worldwide
Metrics details
Huntington’s disease (HD) is caused by a CAG repeat expansion in the HTT gene
the mechanisms leading to disrupted RNA processing in HD remain unclear
Here we identify TDP-43 and the N6-methyladenosine (m6A) writer protein METTL3 to be upstream regulators of exon skipping in multiple HD systems
Disrupted nuclear localization of TDP-43 and cytoplasmic accumulation of phosphorylated TDP-43 occurs in HD mouse and human brains
with TDP-43 also co-localizing with HTT nuclear aggregate-like bodies distinct from mutant HTT inclusions
The binding of TDP-43 onto RNAs encoding HD-associated differentially expressed and aberrantly spliced genes is decreased
m6A RNA modification is reduced on RNAs abnormally expressed in the striatum of HD R6/2 mouse brain
including at clustered sites adjacent to TDP-43 binding sites
Our evidence supports TDP-43 loss of function coupled with altered m6A modification as a mechanism underlying alternative splicing in HD
no disease-modifying treatment is available
Although abnormal interactions between HTT and RBPs were reported
suggesting disruption of RNA processing in HD
the mechanisms by which mHTT leads to alterations of RNA expression and splicing—a hallmark of HD and other neuropathological disorders—remain undetermined
It is unknown whether HD-associated expression and splicing alterations are related to TDP-43 disruption in HD
Novel processing alterations were confirmed by long-read RNA-seq
We identified primary sequence motifs—RNA binding sites associated with mHTT-dependent changes in alternative splicing (AS) using primary sequence models—that implicated two HTT-interacting RBPs: TDP-43 and methyltransferase 3 (METTL3)
Using molecular and neuropathological measures and TDP-43 enhanced crosslinking and immunoprecipitation sequencing (eCLIP-seq) and m6A eCLIP-seq
we determined that mHTT disrupts TDP-43 and METTL3 function in post-transcriptional processing of their RNA targets in HD
We further show a nuclear aggregate-like structure in the brains of patients with HD that contain TDP-43 and HTT
This study provides evidence for functional disruption of TDP-43 in HD and an association with abnormal m6A RNA modification in HD
This work also suggests that TDP-43 dysregulation may be an important component of pathogenesis in a broader group of diseases than previously thought
oligodendrocyte progenitor cells; premyelin cells; and vascular cells
Pie chart showing the detection of significant excluded exons in PacBio Iso-seq long-read sequencing
Canonical binding motif for TDP-43 (UG rich) and METTL3 (DRACH)
Our data suggest that TDP-43 contributes to transcriptional dysregulation in the HD R6/2
which may involve an exciting intersection between TDP-43 and m6A in HD that has not been previously described
we investigated how TDP-43 and the m6A RNA modification might contribute to altered splicing and HD pathology
Gradient scale represents z-scores of normalized gene counts
Reproducible R6/2 HD and NT IDR TDP-43 peaks were centered and plotted on all TDP-43 binding sites
TDP-43 mRNA level by qPCR normalized to cyclophilin as a percentage of PBS control
Heatmap showing clustering of 3mos HD R6/2
TDP-43 ASO treated and control PBS treated on TDP-43 KD-dependent DGE changes
Schematic of iPSC differentiation into MSNs with TDP-43 KD by siRNA
Left: western blot for TDP-43 protein levels after treatment of MSNs with TDP-43 siRNA
n = 3 differentiation replicates per condition
Right: bar graph plots TDP-43 intensity normalized to Revert total protein stain
Statistical significance was determined by two-way ANOVA with Sidak’s multiple comparisons test (18Q: P < 0.0001
Venn diagram showing the overlap of DEGs between HTT-18Q MSNs scramble control versus TDP-43 siRNA and HTT-18Q MSNs versus mHTT-50Q
Example of key gene expression changes anticipated from TDP-43 KD
Statistical significance was determined by unpaired two-tailed t-test
(TDP-43 KD versus Ctrl STMN2: P < 0.0001
95% CI: −768.6 to −611.6; 18Q versus 50Q STMN2: P = 0.0001
95% CI: −496.4 to −336.7; TDP-43 KD versus Ctrl UNC13B: P = 0.0076
95% CI: −64.21 to −18.29; 18Q versus 50Q UNC13B: P = 0.0132
95% CI: −36.78 to −7.682; TDP-43 KD versus Ctrl CAMK2B: P = 0.0006
95% CI: −47.26 to −26.79; 18Q versus 50Q CAMK2B: P = 0.0287
our analysis shows that TDP-43 KD drives similar gene expression pattern changes as mHTT in HD MSNs
Representative IF staining images of SFG from patients with HD compared to non-HD control individuals showing decreased TDP-43 (yellow) signal intensity
Left: quantification of decreased nuclear TDP-43 signal intensity; five representative images were taken at ×40 from five HD and two control individuals
A CellProfiler pipeline was created to identify larger nuclei (enriched for neurons) by DAPI staining
The average of TPD-43 nuclear signal was obtained by measuring the intensity signal within a mask defined by DAPI
Each cell’s mean nuclear TDP-43 intensity is plotted
One-way ANOVA was performed with multiple comparisons and resulted in significant changes between all HD versus control comparisons (data not shown)
Numbers on top of each group indicate the number of cells plotted
Right: dot plot showing grouped data by genotype
Statistical significance was derived from unpaired two-tailed t-test between control versus HD (P < 0.0001
Representative IF staining images of the motor cortex from a patient with ALS (positive control) compared to patients with HD
using antibodies against total TDP-43 (yellow)
phosphorylated TDP-43 (purple) and nuclear stain DAPI
White arrowhead indicates pTDP-43 cytoplasmic aggregation; red arrowhead (HD1) indicates cytoplasmic aggregate verified by orthogonal view
IF images showing pTDP-43 AL bodies (yellow) within MAP2-positive neurons (white)
Each bar is derived from five random ×20 images from each patient
%AL bodies is the number of AL bodies per patient normalized to the total number of Map2-positive neurons
Statistical significance was determined by unpaired two-tailed t-test (number of neurons: P = 0.3655
95% CI: −48.51 to 20.01; AL bodies (%): P = 0.0146
Experiments in a–e were repeated at least three times with similar results represented above
This striking accumulation of nuclear pTDP-43 and HTT into distinct spherical AL bodies in MAP2-positive neurons from HD patient brains represents a type of TDP-43 pathology not previously described that may be unique to HD
b and h were repeated at least three times with similar results represented above
These human data are consistent with the dysregulation of m6A modification in the R6/2 mice
These results support that (1) there is a connection between m6A RNA modification and TDP-43 and (2) TDP-43 dysfunction occurs before m6A dysregulation
we generated and integrated multi-omics data
to investigate mechanisms involved in aberrant splicing
We demonstrated that the RBP TDP-43 and the m6A writer METTL3 have altered protein subcellular localization and protein expression
These alterations accompanied a corresponding enrichment in HD-specific AS and decreased interaction with dysregulated RNAs defining the striatal HD signature
IF imaging in HD mice and HD patient brain tissue revealed co-localization of TDP-43 with mutant HTT in nuclear inclusions
decreased nuclear TDP-43 and a corresponding increase in aggregated phosphorylated TDP-43 in the cytoplasm
We also found an accumulation of spherical
fibrous-like pTDP-43 in the nucleus of Map2-positive neurons that co-localize with HTT
Our analysis of RNA-seq data from both mouse and human samples revealed changes in AS with increased exon exclusion events in HD
and toxicity is modulated through genetic perturbation of m6A machinery
Our finding that TDP-43 binding corresponds with m6A deposition on downregulated striatal genes in HD suggests a co-regulatory role for m6A modification with TDP-43 in HD
We also observed that the presence of aberrant
previously unannotated exon splicing corresponds to hallmark HD genes that are primarily downregulated; however
both an increase and a decrease in novel unannotated exon expression were identified to result in DGE
We propose a mechanism in which the interaction of HTT with multiple RBPs regulates CE splicing
Further studies are required to identify additional CE-regulating RBPs
Our TDP-43 eCLIP-seq detected decreased TDP-43 binding to RNAs
making it plausible that these AL bodies may be similar to anisosomes and contain RNA-free TDP-43; this can be addressed in future studies
we identified three TDP-43 aggregation phenotypes in HD
one of which has not previously been observed
which presents the possibility that our molecular readouts are resulting from a combinatorial effect of known TDP-43-dependent regulation and regulation not previously described
Our future efforts will be aimed at elucidating the role of AL bodies in HD progression and neurodegeneration
No statistical methods were used to pre-determine sample sizes; however
sample size selections were made to be similar to previously published studies
and all experiments were repeated at least three times with similar results as represented in this study
Data were assumed to be normally distributed
all treatment groups were randomly selected
and analyses were performed blinded to treatment/disease status
Data were excluded only for animals that did not survive to endpoint
Differential expression statistics were performed within cited packages below
Statistical testing was performed in GraphPad Prism 10 (GraphPad Software)
Human brain samples were obtained in collaboration with the Netherlands Brain Bank (NBB), the Netherlands Institute for Neuroscience, Amsterdam (open access: https://www.brainbank.nl/) and the Neurological Foundation of New Zealand Human Brain Bank (NZBB). Information on patients can be found in Supplementary Data 9
All materials have been removed of any patient identifiers
All material was collected from donors for whom or from whom a written informed consent for a brain autopsy and the use of the material and clinical information for research purposes had been obtained by the NBB and the NZBB
approved by the Health and Disability Ethics Committee (ethics no.: 14/NTA/208/AM02)
Animal experiments were carried out in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and an approved animal research protocol by the Institutional Animal Care and Use Committee (IACUC
AUP-21-087) at the University of California
an institution accredited by the Association for Assessment and Accreditation of Laboratory Animal Care
all procedures were conducted in accordance with the guidelines of the University of California
iPSC work carried out in this study was approved by the UCI Human Stem Cell Research Oversight Committee (UCI hSCRO no
118) and the UCI Institutional Review Board (UCI IRB no
000664)) in this study were obtained from The Jackson Laboratory at approximately 5 weeks of age
The sex of animals was balanced and age matched
All mice were housed on a 12-h light/dark schedule with ad libitum access to food and water
Animals were housed at controlled temperature and humidity: 70 ± 2 °F and 50 ± 5% humidity (RASL-seq: (68–79 °F) and humidity (30–70%))
Animals were aged and then euthanized with Euthasol overdose (pentobarbital sodium and phenytoin sodium)
Cardiac perfusion was performed with 0.01 M PBS
followed by brain harvesting and isolation of striatum and cortex from the left hemisphere that was flash frozen in liquid nitrogen and stored at −80 °C until use for biochemical analysis
The other halves were post-fixed in 4% paraformaldehyde
cryoprotected in 30% sucrose and cut at 30 μm on a sliding vibratome for immunohistochemistry
frozen tissues were lysed (lysis buffer: 50 mM Tris-HCl pH 7.4
1:200 Protease Inhibitor Cocktail III (add fresh)
samples were homogenized by douncing in lysis buffer followed by incubation on ice for 30 min
Lysate was then sonicated 3× for 10 s at 40% amplitude
Protein quantification was performed by Lowery protein assay with linear range dilution
C57Bl/6 males were dosed at 5 weeks of age with PBS control (n = 5) or ASO (Ionis Pharmaceuticals) targeting TDP-43 (n = 5) at 500 µg by ICV bolus injection
cortex and striatum were collected for downstream analysis
iPSC colonies switched to neural induction medium (Advanced DMEM/F12 (1:1) supplemented with 2 mM GlutaMAX
cells were passaged 1:2 with Accutase and replated on Matrigel
Cells were passaged again 1:2 at day 8 with Accutase and replated on Matrigel in a different medium (Advanced DMEM/F12 (1:1) supplemented with 2 mM GlutaMAX
cells were dissociated with Accutase and plated at a density of 111,000 per cm2 on NUNC-treated tissue culture plastic treated with poly-d-lysine and Matrigel in SCM1 medium (Advanced DMEM/F12 (1:1) supplemented with 2 mM GlutaMAX
Full medium change to SCM2 medium at day 23 (Advanced DMEM/F12 (1:1): Neurobasal A (50:50) supplemented with 2 mM GlutaMAX
50% medium change every 2–3 d until day 37
Human TDP-43 and non-targeting control siRNA were obtained from Horizon Discovery (Accell SMARTPool
E-012394-00-0050); cells were treated with siRNA on day 23 and harvested at day 37
the optimal protein concentration and primary antibody concentration were determined by linear range according to LI-COR’s protocol
1 mg of protein was used for each IP—1:1,000 dilution of primary antibody and 30 µl of Dynabeads sheep anti-rabbit or goat anti-mouse
IP was carried out by incubation for 1 h at room temperature
and then a wash was performed with 3× high-salt wash buffer (50 mM Tris-HCl pH 7.4
followed by 2× low-salt wash buffer (20 mM Tris-HCl pH 7.4
Elution was achieved by 10-min incubation at 80 °C in 1× LDS and 1 mM DTT
Co-IP and regular western blot samples were run on 4–12% Bis-Tris gels and 3–8% Tris-acetate
Specific protein band analysis was performed using LI-COR Empira Studio software with normalization to Revert total protein stain
A list of primary and secondary antibodies with dilutions used in this study can be found in Supplementary Data 10
coronal sections that included the striatum were selected
Antigen retrieval (AR) was performed for 20 min at 80 °C (AR buffer: 10 mM Tri-Na citrate buffer
Tissue slices were permeabilized for 10 min at room temperature (permeabilization buffer: PBS + 2.5% BSA and 0.2% Triton X-100)
followed by blocking for 2 h at room temperature (blocking buffer: PBS + 5% NGS (or NDS) + 1% BSA + 0.1% Triton X-100)
Primary antibodies were added at the indicated concentration in blocking buffer and incubated overnight at 4 °C
Secondary antibody was performed for 2 h at room temperature
followed by Hoeschst (1:3,000) for 10 min at room temperature
Tissues were then mounted onto slides and coverslips with Fluoromount-G (Southern Biotech
5 µm of paraffin-embedded sections was used
Tissue sections were heated at 65 °C for 30 min and then deparaffinized with 100% CitriSolv (Thermo Fisher Scientific
Milli-Q water for 5 min two times and rehydrated
and then AR was performed with antigen unmaking solution (Vector Laboratories
Sections were blocked for 1 h at room temperature with 5% normal goat or donkey serum in 0.1% Triton X-100
Sections were incubated in primary antibody overnight at 4 °C in 1% normal donkey serum in 0.1% Triton X-100
Sections were then incubated in secondary antibodies (1:400 dilution) for 1 h at room temperature in 1× PBS
Secondary antibodies used included Alexa Fluor 488 (1:400; Thermo Fisher Scientific
A-21202) and Alexa Fluor 555 (1:400; Thermo Fisher Scientific
Tissues were then treated with TrueBlack Lipofuscin Autofluorescence Quencher (Biotium
23007) and incubated in Hoeschst for 10 min at room temperature
Sections were then mounted with coverslips using Fluoromount-G
Images were taken on a Zeiss AiryScan 900 and an Olympus FluoView FV2000 confocal system
Images were processed using AiryScan software
Images were taken with the same acquisition settings
Images were then imported to Imaris imaging software version 9 for post-imaging analysis
AL body images had background subtraction to make the phenotype clearer; however
no intensity measurements or statistics were performed
images had all ‘auto-adjustment’ settings reset to raw values
minimum/maximum and gamma (default value of 1) were applied to all images for comparison and analysis
each cell containing nuclear IF signal was quantified with the Imaris surface tool (version 10) and CellProfiler (version 4.2.6)
Normalization was performed between animals by dividing by surface volume
Statistical analysis was performed with unpaired two-tailed t-test between HD versus NT and one-way ANOVA with multiple comparisons where appropriate
The module talon_label_reads with option –ar 20 was used to compute the fraction of As at the ends of read alignments
TALON databases of mouse (Ensembl 87 annotations) were created using talon_initialize_database with options –l 0–5p 500–3p 300
The TALON module was run with default parameters to identify transcripts using the initialized database
which are limited to known and consistently observed transcripts
were generated using talon_filter_transcripts
Filtered and unfiltered transcript abundances were obtained using the talon_abundance module
Processing of RNA for LC–MS-based m6A analysis was performed as described in Mathur et al.112
100 ng of twice-purified polyA-RNA was digested with 1 U of nuclease P1 (Sigma-Aldrich
followed by treatment of 1 U of alkaline phosphatase (Sigma-Aldrich
21.5 μl of the purified nucleoside sample (equivalent of 25 ng of RNA) was mixed with 2× volume (43 μl) of acetonitrile
Samples were centrifugated at 16,000g for 10 min at 4 °C
and 40 µl of supernatant was loaded into MS vials
Adenosine and m6A signals were analyzed by a quadrupole Orbitrap mass spectrometer (Thermo Fisher Scientific) coupled to hydrophilic interaction chromatography (HILIC) via electrospray ionization
LC separation was performed on an Xbridge BEH amide column (2.1 mm × 150 mm
130-Å pore size; Waters) at 25 °C using a gradient of solvent A (5% acetonitrile in water with 20 mM ammonium acetate and 20 mM ammonium hydroxide) and solvent B (100% acetonitrile)
The autosampler temperature was set at 4 °C
and the injection volume of the sample was 3 μl
MS data were acquired in positive ion mode with a full-scan mode from m/z 240 to 290 with 140,000 resolutions
Data were analyzed using El-MAVEN software (version 0.12.0)
S3190) standards were quantified based on standard calibration curves using authentic standards
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article
A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes
The Huntington’s Disease Collaborative Research Group
Huntington’s disease: underlying molecular mechanisms and emerging concepts
Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onset
The contribution of somatic expansion of the CAG repeat to symptomatic development in Huntington’s disease: a historical perspective
Propensity for somatic expansion increases over the course of life in Huntington disease
Disease-associated repeat instability and mismatch repair
Special issue: DNA repair and somatic repeat expansion in Huntington’s disease
The pathogenic exon 1 HTT protein is produced by incomplete splicing in Huntington’s disease patients
HDinHD: a rich data portal for Huntington’s disease research
Proteomic analysis of wild-type and mutant huntingtin-associated proteins in mouse brains identifies unique interactions and involvement in protein synthesis
Proteins with intrinsically disordered domains are preferentially recruited to polyglutamine aggregates
RNA-binding protein TLS is a major nuclear aggregate-interacting protein in huntingtin exon 1 with expanded polyglutamine-expressing cells
Huntingtin protein interactions altered by polyglutamine expansion as determined by quantitative proteomic analysis
Huntington’s disease mice and human brain tissue exhibit increased G3BP1 granules and TDP43 mislocalization
Colocalization of transactivation-responsive DNA-binding protein 43 and huntingtin in inclusions of Huntington disease
Interaction with polyglutamine aggregates reveals a Q/N-rich domain in TDP-43
Ubiquitinated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis
Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43
TDP-43 is a component of ubiquitin-positive tau-negative inclusions in frontotemporal lobar degeneration and amyotrophic lateral sclerosis
TDP-43 and FUS/TLS: emerging roles in RNA processing and neurodegeneration
Phosphorylated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis
TDP-43 proteinopathy presenting with typical symptoms of Parkinson’s disease
TARDBP mutations in a cohort of Italian patients with Parkinson’s disease and atypical parkinsonisms
Coexistence of Huntington’s disease and amyotrophic lateral sclerosis: a clinicopathologic study
Loss of TDP‐43 oligomerization or RNA binding elicits distinct aggregation patterns
Characterizing the RNA targets and position-dependent splicing regulation by TDP-43
TDP-43 repression of nonconserved cryptic exons is compromised in ALS-FTD
TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A
TDP-43 represses cryptic exon inclusion in the FTD–ALS gene UNC13A
Premature polyadenylation-mediated loss of stathmin-2 is a hallmark of TDP-43-dependent neurodegeneration
ALS-implicated protein TDP-43 sustains levels of STMN2
a mediator of motor neuron growth and repair
Region-specific RNA m6A methylation represents a new layer of control in the gene regulatory network in the mouse brain
A majority of m6A residues are in the last exons
allowing the potential for 3′ UTR regulation
Transient N-6-methyladenosine transcriptome sequencing reveals a regulatory role of m6A in splicing efficiency
Methylation of structured RNA by the m6A writer METTL16 is essential for mouse embryonic development
The m6A writer: rise of a machine for growing tasks
Dynamic m6A modification regulates local translation of mRNA in axons
METTL3-mediated m6A modification is required for cerebellar development
Temporal control of mammalian cortical neurogenesis by m6A methylation
m6A-Atlas: a comprehensive knowledgebase for unraveling the N6-methyladenosine (m6A) epitranscriptome
Landscape and regulation of m6A and m6Am methylome across human and mouse tissues
Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq
N6-methyladenosine (m6A) recruits and repels proteins to regulate mRNA homeostasis
Epitranscriptomes in the adult mammalian brain: dynamic changes regulate behavior
RNA methylation influences TDP43 binding and disease pathogenesis in models of amyotrophic lateral sclerosis and frontotemporal dementia
Altered m6A RNA methylation contributes to hippocampal memory deficits in Huntington’s disease mice
Exon 1 of the HD gene with an expanded CAG repeat is sufficient to cause a progressive neurological phenotype in transgenic mice
Neurological abnormalities in a knock-in mouse model of Huntington’s disease
Longitudinal evaluation of the Hdh(CAG)150 knock-in murine model of Huntington’s disease
Comprehensive behavioral and molecular characterization of a new knock-in mouse model of Huntington’s disease: ZQ175
SONAR discovers RNA-binding proteins from analysis of large-scale protein–protein interactomes
Transcriptional signatures in Huntington’s disease
Obenauer, J. C. et al. Expression analysis of Huntington disease mouse models reveals robust striatum disease signatures. Preprint at bioRxiv https://doi.org/10.1101/2022.02.04.479180 (2023)
rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data
Versatile pathway-centric approach based on high-throughput sequencing to anticancer drug discovery
The Akt-SRPK-SR axis constitutes a major pathway in transducing EGF signaling to regulate alternative splicing in the nucleus
Single-nucleus RNA-seq identifies Huntington disease astrocyte states
Wyman, D. et al. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. Preprint at bioRxiv https://doi.org/10.1101/672931 (2019)
Huntington disease oligodendrocyte maturation deficits revealed by single-nucleus RNAseq are rescued by thiamine-biotin supplementation
Single-cell differential splicing analysis reveals high heterogeneity of liver tumor-infiltrating T cells
Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities
Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome
Network organization of the huntingtin proteomic interactome in mammalian brain
TDP-43 regulates its mRNA levels through a negative feedback loop
Cell environment shapes TDP-43 function with implications in neuronal and muscle disease
Truncated stathmin-2 is a marker of TDP-43 pathology in frontotemporal dementia
UPFront and center in RNA decay: UPF1 in nonsense-mediated mRNA decay and beyond
A new view of transcriptome complexity and regulation through the lens of local splicing variations
RNA sequence analysis of human Huntington disease brain reveals an extensive increase in inflammatory and developmental gene expression
Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP)
Aberrant development corrected in adult-onset Huntington’s disease iPSC-derived neuronal cultures via WNT signaling modulation
N6-methyladenosine-dependent RNA structural switches regulate RNA–protein interactions
Interaction of tau with HNRNPA2B1 and N6-methyladenosine RNA mediates the progression of tauopathy
Mechanism of STMN2 cryptic splice-polyadenylation and its correction for TDP-43 proteinopathies
Mis-spliced transcripts generate de novo proteins in TDP-43–related ALS/FTD
A fluid biomarker reveals loss of TDP-43 splicing repression in presymptomatic ALS–FTD
m1A in CAG repeat RNA binds to TDP-43 and induces neurodegeneration
Transcriptome sequencing reveals aberrant alternative splicing in Huntington’s disease
Widespread dysregulation of mRNA splicing implicates RNA processing in the development and progression of Huntington’s disease
Huntington’s disease-specific mis-splicing unveils key effector genes and altered splicing factors
Regulatory mechanisms of incomplete huntingtin mRNA splicing
Splicing repression is a major function of TDP-43 in motor neurons
Nuclear bodies: random aggregates of sticky proteins or crucibles of macromolecular assembly
Nuclear bodies in neurodegenerative disease
HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells
METTL3 from target validation to the first small-molecule inhibitors: a medicinal chemistry journey
Small-molecule inhibition of METTL3 as a strategy against myeloid leukaemia
Inducible and reversible RNA N6-methyladenosine editing
Li, H., Qiu, J. & Fu, X. D. RASL-seq for massively parallel and quantitative analysis of gene expression. Curr. Protoc. Mol. Biol. https://doi.org/10.1002/0471142727.mb0413s98 (2012)
Integrated genome browser: visual analytics platform for genomics
Developmental alterations in Huntington’s disease neural cells and pharmacological rescue in cells and mice
featureCounts: an efficient general purpose program for assigning sequence reads to genomic features
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation
GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists
Full-length RNA-seq from single cells using Smart-seq2
Minimap2: pairwise alignment for nucleotide sequences
Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges
deepTools2: a next generation web server for deep-sequencing data analysis
MetaPlotR: a Perl/R pipeline for plotting metagenes of nucleotide modifications and other transcriptomic sites
Quantitative analysis of m6A RNA modification by LC-MS
Download references
We would like to thank the patients affected by HD and their families for their critical contributions to research
This work was supported by a Chan Zuckerberg Initiative Collaborative Pairs grant (L.M.T
and R.C.S.) and by the following National Institutes of Health (NIH) grants: R35 NS116872 (L.M.T.)
R01 AA029124 (C.J.) and K22CA234399 (G.L.)
It was also supported by US Department of Defense grant TS200022 (G.L.)
Additional support was provided by the National Institute of Neurological Disorders and Stroke of the NIH under award number F31NS124293T32 (T.B.N.)
a Hereditary Disease Foundation postdoctoral fellowship (R
Maimon) and a postdoctoral fellowship from the ALS Association (S.V.-S.)
The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH
We would like to thank the Netherlands Brain Bank and R
Curtis (New Zealand Brain Bank) for supplying the human brain tissue for our study
Petrucelli for the generous gift of the S409/S410 TDP-43 antibody
Brown for help on MAJIQ analysis and for scientific discussions
We would like to thank the laboratory of G
Van Nostrand and ECLIPSEBIO for the wonderful scientific discussions
Department of Cellular and Molecular Medicine
Department of Psychiatry & Human Behavior
Department of Microbiology and Molecular Genetics
Broad Institute of Harvard University and MIT
Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work
figure legends for Supplementary Data 1–12 and supplementary figures and references
DESeq2 output from HTT (18Q) non-targeting siRNA versus TDP-43 KD siRNA at day 37 MSNs
Statistical P values listed were determined using the DESeq2 package
Filtered for events with adjusted P < 0.05
Supplementary Data 8: List of DEGs that overlapped between TDP-43 KD and mHTT dependent (P < 0.05) as determined by the hypergeometric test
Column A contains genes that overlap with upregulated genes in the mHTT condition; column B contains genes that overlap with downregulated genes in the mHTT condition
Supplementary Data 9: Summary of human postmortem brain tissues used in this study
Supplementary Data 10: List of antibodies used in this study
Supplementary Data 11: List of primer pairs used for RASL-seq
Supplementary Data 12: RASL-seq raw data for R6/2 (starting at row 1)
Q150 (starting at row 1,748) and Q175 (starting at row 3,538)
Download citation
DOI: https://doi.org/10.1038/s41593-024-01850-w
pick a genre and the app will automatically detect the key and find samples that match it
Sample platform Splice has launched an update to its mobile app that lets songwriters and producers record vocal ideas over tracks sketched out using its AI-powered Stacks feature
Stacks can be used to generate track ideas by layering samples from Splice's library
and the app will instantly create a Stack that layers multiple samples in that genre that share the same key and tempo; these can then be mixed
muted or swapped out for new samples from Splice's library
while the global key and tempo can be adjusted across the whole Stack
Splice Mic lets app users record over ideas generated using Stacks
and it'll even analyse the vocal recording to find additional samples that match it harmonically
After recording a loop of up to one minute in length
users can then trim it using the app's audio editor before snapping it to the beat grid
Give Splice one of your own loops and it will now use AI to find a compatible stack of sounds to go with it
“The phone is already a huge part of music making," says Splice's SVP of Content Kenny Ochoa
and now those stacks can be merged with vocals”
the company invited two opposing teams of songwriters and producers to create tracks in 60 seconds using Splice's mobile app
"We got the team together to see who could start the best new Stacks," said artist and producer Leland
and if you ask me how I did it I don't know
because I think I was in a flow process”: Rihards Zalupe on composing the music for the Oscar-winning animated movie
“We’d do a soundcheck and everything would be great
it wouldn’t turn on”: 10 more things producers can learn from our In The Studio With..
It was unbelievable”: The making of Robert Palmer's Addicted To Love