Splice — valued in 2021 at nearly USD $500 million after securing $55 million in funding – has acquired UK-based “high-end” virtual instrument library Spitfire Audio Financial terms of the acquisition were not disclosed, but according to the Financial Times which cited a person familiar with the matter The acquisition marks Splice’s entry into the plugin sector and aligns with its existing subscription and rent-to-own businesses The move positions Splice to capitalize on the growing music creation market, forecast (by Midia Research) to nearly double to $14 billion by 2031 New York-based Splice generates more than $100 million in annual revenue with about 600,000 paying subscribers The company was valued at nearly $500 million in a 2021 funding round led by Goldman Sachs and entrepreneur Matt Pincus’s investment firm MUSIC Spitfire Audio, established in 2007, provides sampled virtual instruments —  including recordings by Hans Zimmer, Olafur Arnalds, the BBC Radiophonic Workshop and Abbey Road Studios — to professional composers and producers “The teams at Spitfire Audio and Splice have deep respect for composers, musicians and producers and are committed to celebrating and supporting their work”, said Kakul Srivastava “With Spitfire’s expressive instruments and Splice’s AI-powered platform we’re just beginning to explore what’s possible.” creator-led companies who believe great software and technology can supercharge the creative experience Our shared vision is to develop tools that expand — not replace — human creativity,” Srivastava added “We’ve always focused on inspiring people to create extraordinary music we can now bring that inspiration to a whole new generation of artists added: “We’ve always focused on inspiring people to create extraordinary music both Splice and Spitfire Audio will continue to operate independently with Olivier Robert-Murphy remaining as CEO of Spitfire Audio Thomson will continue to oversee Spitfire Audio’s creative direction “Splice has already built an incredible business,” said Robert-Murphy “Joining forces means Spitfire Audio’s sounds will find new homes in studios around the world—whether that’s a bedroom producer or a blockbuster composer.” The acquisition comes as Splice has been expanding its AI capabilities, with about 40% of its users embracing the platform’s AI tools, the company said. In 2024, Splice hit nearly 350 million downloads of its sound samples across all genres last year without disclosing whether the figure marked a growth or decline from 2023 Speaking with Thomson in a one-on-one at London’s AIR Studios Srivastava said Splice has been building new AI components “that are ethical where artists are fairly compensated for their work So bringing together some of that technology together Thomas acknowledged concerns around the use of AI in the music creation community around AI in the music creation community.” or new ways of doing things… It’s just a tool to help you be more creative and I think that’s where we should be focusing on,” Thomas added to analyze recordings and identify sounds that match the harmony Stay on top of the real stories shaping the music industry: Join over 60,000 industry professionals who rely on MBW's FREE daily newsletter and alert emails for essential insights and breaking news the platform that helps music creators bring their ideas to life announced a significant update to Splice Mobile songwriters can also effortlessly record riffs Splice Mic Video 1: https://www.instagram.com/p/DHADAzguyZE/ and the results take songwriters so much deeper into the finished process This is so valuable to our community" Splice Mic utilizes Splice's Create AI to analyze the recording and find sounds that perfectly match the harmony Not only can songwriters hear their vocal ideas in full musical context using Splice sounds they can also unlock fresh creative possibilities by switching genres within the stack "The phone is already a huge part of music making," says Kenny Ochoa "About 1 million users have made more than 28 million stacks so far and now songwriters and producers can record vocal ideas over stacks of samples and genre and have even more control over their creative vision and now those stacks can be merged with vocals" Leland Video: https://www.youtube.com/watch?v=RYP2pHqFtHw "We got the team together to see who could start the best new Stacks Bridging the gap between inspiration and production Splice Mobile users can easily share ideas with collaborators directly from the app or Airdrop the stems to a Digital Audio Workstation making the jump from mobile to studio seamless Musicians are already using voice recording functions on their phones to capture ideas away from the studio giving songwriters the creative depth of the Splice Sounds catalog and Create Record your ideas with Splice Mic: https://youtu.be/JDiEjKuj9o0?si=sguk4Erpwnamf1Vo Splice Mic is available now via Splice Mobile Click here to see Splice Mic in action He co-wrote the Grammy-nominated song "Rush" by Troye Sivan as well as the majority of Sivanʼs latest release Leland has written songs for artists such as Cher ("DJ Play A Christmas Song") Leland currently serves as in-house composer and on-camera mentor for the Emmy Award-winning competition series RuPaulʼs Drag Race having worked on more than 15 seasons of the show Laurelvale Studios is a premier full-service recording studio nestled in the serene Studio City Hills offering breathtaking views of the iconic Mulholland Drive Laurelvale Studios creates an inspiring environment where creativity can flourish ensuring that every session is as seamless and enjoyable as possible Laurelvale Studios provides the perfect balance of state-of-the-art equipment and a relaxed welcoming atmosphere that fosters the best work from every artist Splice helps music creators bring their ideas to life A subscription to Splice's vast industry-leading sounds catalog includes high-quality to accelerate deep sound discovery and inspiration The company also provides affordable access to plugins and DAWs through a rent-to-own Gear marketplace The New York-based startup (Co-founded by Steve Martocci and Matt Aimonetti) closed a Series D round in November 2020 John Vlautin, Splice, 1 818-763-9800, [email protected] Do not sell or share my personal information: Metrics details f Source data are provided as a Source Data file we focus on the variants that form a novel splice donor/acceptor motif and generate a new splicing junction at that location We exclude from consideration variants that are distant from the novel splice-site and act by modifying the efficacy of splicing enhancers we do not include variants that disrupt the normally used splice-sites leading to the use of preexisting cryptic splice-sites To develop a methodology for identifying SSCVs from transcriptome sequence data particularly those associated with diseases the corresponding splicing junctions are typically not observed in the general population Mismatch bases corresponding to SSCVs are often observed in the short reads of transcriptome sequence data The term “primary” in the “primary novel SS” signifies the direct formation of a novel SS by an SSCV, particularly considering situations where an SSCV within a deep intronic region generates a cryptic exon and subsequently leads to the formation of another novel SS (referred to as secondary SS, as shown in the right panel of Fig. 1a) The slightly lower sensitivity compared to the 1000 Genomes Project dataset may be attributed to the fact that the benchmark variant set was limited to somatic variants which often have lower variant allele frequencies and smaller splicing changes than germline variants juncmut achieves a certain level of sensitivity and a high rate of precision even though it uses only transcriptome data which is typically challenging to detect without whole-genome analysis These results indicate that the juncmut approach can effectively catalog disease-associated variants a Frequencies of transcriptome sequence data analyzed Transcriptome data were also grouped by the number of detected SSCVs whose base counts are equal to or more than 1.0 Gbp and less than 2.0 Gbp three or more SSCVs were identified in 74,805 b Base substitution patterns of SSCVs according to their relative position to primary novel SSs Different colors are used to display different types of alternative bases The x-axes represent different reference bases and the y-axes represent the numbers of variants c Histogram showing the distribution of relative position of primary novel SSs to their hijacked SSs (original SSs) for donor (left) and acceptor (right) creating SSCVs Red dashed lines represent exon-intron boundaries d Fraction of SSCVs with multiples of three shift sizes (difference between primary novel SSs and hijacked SSs) stratified by coding and non-coding genes e Sequence motifs of SSCVs with the relative position of primary novel SS to hijacked SS is -4 (left) and +5 (right) The “GT” dinucleotides at the intrinsic intron edge endow the -4 bp position with the potential to form a novel donor site featuring “GT” at the fifth and sixth positions within the new intron the inherent intron’s fifth and sixth base pairs often comprise “GT” at the donor site this configuration frequently corresponding to the first two intronic bases of a novel splice donor at the +5 bp position e Source data are provided as a Source Data file a Counts of distinct SSCVs creating novel donor (left) and acceptor (right) sites stratified by splicing consequences at each relative position of primary novel SSs compared to hijacked SSs b Counts of distinct SSCVs leading to in-frame (left) and frameshift (right) partial exon loss stratified by PTC generation and NMD susceptibility c Counts of distinct SSCVs at each size of augmented exon (restricted to multiples of three) for both exon extension and cryptic exon inclusion Each red point represents the ratio of PTC generation d Counts of distinct SSCVs located in coding regions categorized by mutation type assuming no abnormal splicing (silent These counts are further stratified by PTC generation and NMD susceptibility d Source data are provided as a Source Data file using a richer set of SSCVs collected in this study we explored higher resolution relationships between Alu elements and SSCVs a Counts of distinct SSCVs within Alu sequences at each primary novel SS mapped to the reference Alu sequence coordinates The counts are stratified by Alu family (AluJ These counts are faceted by the creation of donor and acceptor sites and the orientation of the Alu sequences relative to transcripts (sense and antisense) b The ratio of SSCVs forming novel exons (classified as cryptic exon inclusion by splicing consequence) at each motif creation type (donor and acceptor) and in Alu sequence orientations (sense and antisense) These ratios are further stratified based on whether the novel exons are confined within Alu sequences or not c Typical splicing consequences of SSCVs within Alu sequences SSCVs located on sense-inserted Alu sequences do not form exons and may create novel transcription start sites in an ambiguous manner SSCVs on antisense-inserted Alu sequences are likely to form novel exons within the Alu sequences d Frequently exonized parts by SSCVs in antisense-inserted Alu sequences The green lines indicate the exonized parts and the numbers on the right represent the counts observed in this study the thickness of these green lines corresponds to the frequency e Pairwise alignment of the Alu reference subsequences (reverse complemented) containing the Alu-antisense donor clusters in the left arm and right arm It is observed that the 22nd nucleotide corresponds with the 157th b Source data are provided as a Source Data file a Source data are provided as a Source Data file this indicates that our approach can successfully detect known pathogenic SSCVs and also suggests the potential pathogenicity of the other SSCVs a Sashimi plot for samples with NOTCH1 c.5048-132 G > C (TCGA-A7-A13E upper) and c.5048−132 G > T (SRR8951275 These mutations were expected to result in a 129 bp exon extension (without any stop codon within it) leading to the production of a protein with an additional 43 amino acids b Predicted schematics of the mechanisms for ligand-independent cleavage of NOTCH1 juxtamembrane expansion (JME) induced by the SSCVs c (left) Sequencing chromatograms of two NOTCH1 DNA derived from single clones of two CRISPR-edited PC-9 cell lines (c.5048-132 G > C and c.5048-132 G > T) (right) The PCR amplicons spanning NOTCH1 exon 27 and exon 28 show a 129 bp exon extension in clones with the indicated NOTCH1 genotype ‘M’ in the lane stands for the 100 bp marker d Western blot analysis of the NOTCH intracellular domain (NICD) in CRISPR-edited clones analysis is also provided on the Jurkat cell line which is known to have an internal tandem duplication in exon 28 resulting in the insertion of 17 amino acids in the extracellular juxtamembrane domain e (left) Schema depicting the design of splice-switching ASOs targeting c.5048-132 G > C (ASO1 (right) Images of PCR amplicons spanning NOTCH1 exon 27 and exon 28 generated from the cDNA of CRISPR-edited clones treated with indicated ASOs for two days f Western blot analysis of the NOTCH intracellular domain (NICD) in CRISPR-edited clones treated with indicated ASOs for three days All experiments have been performed in at least two independent experiments f Source data are provided as a Source Data file These findings demonstrate that these SSCVs lead to the activation of NOTCH1 which can be suppressed by splice-switching ASOs the ability to acquire a catalog of SSCVs through reanalysis of existing transcriptome sequence data is an attractive feature Particularly because juncmut can be performed on individual transcriptome sequence data execution on large-scale transcriptome sequences is highly convenient Our saturation analysis indicates that the continuous application of this method will lead to the identification of an increasing number of SSCVs as more sequence data is incorporated into the repository (Supplementary fig. 25) The next important challenge will be to become capable of systematically and accurately predicting variants responsible for rare diseases and cancers from a vast list of SSCVs with loss-of-function and gain-of-function variants intermingling for each gene we can develop a system that autonomously archives important disease-related variants some of which are targetable by splice-switching ASOs We then extracted samples with a base number of ≥ 1 billion to ensure sufficient sequence coverage for reliable mutation detection We removed run data that could not be downloaded even after repeated attempts (likely due to technical issues) We also excluded sequence data that had severe issues such as inconsistencies between the two paired-end files discrepancies between sequence letters and base qualities we discarded run data with an extremely high number of SSCVs attributable to potential DNA contamination and other factors We utilized the SRA Toolkit version 2.11.0 we executed the ‘prefetch’ command with the ‘–max-size 100000000’ option to download the SRA format file we used the ‘fasterq-dump‘ command with the options ‘-v –split-files.’ We initiate the identification of aberrant splicing junctions within transcriptome sequence reads (more specifically A splicing junction is characterized by its chromosomal location and the end coordinate of the intron within each transcript we established control panels of splicing junctions which were consistently observed across multiple samples within specific cohorts We processed transcriptome sequences from two cohorts the Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression (GTEx) Project we processed 742 transcriptome sequence samples from non-tumorous tissues of cancer patients We then identified a list of splicing junctions supported by a minimum of two reads across at least four different samples we analyzed 8656 transcriptome sequences (comprising 479 individuals across 53 different tissues) We extracted splicing junctions present in any tissue with two or more supporting reads in at least eight individuals For each transcriptome sequence aligned by STAR we extract splicing junctions focusing on those possibly associated with novel splice-site creation by a mutation where one edge matches within a 5 bp margin to the exon-intron boundary of a known transcript (GENCODE Comprehensive gene annotation Release 31) and the other edge does not correspond to any known exon-intron boundary (suggesting the formation of a novel splice-site) we adjust the positions to ensure that one side of the edge perfectly aligns with the exon-intron boundary The obtained splicing junctions at this stage are referred to as “primary novel splicing junctions (SJs).” For each primary novel SJ we term the edge of a splicing junction that deviates from the exon-intron annotation as the “primary novel splice-site (SS),” and the edge that aligns with the annotation as the “matching splice-site (SS).” We next remove primary novel SJs according to the following criteria: Primary novel SJs with fewer than three supporting reads are excluded Primary novel SJs that are registered in the splicing junction control panels generated in the previous section are excluded we count the total number of supporting reads of splicing junctions sharing the matching SS If the supporting read count for primary novel SJ constitutes less than 5% of this total we count the number of intersecting splicing junctions If the number of intersecting splicing junctions is five or more we perform ‘samtools mpileup‘ and search for mutations that can explain the formation of the associated primary novel SS For splice donor site creation (where the matching SS is an annotated acceptor) we focus on the two exonic bases (positions -2 and -1) and the six intronic bases (positions +1 to +6) we restrict our search to mutations that result in we require that the first two bp of the intron in the primary novel SS be ‘GT’ following the mutation For splice acceptor site creation (where the matching SS is an annotated donor) we focus on the six intronic bases (positions -6 to -1) and the one exonic base (position +1) relative to the primary novel SS we request that the last two bp of the intron in the primary novel SS be ‘AG’ following the mutation If at least two mismatch corresponding to the relevant mutation is detected from the short reads of transcriptome sequence data it is set as the candidate for splice-site creating variants (SSCVs) and subjected to validation via the subsequent realignment procedure we perform an additional filtering step based on the realignment For each candidate SSCV and its associated primary novel SJ we prepare three types of “mini-transcripts”: Primary novel transcript: extending 10 base pairs of the transcript sequence from both edges of the primary novel SJ Reference transcripts: extending 10 base pairs of transcript sequences from both edges of the splicing junctions of know transcripts that share the matching SS (thus potentially resulting in multiple mini-transcripts) Intron retention transcripts: extending 10 base pairs of genome sequences in both directions from the position of the SSCV if the region of the transcript includes the SSCV we also supply a version of the transcript with the SSCV mutation inserted we are able to generate at most six types of mini-transcripts: primary novel transcripts with and without the SSCV reference transcripts with and without the SSCV and intron retention transcripts with and without the SSCV The edit distance of the alignment must be two or less There should be no mutations within 5 bp from the position of the candidate SSCVs we choose the mini-transcript with the minimum edit distance we choose in the following order: reference transcript without the SSCV intron retention transcript without the SSCV and primary novel transcript with the SSCV If at least one read is classified as aligning with the primary novel reference or intron-retention transcript with the SSCV then the SSCV is retained as a final output For each SSCV and its associated primary novel SJ detected by juncmut we classify the types of splicing consequences we predict the resulting amino acid changes the generation of premature termination codons (PTCs) and assess the susceptibility to nonsense-mediated decay (NMD) the coordinate of the matching SS (the edge of the splicing junction that matches the exon-intron boundary of known transcripts) is identified from the corresponding primary novel SJ we extract all transcripts that possess this matching SS within their exon-intron boundaries we determine the transcript based on the following priorities: The largest transcript (in the case of a tie the transcript with the earlier ENST transcript ID is selected) we identify the exon affected by the SSCV (referred to as the “affected exon”) the affected exon is defined as the one whose start position is closest to The end position of this affected exon is termed the “hijacked SS.” Conversely the affected exon is the one whose end position is closest to with the start position of this affected exon being the “hijacked SS.” The splicing consequences of the SSCV are classified as follows: “Partial exon loss” if the primary novel SS is located within the affected exon “Cryptic exon” if the primary novel SS is located downstream or upstream (for donor and acceptor creation and if there is a splicing junction with one edge corresponding to the hijacked SS and the other edge within 300 bp upstream or downstream (for donor and acceptor creation “Exon extension” if the SSCV is not classified as a “cryptic exon,” and there is a sequence depth of one or greater observed from the hijacked SS to the primary novel SS For SSCVs predicted to result in “partial exon loss,” “cryptic exon,” or “exon extension,” we investigate the consequent protein changes determine whether they are in-frame or not we verify whether the primary novel SJ is completely contained within the 5′UTR or the 3′UTR and we exclude those scenarios from further analysis We also ignore the cases where SSCVs cause skipping of the start or stop codon we performed ‘samtools mpileup‘ directly on the corresponding CRAM file stored in Amazon Web Services (s3://1000genomes/1000G_2504_high_coverage/data/) We decided that the SSCV predicted from RNAseq is a genuine genomic mutation (although the effect on splicing is uncertain at this point) if more than two reads support the base corresponding to the SSCV and the proportion of these supporting reads exceeds 5% of all reads covering that position We utilized GTEx data to verify whether SSCVs identified by juncmut in the 1000 Genomes Project transcriptome truly lead to significant splicing changes We downloaded the GTEx transcriptome sequence data from the Sequence Read Archive and aligned them following the juncmut workflow we downloaded GTEx V7 whole genome genotype calls we counted the number of supporting reads for the corresponding primary novel SJ and hijacked SJ (splice junction connecting the matching SS and the hijacked SS in the reference transcript) and we calculated the ratio of the primary novel SJ (#primary novel SJ / (#primary novel SJ + #hijacked SJ)) by parsing the SJ.out.tab file we calculated the p-value that measures the difference in the ratio of the primary novel SJ between samples with and without the SSCV using a one-sided Wilcoxon rank-sum test with the wilcox.test function in the R language We integrated these p-values using Fisher’s method across tissues We also evaluated variants predicted to cause splice-site activation via SpliceAI as a comparison to juncmut we downloaded VCF files from the 1000 Genomes Project from s3://1000genomes/1000G_2504_high_coverage/working/20201028_3202_phased/ we added the SpliceAI score using the precomputed file for all SNVs and 1 base insertions we extracted SNVs with allele frequencies ≤ 0.01 satisfying the following criteria: SNVs possessed by either of the 445 individuals whose matched transcriptome sequence data are available SNVs where SpliceAI Delta score for acceptor or donor gain (DS_AG or DS_DG) is equal to or above 0.1 we identified a novel splice-site (corresponding to primary novel SS in the juncmut) using the information on Delta positions (DP_AG or DP_DG) provided by the SpliceAI annotation based on GENCODE Basic annotation (Release 39) identified the matching SS and hijacked SS and obtained the corresponding hijacked SJ for variants that were called in at least one GTEx sample we calculated the combined p-values across tissues as above To ensure a fair comparison between juncmut and SAVNet we excluded variants identified by SAVNet if (1) the corresponding splicing junction is included in the control panel constructed from the GTEx transcriptome (2) the support read count for the splicing junction is two or fewer or (3) the proportion of the splicing junction is less than 0.05 we restricted to those splicing-associated variants in SAVNet that exhibit a pattern of substitution within novel splice motifs we assessed the overlap of the variants identified by SAVNet and juncmut we excluded splicing-associated variants detected by SAVNet as was done in the comparison using 1000 Genomes Project data we confined our analysis to SSCVs classified as “somatic,” aligning with SAVNet which exclusively targets somatic variants we convert the coordinates of the SSCV position and the secondary SS (for SSCVs resulting in cryptic exons) in the human reference genome to the coordinate system in the reference Alu sequence Japan) in 2005 and authenticated in 2022 using the Promega GenePrint 10 System (BEX) Leukemia Jurkat cell line was purchased from RIKEN BioResource Research Center in 2022 PC-9 and Jurkat cells were grown in RPMI1640 (Gibco) with 10% FBS (Gibco) and 1% penicillin/streptomycin (Wako) Lenti-X 293 T cell lines were purchased from Takara in 2021 and were cultured in DMEM (Gibco) with 10% FBS and 1% penicillin/streptomycin All cell lines were tested negative for mycoplasma using Mycoplasma Plus PCR Primer Set (Agilent) Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article The list of splice-site creating variants are accessible through SSCV DB (https://sscvdb.io) and Zenodo (https://doi.org/10.5281/zenodo.14053979). Source data are provided with this paper The workflow of juncmut is available at GitHub (https://github.com/ncc-gap/juncmut). The code of the version used in this study is available at Zenodo (https://doi.org/10.5281/zenodo.14011414) The expanding landscape of alternative splicing variation in human populations Splicing in disease: disruption of the splicing code and the decoding machinery A comprehensive characterization of cis-acting splicing-associated variants in human cancer Intron retention is a widespread mechanism of tumor-suppressor inactivation Systematic analysis of splice-site-creating mutations in cancer Discovery of driver non-coding splice-site-creating mutations in cancer Annotation-free quantification of RNA splicing using LeafCutter Transcriptome and genome sequencing uncovers functional variation in humans & International nucleotide sequence database collaboration Reproducible RNA-seq analysis using recount2 recount3: summaries and queries for large-scale RNA-seq expression and splicing Massive mining of publicly available RNA-seq data from human and mouse Systematic identification of intron retention associated variants from massive publicly available transcriptome sequencing data Splicing mutations in human genetic disorders: examples A framework for individualized splice-switching oligonucleotide therapy Intravitreal antisense oligonucleotide sepofarsen in Leber congenital amaurosis type 10: a phase 1b/2 trial Spectrum of NPHP6/CEP290 mutations in Leber congenital amaurosis and delineation of the associated phenotype Effect of an intravitreal antisense oligonucleotide on vision in Leber congenital amaurosis due to a photoreceptor cilium defect The Genotype-Tissue Expression (GTEx) project The mutational constraint spectrum quantified from variation in 141,456 humans High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios Predicting splicing from primary sequence with deep learning Aberrant splicing prediction across human tissues Genomic basis for RNA alterations in cancer Comprehensive characterization of cancer driver genes and mutations Comprehensive pan-genomic characterization of adrenocortical carcinoma Sequence variation between 462 human individuals fine-tunes functional sites of RNA processing Empirical prediction of variant-activated cryptic splice donors using population-based RNA-Seq data To NMD or Not To NMD: nonsense-mediated mRNA Decay in cancer and other genetic diseases Haque, B. et al. Estimating the proportion of nonsense variants undergoing the newly described phenomenon of manufactured splice rescue. Eur. J. Hum. Genet. https://doi.org/10.21203/rs.3.rs-3054906/v1 (2023) Evolutionary history of 7SL RNA-derived SINEs in supraprimates Alternative splicing of Alu exons–two arms are better than one The birth of new exons: mechanisms and evolutionary consequences The birth of an alternatively spliced exon: 3’ splice-site selection in Alu exons Detection of alu exonization events in human frontal cortex from RNA-seq data Alu-containing exons are alternatively spliced Highly sensitive and specific Alu-based quantification of human cells among rodent cells ClinVar: improving access to variant interpretations and supporting evidence The COSMIC cancer gene census: describing genetic dysfunction across all human cancers ACMG SF v3.2 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG) Cellular functions of the protein kinase ATM and their relevance to human disease Exploitation of EP300 and CREBBP lysine acetyltransferases by cancer Mutant p53 in cancer: from molecular mechanism to therapeutic modulation Pharmacological reactivation of p53 in the era of precision anticancer medicine NOTCH1 extracellular juxtamembrane expansion mutations in T-ALL Impact of NOTCH1/FBXW7 mutations on outcome in pediatric T-cell acute lymphoblastic leukemia patients treated on the MRC UKALL 2003 trial Silent mutations reveal therapeutic vulnerability in RAS Q61 cancers Genomic and biological study of fusion genes as resistance mechanisms to EGFR inhibitors Evaluating human mutation databases for ‘treatability’ using patient-customized therapy Patient-customized oligonucleotide therapy for a rare genetic disease Detection of aberrant splicing events in RNA-seq data using FRASER TECHNICAL COMMENT ‘ comment on widespread RNA and DNA Sequence differences in the human transcriptome’ exact sequence alignment using edit distance BEDTools: a flexible suite of utilities for comparing genomic features Biopython: freely available Python tools for computational molecular biology and bioinformatics Download references The authors thank Erika Kawasaki and Rika Murakami (Division of Molecular Pathology National Cancer Center Research Institute) for technical assistance These authors contributed equally: Naoko Iida Division of Genome Analysis Platform Development developed the software for detecting splice-site creating variants developed a platform for analyzing massive transcriptome sequence data deposited in the Sequence Read Archive organized and interpreted the list of SSCVs including the development of model systems using CRISPR editing and splice-switching antisense oligonucleotide administration provided computational assistance across various aspects of the project The authors declare no competing interests who co-reviewed with Sirui Zhang; and the other reviewers for their contribution to the peer review of this work Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations Download citation DOI: https://doi.org/10.1038/s41467-024-55185-y Anyone you share the following link with will be able to read this content: a shareable link is not currently available for this article Sign up for the Nature Briefing newsletter — what matters in science The two companies plan to work together on new products that "blend Spitfire Audio’s cinematic soundscapes and orchestral expertise with Splice’s sample catalog and AI-powered discovery engine" Music creation platform Splice has acquired Spitfire Audio, the British sample library and virtual instrument developer, for around $50 million, according to reports in The Financial Times A press release shared by Splice states that the two companies are planning to work together on developing new products that blend Spitfire Audio's "cinematic soundscapes and orchestral expertise" with Splice's sample catalogue and AI-powered discovery engine Spitfire Audio is a UK-based independent company known for producing high-end orchestral sample libraries and virtual instruments Founded in 2007 by composers Paul Thomson and Christian Henson the company's software tools are popular with composers and producers working in film and television Splice has confirmed that both companies will continue to operate independently "in the near term" Olivier Robert-Murphy will remain in place as Spitfire Audio's CEO while Paul Thomson will continue to oversee creative direction for the company “The teams at Spitfire Audio and Splice have deep respect for composers musicians and producers and are committed to celebrating and supporting their work” Splice CEO Kakul Srivastava said in a statement Our shared vision is to develop tools that expand - not replace - human creativity With Spitfire’s expressive instruments and Splice’s AI-powered platform we’re just beginning to explore what’s possible.” Thomson reassured viewers that Spitfire Audio will continue releasing perpetual-license sample libraries and supporting its existing products allaying fears that the company could transition entirely to a Splice-style subscription model Reactions to the video have not been wholly positive "As an owner of a significant investment in Spitfire Audio sample libraries I don’t know what to believe about future stability." “Most musicians do not want to make music that way but AI will enable [artists] to do things they could not do today,” she said "They could use string quartets from Spitfire but you might want to invent your own instrument You can start with a particular sound and merge instruments together to get a novel sound that has never been heard before.. Read our 2024 interview with Splice CEO Kakul Srivastava. Matt MullenSocial Links NavigationTech EditorI'm MusicRadar's Tech Editor working across everything from product news and gear-focused features to artist interviews and tech tutorials I love electronic music and I'm perpetually fascinated by the tools we use to make it you'll probably find me behind a MIDI keyboard carefully crafting the beginnings of another project that I'll ultimately abandon to the creative graveyard that is my overstuffed hard drive you will then be prompted to enter your display name “All Strats aren't equal… Then it’s how you smack it or zing it or strum it… A lot of it is that too”: Session guitar legend Micheal Thompson reveals how he created the famous clean tone that’s on countless '80s and '90s hits “It’s jokingly the worst rap album in history because there are no lyrics on it at all”: It turns out there’s a reason why André 3000 turned up at the Met Gala with a grand piano on his back Years after Thom Yorke told her that she was “the only one doing anything interesting these days,” Billie Eilish has covered Radiohead’s Creep Essential digital access to quality FT journalism on any device Complete digital access to quality FT journalism with expert analysis from industry leaders Complete digital access to quality analysis and expert insights complemented with our award-winning Weekend Print edition Terms & Conditions apply Discover all the plans currently available in your country See why over a million readers pay to read the Financial Times Wesleyan students, faculty, and staff can RSVP on WesNest The SPLICE Ensemble features Keith Kirchoff on piano Focused on cultivating a canon of electroacoustic chamber music the group has previously premiered works by students of Professor of Music and Director of Graduate Studies Paula Matthusen in 2020 and has also performed Matthusen's works including site-specific recordings in Mammoth Cave in Kentucky.Featuring works by graduate music students Lea Bertucci and Carl Testa ’06. The evening also features a sound installation by Sam Boston ’25 It may not display all features of this and other websites Please upgrade your browser Login For assistance please contact Our Customer Service on: Tel: +44(0)20 8955 7020. Email: musicweek@abacusemedia.com Please enter your email so we can send you password reset link An email has been sent to you containing a link to reset your password Music creation platform Splice has acquired Spitfire Audio the UK-based developer of high-end virtual instrument libraries.  The acquisition marks Splice’s entry into the fast-growing plugin space adding to the company’s Splice Sounds subscriptions and rent-to-own businesses The plugin market alone is valued at $640 million while the wider music software and services sector exceeds $7 billion Since launching its Splice Sounds platform in 2015 Splice has become a key player in modern music production One million sounds are downloaded every day from its sample catalogue Splice has more than 10 million music producers and creators using its ethical AI-powered platform.  Founded in 2007, Spitfire Audio has become an established platform for composers, producers, artists and musicians. The British company provides virtual instrument libraries, including recordings by Hans Zimmer, Olafur Arnalds “The teams at Spitfire Audio and Splice have deep respect for composers, musicians and producers and are committed to celebrating and supporting their work”, said Kakul Srivastava, CEO of Splice Our shared vision is to develop tools that expand – not replace – human creativity “Our shared vision is to develop tools that expand – not replace – human creativity,” Srivastava added “With Spitfire’s expressive instruments and Splice’s AI-powered platform The companies are set to start work on new products that blend Spitfire Audio’s cinematic soundscapes and orchestral expertise with Splice’s sample catalogue and AI-powered discovery engine.  “We’ve always focused on inspiring people to create extraordinary music,” said Paul Thomson, who co-founded Spitfire Audio with Christian Henson The combined company is well positioned to capitalise on growth in the music creation market, which is projected to nearly double to $14 billion by 2031, according to MIDiA Research.  “Splice has already built an incredible business,” added Olivier Robert-Murphy “Joining forces means Spitfire Audio’s sounds will find new homes in studios around the world—whether that’s a bedroom producer or a blockbuster composer.” Both Splice and Spitfire Audio will continue to operate independently in the near term Robert-Murphy will remain CEO of Spitfire Audio while Thomson will continue to oversee Spitfire Audio’s creative direction PHOTO: (L-R) Paul Thomson and Kakul Srivastava (photo by Matthew Johnson) For more stories like this, and to keep up to date with all our market leading news, features and analysis, sign up to receive our daily Morning Briefing newsletter You are using an outdated browser. Upgrade your browser today or install Google Chrome Frame to better experience this site a group company of Sumitomo Electric Industries announced that it was named an honoree in the 2025 Lightwave+BTR Innovation Reviews for Lynx-CustomFit™ Splice-On Connectors.*1 Lynx-CustomFit™ was recognized as an innovative product that excels in both error-free assembly and reliability earning the highest score in the Optical Components category the largest global conference and exhibition for optical communications and networking professionals Our shared vision is to develop tools that expand – not replace – human creativity.” When you purchase through affiliate links on MusicTech.com, you may contribute to our site through commissions. Learn moreCredit: Matthew Johnson Get MusicTech breaking news as it happens by following us on Telegram: https://t.me/MusicTechOfficial the company says its decision to purchase Spitfire comes as the plugin market is valued at $640 million and the wider music software and services sector “exceeds $7 billion” Described as the “leading platform for music creation” Splice hosts a sample library with thousands of royalty-free sounds and a growing suite of AI tools to help creators “unlock inspiration experiment with sound and generate unique compositions” offers a collection of virtual instrument libraries including collections made in collaboration with Hans Zimmer “The teams at Spitfire Audio and Splice have deep respect for composers musicians and producers and are committed to celebrating and supporting their work” Our shared vision is to develop tools that expand – not replace – human creativity Splice and Spitfire are planning to release new products which “blend Spitfire Audio’s cinematic soundscapes and orchestral expertise with Splice’s sample catalogue and AI-powered discovery engine” “We’ve always focused on inspiring people to create extraordinary music,” says Paul Thomson The music creation market is reportedly projected to nearly double to $14 billion by 2031 [per MIDiA Research] Splice hopes to position itself to lead the market “Splice has already built an incredible business,” added Olivier Robert-Murphy “Joining forces means Spitfire Audio’s sounds will find new homes in studios around the world – whether that’s a bedroom producer or a blockbuster composer.” both Splice and Spitfire Audio will continue to operate independently for the time being Olivier Robert-Murphy will remain as Spitfire CEO while Paul Thomson will continue to oversee Spitfire Audio’s creative direction To read some FAQs surrounding the acquisition, head to Spitfire Audio. Get the latest news, reviews and tutorials to your inbox. The world’s leading media brand at the intersection of music and technology. Suggestions or feedback? RNA splicing is a cellular process that is critical for gene expression After genes are copied from DNA into messenger RNA portions of the RNA that don’t code for proteins are cut out and the coding portions are spliced back together This process is controlled by a large protein-RNA complex called the spliceosome MIT biologists have now discovered a new layer of regulation that helps to determine which sites on the messenger RNA molecule the spliceosome will target The research team discovered that this type of regulation which appears to influence the expression of about half of all human genes The findings suggest that the control of RNA splicing a process that is fundamental to gene expression is more complicated than it is in some model organisms like yeast even though it’s a very conserved molecular process There are bells and whistles on the human spliceosome that allow it to process specific introns more efficiently One of the advantages of a system like this may be that it allows more complex types of gene regulation,” says Connor Kenny an MIT graduate student and the lead author of the study Christopher Burge, the Uncas and Helen Whitaker Professor of Biology at MIT, is the senior author of the study, which appears today in Nature Communications allows cells to precisely control the content of the mRNA transcripts that carry the instructions for building proteins Each mRNA transcript contains coding regions They also include sites that act as signals for where splicing should occur allowing the cell to assemble the correct sequence for a desired protein This process enables a single gene to produce multiple proteins; over evolutionary timescales splicing can also change the size and content of genes and proteins when different exons become included or excluded is composed of proteins and noncoding RNAs called small nuclear RNAs (snRNAs) an snRNA molecule known as U1 snRNA binds to the 5’ splice site at the beginning of the intron it had been thought that the binding strength between the 5’ splice site and the U1 snRNA was the most important determinant of whether an intron would be spliced out of the mRNA transcript the MIT team discovered that a family of proteins called LUC7 also helps to determine whether splicing will occur but only for a subset of introns — in human cells it was known that LUC7 proteins associate with U1 snRNA There are three different LUC7 proteins in human cells and Kenny’s experiments revealed that two of these proteins interact specifically with one type of 5’ splice site which the researchers called “right-handed.” A third human LUC7 protein interacts with a different type The researchers found that about half of human introns contain a right- or left-handed site while the other half do not appear to be controlled by interaction with LUC7 proteins This type of control appears to add another layer of regulation that helps remove specific introns more efficiently “The paper shows that these two different 5’ splice site subclasses exist and can be regulated independently of one another,” Kenny says “Some of these core splicing processes are actually more complex than we previously appreciated which warrants more careful examination of what we believe to be true about these highly conserved molecular processes.” Previous work has shown that mutation or deletion of one of the LUC7 proteins that bind to right-handed splice sites is linked to blood cancers including about 10 percent of acute myeloid leukemias (AMLs) the researchers found that AMLs that lost a copy of the LUC7L2 gene have inefficient splicing of right-handed splice sites These cancers also developed the same type of altered metabolism seen in earlier work “Understanding how the loss of this LUC7 protein in some AMLs alters splicing could help in the design of therapies that exploit these splicing differences to treat AML,” Burge says “There are also small molecule drugs for other diseases such as spinal muscular atrophy that stabilize the interaction between U1 snRNA and specific 5’ splice sites So the knowledge that particular LUC7 proteins influence these interactions at specific splice sites could aid in improving the specificity of this class of small molecules.” Working with a lab led by Sascha Laubinger a professor at Martin Luther University Halle-Wittenberg the researchers found that introns in plants also have right- and left-handed 5’ splice sites that are regulated by Luc7 proteins The researchers’ analysis suggests that this type of splicing arose in a common ancestor of plants but it was lost from fungi soon after they diverged from plants and animals “A lot what we know about how splicing works and what are the core components actually comes from relatively old yeast genetics work,” Kenny says “What we see is that humans and plants tend to have more complex splicing machinery with additional components that can regulate different introns independently.” The researchers now plan to further analyze the structures formed by the interactions of Luc7 proteins with mRNA and the rest of the spliceosome which could help them figure out in more detail how different forms of Luc7 bind to different 5’ splice sites National Institutes of Health and the German Research Foundation This website is managed by the MIT News Office, part of the Institute Office of Communications Massachusetts Institute of Technology77 Massachusetts Avenue Metrics details A Publisher Correction to this article was published on 18 December 2024 This article has been updated Mutations that affect RNA splicing significantly impact human diversity and disease Here we present a method using transformers to detect splicing from raw 45,000-nucleotide sequences We generate embeddings with residual neural networks and apply hard attention to select splice site candidates enabling efficient training on long sequences in detecting splice sites in GENCODE and ENSEMBL annotations Using extensive RNA sequencing data from an Icelandic cohort of 17,848 individuals and the Genotype-Tissue Expression (GTEx) project our method demonstrates superior performance in detecting splice junctions compared to SpliceAI-10k (PR-AUC = 0.834 vs PR-AUC = 0.820) and is more effective at identifying disease-related splice variants in ClinVar (PR-AUC = 0.997 vs These advancements hold promise for improving genetic research and clinical diagnostics potentially leading to better understanding and treatment of splicing-related diseases it is difficult to scale them to long sequence context This is because self-attention scales quadratically with sequence length a For each position in an input DNA sequence the method looks at the surrounding context region and outputs a predicted score for three options: no splicing b Comparison of Transformer-45k with SpliceAI-10k on both ENSEMBL and GENCODE annotations with regard to area under the precision-recall curve (PR-AUC) and top-k accuracy 95% confidence intervals (CIs) are shown in brackets N denotes the number of splice sites in the test set the total size of the ENSEMBL test set is 664,940,000 nt c Receiver operating characteristic (ROC) curve and precision-recall curve for cases where SpliceAI and Transformer-45k disagree (TVD ≥0.1) d The total number of false positive and true positive splice sites as a function of the decision threshold for cases where SpliceAI and Transformer-45k disagree (TVD ≥0.1) We can also look at the top-k decision thresholds to look at the agreement between predicted splice sites here we see that the models agree on 175,825 splice sites and 664,757,322 non-splice sites and they disagree on 3599 splice sites and 3254 non-splice sites For cases where the the predictions disagree Transformer-45k has 0.609 accuracy (4172 correct sites) and SpliceAI-10k has 0.391 accuracy (2681 correct sites) The predictions are mostly in agreement except SpliceAI-10k does not detect the acceptor for the final exon our method detects 98.5% of junctions annotated in ENSEMBL and 71.8% of unannotated junctions while SpliceAI-10k with pre-trained weights detects 96.6% of the annotated junctions and 53.8% of unannotated junctions Transformer-45k fine-tuned only on GTEx splice site annotations detects 98.1% of junctions annotated in ENSEMBL and 67.6% of unannotated junctions a PR-AUC plotted against maximum distance from an sQTL to the closest splice site annotation b Precision-recall curve for sQTLs determined to be splice-disrupting or splice-creating c Precision-recall curve for 35,464 pathogenic splice variants in ClinVar d A scatter plot showing the distribution of delta scores for non-splicing variants (n = 40,528) and pathogenic splice variants (n = 35,464) This variability suggests a more complex prediction landscape for benign splice variants we have looked at predicting splice sites with transformers and shown that they can learn to utilize long sequence contexts to predict splicing with better classification accuracy than the current best splice site prediction methods in the literature We tested our method on splice site annotations from ENSEMBL and GENCODE and showed that it was able to predict splicing with greater accuracy than SpliceAI-10k both with regard to PR-AUC and top-k accuracy Focusing on the splice site predictions where our method disagreed with SpliceAI-10k we saw that our method makes fewer false positive predictions while making about as many true positive predictions By providing the transformer with a list of 512 potential splice sites we enable it to produce more accurate predictions than those achieved with SpliceAI alone This improvement may be attributed to the model’s ability to learn the dependencies between splice sites over a larger sequence context supporting the hypothesis that longer raw sequences are beneficial for capturing splice site interactions When classifying unannotated splice junctions and splice variants we found that fine-tuning the model on RNA-Seq data was necessary to achieve better performance than SpliceAI-10k This is likely due to our training set only consisting of protein-coding ENSEMBL transcripts Genes can have multiple transcript annotations and splice sites observed in RNA-Seq can come from any one of these transcripts The ENSEMBL annotations can be combined into a single gene annotation and this could improve performance for detecting splice sites in RNA-Seq data and many splice sites observed in RNA-Seq would still be missing from the annotations We conducted an additional experiment in Fig. 1b to determine whether the observed performance improvement can be attributable to the transformer architecture or the increased sequence context We trained a transformer model using a 10kb context and compared its performance to both SpliceAI and our 45kb context transformer model The 10kb context transformer model outperformed SpliceAI confirming that the transformer architecture contributes to more accurate predictions the 45kb context transformer model achieved the highest performance highlighting that an extended sequence context is a significant factor in improving model accuracy our method can be trained on larger contexts than 45,000 nt and there is no reason to assume that increasing the context further will not be beneficial The same applies to the number of selected splice sites and parameters such as depth and number of heads these models may need more training data or longer training to show any improvements over our current method A known issue with policy gradient methods is their tendency to exhibit high gradient variance this can slow down convergence or prevent the model from reaching optimal policies our model quickly learned a policy that selected almost all annotated splice sites reducing gradient variance could potentially further refine the policy and improve model performance we designed a splice site prediction method that utilizes transformers and showed that they can significantly improve the state-of-the-art Our method utilizes hard attention to reduce pre-mRNA sequences to a set of potential splice sites that have a manageable length for transformers to learn long-range dependencies between splice sites The model is trained on a about four times larger context than SpliceAI and Nucleotide Transformer v2 we showed that the Transformer-45k makes fewer false positive predictions than SpliceAI while predicting about as many true positives that Transformer-45k primarily attends to other annotated splice sites when performing splice site predictions when our method is fine-tuned on RNA-Seq data from a large Icelandic cohort and GTEx V8 it detects more unannotated splice junctions and pathogenic splice variants than SpliceAI Only protein-coding transcripts with one or more splice junctions were used and transcripts on chromosomes 1 In ENSEMBL we only selected transcripts with support level 1 This resulted in an ENSEMBL training set that has 22,375 transcripts and a GENCODE training set with 13,384 transcripts The corresponding ENSEMBL test set includes 8955 transcripts and the GENCODE test set includes 1652 Before training we removed 10% of transcripts from the training set and placed them in a validation set in the ENSEMBL-based annotations 21,432 splice junctions were selected into this set Nucleotides were one-hot encoded as as A = [1,0,0,0] The labels were encoded as ’no splicing’ = [1,0,0] Nucleotide sequences are stored in sparse arrays split by chromosome where nucleotides outside of genes are stored as zeros The array indices correspond to nucleotide chromosome position and to deal with negative-strand genes we reverse complement the nucleotide sequences on the fly This allows us to easily change the context and sequence length without needing to write a copy of the sequence to disk The proposed method consists of three main parts All three parts of the model are trained simultaneously from scratch and optimized with the following loss function: This combined loss function is designed to simultaneously quantify the models' proficiency at splice site classification (Cross-entropy loss) and the selector modules' ability to select relevant splice sites (Policy loss) The Policy loss is scaled by a factor λ and during training To train the model we use a 2D cross-entropy loss: where N is the length of the sequence context yi,j is a one-hot encoded splice site label (‘no splice’ The encoder module takes a pre-mRNA sequence as input and maps each position in the sequence to a 32-dimensional (32D) feature space based on its context. Nucleotides in the input sequence are one-hot encoded and mapped using a CNN that has the same architecture as SpliceAI-10k (Fig. 1a) the module can learn to encode information for each sequence position from its surrounding 10k context We base the encoder architecture on SpliceAI since it has been thoroughly tested and shown to be effective at splice site prediction This allows us to focus on designing other parts of the model where the policy πθ is the probability of taking action \({a}_{s}^{t}\) at step t and trajectory s given an embedding Xs and previous actions \({C}_{s}^{t-1}\) \({C}_{s}^{t-1}\) is an indicator vector that masks out previous actions and prevents the policy from selecting the same splice site twice S is the total number of trajectories and \({R}_{s}^{t}\) is the reward we found that using one trajectory for each sequence was enough to achieve stable training We want the module to select the annotated splice sites and also select promising functional splice sites that are not in the annotations annotated splice sites receive reward \({R}_{s}^{t}=1\) and other sites \({R}_{s}^{t}=0\) This ensures that the policy is not penalized for selecting non-splice sites An exception is made if the acceptor selector selects an annotated donor to discourage the selector from selecting the wrong splice type here a \({R}_{s}^{t}=-1\) penalty is given The policy is parameterized by a fully connected feed-forward network with 32D vector input one hidden layer with four units and a leaky ReLU activation The policy network learns to take embeddings from the encoder module as inputs and returns acceptor and donor site logits as outputs During training these logits are used to parameterize two categorical distributions The policy alternates between sampling acceptor and donor sites from the distributions until it has selected 512 potential splice sites we simply select the acceptors and donors with the largest logits The output of the transformer module is finally sent to the prediction head This is a convolutional layer with kernel size one and a softmax activation function it maps 32D feature maps down to three feature maps where the three possible outputs correspond to ‘no splice’ All models were trained for 10 epochs with the AdamW optimizer44 and with 96 samples per batch We used linear warm-up for the first 1000 optimization steps After five epochs the learning rate was reduced by half each epoch The model weights were randomly initialized ten times and trained Training the model for ten epochs with 3 NVIDIA A100 GPUs takes about 9 hours SpliceAI-10k was retrained on data and code made available by Jaganathan et al.8 The original model was implemented using Keras (version 2.0.5) with TensorFlow backend and is trained on a GENCODE annotations constructed by the authors we implemented the model using PyTorch and constructed a training set using ENSEMBL annotations The reported results for the methods trained on ENSEMBL are the average predictions of ten models To fine-tune the models on data from the Icelandic RNA-Seq cohort and GTEx V8 weights from the ENSEMBL dataset training runs were used as a starting point and trained for four additional epochs on splice sites obtained from RNA-Seq During fine-tuning all weights were kept trainable and the learning rate was set to 2e−4 The RNA-Seq data from the Icelandic cohort consist of 17,848 samples drawn from blood from the same number of individuals (9784 females 8064 males) collected using Illumina NovaSeq and HiSeq machines with read length 2 × 125 and poly-A mRNA isolation These samples were aligned separately to the maternal and paternal inherited genome references using STAR v2.5.3a we transferred the alignment files (BAM) to GRCh38 reference space (updating CIGAR and POS fields) merged the two files into a single BAM file and annotated the parental alignment with a higher alignment score as primary alignment The alignment files were scanned to detect splice sites from the CIGAR strings of primary alignment Alignment counts per splice site were gathered on the fragment level and annotated with information on multi-mapping and length of sequence overhang aligned to aside exons Splice sites were included if one individual fulfilled the following splice count requirements; (1) at least 4 fragments mapped (2) maximum of shorter overhang is larger than 7 base pairs (3) log2 entropy of left and right overhang length is larger or equal to two and (4) donor or acceptor site is within annotated gene boundary Using aggregated data from all individuals splice sites were filtered out if multi-mapped alignment excited more than 20% of mapped alignments or if the maximum fragment count was less than 5% of the expected transcript abundance After filtering 351,546 splice sites were used in subsequent analysis These sets of splice sites allowed us to quantify alternative splicing by calculating the percentage spliced in (PSI) per individual; the proportion of splice count divided by the total number of fragments aligned to any of the splice sites in the SOSJ A cis-sQTL scan was carried out by testing for association between PSI and sequence variants closer than 30kb to annotated gene overlapping SOSJ The most significant sequence variants associated with PSI were annotated as lead-sQTLs The cohort was a homogeneous population of 17,848 Icelanders (9784 females The year of birth (YOB) data was binned into 5-year intervals with the oldest participants born closest to 1920 and the youngest born closest to 2005 we adjusted for both technical covariates and kinship since the pedigree of Icelanders was available PSI values were adjusted for technical covariates (median coverage variation and age was evaluated as a potential covariate but excluded due to minimal contribution to PSI variation we detected 257,372 lead-sQTL of which 146,372 are within genes and pass a basic quality filter (REF ≠ ALT) We detect 80,976 lead-sQTLs with p-values below the Bonferroni threshold (\(\frac{0.05}{146,372}\)) 1588 sQTLs disrupt highly conservative splice motifs GT/AG while 2113 sQTL These variants are highly likely to truly affect splicing and we refer to them as splice-disrupting if they remove a splice motif and splice-creating if they create a splice motif we constructed a list of variants in the vicinity of the lead-sQTL that RNA sequencing never detects to affect splicing we randomly select one of these negative examples for each lead-sQTL The splice site annotations used for fine-tuning our model were constructed by combining RNA-Seq splice junctions detected in all 49 tissues in GTEx and the Icelandic blood samples Junction reads were selected if they were present in four or more individuals and if either end of the junction was present in the canonical transcript for a gene The combined set of splice site annotations consists of 360,601 acceptors and 359,934 donors from 17,239 genes using the same method to construct annotations using exclusively reads from tissues in GTEx V8 we identified 310,532 acceptors and 311,499 donors from 16,308 genes as the fraction of k positions that are correctly predicted to belong to a class where k is the number of positions truly belonging to the class and the decision threshold is chosen so that exactly k positions are predicted for this class To calculate 95% confidence intervals for PR-AUC and top-k accuracy we performed bootstrapping with 1000 samples where P and Q are probability distributions we visualized the attention in transformer encoders by calculating the average value of all attention matrices Statistical analyses were conducted to identify and replicate splicing quantitative trait loci (sQTLs) in our cohort compared to those reported in the GTEx V8 whole blood dataset significant sQTLs were determined using a false discovery rate (FDR) threshold of 5% (q-value < 0.05) to control for multiple testing Replicates were defined as the lead-sQTLs identified in GTEx that were also present in our dataset We assessed replication by testing these variants for association with the corresponding splicing events in our cohort A replication was considered successful if the variant showed a significant association at a Bonferroni-adjusted p-value threshold (\(\frac{0.05}{1,972}\)) The majority (94.2% [1858 out of 1972]) of lead-sQTLs from GTEx were replicated in our cohort indicating high reproducibility of the findings To compute the delta score we followed the procedure outlined by Jaganathan et al.8 We first calculate the difference between the predictions for an alternative sequence that includes a sequence variant and the prediction for the reference sequence Then the location and splice site with the highest absolute difference in either the acceptor or donor site predictions is located This difference is defined as the delta score and if the score is sufficiently high it indicates a splice site gain or loss at that location We downloaded the ClinVar variants in variant call format and selected variants that were marked as splice variants These variants were then labeled as pathogenic if their clinical significance was annotated as pathogenic or likely pathogenic and benign if their clinical significance was annotated as benign or likely benign This resulted in 35,464 variants labeled as pathogenic and 1001 labeled as benign To calculate PR-AUC for delta scores we used 40,528 variants as negative examples that had been determined to be highly unlikely to affect splicing based on differential splicing analysis in whole blood This research received approval from the National Bioethics Committee of Iceland (approval number VSN 14-015) and was conducted in accordance with guidelines from the Icelandic Data Protection Authority (PV_2017060950þS/–) Informed consent was obtained from all participants and an external party encrypted all personal identifiers before they were added to the deCODE database All ethical regulations relevant to human research participants were followed Local researchers from deCODE genetics in Iceland were actively involved throughout the research process The research was developed in collaboration with local partners to ensure its relevance to the Icelandic population and the broader scientific community The study did not involve any activities that are restricted or prohibited in the researchers’ setting Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article The Icelandic RNA-Seq data used in this study are not publicly available due to information that could compromise research participant privacy and releasing this information publicly is against Icelandic state law Other data supporting the findings of this study are available from the corresponding authors upon reasonable request A Correction to this paper has been published: https://doi.org/10.1038/s42003-024-07379-9 RNA splicing is a primary link between genetic variation and disease Pre-mRNA splicing in disease and therapeutics Recommendations for clinical interpretation of variants found in non-coding regions of the genome Pathogenic variants that alter protein code often disrupt splicing Improving genetic diagnosis in Mendelian disease with transcriptome sequencing Comparison of in silico strategies to prioritize rare genomic variants impacting RNA splicing for the diagnosis of genomic disorders Cadd-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores CI-SpliceAI—improving machine learning predictions of disease causing splicing variants using curated alternative splice sites Romero, D. W. et al. Towards a general purpose CNN for long range dependencies in ND. Preprint at https://arxiv.org/abs/2206.03398 (2022) Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2020) Highly accurate protein structure prediction with alphafold Effective gene expression prediction from sequence by integrating long-range interactions Dalla-Torre, H. et al. The nucleotide transformer: building and evaluating robust foundation models for human genomics. Preprint at bioRxiv https://doi.org/10.1101/2023.01.11.523679v1 (2023) DNABERT-2: efficient foundation model and benchmark for multi-species genome Flashattention: fast and memory-efficient exact attention with io-awareness Child, R., Gray, S., Radford, A. & Sutskever, I. Generating long sequences with sparse transformers. Preprint at https://arxiv.org/abs/1904.10509 (2019) Dai, Z. et al. Transformer-xl: attentive language models beyond a fixed-length context. Preprint at https://arxiv.org/abs/1901.02860 (2019) Beltagy, I., Peters, M. E. & Cohan, A. Longformer: the long-document transformer. Preprint at https://arxiv.org/abs/2004.05150 (2020) Big bird: Transformers for longer sequences Efficiently modeling long sequences with structured state spaces Sun, Y. et al. Retentive network: a successor to transformer for large language models. Preprint at https://arxiv.org/abs/2307.08621 (2023) Hyena hierarchy: towards larger convolutional language models Hyenadna: long-range genomic sequence modeling at single nucleotide resolution attend and tell: Neural image caption generation with visual attention In International Conference on Machine Learning 2048–2057 (PMLR Saccader: improving accuracy of hard attention models for vision The gtex consortium atlas of genetic regulatory effects across human tissues Clinvar: public archive of relationships among sequence variation and human phenotype Splicevault predicts the precise nature of variant-associated mis-splicing Predicting RNA splicing from DNA sequence using pangolin Gencode: the reference human genome annotation for the encode project Reinforcement Learning: An Introduction (MIT Press Ba, J. L., Kiros, J. R. & Hinton, G. E. Layer normalization. Preprint at https://arxiv.org/abs/1607.06450 (2016) On layer normalization in the transformer architecture In International Conference on Machine Learning Hendrycks, D. & Gimpel, K. Gaussian error linear units (gelus). Preprint at https://arxiv.org/abs/1606.08415 (2016) Annotation-free quantification of RNA splicing using leafcutter Jónsson, B. et al. Supplementary data for ’transformers significantly improve splice site prediction’. Zenodo https://doi.org/10.5281/zenodo.14109868 (2024) Jónsson, B. et al. Transformers significantly improve splice site prediction (figure data). figshare https://doi.org/10.6084/m9.figshare.27607056 (2024) Jónsson, B. et al. Spliceformer: transformer model for splice site prediction. Zenodo https://doi.org/10.5281/zenodo.14019451 (2024) Download references developed the method and designed statistical experiments oversaw the processing RNA-Seq data and analysis of sQTLs in the Icelandic cohort contributed to writing the final version of the manuscript All authors are employed by deCODE Genetics/Amgen Communications Biology thanks Peter ’t Hoen reviewer(s) for their contribution to the peer review of this work Primary Handling Editors: Laura Rodríguez Pérez and Johannes Stortz Download citation DOI: https://doi.org/10.1038/s42003-024-07298-9 Splice — valued in 2021 at nearly USD $500 million after securing $55 million in funding – has named pluggnb the fastest-growing music genre on its platform in 2024 The music creation platform has also confirmed that it clocked “nearly 350 million” downloads of its sound samples across all genres last year The company didn’t say whether that marked growth or decline from 2023. Back in 2020, when pandemic lockdowns caused indoor activities – like music creation – to spike in popularity, Splice reported 1.1 million daily downloads implying an annual pace of more than 400 million at the time In a new report co-authored with market research firm MIDiA and released on Wednesday (January 22) Splice said pluggnb – a blend of trap subgenre plugg and 1990s R&B – was the fastest-growing genre on its platform in 2024 as measured by downloads of pluggnb sample packs Splice says it tracks “hundreds” of music genres on its platform Downloads of pluggnb sample packs jumped 342.8% YoY in 2024 putting the genre ahead of second-place K-pop with 328.2% YoY growth (832,058 downloads) That was followed by house/hip-hop hybrid Jersey club (up 281.3% YoY to 1,298,679 downloads) thanks in part to the genre’s growing popularity among music creators in Berlin “Unofficial pluggnb remixes dominated TikTok in 2024 and led to adoption of the genre by K-pop heavyweights like LE SSERRAFIM and ILLIT,” the report said It also noted that some genres declined in popularity in 2024 with neo soul recording the largest drop in sample pack downloads – down 46.8% YoY That was followed by future soul (down 37.8% YoY) and dancehall (down 35% YoY) and Jersey club may be the fastest-growing genres with none of the three making Splice’s list of the top 10 genres by download The roughly 700,000 downloads of pluggnb sample packs amounts to a small fraction of the more than 48 million downloads of hip-hop sample packs And the report isn’t entirely convinced that pluggnb will be the next big thing in music “Pluggnb’s rapid ascent may raise questions about its long-term viability and sustainability It has benefited from trends in an environment – internet culture – where trends are often short-lived,” the report said pluggnb has laid a foundation strong enough to shape the sound of one of K-pop’s biggest hits of 2024 The coming year is a critical window for continued growth.” “Splice is uniquely positioned to see the sounds that are driving music production globally.” Splice’s list of top genres continues to be dominated by more established musical styles which retained their places as the most popular and second most popular genres Hip-hop sample packs were downloaded 48.7 million times pop music – which had a banner year in 2024 – experienced a decline on Splice’s platform dropping from the third most popular genre in 2023 to fifth place in 2024 “This is not to say that pop is on its way out; instead with new regional styles coming to the fore,” the report said “East Asian offshoots like K-pop and Japanese city pop are growing fast In its year-end report for 2024, market monitor Luminate identified pop as the fastest-growing core genre in the US Luminate attributed pop’s strength to acts like Taylor Swift and other female artists such as Billie Eilish Splice says its download data can be used to predict coming trends in music in part because the platform “overindexes” among creators aged 16 to 24 meaning that youth culture trends are easier to spot on the platform than elsewhere “The sounds that producers use to create today give us a window not only into the genres that will be big tomorrow but also early indications of entirely new genres that will emerge from the process of mixing samples across styles and cultures.” “Splice is uniquely positioned to see the sounds that are driving music production globally This report gives us a sense of how music producers are bending and blending genres in real time and just what genres and regions will define what we hear in 2025 and beyond,” Splice CEO Kakul Srivastava said Managing Director and Music Analyst at MIDiA said there is “perhaps no more forward-looking cultural trend” than sample usage “The sounds that producers use to create today give us a window not only into the genres that will be big tomorrow but also early indications of entirely new genres that will emerge from the process of mixing samples across styles and cultures,” he said “The genres that stand out in this report also underline wider trends: the growing importance of scenes and fan remixing in shaping the sounds of the future.” Metrics details Mutation or deletion of the U1 snRNP-associated factor LUC7L2 is associated with myeloid neoplasms and knockout of LUC7L2 alters cellular metabolism we show that members of the LUC7 protein family differentially regulate two major classes of 5′ splice sites (5′SS) and broadly regulate mRNA splicing in both human cell lines and leukemias with LUC7L2 copy number variation We describe distinctive 5′SS features of exons impacted by the three human LUC7 paralogs: LUC7L2 and LUC7L enhance splicing of “right-handed” 5′SS with stronger consensus matching on the intron side of the near invariant /GU while LUC7L3 enhances splicing of “left-handed” 5′SS with stronger consensus matching upstream of the /GU We validated our model of sequence-specific 5′SS regulation both by mutating splice sites and swapping domains between human LUC7 proteins Evolutionary analysis indicates that the LUC7L2/LUC7L3 subfamilies evolved before the split between animals and plants Analysis of Arabidopsis thaliana mutants confirmed that plant LUC7 orthologs possess similar specificity to their human counterparts indicating that 5′SS regulation by LUC7 proteins is highly conserved C Representative sequence logos of 5′SS for all cassette exons (left) and exons with significantly increased or decreased inclusion upon depletion of LUC7L2 or LUC7L3 which is defined as the number of consensus bases at positions +4/ + 5/ + 6 minus the number of consensus bases at –3/–2/–1 of a 5′SS E Mean phyloP score for human 5′SS of each possible 5′SS Balance score F Log-odds estimates of 5′SS differential inclusion versus skipping as a function of the 5′SS Balance score for each human LUC7 RNA-seq dataset Point estimates are derived from 100 bootstrap samples each containing 500 differentially included (FDR < 0.1 dPSI > 0) and 500 differentially skipped (FDR < 0.1 Error bars represent the 95% confidence intervals calculated as the standard deviation of the 100 bootstrap estimates Source data are provided as a Source Data file a non-constitutively bound U1 snRNP auxiliary factor presented the strongest metabolic phenotype and altered the splicing of several hundred exons The shift toward OXPHOS could be partially explained by changes in the splicing of key metabolic genes and their broad impact on the transcriptome human LUC7 proteins are absent from all published cryo-EM spliceosome structures suggesting that LUC7L2’s influence on splicing and cellular metabolism may contribute to leukemogenesis the molecular functions of the LUC7 paralogs remain incompletely understood we investigated the distinct roles of LUC7s in pre-mRNA splicing we found that different human LUC7 paralogs broadly impact the splicing of thousands of exons in a predictable Experiments and analyses in a broad range of systems demonstrate that two subfamilies of LUC7 proteins regulate two newly defined classes of “left-handed” (LH) and “right-handed” (RH) 5′SS in opposing manners helping to explain the distinct phenotypes of these paralogs These observations suggest 5′SS strength contributes at most minimally to exon responsiveness to Luc7 perturbations supporting that the 5′SS sequence of the impacted cassette exon is the primary feature that predicts LUC7 regulation and position +3 is ignored for the time being so that Balance scores are centered on 0 where LH 5′SS with more consensus matching on the exon side receive negative scores and RH 5′SS with more consensus matching on the intron side receive positive scores “Balanced” exons with equal consensus matching on both sides of the /GU motif receive a score of 0 these results imply that LH and RH 5′SS subclasses are abundant and stably maintained as subclasses of 5′SS supporting our conclusions that the splicing of LH 5′SS is promoted by LUC7L3 while RH 5′SS are promoted by LUC7L2 and LUC7L To systematically identify 5′SS motifs impacted by each LUC7 protein we measured the impact of LUC7 proteins on the enrichment of each 5′SS 9mer (spanning positions –3 to +6) in exons differentially included or skipped uses a Dirichlet-multinomial model to approximate the log-odds of a given 5′SS sequence occurring in significantly included or skipped exons versus unchanged exons within an RNA-seq experiment Positive enrichment values indicate over-representation of a 5′SS sequence in included exons and negative values indicate enrichment in skipped exons while values near zero indicate the absence of bias for inclusion/skipping (or the presence in unchanged exons only) A Volcano plot of 5′SS enrichment scores of individual 5′SS 9mers (each dot represents a 9-mer) from LUC7 meta-analysis (left) Sequence logos derived from significant 9mers (qvalue < 0.01) from LUC7 meta-analysis (right) B Receiving operator characteristic curve (ROC) of LUC7 score’s ability to discriminate held-out differentially included vs skipped splicing events (FDR < 0.1 C Distribution of LUC7 score and frequency for distinct 9mer 5′SS sequences (with /GT) in Gencode human genome protein-coding exons D Heat map showing Pearson correlation between new measures – 5′SS Balance and LUC7 score – versus standard 5′SS measures for /GT donors in protein-coding exons The individual PWMs for the LH and RH motifs generated from LUC7 protein RNA-seq data are weakly with the predicted minimum free energy of interaction with U5 and U6 snRNAs these findings indicate that the LUC7 Score quantitatively summarizes the sequence features of exons whose splicing depends on different LUC7 family members Sequence alignment of the LUC7L2 AE2 and LUC7L PTC exons revealed that they are 82% identical at the nucleotide level implying a common evolutionary origin; no comparable exon was found in LUC7L3 A Schematic of the pSpliceExpress minigene construct used in which an internal exon of interest with flanking splice sites is inserted into a minigene expressing rat insulin exons 2 and 3 along with the intervening intron B Bar plots of the cassette exon’s mean percent spliced in (by RT-PCR with primers in flanking exons) across all experiments; bars are color-coded to match LUC7 paralog colors in (E) C Representative gel images from LUC7-related minigenes Percent Spliced In values are shown at the bottom of each lane D Mean PSI for the LUC7L2 AE2 minigene with mutagenized 5′SS (left) and representative RT-PCR gel images (right) E Proposed regulatory relationships between human LUC7 family members F Bar plot of PSI for the SNRPC E2 minigene with wildtype RH 5′SS (left) G Bar plot of PSI for the XPA E3 minigene with wildtype LH 5′SS (left) For all minigene experiments reported (panels B error bars represent the standard error of the mean These results indicate that RH 5′SS are necessary, but not sufficient, to confer positive regulation by LUC7L or LUC7L2, but may be sufficient to confer repression by LUC7L3 (Fig. 3G) our mutagenesis experiments suggest that the “handedness” (LH or RH character) of the 5′SS sequence is a key determinant of regulation by LUC7 family members A eCLIP enrichment of LUC7L2 around windows of constitutive and cassette exon splice sites with crosslinks aggregated into 10 nt bins (top) Pearson correlation between LUC7 score and LUC7L2 eCLIP enrichment is shown with significant correlations (p < 0.05) identified by Pearson’s correlation test B Protein domain structure of human LUC7 proteins and experimentally investigated chimeric proteins C Change (“delta”) in percent spliced in (qRT-PCR) from transfected minigenes containing different internal exons of introns following overexpression of LUC7 WT or chimeric cDNA (shown at right) Mean is plotted with error bars representing standard error of the mean and bars are color-coded by the Balance score of the internal exon’s 5′SS D Heat map of Pearson correlations of delta percent spliced values for each LUC7 WT or chimeric cDNA overexpressed in (C) our results indicate that the C-terminal regions of these paralogs perform similar A Normalized gene expression values for human LUC7 family in LAML samples colored by whether samples possess a LUC7L2 CNV loss (CNV log2 value < –0.5) A two-sided t-test was used to test differences in LUC7 paralog expression between LUC7L2+/+ (n = 159) and LUC7L2 CNV loss (n = 14) B Overlap of LUC7L2 CNV loss samples and LUC7L2Low expression samples C Mean dPSI per 5′ splice site sequence for differentially spliced exons when comparing LUC7L2Ctrl versus LUC7L2Low expression samples A linear regression line (black) with a 95% confidence interval is shown and the Pearson correlation and Pearson correlation test are displayed D Gene set enrichment analysis of differentially expressed genes comparing LUC7L2Low versus LUC7L2Ctrl expression samples E Copy-number variation analysis for LUC7L2 and LUC7L3 loci across all TCGA cancer types Bonferroni-corrected two-sided Wilcoxon-rank sum test boxplots display the data distribution within each group The box represents the interquartile range (IQR) spanning from the 25th percentile (Q1) to the 75th percentile (Q3) with the horizontal line inside the box indicating the median (50th percentile) The whiskers extend to the most extreme data points within 1.5 times the IQR from Q1 and Q3 these analyses show that LUC7L2Low AMLs inefficiently splice RH 5′SS relative to LH 5′SS supporting a role for reduced LUC7L2 levels in shaping the transcriptomes of these tumors kidney renal papillary cell carcinoma (KIRP) had increased copy number of LUC7L3 Some of these observations reflect well-established chromosomal aberrations commonly found in specific tumor subtypes which is common in GBM and SKCM associated with EGFR amplification would also yield an additional copy of LUC7L2 in these tumors Changes in LUC7 expression due to CNVs may generally contribute to the observed splicing variation found in many cancer subtypes and might contribute to metabolic changes as well A Maximum likelihood phylogenetic tree built from multiple sequence alignment of Luc7 proteins from 33 animal adding Trichomonas vaginalis Luc7 protein as an outgroup Two main clusters are shaded to indicate Luc7 subfamilies; individual proteins represented by symbols indicating clade of origin B Representation of presence/absence and likely duplication/loss events for Luc7 subfamilies overlaid on the eukaryotic phylogenetic tree C Correlation matrix of 5′SS models learned from dinucleotide features of 5′SS differentially spliced in human and plant Luc7 RNA-seq experiments D Sequence logos of top or bottom 10% unique 5′SS sequences identified by dinucleotide 5′SS models E Residue conservation estimated from multiple sequence alignment of LUC7L2-type proteins from 6A) projected onto the yLUC7p (from Bai et al. To assess whether evolutionarily related LUC7 proteins possess analogous 5′SS specificities we performed RNA-seq on every possible combination of Arabidopsis thaliana luc7 single and carried out differential splicing analysis No bias for LH or RH 5′SS in differentially skipped or included exons was observed in the luc7 triple mutant These observations indicate that the two subfamilies of LUC7 proteins in plants have distinct activities on 5′SS subclasses While the 5′SS subclasses impacted by human and plant LUC7 proteins are very similar overall, we do observe some subtler species-specific differences. For example, Arabidopsis LUC7RL promotes LH 5′SS with –1 G, rather than LH 5′SS with a –2 A/–1 G pair promoted by human LUC7L3 (Fig. 6D) our observations support that orthologous human and A thaliana LUC7 proteins have largely retained their ancestral specificities for specific 5′SS subclasses over 1.5 billion years of evolution A Mean position-specific information content (calculated as in Irimia et al., 201938) of GT-type 5′SS motifs color-coded by animals (n = 13) Error bars represent standard error of the mean B Density plot of the distribution of 5′SS subclass frequencies each organism vertical dashed line reflects mean LUC7 score of the clade C Frequency of dinucleotide features that define classical LH and RH 5′SS subclasses in 5 representative eukaryotes Source data are provided as Source Data file these data suggest long-term coevolution between LUC7 subfamilies and 5′SS subclasses with depletion of LH 5′SS and concomitant loss of the LUC7L3-subfamily occurring early in fungal evolution Our minigene experiments validate that the influence LUC7 proteins have on pre-mRNA splicing is dependent on specific nucleotide features of 5′SS which are succinctly summarized by the 5′SS Balance score Our domain-swapping experiments reveal that LUC7 structured regions are largely sufficient to confer specificity for 5′SS subclasses The most straightforward model is a U1 stabilization model in which the LUC7L/LUC7L2 structured regions stabilize the interaction of U1 with RH 5′SS and the LUC7L3 structured region stabilizes U1 interactions with LH 5′SS a model in which LUC7L/LUC7L2 and LUC7L3 preferentially destabilize interactions with LH and RH 5′SS These findings are consistent with our own experimental and evolutionary observations in which the functionally similar LUC7L and LUC7L2 promote splicing of exons with fungal-like RH 5′SS while the human LUC7L3 ZnF2 promotes recognition of 5′SS unlike those seen in the budding yeast genome so regulation of 5′SS subclasses at later stages of splicing is also possible the absence of U1-associated proteins like LUC7L3 from S pombe suggests that these common model organisms of splicing may not recapitulate all aspects of mammalian 5′SS choice and reasonably well-defined rules of pre-mRNA splicing make it a compelling target for the therapeutic modulation of gene expression Our findings linking the activity of LUC7 proteins with specific 5′SS sequence features may have implications for the future advancement of small molecule regulators of splicing our proposed mechanism in which LUC7 proteins modulate splicing via recognition of U1:5′SS RNA duplex structures implies that LUC7 proteins will likely influence the specificity of small molecules that stabilize U1 snRNP:5′SS interactions the synthetic lethal relationship between LUC7L and LUC7L2 suggests that splicing therapeutics specifically targeting the recognition of RH 5′SS will be more effective in AML patients with monosomy 7 or LUC7L2 mutations LUC7L2 and LUC7L3 ORFs were amplified from human cDNA and cloned into pcDNA3.1(+)IRES GFP (Addgene #: 51406) Domain swap constructs were synthesized as gblocks from IDT and cloned into pcDNA3.1(+)IRES GFP Exons and flanking intronic regions used for pSpliceExpress minigenes were PCR amplified from human male genomic DNA using primers with attB overhangs and subsequently recombined into pSpliceExpress using BP Clonase II (Thermo Fisher HEK293T cell line authentication was performed at ATCC using STR profiling and referenced to ATCC’s internal database HEK293T RMCE cell lines were cultured in Advanced DMEM supplemented with 5% FBS 25 mM HEPES and Glutamax and tested negative for mycoplasma cells were plated 24 h in advance in 24-well plates cells were transfected with 500 ng of 95:5 w/v of a cDNA overexpression vector and minigene reporter respectively using Lipofectamine LTX (Thermo Fisher RNA was extracted 24 h after transfection using Qiagen RNeasy Mini kit (cat 74104) according to manufacturer’s instructions with the optional on-column DNAse digestion (cat RNA was eluted in nuclease free water and quantified using Nanodrop we used 125 ng of RNA input into a 12.5 µL LunaScript Multiplex One Step Master Mix for RT-PCR (cat we mixed PCR samples with NEB 6X loading dye and loaded 5 µL of PCR products on a 3% agarose gel infused with ethidium bromide Images were acquired using Azure Biosystems c600 with UV imager Agarose gel images were manually quantified using chromatograms in ImageJ Percent Spliced In values were calculated by taking the signal intensity of the larger band and dividing it by the sum of the signal intensity of the included product and the skipped product LUC7L2 and LUC7L3 ORFs were transfected into HEK293 RMCE cells as described above RNA was eluted in nuclease-free water and quantified using Nanodrop Illumina-compatible libraries were prepared by MIT BioMicroCenter using NEB II Ultra Directional RNA with poly(A) selection and sequenced on NovaSeq 6000 with 2 × 150 bp reads only a subset of exons likely change in splicing are direct summaries of the read counts and are affected by sampling variation in the read counts which may artificially inflate changes for exons with low read counts shrinkage estimates are used in differential expression analyses to account for this issue Shrinkage considers the set of all effect sizes to constrain the noise in estimates from low read count events requires parameter estimates and associated uncertainty we reconstruct the effect size in log-odds scale δ from the rMATS read counts and approximate a standard deviation σ describing the uncertainty in δ using the rMATS p-value We pass these effect sizes and standard deviations to ashr using the ‘normal’ option which assume that the proportions of up- and down-regulated exons are equal (Stephens we reconstruct an estimate of Δψ∗ (see Supplemental Methods) We used a Dirichlet-multinomial model to calculate the log-odds of whether a given 5′SS was more likely to be involved in a significantly included event vs significantly skipped event we excluded all events with fewer than 10 junction-count reads on average across all samples we combined the counts of 5′SS from significant included and skipped events (FDR < 0.1) and their respective background sets which consisted of an equal number of unregulated 5′SS that were matched for both PSI and expression level we added a pseudocount of 1 to every observed 5′SS sequence and accounted for class imbalance by dividing each column by a weight that reflected the fraction of included or skipped events if there were 4,000 significantly included events and 1000 significantly skipped events the significantly included counts and their respective background set were divided by 0.8 and skipped events and their respective background were divided by 0.2 we used this count matrix as the alpha parameters for Dirichlet-multinomial model and simulated drawing from the posterior distribution 2500 times For each draw we calculated the log-odds of a given 5′SS being enriched in the included versus skipped set The posterior distribution of log odds generated from the Dirichlet-multinomial model was used to calculate the posterior mean and the posterior standard deviation which were both passed to ashr for shrinkage using the uniform option we plotted the PosteriorMean estimates in 5′SS Enrichment plots which can be directly interpreted as the log-odds of a given sequence occurring in the differentially included exon set over the differentially skipped exon set we used scaled 5′SS enrichment scores to cluster 5′SS sequences by their activity across each LUC7 paralog RNA-seq dataset using Euclidean distance and ward.D2 linkage The events were then aggregated into a single table such that each RNA-seq data was equally represented in the final 5′SS enrichment analysis To account for opposing effects on LH and RH 5′SS subclasses for different experiments the direction of differentially included and skipped events (and their respective background sets) from LUC7L3 KD LUC7L OE and LUC7L2 OE analyses were flipped such that included events represented RH 5′SS and skipped events represented LH 5′SS Then we performed a 5′SS enrichment analysis as described above using a Dirichlet-multinomial model and simulated drawing from the posterior 10,000 times the distribution of log odds generated from the Dirichlet-multinomial model was used to calculate the posterior mean and the posterior standard deviation we plotted the PosteriorMean estimates in 5′SS enrichment plots and the associated qvalue LH 5′SS (LUC7L2-repressed/LUC7L3-promoted) were defined as 5′SS with negative meta-5′SS enrichment scores with qvalue < 0.01 and and RH 5′SS (LUC7L2-promoted/LUC7L3-repressed) were defined as 5′SS with positive meta-5′SS enrichment scores with qvalue < 0.01 Sequence logos were created from LH and RH 5′SS sequences identified from meta-5′SS enrichment analyses Individual PWM were created by calculating the observed frequency of a nucleotide at a given position assuming a uniform nucleotide distribution Pseudocounts of 0.1 were used to avoid division by zero The LUC7 score was calculated by taking the ratio of the LUC7L2-promoted/LUC7L3-repressed RH PWM over the LUC7L3-promoted/LUC7L2-repressed LH PWM We used ViennaRNA 2.534 to model free energy predictions between all 5′SS and the 5′end of U1 snRNA (ATACTTACCUG) U6 snRNA (ATACAGAGA) and U5 snRNA loop 1 (GCCUUUUAC) using RNAcofold with default parameters Publicly available LUC7L2 eCLIP data was downloaded from European Read Archive (PRJNA663333) 10 nucleotide UMIs were extracted from reads and appended to read name Then Illumina adaptors were removed from reads using cut-adapt with the following settings (-a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \ --minimum-length 18 \ --quality-cutoff 6 \ --match-read-wildcards \ -e 0.1) This was performed twice to ensure the complete removal of adaptor sequences we created a genomic index using GRCh38.primary_assembly.genome.fa and gencode.v38.primary_assembly.annotation.gtf with --sjdbOverhang 65 Trimmed eCLIP reads were then aligned to this genome using STAR with default settings and resulting bam files were then deduplicated with umi-tools Crosslink counting was restricted to cassette exons observed in LUC7L2 KO datasets around 250 nt windows surrounding four splice sites (upstream 5′SS cassette exon 5′SS and downstream 3′SS) and crosslinks in 10 nt bins were summed together 2 pulldowns and 2 size-matched (SM) inputs were considered Crosslink counts in each library were first normalized by library size Cassette exons were then stratified into 10 bins using their 5′SS LUC7 score (1 = most LH 10 = most RH) and the number of crosslinks for each LUC7 score bin were summed together Position-based eCLIP enrichment was calculated as the ratio of normalized eCLIP crosslinks to normalized SM-input crosslinks at each 10 nt bin eCLIP enrichment within each 10 nt bin was then correlated with the mean LUC7 score of binned cassette exons For each of the species used in our phylogenetic analysis we downloaded their genomes and associated gene annotation files from Ensembl and extracted their splice site sequences using bedtools 2.29.2 we filtered for only /GT splice sites and calculated the information content per position of the 5′SS luc7a-1 luc7rl-1 and luc7b-1 luc7rl-1) and a luc7 triple mutant (luc7a-2 luc7b-1 luc7rl-1) were surface sterilized with chlorine-gas and then grown on half-strength Murashige Skoog (MS) plates containing 0.8% phytoagar in continuous light at 22 °C for 10 days Seedlings were collected and flash-frozen in liquid nitrogen Total RNA was isolated using RNeasy® Plant Mini Kit (Qiagen 74904) according to the manufacturer’s instructions mRNA stranded library preparation and sequencing (PE150) was done by Novogene (Cambridge United Kingdom) using an Illumina Novaseq6000 system To validate the alternative splicing found in RNA-seq 1 µg of RNA was treated with DNase I (Thermo Fisher Scientific) cDNA synthesis was carried out using the RevertAid First Strand cDNA Synthesis kit (Thermo Fisher Scientific) with 100 µM oligo-dT RT-PCR was then performed using Taq DNA Polymerase and the products were analyzed on a 2% agarose gel Gene set enrichment was performed in the Xena browser web portal The same 16 samples identified as LUC7L2 low-expressing AMLs were selected and compared to the remainder of the cohort with available data To identify and score dinucleotide features contributing to 5′SS inclusion or skipping we calculated the frequency of adjacent and non-adjacent dinucleotide pairs in the 5′SS of differentially spliced exons normalized by the background frequency from a set of unregulated exons The log-odds ratio of a dinucleotide feature’s occurrence in differentially included versus skipped exons was calculated and the variance was estimated by sampling the unregulated set 100 times The resulting mean and standard deviation were shrunk using the ashr package and only significant dinucleotides (q-value < 0.01) were retained the posterior log-odds of dinucleotides in the 9mer were summed to score the 5′SS Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article The code required to perform shrinkage on differential splicing analyses can be found at https://gitlab.com/LaptopBiologist/spliceformats The 5′ terminus of the RNA moiety of U1 small nuclear ribonucleoprotein particles is required for the splicing of messenger RNA precursors Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes Quantitative activity profile and context dependence of all human 5′ splice sites Loss of LUC7L2 and U1 snRNP subunits shifts energy metabolism from glycolysis to OXPHOS a novel yeast U1 snRNP protein with a role in 5’ splice site recognition The U1 snRNP-associated factor Luc7p affects 5’ splice site selection in yeast and human Structure–function analysis and genetic interactions of the Luc7 subunit of the Saccharomyces cerevisiae U1 snRNP Structures of the fully assembled Saccharomyces cerevisiae spliceosome before activation Prespliceosome structure provides insights into spliceosome assembly and regulation A unified mechanism for intron and exon definition and back-splicing Functional analyses of human LUC7-like proteins involved in splicing regulation and myeloid neoplasms Functional analysis of a chromosomal deletion associated with myelodysplastic syndromes using isogenic human induced pluripotent stem cells Putative RNA-splicing gene LUC7L2 on 7q34 represents a candidate gene in pathogenesis of myeloid malignancies Maximum entropy modeling of short sequence motifs with applications to RNA Splicing Signals Comparative analysis detects dependencies among the 5’ splice-site positions but be quick: 5’ splice sites and the problems of too many choices m6A modification of U6 snRNA modulates usage of two major classes of pre-mRNA 5’ splice site The U1 spliceosomal RNA is recurrently mutated in multiple cancers Crossregulation and Functional Redundancy between the Splicing Regulator PTB and its paralogs nPTB and ROD1 Poison exon splicing regulates a coordinated network of SR protein expression during differentiation and tumorigenesis Genomics of deletion 7 and 7q in myeloid neoplasm: from pathogenic culprits to potential synthetic lethal therapeutic targets Complex landscape of alternative splicing in myeloid neoplasms Mitochondrial metabolism as a potential therapeutic target in myeloid leukaemia The U1 snRNP subunit LUC7 modulates plant development and stress responses via regulation of alternative splicing Introns and splicing elements of five diverse fungi A computational analysis of sequence features involved in recognition of short introns ERISdb: A Database of Plant Splice Sites and Splicing Signals Functional Analysis of the Zinc Finger Modules of the S A single m6A modification in U6 snRNA diversifies exon sequence at the 5’ splice site Extended base pair complementarity between U1 snRNA and the 5’ splice site does not inhibit splicing in higher eukaryotes but rather increases 5’ splice site recognition An RNA Switch at the 5′ Splice Site Requires ATP and the DEAD Box Protein Prp28p rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data MAFFT multiple sequence alignment software version 7: improvements in performance and usability RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models ggtree: an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data Coevolution of genomic intron number and splice sites Download references We thank the staff of the MIT BioMicro Center for Illumina NovaSeq library preparation and sequencing and Gordon Simpson for their helpful discussions This work was supported by grant GM085319 from the NIH (C.B.B.) and DFG grant LA2633-4/2 and 400681449/GRK2498 TP13 (S.L.) and analyzed all experiments and computational analyses under the supervision of C.B.B conducted mouse and human evolutionary analyses under the supervision of C.J.K assisted with statistical approach development and eCLIP analysis performed Arabidopsis RNA-seq and qRT-PCR experiments wrote the manuscript with input from all authors All authors contributed to manuscript revisions is a member of the Scientific Advisory Board of Remix Therapeutics and has equity interests in Remix Therapeutics and Arrakis Therapeutics: both companies are developing small molecule therapeutics targeting RNA The authors claim no other competing interests with respect to this work Download citation DOI: https://doi.org/10.1038/s41467-025-56577-4 Metrics details Splice site recognition is essential for defining the transcriptome Drugs like risdiplam and branaplam change how human U1 snRNP recognizes particular 5′ splice sites (5′SS) and promote U1 snRNP binding and splicing at these locations Despite the therapeutic potential of 5′SS modulators the complexity of their interactions and snRNP substrates have precluded defining a mechanism for 5′SS modulation We have determined a sequential binding mechanism for modulation of −1A bulged 5′SS by branaplam using a combination of ensemble kinetic measurements and colocalization single molecule spectroscopy (CoSMoS) Our mechanism establishes that U1-C protein binds reversibly to U1 snRNP and branaplam binds to the U1 snRNP/U1-C complex only after it has engaged with a −1A bulged 5′SS Obligate orders of binding and unbinding explain how reversible branaplam interactions cause formation of long-lived U1 snRNP/5′SS complexes and its action depends on fundamental properties of 5′SS recognition it is thought that these drugs enhance U1 snRNP affinity for the SMN2 exon 7 5′SS which in turn promotes spliceosome assembly these drugs appear to modulate the SS recognition process to ultimately alter the nucleotide sequences of spliced mRNA products by how quickly U1 can associate with a 5′SS and how long it may remain bound in order to promote spliceosome assembly before the RNA is degraded Splice site modulation by drugs such as branaplam is likely also restricted to the same To elucidate 5′SS recognition and modulation in humans we reconstituted a model human U1 snRNP and assayed its interactions with RNA oligos in the presence and absence of branaplam A combination of surface plasmon resonance (SPR) microscale thermophoresis (MST) and colocalization single molecule spectroscopy (CoSMoS) assays reveals how 5′SS containing a bulged adenosine at the −1 position (−1A) are recognized and modulated by drugs working collaboratively with protein splicing factors Branaplam reversibly binds to the U1 snRNP/5’SS complex and drug modulation of this complex is strictly dependent on reversible binding of U1-C U1-C in turn can only bind to the snRNP if the 5′SS has not yet been engaged our sequential binding mechanism predicts that 5’SS modulation by branaplam depends on an ordered series of events: U1-C binds to U1 snRNP and finally branaplam binds to the U1 snRNP/U1-C/5′SS ternary complex This mechanism reveals how a reversibly binding splicing modulator can elicit formation of long-lived U1 snRNP/5′SS interactions as well as fundamental features of human 5′SS recognition A Crystal structure of minimal U1 snRNP (PDB: 4PJO) Arrows indicate relative placement of modifications for single molecule measurements Shaded region indicates predicted base pairing interactions with the U1 snRNA (shown at the top above a schematic of the exon (box)/intron (line) junction) C SPR sensorgrams showing the association and dissociation of surface-tethered (top) 11bp and (bottom) 11bp-1A RNAs at various U1 snRNP concentrations (0.02 to 100 nM) D Cartoon schematic of the two-color CoSMoS assay for monitoring U1/RNA interactions immobilized U1 snRNP molecules) and 532 nm (right interacting RNA oligos) excitation (scale bar = 20 µm) Inset highlights colocalization (scale bar 1 µm) Fluorescent beads were included as fiducial makers (yellow arrow) Images are rendered by averaging three consecutive images and applying uniform brightness and contrast values F Fluorescence in arbitrary units (au) across time in seconds (s) showing the binding of 9bp (top) and 9bp-1A (bottom) to surface tethered U1 snRNP (0.33 frames/s G Linear regression (solid line) of keq values (circle) on 9bp concentration The shaded region is the 95% confidence interval of the linear regression the kon value is fixed from maximum likelihood estimations of unbound dwell times (see F) at kon = 3.9 ×106 M−1s−1 H Cumulative probability distributions of (left) unbound dwell times (0.5 nM N = 4373) and (right) bound dwell times (0.5 nM N = 4895) across range of 9bp-1A RNA concentrations I Cumulative probability distribution of bound dwell times of the 9bp-1A RNA at 1 nM (grey N = 5216) overlaid with MLE of single (blue dashed) and biexponential (red) distributions Source data are provided as a Source data file we used single-molecule co-localization spectroscopy (CoSMoS) to observe U1 snRNP/5′SS interactions we reconstituted our U1 snRNP particle without U1-C The U1-C zinc-finger domain was then separately purified and added directly into solution as required (typically at 100 nM unless otherwise noted; U1-C binding was also analyzed in depth as described subsequently) This demonstrates that non-specific binding did not meaningfully impact our analysis under the experimental conditions The slope was constrained at the kon value from MLE resulting in a koff = 6.4 ± 2.3 × 10−4 s−1 we estimate a KD ≈ 1.2 × 10−10 M for a 9bp 5′SS RNA which closely matches our SPR data of an 11bp 5′SS RNA (KD = 2.03 × 10−10 M) The similarity of these values provides high confidence that neither photobleaching nor surface immobilization is significantly impacting our analysis Together the SPR and CoSMoS data indicate that U1 snRNP binds highly complementary RNAs very tightly with a KD of ~100 pM and this affinity is primarily attributable to formation of very stable bound complexes with lifetimes of ~27 min rather than rapid association kinetics These data also indicate that additional base pairing interactions between the +7 and+8 positions of a 5′SS with the AU dinucleotide present at the 5 end of the snRNA do not necessarily confer significant changes in the dissociation constant At the single molecule level, the introduction of the -1A bulge (9bp-1A) into the 5′SS results in dynamic RNA binding to U1 snRNP (Fig. 1F, Supplementary Fig. 7) Given the faster kinetics from the weaker binding we were able to perform equilibrium measurements whereby imaging commenced after equilibrium was reached we see only a slight decrease in kon to 2.9 ± 0.2 × 106 M−1 s−1 indicating the energetic penalty of -1A bulge stems from duplex stability and not recruitment the 9bp-1A 5′SS RNA exhibits a koff of 3.4 ± 0.2 × 10−3 s−1 and KD = 1.2 × 10−9 M These results are also close to our SPR data for the 11bp-1A 5′SS RNA and confirm an order of magnitude reduction in kinetic stability from the -1A substitution A RNA oligos containing -1A bulged 5′SS Shaded region indicates predicted base pairing interactions with the U1 snRNA (top) C SPR showing the association and dissociation of 11bp-1A RNA across various concentrations of branaplam (0–5 µM) and at 10 nM U1 snRNP D Dose response curve showing the fitted dissociation rates (koff) of response units vs branaplam concentration for 11bp-1A (white circles) and 11bp (black circles) RNAs Data for 11bp-1A is overlaid with the fitted equation to determine EC50 value (1.3 ± 0.1 µM) E Fluorescence in arbitrary units (au) across time in seconds (s) of 1 nM 9bp-1A RNA binding to immobilized U1 in the presence of 100 nM U1-C plus (top) DMSO (0.33 Hz) or (bottom) 10 µM branaplam (0.11 Hz) overlaid with idealizations (black lines) F Dose response curves of 9bp-1A RNA binding to U1 snRNP vs branaplam concentration The fraction bound at each branaplam concentration (white cirlcle) is overlaid with a fitted equation (solid red line) and the 95% confidence interval of the fitted equation (shaded red region) to estimate an EC50 value (0.45 ± 0.17 µM) G Cumulative probability distribution of bound dwell times across branaplam concentrations (DMSO H Cumulative probability distribution of bound dwell times at 1 nM 9bp-1A and 100 nM U1-C (grey circles N = 3640) in presence of 1 µM branaplam overlaid with MLE of mono (blue dashed) and biexponential distributions (solid red) I Maximum likelihood estimates of (left) time constants (\({\tau }_{B}^{1}\) and \({\tau }_{B}^{2}\)) and (right) amplitude of \({\tau }_{B}^{2}\) of a biexponential distribution for bound dwell times at 1 nM 9bp-1A RNA and 100 nM U1-C across branaplam concentrations (mean ± SEM) Plotted parameters are computed across all single molecules for each branaplam concentration (DMSO J Contour plots showing the correlation between successive bound event durations (\(i\) and \(i+1\)) within individual molecules consistent with branaplam specifically perturbing 5′SS RNA dissociation an order of magnitude increase in \({\tau }_{B}^{2}\) for the 9bp-1A 5′SS RNA is observed between the absence (342 ± 8 s) and presence of 10 µM branaplam (4426 ± 458 s) the SPR and CoSMoS data indicate that branaplam does not facilitate -1A 5′SS RNA association and primarily functions to stabilize the U1 snRNP/-1A 5′SS RNA complex only a subset of U1 snRNP/-1A 5′SS interactions are branaplam-sensitive This result suggests the presence of two types of U1 snRNP molecules on the surface We were able to robustly detect the smaller population due to the analysis of many thousands of single molecule binding events and by avoiding ensemble averaging which would have obscured their presence the relative amplitudes from MLE of our bound time distributions in the presence of branaplam would presumably correspond to the fractions of U1 snRNP molecules without (the amplitudes of the short-lived parameters) and with U1-C (the amplitudes of the long-lived parameters) A Fluorescence in arbitrary units (au) across time in seconds (s) of 1 nM 9bp-1A binding in the absence of U1-C with (top) DSMO and (bottom) 10 µM branaplam overlaid with idealizations (black lines) B Violin plots of bound dwell time distributions at 1 nM 9bp-1A RNA with and without U1-C and/or branaplam DMSO was included in the absence of branaplam C Change in fluorescence due to branaplam binding to a duplex of 11bp-1A and U1 snRNP in the absence (grey triangles) and presence (white circles) of U1-C by microscale thermophoresis (MST) The change in fluorescence in the presence of U1-C is overlaid with a fitted equation (solid purple) and 95% confidence interval of the fitted equation (shaded purple region) to estimate a KD value (2.69 ± 0.36 µM) D Violin plots showing the bound dwell time distributions across different permutations of U1-C and branaplam concentrations in solution across indicated RNA oligo sequences (SMN2 Each violin plot is overlaid with box plot that show the median (horizontal line) and whiskers representing data within 1.5\(\times\)IQR Highlighted nucleotides in the 5′SS sequences above each plot indicate predicted base pairs to the U1 snRNA The lower case letters in HTT* indicate a +7 G:A and +8G:U substitutions included to enable RNA synthesis numbers above the violins indicate the number of bound lifetimes included in each distribution these data show that branaplam binds a -1A bulged U1 snRNA/5′SS duplex only in the presence of the U1 snRNP and U1-C and support the U1-C-first model the endogenous sequence proved to be synthetically intractable due to a stretch of guanine bases we substituted guanines at the +7 and +8 positions with UA (HTT*) all four 5′SS display weaker binding to U1 and no effect upon addition of branaplam in the absence of U1-C we were not even able to observe enough binding events above background to determine a bound lifetime in the absence of U1-C our combined single molecule and ensemble data show that U1-C must be present for branaplam to bind to and modulate U1 snRNP/-1A 5′SS complexes and that the extent of binding enhancement is sequence-dependent A Dose response curves of 9bp-1A RNA binding to U1 snRNP vs The red line and shading indicate the fit and 95% CI to the fitted equation to estimate an EC50 value (1.3 ± 0.20 nM) B Cumulative probability distribution of unbound dwell times at 1 nM 9bp-1A RNA across U1-C concentrations (0 nM C Unbound time constants (\({\tau }_{U}\)) determined from MLE of a monoexponential distribution for unbound dwell times overlaid with a fit to the fitted equation (EC50 = 1.8 ± 0.5 nM) Plotted \({\tau }_{U}\) values (circles) are shown as mean ± SEM and are computed across all single molecules for each U1-C concentration (0 nM The fitted equation is shown as the fit (solid line) and 95% confidence interval of the fit (shaded region) D Cumulative probability distribution of bound dwell times at 1 nM 9bp-1A RNA across U1-C concentrations (0 nM E Cumulative probability distribution of bound dwell times at 1 nM 9bp-1A RNA and 100 nM U1-C (grey circles overlaid with MLE of mono (blue dashed) and biexponential distributions (solid red) F MLE of bound time constants (\({\tau }_{B}^{1}\) and \({\tau }_{B}^{2}\) left) and amplitude of \({\tau }_{B}^{2}\) (right) of a biexponential distribution for bound dwell times at 1 nM 9bp-1A RNA across U1-C concentrations (mean ± SEM) The amplitudes of \({\tau }_{B}^{2}\) are overlaid with the with a fit to the fitted equation (EC50 = 1.0 ± 0.2 nM) Plotted parameters and errors are computed across all single molecules for each U1-C concentration (0 nM G Contour plots showing the correlation between successive bound event durations (\(i\) and \(i+1\)) within individual molecules Source data are provided as Source data file we see the association rate at 1 nM 9bp-1A 5′SS RNA double between the absence and presence of saturating U1-C the 9bp-1A 5′SS RNA duplex exhibits \({\tau }_{B}^{1}\) ≈ 30 s and \({\tau }_{B}^{2}\) ≈ 330 s U1 snRNP can still form the longer-lived complexes with this RNA; however these are rare relative to the shorter-lived interactions the amplitude of \({\tau }_{B}^{1}\) is dominant at low concentrations of U1-C but \({\tau }_{B}^{2}\) dominates at high concentrations The change in the amplitude of \({\tau }_{B}^{2}\) across U1-C concentrations yielded an EC50 = 1.0 ± 0.2 nM To test whether these two time constants reflect dynamic association/dissociation of U1-C with a kinetically homogenous population of U1 snRNP molecules, we correlated the dwell time durations of successive binding events within individual immobilized U1 snRNPs (Fig. 4G) At the extremes of either no U1-C or saturating U1-C bound lifetimes of the 9bp-1A 5′SS RNA predominately align to a single cluster at the level of individual molecules corresponding to ether faster \({\tau }_{B}^{1}\) durations or slower \({\tau }_{B}^{2}\) we observe both short and long events corresponding to dynamic interconversion of the two time constants which likely stems from the association and dissociation of U1-C Higher concentrations of U1-C increase the probability of binding to U1 snRNP and the probability of observing a more stable duplex these data show that U1-C dynamically binds U1 snRNP and that its presence can help recruit and stabilize a -1A bulged 5′SS RNA Shaded region indicates predicted base pair interactions with the U1 snRNA (top) B Apparent association rates for 9 bp (left) and 9bp-1A RNAs (right) at various concentrations in the absence and presence of saturating U1-C (100 nM) Data are overlaid with linear fits (solid line) to determine kon (9 bp with 100 nM U1-C: kon = 3.9 ± 0.4 × 106 M−1 s−1 R2 = 0.97; 9 bp without U1-C: kon = 1.8 ± 0.5 × 106 M−1 s−1 R2 = 0.95; 9bp-1A with 100 nM U1-C: kon = 3.2 ± 0.6 × 106 M−1 s−1 R2 = 0.98; 9bp-1A without U1-C: kon = 3.2 ± 0.6 × 106 M−1 s−1 Shaded region indicates the 95% confidence interval of the linear regression C Violin plots showing the distributions of unbound (top) and bound (bottom) dwell times for various RNAs at 0 and 100 nM U1-C N indicates the number of single molecule events included in the violin plot D Scatter plot showing the average unbound (top) and bound (bottom) dwell times of RNAs at 0 (x-axis) or 100 nM (y-axis) U1-C we designed two +1/+2 GU 5′SS-containing RNAs with 6 base pairs of complementarity to the U1 snRNA either on the exonic (-4 to +2 positions) or intronic ( +1 to +6 positions) side of the exon/intron boundary All experiments were conducted at a single concentration of the RNA (9bp-1C: 1 nM 6bp-exon: 3 nM; 6bp-intron: 3 nM; depending on their affinity) and either in the absence or presence of 100 nM U1-C For all of these RNAs, we see a decrease in the average unbound lifetimes, corresponding to a faster association rate, when U1-C is included relative to its absence (Fig. 5D U1-C increased the apparent association rate by 2.5-fold This suggests that U1-C does not have a single presence of a GU 5’SS) for facilitating RNA association to U1 snRNP and U1-C is ineffective at stabilizing the bound state This suggests that U1 snRNP may enforce the requirement for a +1G at the 5′SS through kinetic selection against mismatches at this position This selection results from both U1-C independent (poor binding in the absence of U1-C) and dependent (failure of U1-C to stabilize the bound state) components reinforcing that splicing outcomes in cells are often dependent on U1-binding kinetics A kinetic model for -1A bulged 5′SS association and dissociation in the presence and absence of U1-C and branaplam. Optimized rate transitions for 9bp-1A are provided in Table 1 Equilibrium arrows in grey indicate transitions that are not supported by our experimental data or mathematical modeling reversible drug binding can nonetheless contribute to formation of very long-lived U1 snRNP/5′SS interactions we used a reconstituted U1 snRNP to study the detailed kinetics of its interactions with 5′SS RNA oligos and how these interactions change upon the inclusion of a small molecule splicing modulator Both single molecule and bulk biophysical measurements show that U1 snRNP binds RNA in a sequence-specific manner branaplam extends U1 snRNP/oligo lifetimes of -1A bulged 5′SS if U1-C is present and that splicing modulation can involve a complex The origin of this complexity is in part due to reversible binding of the U1-C component which dynamically interacts with the snRNP U1-C itself both promotes RNA binding by U1 snRNP and stabilizes the U1/RNA complex in a sequence-specific manner We were able to use our feature-rich and large single molecule data sets to determine a sequential binding mechanism for U1 snRNP and branaplam interactions with a -1A 5′SS-containing oligo U1-C associates with the snRNP prior to 5′SS binding and decreases the KD for the 5′SS sixteen-fold Branaplam then reversibly associates with this complex with a moderate KD ( ~ 0.7 µM) kinetic selection of substrates chemically competent for splicing is a conserved feature of both human and yeast U1 snRNPs even though neither U1 snRNP is present during the transesterification steps Another prediction of our kinetic model is that U1-C can only associate with U1 snRNPs in the absence of 5′SS pairing U1-C must be pre-associated with U1 snRNP prior to its engagement with RNA in order to modulate 5′SS recognition it should not be assumed that a U1-C-containing U1 snRNP that is recruited to the transcriptional machinery for co-transcriptional spliceosome assembly still contains U1-C at the moment a 5′SS is transcribed our work suggests that understanding the correlations between predicted base-pairing strength location of the base pairs within the U1 snRNA/5′SS duplex and dependency on U1-C for exon inclusion in vivo are all critical for predictive modeling of U1 snRNP occupancy in cells While it may seem counterintuitive that a reversibly-binding splicing modulator leads to formation of long-lived interaction between U1 snRNP and -1A bulged 5′SS our kinetic mechanism provides a rationale for this observation The -1A 5′SS RNA can only dissociate from U1 snRNP when branaplam is not bound and rapid re-binding of branaplam limits the lifetime of this state Recently, a thermodynamic model for splicing modulator drug action has been proposed based on RNA-Seq data and measurements of mRNA production in cells14 the authors proposed two branaplam-binding modes for U1 snRNP: a risdiplam-like binding mode that occurs on -1A bulged 5′SS that also contain a -2G and a second state that leads to hyperactivation of some 5’SS that additionally contain a -3A It is unlikely that these two states are due to presence/absence of U1-C since our data shows that branaplam can only bind U1 snRNP when U1-C is present While we did not study the sequence requirements for hyperactivation explicitly we do note that the 5′SS RNA oligos that showed the largest changes in U1 snRNP bound state lifetimes were also those with the hyperactivation AGA motif (9 bp -1A These results suggest that the hyperactivation phenotype has a kinetic basis and might be due to larger changes in the lifetime of the U1 snRNP/5′SS interaction while the thermodynamic model included two different branaplam-binding modes the authors were not able to determine if the risdiplam-like binding mode is a necessary precursor for formation of the hyperactivated state Our single molecule data supports only a single branaplam-bound state for the U1 snRNP/U1-C/5′SS complex The risdiplam-like and hyperactivated binding modes of branaplam likely occur independently of one another each involving particular molecular interactions with their corresponding 5′SS be limited in part by the inherent kinetic properties of the factors and processes involved Key chemicals and materials are described in Supplementary Table 6 RNA oligonucleotides (Supplementary Table 1) for SPR and single-molecule experiments were purchased from Integrated DNA Technologies (IDT Stocks of fluorescent RNAs intended for single-molecule experiments were prepared by resuspending the lyophilized oligonucleotides in nuclease-free water (20-50 µM RNA concentrations were calculated from their absorbance values 260 nm using a NanoDrop and the extinction coefficients from IDT via the Beer-Lambert law All plasmids were purchased from GenScript (Piscataway USA) based on the pET-28a(+) backbone and codon optimized then transformed into Escherichia coli BL21 Star (DE3) cells (Cat# C601003 For the U1-70K_SmD1/D2 polycistronic construct an N-terminal thioredoxin-6xHis-tag followed by a tobacco etch virus (TEV) protease cleavage site was appended to the U1-70K fragment comprised of residues 2-59 followed by a Gly-Ser triplet linker then residues 7-91 of SmD1 A second open reading frame containing SmD2 was comprised of residues 1−118 an N-terminal 6xHis-tag followed by a TEV cleavage site was appended to residues 1−126 of SmD3 which was followed by a second open reading frame for residues 1-95 of SmB an N-terminal His-SUMO-Avi tag was introduced prior to residues 1-75 of SmF followed by additional open reading frames for SmE (residues 1-92) and SmG (1-76) a construct with only a His-SUMO tag was used a C-terminal 6xHis-tag was added after residues 1-61 of U1-C Cells were cultured at 37 °C in 1 L of 2xYT media supplemented with kanamycin (50 µg/mL) then induced with 0.5 mM IPTG at 16 °C overnight Cell pellets were resuspended in lysis buffer (20 mM HEPES pH 7.5) plus cOmplete ULTRA EDTA-free protease inhibitor cocktail (Roche Clarified lysates were diluted in IMAC Buffer A (20 mM HEPES pH 7.5) then loaded onto a HisTrap HP 5 mL Ni-NTA column (Cytiva USA) and eluted with a gradient of IMAC Buffer B (20 mM HEPES Eluted fractions were pooled in dialysis tubing (Cat# 68035 ThermoFisher Scientific) with TEV protease and dialyzed overnight against 20 mM HEPES the solution was adjusted to 1 M KCl and loaded onto a HisTrap column equilibrated in IMAC Buffer A The flow-through was collected and injected onto a Superdex HiLoad 75 26/60 column (Cytiva) equilibrated in 20 mM HEPES pH 7.5 and the fractions were collected then concentrated by centrifugation (Cat# UFC9003 cells were cultured like the other constructs with the addition of 1% (w/v) glucose to the culture media Protein was similarly purified via the 6xHis-tag and finally purified by IMAC and SEC as described the SmF/E/G trimer was biotinylated on the SmF AviTag using the BirA biotin-protein ligase reaction kit (Avidity LLC USA) and biotinylation was confirmed by MALDI-TOF MS The U1 snRNA used for reconstitution of the miniU1 particle was purchased from AxoLabs (LGC Group Germany) and dissolved to a concentration of 500 µM in RNAse-free ddH2O comprising the sequence: 5′-AmUmACψψACCU GGCAGUGACC ACCACACACU GCAUAAUUUG UGGUAGUGGG CGAAAGCCCG-3′ where Am and Um represent 2′-O-methyl nucleotides a U1 snRNA of the same sequence was produced with an aminohexyl linker on the 3′ end that was subsequently labeled with Cy5 NHS ester as the fluorophore U1 snRNA was prepared by refolding at 80 °C for 3 min and then cooling on ice for 10 min In a pre-warmed solution of Reconstitution Buffer (20 mM HEPES pH 7.5) containing 40 U/mL RNAsin (Cat#N2111 each Sm protein sub-complex was combined to a final concentration of 8 µM and incubated for 5 min at 37 °C U1 snRNA was added to a final concentration of 4 µM and incubated for 45 min at 37 °C U1-C_61 can be added to a final concentration of 8 µM then the complex is cooled overnight at 4 °C The crude complex was then loaded onto a MonoQ 10/100 GL column (Cytiva) in Reconstitution Buffer and eluted with a gradient of Reconstitution Buffer containing 1 M KCl Eluted fractions were pooled and loaded onto a Superdex column (Cytiva) equilibrated in Reconstitution Buffer fractions corresponding to miniU1 were concentrated using a 30 kDa MWCO centrifugal filter (Cat#UFC9030 a Biacore 8 K (Cytiva) was used with a streptavidin-coated Series S Sensor Chip SA (Catalog #BR100531) The instrument was equilibrated in 20 mM HEPES RNA was synthesized with a 3’-biotin (-1A bulge: CAGAGUAAGUAU; SMN2: AGGAGUAAGUCU; Match: CAGGUAAGUAU; Reverse: UAUGAAUGGAC; Dharmacon) and injected at 1 nM with 30 s contact time at 10 µL/min to afford 3-5RU of capture U1 snRNP binding studies were performed by injecting a titration of complex that was serially diluted from 100 nM with a contact time of 180 s and a dissociation time of 600 s at a flowrate of 30 µL/min in duplicate the chip surface was regenerated by an injection of 3 M MgCl2 All data was analyzed after reference subtraction and a 1.5-2.5% (v/v) DMSO solvent correction applied Data was analyzed using a two-state binding model in the Biacore Insight Software To assess the effect of ligand on U1 snRNP binding kinetics a co-inject format was used to allow for compound to be present during the association and dissociation phases of the experiment A protein solution of U1 snRNP was prepared at 10 nM with varying concentrations of Branaplam serially diluted from 5 µM and injected across the immobilized RNA surface as previously described a serial dilution was performed in DMSO at 25x final concentration followed by dilution with assay buffer (20 mM HEPES a pre-formed complex of U1 snRNP ΔU1-C was prepared at 200 nM in the presence of 20 nM of a 5’SS oligonucleotide labeled with Cy5 (5’-Cy5-CAGAGUAAGUAU; Metabion) with or without the addition of 500 nM U1-C supplementation The branaplam titration was then mixed 1:1 with the pre-formed U1 snRNP complex and incubated at room temperature for 15 min before loading Monolith LabelFree Premium Capillaries (MO-Z025; NanoTemper Tech) Capillaries were analyzed by a Monolith X red-continuous instrument (NanoTemper Tech) at 25 °C with 100% LED and laser power U1-C protein was serially diluted at 2x final concentration in assay buffer a pre-formed complex of U1 snRNP ΔU1-C was prepared at 200 nM in the presence of 20 nM of a 5′SS oligonucleotide labeled with Cy5 with the addition of branaplam at 5 µM or DMSO to yield a final DMSO concentration of 2% (v/v) The U1-C titration was mixed 1:1 with the pre-formed U1 snRNP complex and assessed on a Monolith X as previously described Curves were fit using nonlinear regression using the custom code in MATLAB with the function nlinfit 95% confidence intervals were computed from the estimated Jacobian returned by nonlinear least squares fitting via nlparci Single molecule imaging chambers were prepared using microscope slides (24 mm × 60 mm GoldSeal) and cover glasses (25 mm × 25 mm Corning) at least one day before each experiment Substrates were first cleaned by successive sonication in 2% v/v Micro-90 and 1 M KOH for 60 min each in slide-mailers (Fisher Scientific) Cleaned substrates were then dried with high purity nitrogen (Airgas) and aminosilanized with 1.5% (v/v) VECTABOND (Vector Laboratories) in acetone (Spectrophotometric Grade and passivated by incubation of a 1:100 w/w mixture of mPEG-biotin-SVA (Laysan Bio) and mPEG-SVA (Laysan Bio) in 100 mM NaHCO3 (pH 8) overnight the substrates were rinsed with MilliQ water and dried with nitrogen Imaging chambers were created by placing thin strips of double-sided tape and vacuum grease along the glass slide and adhering a cover glass on top This typically resulted in three 25 µL volume lanes per slide assembled chambers were rinsed with at least 200 µL of wash buffer (WB: 20 mM HEPES pH 7.5 All images were collected 2x2 pixel binning with active hot pixel correction streptavidin-labeled fluorescent beads (T10711 Invitrogen) were flowed into the lane at a low concentration (~5 × 104-fold dilution from stock in WB) to serve as fiducial markers for channel alignment and lateral drift correction The lane was then washed with 50 µL of 0.2 mg/mL streptavidin (SA10-10 followed by 50 µL of WB to remove unbound beads and streptavidin The U1 snRNP particle labeled with Cyb5 was then diluted to 10-20 pM in WB and incubated in a lane for one minute The surface density of Cy5-labeled U1 snRNP-∆U1-C was checked by flowing in imaging buffer (IB: 20 mM HEPES pH 7.5 Two imaging schemes were used for data collection: alternating laser excitation or sequential laser excitation a 50 µL solution containing variable concentrations of RNA and branaplam in IB was added and successive images were captured with a 1 s exposure under 532 nm then 633 nm excitation approximately 30 s were first recorded (633 nm 1 Hz) to identity areas of interest (AOI) followed by addition of 50 µL solution containing variable concentrations of RNA and Branaplam in IB was added and images were captured sequentially at 1 Hz (532 nm a total of 600 frames were collected across varying frame rates (0.11 to 1 Hz) to minimize photobleaching the lane was washed with WB to remove oxygen scavengers and images were collected under 633 nm excitation until all surface tethered molecules photobleached (typically 30-60 frames) this step allowed us to ensure we only analyzed AOIs featuring a single U1 snRNP molecule Detected AOIs were fit to a two-dimensional gaussian function within a 5x5 pixel space AOIs were filtered by removing those with intensity values of greater than three scaled median absolute deviations from the median (e.g. multiple overlapping molecules) and those with a Euclidean distance less than 5 pixels away from a neighboring AOI Accepted AOIs were then mapped to the 532 nm channel using the mathematical transformations described above The time dependent fluorescence of each AOI in each channel was computed by integrating over all frames in a 3x3 pixel space centered on each AOI’s sub-pixel location All the steps of this process were incorporated into a graphical user interface (smVideoProcessing) Only molecules exhibiting at least one binding event were included for further analysis All the steps of this process were incorporated into a graphical user interface (smTraceViewer) Dwell time distributions are visualized as either their cumulative probabilities, violin plots, or histograms. Cumulative probability plots were constructed by computing a cumulative distribution function (CDF) estimate (Eq X) where the value of each bin \(({v}_{i})\) is computed by Eq. 5 Overlays of MLE of mono- and biexponential distributions were computed by integrating PDF1 and PDF2 over a range [\({t}_{\min },\, {t}_{\max }\)] as provided by Eqs. 6 and 7 only the initial unbound event (aka time to first binding) was considered for MLE estimation to reduce the potential bias introduced by photobleaching of tight binders the likelihood of 2 clusters vs 1 cluster was determined by computing a Bayesian Information Criterion (BIC) score for each model by and SSE is the sum of squared errors within a cluster summed across all clusters by \({x}_{j}\) is the data point in cluster \({C}_{i}\) and \({\mu }_{i}\) is the centroid of cluster \({C}_{i}\) A lower BIC value was used as evidence for a better model Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article Single molecule data (raw images and analyzed.mat files) can be accessed at https://doi.org/10.5281/zenodo.13738649Source data are provided with this paper Scripts for single molecule analysis and figure generation can be found at https://github.com/David-Scott-White/White_2024 Pre-mRNA splicing-associated diseases and therapies Splicing modulators: on the way from nature to clinic Antisense Oligonucleotide Therapies for Neurodegenerative Diseases Progress in spinal muscular atrophy research Identification and characterization of a spinal muscular atrophy-determining gene An update of the mutation spectrum of the survival motor neuron gene (SMN1) in autosomal recessive spinal muscular atrophy (SMA) Disruption of an SF2/ASF-dependent exonic splicing enhancer in SMN2 causes spinal muscular atrophy in the absence of SMN1 A negative element in SMN2 exon 7 inhibits splicing in spinal muscular atrophy Antisense masking of an hnRNP A1/A2 intronic splicing silencer corrects SMN2 splicing in transgenic mice Structural basis of a small molecule targeting RNA for a specific splicing correction Binding to SMN2 pre-mRNA-protein complex elicits specificity for small molecule splicing modifiers SMN2 splice modulators enhance U1-pre-mRNA association and rescue SMA mice SMN2 splicing modifiers improve motor function and longevity in mice with spinal muscular atrophy Small molecule splicing modifiers with systemic HTT-lowering activity Principles and correction of 5’-splice site selection Control of alternative splicing by the differential binding of U1 small nuclear ribonucleoprotein particle An RNA switch at the 5’ splice site requires ATP and the DEAD box protein Prp28p Kondo, Y., Oubridge, C., van Roon, A. M. & Nagai, K. Crystal structure of human U1 snRNP, a small nuclear ribonucleoprotein particle, reveals the mechanism of 5’ splice site recognition. Elife 4, https://doi.org/10.7554/eLife.04986 (2015) CryoEM structure of Saccharomyces cerevisiae U1 snRNP offers insight into alternative splicing U1-specific protein C needed for efficient complex formation of U1 snRNP with a 5’ splice site Coupling mRNA processing with transcription in time and space In vitro reconstitution of mammalian U1 snRNPs active in splicing: the U1-C protein enhances the formation of early (E) spliceosomal complexes Determination of parameter identifiability in nonlinear biophysical models: A Bayesian approach RNAstructure: software for RNA secondary structure prediction and analysis Jarmoskaite, I., AlSadhan, I., Vaidyanathan, P. P. & Herschlag, D. How to measure and evaluate binding affinities. Elife 9, https://doi.org/10.7554/eLife.57264 (2020) Analysis of spliceosome dynamics by maximum likelihood fitting of dwell time distributions Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure Goldschen-Ohm, M. P. et al. Structure and dynamics underlying elementary ligand binding events in human pacemaking channels. Elife 5, https://doi.org/10.7554/eLife.20797 (2016) Regulated control of gene therapies by drug-induced splicing a selective survival of motor neuron-2 (SMN2) Gene splicing modifier for the treatment of spinal muscular atrophy (SMA) Normal and mutant human beta-globin pre-mRNAs are faithfully and efficiently spliced in vitro Mutations in conserved intron sequences affect multiple steps in the yeast splicing pathway Hansen, S. R. et al. Multi-step recognition of potential 5’ splice sites by the Saccharomyces cerevisiae U1 snRNP. Elife 11, https://doi.org/10.7554/eLife.70534 (2022) Quantitative Activity Profile and Context Dependence of All Human 5’ Splice Sites Larson, J. D., and Hoskins, A. A. Dynamics and consequences of spliceosome E complex formation. Elife 6, https://doi.org/10.7554/eLife.27592 (2017) A handful of intron-containing genes produces the lion’s share of yeast mRNA Small nuclear RNAs from Saccharomyces cerevisiae: unexpected diversity in abundance Structure of a transcribing RNA polymerase II-U1 snRNP complex Cotranscriptional spliceosome assembly and splicing are independent of the Prp40p WW domain U1 snRNP increases RNA Pol II elongation rate to enable synthesis of long genes 4sUDRB-seq: measuring genomewide transcriptional elongation rates and initiation frequencies within cells Competition between pre-mRNAs for the splicing machinery drives global regulation of splicing A novel intra-U1 snRNP cross-regulation mechanism: alternative splicing switch links U1C and U1-70K expression Identification of alternative splicing regulators by RNA interference in Drosophila U1 snRNP determines mRNA length and regulates isoform expression Design and construction of a multiwavelength micromirror total internal reflectance fluorescence microscope Edelstein, A. D., et al. Advanced methods of microscope control using muManager software. J Biol Methods 1, https://doi.org/10.14440/jbm.2014.36 (2014) Dynamic multiple-target tracing to probe spatiotemporal cartography of cell membranes White, D. S., Goldschen-Ohm, M. P., Goldsmith, R. H., and Chanda, B. Top-down machine learning approach for high-throughput single-molecule analysis. Elife 9, https://doi.org/10.7554/eLife.53357 (2020) Learning rates and states from biophysical time series: a Bayesian approach to model selection and single-molecule FRET data Solving ion channel kinetics with the qub software Extracting dwell time sequences from processive molecular motor data Download references This work was supported by funding from Remix Therapeutics grants from the National Institutes of Health (R35 GM136261 to A.A.H.) with additional support from a Research Forward grant award from the Wisconsin Alumni Research Foundation and a NIH postdoctoral fellowship award (F32 GM143780 to D.S.W.) and Amira Yazidi at NMX Research Solutions for assistance with protein production Maria McGresham at August Bioservices for assistance with SPR experiments and Maximilian Plach at 2bind GmbH for assistance with MST experiments and members of the Hoskins and Herschlag labs for helpful discussions and A.A.H wrote the manuscript with input from B.M.D All authors contributed to reviewing and revising the manuscript is a member of the Scientific Advisory Board (SAB) for Remix Therapeutics and is carrying out sponsored research in collaboration with Remix are paid employees and interest holders of Remix Therapeutics completed this work while employed as a postdoctoral scientist in the Hoskins Laboratory at UW-Madison and is a current employee of Element Biosciences Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work Download citation DOI: https://doi.org/10.1038/s41467-024-53124-5 Metrics details Genetic and experimental findings point to a crucial role of RNA dysfunction in the pathogenesis of Amyotrophic Lateral Sclerosis (ALS) Evidence suggests that mutations in RNA binding proteins (RBPs) such as FUS affect the regulation of alternative splicing We have previously shown that the overexpression of wild-type FUS in mice a condition that induces ALS-like phenotypes a protein with key roles in RNA metabolism suggesting that a pathological connection between FUS and hnRNP A2/B1 might promote FUS-associated toxicity Here we report that the expression and distribution of different hnRNP A2/B1 splice variants are modified in the affected tissues of mice overexpressing wild-type FUS degenerating motor neurons are characterized by the cytoplasmic accumulation of splice variants of hnRNP A2/B1 lacking exon 9 (hnRNP A2b/B1b) In vitro studies show that exon 9 skipping affects the nucleocytoplasmic distribution of hnRNP A2/B1 promoting its localization into stress granules (SGs) and demonstrate that cytoplasmic localization is the primary driver of hnRNP A2b recruitment into SGs and cell toxicity boosting exon 9 skipping using splicing switching oligonucleotides exacerbates disease phenotypes in wild-type FUS mice these findings reveal that alterations of the nucleocytoplasmic distribution of hnRNP A2/B1 likely contribute to motor neuron degeneration in ALS implying that pathological FUS may affect the overall expression that might in turn promote motor neuron degeneration the actual contribution of these splicing changes in ALS pathogenesis is essentially unknown in this study we aimed at verifying whether the changes in hnRNP A2/B1 splicing induced by pathological FUS affect hnRNP A2/B1-related activities that might in turn contribute to motor neuron degeneration induced by FUS here we uncover for the first time the existence of a functional connection between hnRNP A2/B1 splicing isoforms and ALS pathology disease progression is marked by the accumulation of hnRNP A2/B1 splicing isoforms lacking exon 9 (hnRNP A2b/B1b) which preferentially localize in the cytoplasm of degenerating motor neurons In vitro experiments demonstrate that the hnRNP A2b variant exhibits an increased propensity to relocalize into the cytoplasm and that its de-localization is sufficient to drive SGs formation and cell toxicity disease phenotypes in hFUS mice worsen upon treatment with splicing switching oligonucleotides that enhance exon 9 skipping these findings support the existence of a pathological cascade orchestrated by FUS and hnRNP A2/B1 and strengthen the idea that the functional network connecting RBPs is widely affected by ALS conditions A Schematic representation of alternative splicing of exon 2 (green) and exon 9 (orange) of hnRNP A2/B1 The filled rectangles represent included exons while the empty rectangles represent skipped exons Arrows represent the specific primers used for cDNA amplification B The alternative splicing pattern of hnRNP A2/B1 exon 2 and exon 9 was monitored by semiquantitative RT-PCR analysis in spinal cords from hFUS transgenic mice along with age-matched non-transgenic (Ctrl) animals Bands were quantified through densitometric analysis and a splicing index was calculated as the ratio between the upper and lower band and plotted considering the corresponding ratio in a Ctrl mouse equal to 1 Data are expressed as means ± SD (n = 4–5 mice/group) Statistical significance was calculated by Student’s t-test D Lumbar spinal cord lysates from control (Ctrl) and hFUS mice at the symptomatic (C) and end-stage (D) phases of the disease were subjected to western blot analysis using anti-exon 2 Data are expressed as means ± SD (n = 4/5 mice/group) considering the relative expression of a Ctrl mice equal to 1 Statistical significance was calculated by Student’s t-test referred to Ctrl and anti-exon 9-immunoreactive isoforms are strongly downregulated while anti-exon 8/10 signal is significantly upregulated in hFUS mice compared to control animals these effects appear enhanced at this stage of the disease suggesting that the observed alterations in the expression of hnRNP A2/B1 match disease progression these results show that ALS disease in hFUS mice is characterized by a shift in the expression of hnRNP A2/B1 towards isoforms lacking exon 9 (either A2b A–C Immunofluorescence staining on spinal cord sections of non-transgenic (Ctrl) and end stage hFUS mice with antibodies against Exon 9 (green) (A) Exon 2 (green) (B) or Exon 8/10 (green) (C) and SMI32 (red) Nuclei were detected by DAPI staining (blue) The dotted white lines mark the separation between white and grey matter of the spinal cord Magnifications of the highlighted areas are also shown Exon 2 (E) and Exon 8/10 (F) staining in hemisections of Ctrl and end stage hFUS mice ****p < 0.0001 (n = 4 animals for group G Quantification of nuclear/cytoplasmic distribution of Exon 8/10 signal in motor neurons (MNs) of Ctrl and end stage hFUS mice The bar plot shows the percentage distribution of Exon 8/10 staining in the nuclear (nuc) and cytosolic (cyt) compartments for the Ctrl and hFUS mice Statistical significance was assessed using Two-way ANOVA followed by Šidák’s multiple comparisons test Asterisks indicate the level of statistical significance: ****p < 0.0001 (n = 4 animals for group HeLa cells were transfected with the HA-tagged hnRNP A2 B1 and B1b (B) isoform constructs and analysed 24 h after transfection by immunofluorescence using an anti-HA antibody (red) and anti-TIA1 antibody (green) Magnifications of the highlighted areas are also shown (zoom) For fluorescent distribution across the cell A straight line was overlaid across the cell and then the fluorescent intensity was measured across the line using the built-in function Graphs (lower panels) represent fluorescent intensity across the line in images; the yellow shaded area denotes the nucleus SuperPlots on the right show the percentage of cells with cytosolic HA signal or cells where the HA signal colocalizes with TIA1-positive SGs calculated for both the A2/A2b (A) and B1/B1b isoforms (B) The distribution of measures from n = 3 independent experiments is reported with each biological replicate color-coded: the mean value from each of the three replicates is represented by black dots and the mean ± SD of the three replicates is shown as a black line as well as the nuclear (nuc) and cytosolic (cyt) fractions from HeLa cells transfected with HA-tagged hnRNP A2/B1 isoform constructs were analyzed by western blot hnRNP A2/A2b/B1/B1b expression levels were monitored using an anti-HA antibody lamin B1 and β-actin levels were included to assess the purity of the cytosolic A HeLa cells were transfected with the HA-tagged hnRNP A2 and hnRNP A2b isoform constructs for 24 h untreated or treated with 0.5 mM sodium arsenite (NaAs) for 20 40 and 60 minutes and analysed by immunofluorescence using an anti-HA antibody (red) and anti-TIA1 antibody (green) B SuperPlots showing the percentage of cells with the HA signal colocalizing with TIA1-positive SGs Statistical significance was calculated by One-way ANOVA test and the significant differences between A2 and A2b isoforms at the same time point are shown as well as the nuclear (nuc) and cytosolic (cyt) fractions from HeLa cells transfected with HA-tagged hnRNP A2 and A2b isoforms constructs hnRNP A2 and A2b expression levels were monitored using an anti-HA antibody Lamin B1 and β-actin levels were analyzed to assess for the purity of the cytosolic and the presence of the D290V mutation does not further enhance this effect Representative western blot (A) and quantification (B) of total (INPUT) protein extracts as well as the insoluble (INS) and soluble (SOL) fractions from HeLa cells transfected with HA-tagged hnRNP A2 A2-D290V and A2b-D290V isoforms constructs Isoform expression levels were monitored using an anti-HA antibody β-actin levels were used as a loading control Data are reported as mean value ± SD (n = 3 independent experiments) Statistical significance was calculated by Two-way ANOVA comparing soluble and insoluble fractions between groups C Numerical output scores resulting from GraPES analysis of hnRNP A2 compared to hnRNP A2b with higher values indicating increased likelihood of the protein being localized in a biological condensate (a value greater than 0.90 suggests a high propensity); Disorder representing the percentage of protein residues predicted to be disordered by DISOPRED3; Net charge that is the overall sum of the positively and negatively charged residues at neutral pH; PScore reflecting the quantity of π-π interactions that are linked to the propensity of the protein to phase separate in vitro; Soluprot a protein solubility score where higher values correspond to higher solubility; GRAVY Score a measure of protein hydrophobicity; and RBP Pred a likelihood prediction for a protein to exhibit RNA-binding capabilities D Graphical plots generated by GraPES analysis showing MaGSeq Z-score (upper panel) and Disorder score (lower panel) of hnRNP A2 and hnRNP A2b along the precomputed score distributions of human proteome (shown in gray) The y-axis represents the percentage (density) of proteins associated with a given score (the x-axis) Scores relative to known markers for SGs and processing bodies (p-bodies) are also shown as references A Schematic representation of hnRNP A2/B1 variants used which encodes a portion of the M9-NLS (A2_ΔNLS and A2b_ΔNLS) hnRNP A2 and A2b isoforms fused at the C-terminal with an extra NLS were also produced (A2 + NLS and A2b+NLS) All six variants contain an HA epitope at the N-terminal B Hela cells were transfected with the indicated plasmids and analyzed by immunofluorescence using anti-HA (red) and anti-TIA1 (green) antibodies C SuperPlots showing the percentage of cells with cytosolic HA signal (upper panel) and cells where the HA signal colocalizes with TIA1-positive SGs (lower panel) Statistical significance was calculated by one-way ANOVA test and the significant differences between the A2 variants or between the A2b variants are shown D Representative western blot and relative quantification of SH-SY5Y cells untreated (NT) or transfected with an empty plasmid (mock) or with plasmids coding for HA-tagged hnRNP A2 (A2) and HA-tagged hnRNP A2b (A2b) as well as the nuclear and cytosolic fractions were analysed by western blot with an anti-cleaved PARP (cPARP) and anti-cleaved Caspase 3 (cCasp3) antibodies A2 and A2b expression levels were monitored using an anti-HA antibody Lamin B1 and β-actin levels were monitored to assess the fractions’ purity and normalize the fractions E Representative western blot and relative quantification of SH-SY5Y cells untreated (NT) or transfected with an empty plasmid (mock) or with plasmids coding for HA-tagged hnRNP A2 (A2) hnRNP A2b+NLS (A2b+NLS) and hnRNP A2b_ΔNLS (A2b_ΔNLS) cytosolic protein extracts were analyzed in western blot with an anti-cCasp3 antibody Expression of hnRNP A2/B1 isoforms has been evaluated in total lysates (input) with an anti-HA antibody β actin antibody was used as a loading control B Overlapping genes whose alternative splicing is misregulated upon A2/B1 and FUS downregulation (ASO)/ knock-out (KO) have been analysed by the Enrichr analysis tool The top 10 terms enriched in Gene Ontology biological process are listed according to their decreasing −log10 (p value) (enrichment) The colour code indicates the adjusted p value and the bubble size reflects the number of genes enriching that annotation (count) C The alternative splicing pattern of 11 selected common hnRNP A2/B1 and FUS target genes was assessed through semiquantitative RT-PCR analysis in the spinal cords of end-stage hFUS mice along with age-matched non-transgenic control (Ctrl) animals Representations of constitutive exons (dark grey rectangles) and alternatively spliced exons (light grey and white rectangles) analysed are shown Bands were quantified by densitometric analysis and a splicing index was calculated as follows For genes that are expressed in more than one isoform the ratio between the upper and the lower band was calculated and plotted considering the corresponding ratio in Ctrl mice equal to 1 For genes that are expressed as unique isoform the splicing index was calculated as the ratio between band intensity and the relative intensity of the housekeeping Gapdh gene Data are expressed as means ± SD (n ≥ 3mice/group) Statistical significance was calculated by student’s t-test A Nissl-stained spinal cord sections of male non-transgenic (Ctrl) and symptomatic hFUS mice treated with PBS (Veh) or 40 µg SSO A were analyzed 35 days after ICV injection Quantification of motor neuron (MNs) numbers/ventral horn is provided Statistical significance was calculated using ANOVA E Spinal cord sections from non-transgenic (Ctrl) and symptomatic hFUS mice treated with PBS (Veh) or 40 µg SSO A were analyzed 35 days after ICV injection and subjected to immunofluorescence staining with an antibody against NeuN (green) (B) D Iba1 positive cells in vehicle- and SSO A treated hFUS mice were analyzed by ImageJ software for different size descriptors (Area and Perimeter) Statistical significance was calculated using Student’s t-test F The alternative splicing pattern of Atxn2 Fchsd2 and Sorbs1 target genes was assessed through semiquantitative RT-PCR analysis in the spinal cords of non-transgenic (Ctrl) and symptomatic hFUS mice treated with PBS (Veh) or 40 µg SSO A and a splicing index was calculated as the ratio between band intensity Data are expressed as means ± SD (at least n = 3 mice/group) Statistical significance was calculated using unpaired t-test the functional network connecting RBPs is widely affected by ALS conditions and an extensive process of mislocalization and/or aggregation of RBPs can occur along with the progression of the disease and might have significant implications for the overall pathological process a significant number of genes that are regulated by hnRNP A2/B1 display alterations in their splicing patterns in the spinal cord of symptomatic and end-stage hFUS mice suggesting that a loss of splicing regulation by hnRNP A2/B1 has a role in this process Whether these changes are caused by a decreased nuclear pool of hnRNP A2/B1 or by the concurrent cytoplasmic mis-localization of exon 9-lacking isoforms is still to be fully defined SSOs that increase exon 9 skipping in hFUS mice enhance the observed splicing alterations suggesting that these changes are tightly linked to A2b/B1b accumulation and to the overall disease progression and showed that modifying hnRNP A2/B1 expression affect the dynamics of pathological SGs induced by mutant FUS thus supporting the hypothesis that A2b/B1b isoforms might impact on this function the accumulation of hnRNP A2/B1 into the cytoplasm of degenerating motor neurons in hFUS mice is a circumstantial but compelling suggestion that an altered SG dynamics might be involved in disease progression in ALS mice Results from cultured neuronal cells support the conclusion that an increased cytoplasmic expression of hnRNP A2/B1 might be harmful to cells cytoplasmic A2b isoform promotes a significant increase in activated caspase-3 expression compared to controls and to cells expressing A2 relocates to the cytosol upon removal of the M9_NLS demonstrating that uncontrolled cytoplasmic delocalization of A2 is sufficient to promote cellular toxicity This further suggests that the exclusion of exon 9 and the subsequent cytoplasmic accumulation of the A2b isoform may play a role in the pathogenic mechanism of FUS-related ALS The experiments performed by in-vivo injection of SSOs strengthen this conclusion enhanced exon 9 skipping induced by SSO treatment increases motor neuron degeneration and neuroinflammation that characterize the disease course in hFUS mice demonstrating that the accumulation of A2b/B1b isoforms contributes to FUS-associated toxicity in mice control mice treated with SSO A do not display significant phenotypic changes within the observed period indicating the need for further investigation into the long-term consequences of exon 9 skipping considering that the alterations in alternative splicing that we detect in hnRNP A2/B1 coincide with the onset of symptoms these data suggest that changes in isoform expression alone are not sufficient to induce an ALS phenotype but rather play an active role in disease progression our findings support the notion that ALS conditions broadly impact the functional network of RNA-binding proteins and identify hnRNP A2/B1 mislocalization as a possible player in the pathological process characterizing FUS-ALS Plasmids containing degenerated protein-coding sequences of wild type and mutated D290V hnRNP A2 isoforms were purchased by Addgene hnRNP B1 sequence were amplified by PCR from hnRNP A2 plasmid were produced by PCR-driven overlap extension Δ323–341 deletion variants (hnRNP A2_ΔNLS and hnRNP A2b ΔNLS) were mutagenized by PCR from wild type hnRNP A2 and hnRNP A2b plasmids An extra canonical SV-40 nuclear localization signal has been introduced in wild type hnRNP A2 and hnRNP A2b plasmids by PCR to produce hnRNP A2 + NLS and hnRNP A2b+NLS variants All hnRNP A2/B1 variants were cloned into pcDNA3.1 plasmid vector (Invitrogen) and fused with an HA epitope at the N-terminal end Fully modified 2’-O-methyl splicing switching oligonucleotides (SSO) with a phosphorothioate backbone have been synthesized and HPLC purified by Eurofins Genomics for subsequent test in vitro with low endotoxin level (guaranteed < 0.5 EU/mg) SSO was synthesized by Microsynth AG for studies in vivo The sequences of SSOs used are: SSO A 5’-UUUAUUACCUCCUCCA-3’ and scramble SSO 5’-ACCUUCUUACUUCAUC-3’ and mouse NSC34 cells (originally obtained from Neil Cashman Canada) were cultured in Dulbecco’s modified Eagle’s medium (DMEM) with Glutamax (Corning) supplemented with 10% fetal bovine serum (FBS and 1% penicillin/streptomycin (Sigma-Aldrich) at 37 °C in a 5% CO2 atmosphere cells were treated with sodium arsenite (NaAs Sigma Aldrich) at a concentration of 0.5 mM cells at 80% confluence were transfected with appropriate plasmids or antisense oligonucleotides using Lipofectamine 2000 (Invitrogen) according to the manufacturer’s instruction Cells were collected after 24 and/or 48 hours for subsequent analysis digested with 0.25% trypsin (Gibco) and 0.2 mg/ml DNase (Sigma-Aldrich) in DMEM After dissociation with a fire-polished Pasteur pipette and passage through 70 μm filters primary cortical neurons were suspended and plated in 12-well plates previously coated with poly-L-lysine (PLL) (1 mg/mL) or on PLL-coated coverslips and maintained in Neurobasal® medium (Gibco Life Technologies) supplemented with B-27® (Life Technologies) at densities ranging from 4 × 104 cells/cm2 to 6 × 104 cells/cm2 Lumbar spinal cord from n = 4 animals per group were dissected and homogenized with a homogenizer (MICCRA D-1) in lysis buffer containing 20 mM Hepes pH 7.4 10 mM EDTA and a protease inhibitor cocktail (Sigma-Aldrich) the lysates were centrifugated for 20 minutes at 16,000 × g at 4 °C The supernatant was quantified using the Bradford assay and resuspended in Laemmli buffer (Biorad) SH-SY5Y and Hela cells were lysed in RIPA buffer (50 mM Tris-HCl pH 7.4 5 mM MgCl2) containing a protease inhibitor cocktail incubated for 30 minutes on ice and centrifugated for 10 minutes at 16,000 × g at 4 °C Supernatants were quantified using the Bradford protein assay (Bio-Rad) and resuspended in Laemmli buffer For the preparation of insoluble protein extracts the pellets were resuspended in Laemmli Buffer HeLa cells were centrifuged at 600 × g for 5 minutes at 4 °C and washed with cold PBS Cell pellet was resuspended by gentle pipetting with cold hypotonic lysis buffer (HLB Tris 10 mM pH 7.5; NaCl 10 mM; MgCl2 3 mM; NP-40 0.1%; glycerol 10%) 1 mM sodium fluoride and a cocktail of protease inhibitors (Sigma-Aldrich) Laemmli buffer was added to a portion of lysates to obtain the total fraction The remnant cell suspension was centrifuged at 1000 × g for 3 minutes at 4 °C Supernatant containing the cytoplasmic fraction was clarified at 5000 × g for 5 minutes at 4 °C quantified by Bradford assay and then resuspended in Laemmli Buffer Pellet containing the nuclear fraction was washed by carefully pipetting with cold PBS centrifuged at 300 × g for 2 minutes at 4 °C Protein lysates were separated by SDS-PAGE and transferred to a nitrocellulose membrane The membranes were blocked at room temperature for 1 hour in Tris-buffered saline solution with 0.1% Tween-20 (TBS-T) containing 5% non-fat dry milk and then incubated with primary antibodies diluted in TBS-T containing 2% non-fat dry milk at 4°C overnight or for 2 hours at room temperature HRP-conjugated secondary antibodies (Jackson ImmunoResearch) were applied at room temperature for 1 hour Chemiluminescent detection was performed using ECL solution (Roche) Following densitometry-based quantification and analysis using ImageJ software (National Institute of Health the relative density of each identified protein was calculated Spinal cords from n = 4 animals per group were fixed using a 4% paraformaldehyde solution (PFA) in 0.1 M PBS for 12 hours and tissues were cryoprotected in 30% sucrose in PBS solution at 4 °C spinal cords were cut into 30-μm-thick slices with a freezing cryostat (Leica Biosystems) After blocking for 1 hour in 10% normal donkey serum (NDS) in PBS containing 0.3% Triton X-100 spinal cord slices were incubated for 3 days at 4 °C with primary antibodies diluted in 2% NDS in PBS and then for 3 h at room temperature with appropriate fluorescent secondary antibody Nuclei were stained with 1 μg/ml DAPI (Sigma-Aldrich) for 10 min The slides were coverslipped with Fluromount Aqueous Mounting Medium (Sigma-Aldrich) HeLa and SH-SY5Y cells were fixed using 4% PFA in PBS for 10 minutes permeabilized with a 0.1% Triton X-100 solution in PBS for 5 minutes and blocked with 2% FBS diluted in PBS for 30 minutes at room temperature Cells were then incubated with primary antibodies diluted in 2% FBS in PBS for 1 hour at 37° and with appropriate fluorescent-conjugated secondary antibodies in PBS Nuclei were stained with 1 μg/ml DAPI (Sigma-Aldrich) for 5 min Immunofluorescence images were analysed using a LEICA TCS SP5 confocal microscope Images were captured under constant exposure time Digital image brightness and contrast were adjusted using the LAS AF software (Leica) Background subtraction was performed after defining a region of interest and the average pixel intensity was calculated All image quantifications were done using ImageJ software (NIH) To assess the nuclear and cytoplasmic distribution of the Exon 8/10 fluorescence signal in motor neurons images were analyzed using ImageJ software (NIH) Regions of interest (ROIs) were drawn around the nucleus and the entire cell of individual motor neurons The mean fluorescence intensity was measured for each ROI The cytoplasmic signal was calculated by subtracting the nuclear intensity from the total cellular intensity For the quantification of cells displaying cytoplasmic and SG localization of HA-hnRNP A2/B1 isoforms at least 50–100 HA-positive cells per condition from randomly selected fields in n = 3 independent experiments were visually scored using a Zeiss Axioplan fluorescence microscope The total number of motor neurons in the L3–L5 segments of the lumbar spinal cord was quantified by analyzing serial sections from each mouse To visualize the Nissl substance within neurons the sections were stained with 0.02% cresyl violet solution the sections underwent a graded dehydration process using ethanol (50% to 100%) Images of the sections were captured using a Zeiss Axioskop 2 microscope at 20x magnification Both the right and left ventral horns were examined to count neurons characterized by cell bodies exceeding 200 μm² and the average count from the sections was calculated for each mouse Immunofluorescences (IF) and immunoblots (WB) were performed with the following primary antibodies: rabbit anti-Exon 2 (1:1000-WB 1:500-IF) and rabbit anti-Exon 8/10 (1:500-WB Rothnagel from the University of Queensland (Australia); rabbit anti-Exon 8/10 (1:5000-WB custom antibody produced by Bio-Fab Research) Secondaries antibodies for WB were anti-rabbit (1:2500) and anti-mouse (1:5000) IgG peroxidase-conjugated from Bio-Rad Laboratories (Hercules Secondary fluorescent antibodies for IF were Alexa-Flour 488-Donkey anti-rabbit (1:200) from Jackson ImmunoResearch Laboratories (West Grove PCR products were run in 2% agarose gels and visualized by SYBR Safe DNA Gel Stain (Invitrogen) staining Images were acquired on ChemiDocTM Imaging System (Bio-Rad) bands were quantified using the ImageJ software (NIH) and the splicing indices were calculated as the ratio between the upper and the lower bands The expression levels of the isoform lacking exon 9 were calculated as percentages relative to the total expression of the isoform containing exon 9 and the one lacking exon 9 using the MaGSeq (MaGS Sequence-based tool) predictive model MaGSeq is a general linearized model (GLM) based only on protein sequence features and provides a Z-score representing the propensity of proteins to localize into biological condensates as well as the feature scores used to generate the predictions A MaGSeq value greater than or equal to 0.90 for human suggests that the protein is highly likely part of phase-separated organelles within the cell Sterile PBS or SSO A (20 μg or 40 μg) diluted in 0.01% Fast Green (Sigma-Aldrich) was injected intracerebroventricularly (ICV) in newborn pups (P0-P1) using a glass syringe (Hamilton the needle was inserted at the midpoint of a line defined between the right eye and the lambda intersection of the skull The needle was carefully advanced into the lateral ventricle to a depth of approximately 3 mm the pups were allowed to recover on a heating pad under a heat lamp before being returned to their mother statistical significance was assessed using a two-tailed Student’s t-test One-way analysis of variance (ANOVA) or Two-way ANOVA All statistical analyses were conducted using GraphPad Prism 9.0 software (GraphPad Software the number of animals in the experimental groups was determined through power analysis The parameters were calculated based on previous experiments using the same animal model groups were balanced for age or disease stage before randomization Investigators remained blinded to treatment allocation during outcome assessment to reduce bias Data available on request from the authors Dysregulation of RNA-binding proteins in amyotrophic lateral sclerosis Disruption of RNA metabolism in neurological diseases and emerging therapeutic interventions The role of TDP-43 mislocalization in amyotrophic lateral sclerosis The era of cryptic exons: implications for ALS-FTD Stress granule mediated protein aggregation and underlying gene defects in the FTD-ALS spectrum Stress granules as crucibles of ALS pathogenesis Toxic gain of function from mutant FUS protein is crucial to trigger cell autonomous motor neuron loss Importance of functional loss of FUS in FTLD/ALS Mechanisms of FUS mutations in familial amyotrophic lateral sclerosis Converging mechanisms in ALS and FTD: disrupted RNA and protein homeostasis RNA dysregulation in amyotrophic lateral sclerosis Fused in sarcoma neuropathology in neurodegenerative disease Mutations in the 3’ untranslated region of FUS causing FUS overexpression are associated with amyotrophic lateral sclerosis Overriding FUS autoregulation in mice triggers gain-of-toxic dysfunctions in RNA metabolism and autophagy-lysosome axis Overexpression of human wild-type FUS causes progressive motor neuron degeneration in an age- and dose-dependent fashion An ALS-associated mutation in the FUS 3′-UTR disrupts a microRNA–FUS regulatory circuitry Cause Familial Amyotrophic Lateral Sclerosis Type 6 Divergent roles of ALS-linked proteins FUS/TLS and TDP-43 intersect in processing long pre-mRNAs Functional interaction between FUS and SMN underlies SMA-like splicing changes in wild-type hFUS mice FUS ALS-causative mutations impair FUS autoregulation and splicing factor networks through intron retention Cytoplasmic aggregation of mutant FUS causes multistep RNA splicing perturbations in the course of motor neuron pathology RNA-binding proteins with prion-like domains in health and disease The roles of hnRNP A2 / B1 in RNA biology and disease Differential subcellular distributions and trafficking functions of hnRNP A2/B1 spliceoforms Heterozygous frameshift variants in HNRNPA2B1 cause early-onset oculopharyngeal muscular dystrophy Protein-RNA networks regulated by normal and ALS-associated mutant HNRNPA2B1 in the nervous system Mutations in prion-like domains in hnRNPA2B1 and hnRNPA1 cause multisystem proteinopathy and ALS hnRNPA2B1 represses the disassembly of arsenite-induced stress granules and is essential for male fertility It’s not just a phase: function and characteristics of RNA-binding proteins in phase separation GraPES: the granule protein enrichment server for prediction of biological condensate constituents Enrichr: a comprehensive gene set enrichment analysis web server 2016 update The role of hnRNPs in frontotemporal dementia and amyotrophic lateral sclerosis Linking hnRNP function to ALS and FTD pathology TDP-43 proteinopathies: a new wave of neurodegenerative diseases Mutant FUS proteins that cause amyotrophic lateral sclerosis incorporate into stress granules Cytoplasmic FUS triggers early behavioral alterations linked to cortical neuronal hyperactivity and inhibitory synaptic defects Genetic mutations in RNA-binding proteins and their roles in ALS Amyotrophic lateral sclerosis: translating genetic discoveries into therapies Neuroprotective effects of niclosamide on disease progression via inflammatory pathways modulation in SOD1-G93A and FUS-associated amyotrophic lateral sclerosis models RNA-binding proteins and the complex pathophysiology of ALS RNA-binding proteins in amyotrophic lateral sclerosis Stress granules in the spinal muscular atrophy and amyotrophic lateral sclerosis: The correlation and promising therapy ALS-FUS mutations cause abnormal PARylation and histone H1.2 interaction A liquid-to-solid phase transition of the ALS Protein FUS accelerated by disease mutation ALS/FTD mutation-induced phase transition of FUS liquid droplets and reversible hydrogels into irreversible hydrogels impairs RNP granule function Dysregulation of stress granule dynamics by DCTN1 deficiency exacerbates TDP-43 pathology in Drosophila models of ALS/FTD Phase separation of C9orf72 dipeptide repeats perturbs stress granule dynamics FUS pathology in ALS is linked to alterations in multiple ALS-associated proteins and rescued by drugs stimulating autophagy OpenCell: endogenous tagging for the cartography of human cellular organization A quick phenotypic neurological scoring system for evaluating disease progression in the SOD1-G93A mouse model of ALS Muramatsu R, Yamashita T. Primary culture of cortical neurons. Bio Protoc. 2013;3: https://doi.org/10.21769/BioProtoc.496 Culturing pyramidal neurons from the early postnatal mouse hippocampus and cortex Download references This work was supported by Fondazione Arisla ETS (Project Spliceals to M.C. and European Union—Next Generation EU and founded by the Ministry of University and Research (MUR) National Recovery and Resilience Plan (PNRR) project MNESYS (PE0000006)—A Multiscale Integrated Approach to the Study of the Nervous System in Health and Disease (DN are supported by European Union—Next Generation EU within the PNRR project “Rome Technopole—Innovation Ecosystem” receive fundings from the European Union—Next-GenerationEU—National Recovery and Resilience Plan (NRRP)—MISSION 4 COMPONENT 2 Dr Valeria Gerbino (Fondazione Santa Lucia Italy) is gratefully acknowledged for providing help with ICV in vivo injection of SSOs Joseph Rothnagel (School of Chemistry and Molecular Biosciences Australia) for providing hnRNP A2/B1 isoform specific antibodies PhD Program in Cellular and Molecular Biology Institute of Biology and Molecular Pathology and interpreted most of the molecular and cell biology experiments and interpreted most of the mouse biology experiments IDV helped with mouse experiments and ICV injection of SSO SB helped with cell biology experiments and with the design and in vitro testing of SSO MA helped with the molecular analysis of SSO effects EDA aided with the maintenance of cell cultures and sample generation MDS aided with sample preparation and analysis AS helped with confocal microscopy analysis supervised and interpreted the experiments All animal procedures were performed according to the European Guidelines for the use of animals in research (2010/63/EU) and the requirements of Italian laws (D.L The ethical procedure was approved by the Italian Ministry of Health (protocol number 383/2022 PR/G) Download citation DOI: https://doi.org/10.1038/s41419-025-07538-8 Metrics details We present SpliceTransformer (SpTransformer) a deep-learning framework that predicts tissue-specific RNA splicing alterations linked to human diseases based on genomic sequence SpTransformer outperforms all previous methods on splicing prediction Application to approximately 1.3 million genetic variants in the ClinVar database reveals that splicing alterations account for 60% of intronic and synonymous pathogenic mutations and occur at different frequencies across tissue types tissue-specific splicing alterations match their clinical manifestations independent of gene expression variation We validate the enrichment in three brain disease datasets involving over 164,000 individuals we identify single nucleotide variations that cause brain-specific splicing alterations and find disease-associated genes harboring these single nucleotide variations with distinct expression patterns involved in diverse biological processes SpTransformer analysis of whole exon sequencing data from blood samples of patients with diabetic nephropathy predicts kidney-specific RNA splicing alterations with 83% accuracy demonstrating the potential to infer disease-causing tissue-specific splicing events SpTransformer provides a powerful tool to guide biological and clinical interpretations of human diseases recognizing variations in alternative splicing becomes an essential task for clinical diagnosis For instance, aberrant splicing in CPEB4 has been reported to be highly associated with autism-like phenotype19 these alternative splicing events may not be detectable in clinically accessible tissues such as blood accurate prediction of splice-altering mutations in a tissue-specific manner holds significant clinical importance for genetic diagnosis most existing algorithms did not address the tissue-specificity of splicing into their model a The SpTransformer model takes an only sequence as input and predicts tissue-specific splicing in 15 human tissues The model can be used to evaluate genetic variants and predict tissue-specific splicing alterations b Performance of 6 algorithms in splice site prediction task Top-k accuracy is calculated by choosing a threshold to make predicted positive sites and actual splice sites have the same number then computing the fraction of correctly predicted splice sites PR-AUC is the area under the precision-recall curve c Tissue-usage prediction of SpTransformer in comparison with other models d The distribution of SpTransformer prediction score for tissue usages of splice sites in the test dataset Tissue usages were grouped into low (<0.5) and high (≥0.5) by their original usage ratio across all samples in the same tissue types was released under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license Tissue usage was not totally dominated by gene expression b Impact of in silico mutation around intron in the GLA gene SpTransformer considers sequence features both proximal and distal to the splice donor site Mutagenesis weight was calculated by the decrease in the predicted strength of the splice site when that nucleotide is mutated c Impact of in silico mutation around exons in the APBB2 gene Several known RBP motifs were found in regions of large weight d De novo motifs that influence the tissue-usage prediction of SpTransformer (left) and their presentations in different tissues (right) we illustrate the sequence feature identified by SpTransformer using the GLA gene as an example which encodes the enzyme alpha-galactosidase A SpTransformer detected the “GT” sequence around the exon–intron junction SpTransformer recognized regions with relatively high mutagenesis weight at 300 nt demonstrating its capability to detect sequence features in distal intronic regions There were also de novo motifs that were not similar to known RBP motifs by learning the tissue-specific splicing events SpTransformer was able to implicitly learn the joint contribution of expression and sequence context to the tissue-specific regulatory code a SpTransformer is applied to evaluate the splicing effect of a single nucleotide variant by calculating an ΔSplice score and matching graphical representations b Examples of two pathogenic mutations in the ClinVar database SpTransformer successfully predicted splicing changes even far from variants (right panel) Both cases were validated by RT-PCR in previous studies c The distribution of mutations classified by clinical significance within several intervals of ΔSplice scores the ratio of pathogenic mutations becomes larger d Distributions of ΔSplice scores of all SNVs grouped by both pathogenicity in ClinVar database and annotated variant type The number of SNVs and the proportion of SNVs above/below the cutoff were annotated The bar chart on the left aggregates the data by rows while the bar chart at the top tabulates the data by columns SNVs with alternative pathogenicity annotations (e.g. “conflicting interpretations”) were excluded from the analysis Identifying pathogenic variants and interpreting variants of uncertain significance (VUS) in noncoding regions and synonymous mutations has been a long-standing challenge in the field Our analysis unveils a significant contribution of splicing alterations in intronic and synonymous pathogenic mutations underscoring the value of applying SpTransformer in regions beyond splicing sites for diagnosis and interpretation of candidate pathogenic mutations or VUS a The strategy to derive tissue specificity variants from model prediction We created a reference set of common splicing sites to derive background distribution and calculate tissue-specific z-scores for new variants in order to make fair comparisons across tissues and gene enrichment is calculated based on tissue-specific splice-altering SNVs b Top five genes enriched for tissue-specific splice-altering SNVs for each of the 15 tissues as predicted by SpTransformer The size of the bubbles represents the number of SNVs in each gene and the color of the bubbles represents the significant level of enrichment one-sided hypergeometric test was used for statistics We manually examined genes associated with tissue-specific phenotypes from the HPO database and marked by a black rectangle box c Expression pattern of top 3 genes in enrichment result of each tissue d Proportion of pathogenic SNVs predicted as tissue-specific splice altering in different tissues Only genes that have a p-value < 0.05 in enrichment were included The box extends from the first quartile to the third quartile of the data The dashed line represents the median proportions of SNVs in each tissue e Number of tissue-specific splice-altering SNVs grouped by pathogenic classifications on TTN gene in different tissues f Genome coordinate and Tissue z-score of SNVs on a sub-region of TTN gene It is worth noting that both Blood and Skin which are considered the most clinically accessible displayed lower proportions of tissue-specific splice alterations compared to the median average across all tissues This observation suggests that Blood and Skin may not be suitable alternatives for estimating splicing events in other tissues The identification of numerous tissue-specific splicing alterations in the heart further supports the capabilities of the SpTransformer algorithm Together, these results suggest that SpTransformer has the capability to discern sequence features unique to tissue-specific isoforms of genes associated with disease clinical manifestations. Moreover, the SpTransformer annotation provided mechanistic insights for numerous VUS labeled in ClinVar, specifically regarding tissue-specific splicing alterations (Fig. 4e) suggesting SpTransformer as a powerful tool to be used for genetic diagnosis and VUS interpretation purposes a Statistical data for the three analyzed databases b Splicing effect prediction for different variant types in the three brain disorder datasets: ASC c Enrichment of tissue-specific splicing alterations in ASD A two-sided z-test for two groups was performed The dashed line represents threshold powers for p = 0.05 d Number of tissues showing expression for genes filtered by brain-specific splicing altering SNVs in the case group e Enriched GO term for genes in (d) that are expressed only in brain tissue (left) and those expressed in 11–15 tissues (right) f Network view of enriched biological processes of genes carrying brain-specific splice-altering SNVs from case group in three brain disorders g Detailed visualization of genes enriched in GO pathway GO:0007610 “Behavior” in three brain disorders This analysis underscores the contribution of brain-specific splicing alteration of cytoskeleton-related genes to multiple brain disorders we believe that in addition to considering only gene expression tissue-specific splicing is also crucial in clinical diagnosis Although not all those genes were investigated our analysis did find evidence of associations between ASD and genes out of overlap These findings underscore the importance of incorporating tissue-specific splicing patterns into the investigation of ASD genetics in order to better understand the missing inherence of this disorder the findings of SpTransformer underscore the importance of investigating brain-specific splicing dysregulation as a disorder-causing mechanism for brain disorders which holds great promise for advancing our understanding of these conditions and developing targeted therapies a Overview of DN patients involved and samples collected for SpTransformer prediction and RNA-seq-based validation b Flow chart showing the filtering steps of kidney-specific splicing variants for variants called directly from WES data d Examples of heterozygous variants predicted as kidney-specifically splice altering validated by matched renal tubule RNA-seq SpTransformer prediction on WES identified variants (upper) and sashimi plot of matched RNA-seq data (lower) for CLCNKA (c) and BTN3A2 (d) gene e Top ten GO terms enriched from genes harboring kidney-specific splicing SNVs f Top ten terms enriched in the DisGeNet database from genes harboring kidney-specific splicing SNVs Our findings in this pathway suggested that aberrant splicing may represent a potential mechanism underlying the abnormalities in AA metabolism in DN these results support the reliability of SpTransformer prediction and enable us to explore DN candidate pathogenic variants from the perspective of splicing alterations These findings are concordant with the known pathology of DN and highlight key genes harboring kidney-specific aberrant splicing that may contribute to renal dysfunction in DN the application of SpTransformer helps effectively prioritize disease-associated mutations and sheds light on unresolved disease mechanisms Predicting RNA splicing directly from sequence data has been a long-standing challenge in the field we have developed a novel computational framework utilizing an attention-based deep-learning neural network SpTransformer stands out as the pioneering method to employ a transformer model for predicting RNA splicing with tissue specificity This transformer architecture benefits SpTransformer from large-scale SpTransformer emphasizes the tissue specificity of these events an aspect often overlooked by most existing splicing prediction methods This unique feature enables a more comprehensive understanding of the splicing landscape across different tissue types SpTransformer has been successfully applied to mutation databases and disease-specific datasets identifying tissue-specific splicing alterations and their associated disease manifestations Splice-altering mutations make up an essential class of known disease-causing mutations and accurate prediction methods are crucial for interpreting VUS and pathogenic variants in clinical diagnostic tasks While ideal scenarios would include RNA-seq profiles of diseased tissue together with the genotyping data practically most disease-manifested tissues are not accessible or easily accessible we identified genes enriched in mutations that may alter splicing in various tissue types which provides stronger supporting evidence for clinical interpretation of VUS pathogenicity in the relevant tissues there is an increasing need for accurate pathogenicity prediction of mutations in the noncoding regions Through extensive analysis of the ClinVar database we identified a significant proportion of intronic pathogenic/like pathogenic mutations that may affect splicing may provide valuable information for pathogenicity prediction and interpretation of variants in the noncoding region The success of SpTransformer in achieving tissue specificity is attributed to the application of NLP models and large datasets Although previous studies have demonstrated the effectiveness of deep convolutional networks in this domain the convolution and transformer architectures we designed have several advantages the tissue-specific splicing is presumably achieved through CREs as represented by sequence motifs The attention mechanism in the transformer can help capture such CREs much more effectively transformers have demonstrated advantages in capturing distal information and can perform better at capturing CREs located far away we utilize RNA-seq data from four distinct mammalian species This approach enables our model to discern similarities and homologies of splicing sites across species we feed the GTEx data and evolutionary data into two convolution encoders before the transformer to extract different layers of splicing information which supports the model with more comprehensive but structured input has exhibited a clear advantage over other state-of-the-art methods in tissue-specific splicing prediction on the GTEx dataset SpTransformer is the first application of the transformer deep learning architecture that achieves remarkable performance in tissue-specific splicing prediction The limitations of the model come in several aspects there is room for increasing the training data and including more rare splicing events our splicing annotation is based on annotation from GTEx common splicing events which does not account for splicing variability at the individual level the model’s inclusion of tissues is still limited While the SpTransformer model handles approximately 15 different tissue types with high accuracy its performance decreases as the number of multitask events in the model increases a combined result of the splicing events of all single cells in the tissue experimental technologies that measure splicing at the single-cell level are currently limited we envision the collection of more individualized and cell-specific splicing events can potentially enhance the deep learning model and enable more precise splicing predictions in the future The input for the model is pre-mRNA sequences These encoded nucleotides are combined to form a 4 × N matrix where N stands for the length of the sequence [0,0,0,0] are used for padding sequences with insufficient length or to present “unknown” nucleotides in unclear regions The model utilizes this input matrix to capture sequence features we denote the length of the input sequence as N = Ncontext + Ntarget + Ncontext where Ntarget represents the length of the target region that we aim to predict and Ncontext represents the length of the flanking sequence to each side of the target region each nucleotide in the target region is assigned a splice label set [SN SD] and a numerical tissue usage label set [S1 SN represent the possibility that a position is an “acceptor” indicate the possibility that a position is used as a splice site in a certain tissue the model produces a matrix with the shape of (3 + t) × Ntarget as its output SpTransformer utilized the sequence context to predict splicing sites and their corresponding usage in 15 tissues for the central 1000 nt sequence providing 17,382 samples from 53 tissues and two cell lines To obtain meaningful splice sites and the corresponding tissue usage ratio we processed the exon-exon junction read counts file from the dataset for 15 representative tissue types sequences around splice junctions in the GRCh38 reference genome were extracted The base preceding and following each splice junction (i.e. the 5’ and 3’ ends of exons) were defined as splice sites Only samples from the 15 selected target tissues were considered A splice site position was labeled as “acceptor” or “donor” label if it was supported by any sample and had no conflict The splice site at the exon start site was labeled as “acceptor” and the splice site at the end site was labeled as “donor” All other positions were labeled as “neither” the tissue usage label was calculated for each splice site representing the proportion of samples belonging to the tissue that contained corresponding splice junctions The SpTransformer code frameworks also support other combinations of tissue types any splice site with a maximum usage label of less than 0.05 across all tissue classes was excluded and re-labeled as “neither” class The independent RNA-seq dataset underwent similar processing steps We utilized mammalian organ transcriptomes the genes that show orthology or paralogy to human genes in the test dataset were excluded A splice site was identified if it was in the gene body and supported by at least one split read in each of at least two different samples The dataset was partitioned following the same strategy as the GTEx dataset The part of the training data was considered an extension of the training dataset while the test data segment remained unused We excluded gene sequences that have paralogs from the testing dataset Despite splitting the two datasets independently we made sure that there was no overlap between the training and testing data after the steps to split data by chromosomes and paralogs the pre-mRNA sequences of each gene were extracted the extracted sequence began from the most upstream site observed across all transcripts and ended at the most downstream site observed across all transcripts each sequence was divided into blocks of length 1000 nt Blocks that did not contain any splice sites were discarded the flanking sequence with 4000 nt + 4000 nt and the corresponding 1000 nt label was packaged as a single training (testing) data entry The architecture of our model is shown in Supplementary Fig. 1 The input is an RNA sequence of length N = Ncontext + Ntarget + Ncontext where Ntarget denotes the length of the target sequence and Ncontext represents the sequence context of the target region (Ncontext = 4000 in our pipeline) The parameter L dictates the number of channels in each convolution layer with L1 = 192 and L2 = 64 used in this study The convolution layer in ResBlocks is characterized by parameters L Following the calculation of the encoder module and a truncation operation are applied to ensure that the input to the attention module does not exceed a length of 8192 The Sinkhorn Transformer module has 256 channels 8 attention heads (including two local attention heads) per layer The final output is a (N × 3) shaped matrix and a (N × 15) shaped matrix representing splice site prediction and tissue usage prediction 2) The dimension of encoder layers was gained by grid-search in {32 Multiple hyperparameters of the transformer module were also tried batch size = 12 and learning rate = 0.001 were used in order to keep consistent with SpliceAI 3) Other parameters was selected from: batch size = {6 Those options were established in reference to previous publications the combination with the best performance on the validation dataset was subsequently used Further details of input and output have been provided in the “Data representation” section Different measurement was applied to the output scores we applied a Softmax activation function to produce probability prediction of “Acceptor” we applied a sigmoid activation for each tissue type The whole network was then trained on the GTEx training dataset to get the final model This approach enabled SpTransformer to learn from multiple datasets with similar biological meanings but different data formats minimizing the need for extensive coding or conversion when differently sourced data was received Despite both datasets being under the sequence model the diverse splicing representations and distinct data content encourage the deep model to comprehend latent sequence features from various aspects akin to a visual model examining a human face in multiple ways The strategy improved the model’s performance compared to only using one dataset demonstrating the potential to integrate diverse bioinformatics data for a single task Special loss functions are used in the backpropagation of deep learning Each sequence in the training dataset is a contiguous nucleotide sequence of length n The i-th position has a splicing label Ai and a tissue-usage label Bti for T different tissues the model outputs si for splice site prediction and outputs uti for tissue-usage prediction we compute the categorical cross-entropy loss as it is a multi-class classification task which is a multi-label classification task (meaning one sample can belong to multiple classes) we calculate the Binary Cross-Entropy loss We apply mean reduction for the two loss functions The loss function above is sufficient for the encoders to learn basic sequence patterns as the number of supported tissues increases models struggle to learn the features of different tissues in a balanced manner during the training process occasionally demonstrating superior predictive performance for specific tissues only there is a relative scarcity of samples with strong tissue specificity compared to those with weak tissue specificity This imbalance leads to models tending to produce similar tissue usage scores for splice sites in order to persuade the transformer module to overcome these difficulties: We use this method to encourage the model to balance the performance on multiple tissues and pay attention to those tissues that harder to classify a weight wi was multiplied to encourage the model to pay more attention to splice sites with stronger tissue specificity wi related with the variance of tissue-usage labels of i-th position The model was trained for 12 epochs in each stage Adam optimizer was used to minimize the combined loss and was multiplied by 0.7 after every epoch We evaluated the performance of SpTransformer on two tasks using the compiled test dataset: 1) splice site prediction in long sequences: the model took each pre-mRNA sequence as input and identified every splice acceptor and donor within a target region of 1000 nt Given that most positions in the sequences are not splice sites we computed the top-k accuracy and the area under the precision-recall curve (AU-PRC) for splice site prediction The top-k accuracy was defined as follows: if a sequence has k positive positions that truly belong to the class a threshold is selected so that exactly k positions are predicted to be positive The fraction of these k predicted positions that truly belong to the class is reported as the top-k accuracy We calculated the top-k accuracy and AU-PRC value for the acceptor and donor classes separately and reported the average performance of the two classes 2) Tissue usage level prediction: the model was tasked with predicting the usage level in each of the 15 tissue classes for each position of the sequence Given the absence of a widely accepted “tissue usage” protocol we divided all splice sites in the test dataset based on their tissue usage label Since most tissue usage labels were close to (or equal to) 0 or 1 we defined a tissue usage label greater than 0.5 as “high usage” while the remaining sites were classified as “low usage.” The usage was set to 0 if a position was not a splice site The model was then tasked to classify the usage of each position in the test dataset positions that did not pass the top-k threshold in task 1 were forcibly masked as negative in the prediction result We calculated AU-PRC for each tissue class including an ablation test to highlight the advantages of our method we prepared three different versions of the SpTransformer model: SpTransformer-noextra which was trained only on the GTEx training dataset; SpTransformer-extra1 which used an extra training dataset of two species which used the full training dataset of four mammalian species All versions were trained with the same configuration we applied the published version of SpliceAI to the tasks for comparison The key difference was a specific alteration: the output channel number of the final convolution layer was adjusted from 3 to 18 This modification enabled the model to predict tissue usage at each splice site The modification was a permissible adjustment within the conventional design framework of CNN This network was trained on the GTEx dataset with the same hyperparameter and loss functions as the first stage of SpTransformer We also retrained SpliceAI on our training dataset using the same hyperparameters as the original version We then included the published version of Pangolin Pangolin supports the prediction of four tissues (brain and testis) and does not distinguish between acceptor and donor The maximum splice scores of them were used in task 1 The tissue scores of them were used in task 2 It is worth noting that SpliceAI-modified has a remarkably similar structure to Pangolin despite the differences in their last output layers SpliceAI-modified predicts splicing effects across 15 tissues using a single model whereas Pangolin utilizes four distinct models we adapted the task for earlier methods that were not entirely compatible with our tasks and MaxEntScan were designed for classifying a single position with short flanking sequences We modified the input format to enable them to predict each position of the long sequences individually the lengths of input sequence were carefully selected based on the recommendations provided by their respective publications we excluded the “Acceptor” class when evaluating HAL The “MMSplice_MTSplice” tool is a combined model where “MMSplice” predicts splice sites and “MTSplice” predicts tissue-specific usage of those splice sites we evaluated MMSplice in two different scenarios: the full dataset as well as a simpler task where it was restricted to predicting positions within a 20 nucleotide range of each splice site The performance on the simpler task was marked as “MMSplice-short” MTSplice was able to predict usage scores in 56 detailed tissue types We took the maximum score of corresponding types as the prediction of 15 classes in our dataset Since “MMSplice-short” exhibited an advantage against “MMSplice” MTSplice was also tested on the restricted task To identify important regions within the input sequences we performed a procedure referred to as “in silico mutagenesis” The “mutagenesis weight” of a nucleotide with respect to a splice site is defined as follows: Let sref denote the splice (or tissue usage) score of the target splice site The score is recalculated by replacing the nucleotide under consideration with A The mutagenesis weight of the nucleotide is estimated as: Multiple associating motifs were enriched for each tissue through the outlined methodology the motifs were treated as the same term if their IUPAC codes (given by XSTREME) have a Levenshtein Distance not greater than 1 In order to quantify the splicing alterations caused by SNVs we calculated the difference in scores between the original sequences and the alternated sequences we predicted the 2R + 1 length of the reference sequence surrounding the mutation (R represents the length of flanking nucleotides on each side and was set to 100 by default in our analysis) using SpTransformer We then used the alternative sequence for prediction resulting in prediction scores for each position of the sequence Regardless of the “not a splice site” class scores for each class were represented by vectors in the shape of (2R + 1) × 1 for two splice site types and 15 tissue types Δscore for other tissues was calculated in the same way where abs() refers to the function to calculate the absolute value for each position and max() is the function to find the max value in the vector We defined ΔSplice as the max value between ΔAcceptor and ΔDonor to quantify the effect of splice alteration caused by a variant: was processed to quantify the change in tissue usage We created a reference mutation set with GTEx SNVs data Upon checking the GTEx Genotype calls vcf file we identified a total of 734,509,842 variants We selected SNVs that met the following conditions: 1) For each tissue there should be at least one available RNA-seq data from an individual carrying the SNV there should be at least one available RNA-seq data from an individual not carrying the SNV 3) Within a 100nt range of the SNV location should be observable across all tissues when comparing RNA-seq data between individuals with and without the SNV The “low” and “high” were under the same definition as those used in the training dataset we filtered out 27,843 mutations that cause splice alterations in all tissue classes These mutations were expected to have minimal impact on tissue specificity and their Δscore was used to build a reference distribution hereafter SpTransformer predicted the ΔSplice score and tissue Δscores for each mutation A tissue z-score was then calculated based on the following formula and Zi representing the adipose tissue z-score for the i-th mutation The distribution of the tissue z-score for each tissue was defined as the reference distribution mentioned above (Supplementary Fig. 7) we similarly calculated its z-score using previously calculated μtissue and σtissue we consider any real SNVs with a tissue z-score greater than X% of the reference distribution to be tissue-specific We classified gene expression into “Low” (0–1 NAUC) and “High” (over 20 NAUC) according to the recommended standard This classification was used during analysis to exclude genes with “Low” expression in the tissues of interest from our investigation All the gene expression values presented in the figures were based on the NAUC values The SNVs were categorized based on their consequence annotations (e.g. etc.) and clinical significance labels (e.g. SNVs with ambiguous labels such as “Conflicting interpretations of pathogenicity” or “Likely risk allele” were excluded We further utilized SNVs from the ClinVar dataset to establish a score threshold for SpTransformer (Supplementary Fig. 5) The “Strict” panel presents the performance of our SpTransformer model in distinguishing between pathogenic and benign mutations while the “Soft” panel presents the performance in distinguishing between pathogenic/likely pathogenic/uncertain mutations and benign/likely benign mutations For each gene enriched with splicing alterations SNVs we examined HPO terms for tissue-related phenotypes in the corresponding Mendelian disorders we employed this test to examine whether there exists a larger proportion of brain-specific splicing variants among case SNVs that are related to splicing Group A consisted of all SNVs with OR > 3.5 and ΔSplice ≥ 0.27 while Group B included all SNVs with OR ≤ 3.5 and ΔSplice ≥ 0.27 Genes were classified as “previously reported” if any publication in PubMed explicitly stated in the abstract or discussion that the gene is associated with the disorders f) was calculated by Metascape based on the hypergeometric test and Benjamini–Hochberg p-value correction algorithm Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article Splicing regulation: from a parts list of regulatory elements to an integrated splicing code Genomic variants in exons and introns: identifying the splicing spoilers Hutchinson–Gilford progeria syndrome: a premature aging disease Maximum entropy modeling of short sequence motifs with applications to rna splicing signals Learning the sequence determinants of alternative splicing from millions of random sequences Mmsplice: modular modeling improves the predictions of genetic variant effects on splicing Predicting rna splicing from dna sequence using pangolin Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance A deep intronic pkhd1 variant identified by spliceAI in a deceased neonate with autosomal recessive polycystic kidney disease Clinical implementation of rna sequencing for mendelian disease diagnostics Alternative splicing and related RNA binding proteins in human health and disease Neuron-specific alternative splicing of transcriptional machineries: implications for neurodevelopmental disorders Transcriptome-wide isoform-level dysregulation in ASD Autism-like phenotype and risk gene mrna deadenylation by cpeb4 mis-splicing Rna in situ conformation sequencing reveals novel long-range rna structures with impact on splicing The human splicing code reveals new insights into the genetic determinants of disease The GTEx consortium atlas of genetic regulatory effects across human tissues Gene expression across mammalian organ development Tissue-specific regulatory elements in mammalian promoters A correlation with exon expression approach to identify cis-regulatory elements for tissue-specific alternative splicing The role of rna splicing factor ptbp1 in neuronal development Analysis of APBB2 gene polymorphisms in sporadic Alzheimer’s disease Grant, C. E. & Bailey, T. L. Xstreme: comprehensive motif analysis of biological sequence datasets. Preprint at https://doi.org/10.1101/2021.09.02.458722 (2021) Attract—a database of rna-binding proteins and associated motifs Characterization of germline tp53 splicing mutations and their genetic and functional analysis Congenital afibrinogenemia: first identification of splicing mutations in the fibrinogen bbeta-chain gene causing activation of cryptic splice sites Purification and properties of native titin Identification of a novel mutation in the titin gene in a chinese family with limb-girdle muscular dystrophy 2j Homozygous missense variant in the ttn gene causing autosomal recessive limb-girdle muscular dystrophy type 10 Tibial muscular dystrophy is a titinopathy caused by mutations in ttn the gene encoding the giant skeletal-muscle protein titin Truncating mutations in C-terminal titin may cause more severe tibial muscular dystrophy (tmd) Titin founder mutation is a common cause of myofibrillar myopathy with early respiratory failure C-terminal titin deletions cause a novel early-onset myopathy with fatal cardiomyopathy Genetic profile and clinical characteristics of Brugada syndrome in the Chinese population Cognitive impairment in duchenne muscular dystrophy Duchenne and becker muscular dystrophies: a review of animal models Networking to optimize dmd exon 53 skipping in the brain of mdx52 mouse model Modulation of neurofibromatosis type 1 (nf1) gene expression during in vitro myoblast differentiation Mast cells and the neurofibroma microenvironment Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism Rare coding variants in ten genes confer substantial risk for schizophrenia Exome sequencing in bipolar disorder identifies akap11 as a risk gene shared with schizophrenia Dctn1-related neurodegeneration: Perry syndrome and beyond Mutations in the gene encoding the synaptic scaffolding protein SHANK3 are associated with autism spectrum disorders Epigenetic dysregulation of SHANK3 in brain tissues from individuals with autism spectrum disorders Rare coding variation provides insight into the genetic architecture and phenotypic context of autism Functional and structural analysis of CLC-K chloride channels involved in renal disease Overt nephrogenic diabetes insipidus in mice lacking the CLC-K1 chloride channel Exploring genes for immunoglobulin A nephropathy: a summary data-based mendelian randomization and fuma analysis Arachidonic acid metabolism and kidney inflammation Arachidonic acid in health and disease with focus on hypertension and diabetes mellitus Quantifying splice-site usage: a simple yet powerful approach to analyze splicing Deep residual learning for image recognition In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Scapture: a deep learning-embedded pipeline that captures polyadenylation information from 3’ tag-based rna-seq of single cells Multinet++: multi-stream feature aggregation and geometric loss strategy for multi-task learning In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops Ascot identifies key regulators of neuronal subtype-specific splicing A program for annotating and predicting the effects of single nucleotide polymorphisms SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Metascape provides a biologist-oriented resource for the analysis of systems-level datasets ggsashimi: Sashimi plot revised for browser- and annotation-independent splicing visualization The human phenotype ontology in 2024: phenotypes around the world You, N. et al. Splicetransformer predicts tissue-specific splicing linked to human diseases. Splicetransformer v1.0.0. https://doi.org/10.5281/zenodo.13824839 (2024) Download references Department of Obstetrics and Gynecology of Sir Run Run Shaw Hospital & Liangzhu Laboratory National Clinical Research Center for Kidney Diseases Research Institute of Intelligent Complex Systems drafted the original manuscript and revised the manuscript performed data analysis and drafted the original manuscript; H.J collected clinical samples and performed experiments to generate data regarding DN patients; J.S supervised the work and contributed to the manuscript; N.S and revised the manuscript with inputs from S.P. All authors read and approved the manuscript The authors have submitted a patent application for the method the authors declare that they do not have any competing interests Nature Communications thanks Dadi Gao and the other Download citation DOI: https://doi.org/10.1038/s41467-024-53088-6 The dates displayed for an article provide information on when various publication milestones were reached at the journal that has published the article activities on preceding journals at which the article was previously under consideration are not shown (for instance submission All content on this site: Copyright © 2025 Elsevier B.V. Metrics details Individuals with heritable thoracic aortic disease (HTAD) face a high risk of deadly aortic dissections but genetic testing identifies causative variants in only a minority of cases We explored the contribution of non-canonical splice variants (NCVAS) to thoracic aortic disease (TAD) using SpliceAI and sequencing data from diverse cohorts including 551 early-onset sporadic dissection cases and 437 HTAD probands with exome sequencing 57 HTAD pedigrees with whole genome sequencing and select sporadic cases with clinical panel testing NCVAS were identified in syndromic HTAD genes such as FBN1 including intronic variants in FBN1 in two Marfan syndrome (MFS) families Validation in the Penn Medicine BioBank and UK Biobank showed enrichment of NCVAS in HTAD-associated genes among dissections These findings suggest NCVAS are an underrecognized contributor to TAD particularly in sporadic dissection and unsolved MFS cases highlighting the potential of advanced splice prediction tools in genetic diagnostics to our cohorts of unsolved thoracic aortic disease (TAD) cases to assess the role of NCVAS DNA samples from affected individuals and relevant family members were collected after obtaining informed consent and human subject research approval The UTHealth HTAD cohort includes families with two or more members affected by thoracic aortic disease as well as trios of probands with aneurysm surgery or dissection at ≤ 40 years of age with unaffected parents confirmed by imaging The ESTAD cohort focuses on sporadic dissection cases in individuals ≤ 60 years of age without syndromic features or a family history Genetic testing reports from patients with early onset sporadic aortic dissection undergoing clinical panel testing were obtained and reviewed Exome sequencing (ES) was performed on the full HTAD and ESTAD cohorts with select cases from unsolved HTAD pedigrees also undergoing whole genome sequencing (WGS) Dissection cases were identified in individuals of European ancestry with an aortic dissection International Classification of Diseases 10th Revision (ICD10) diagnosis or cause of death code (I71.0) or surgical code for aortic dissection (L27.4 resulting in 467 cases available for analysis individuals of European ancestry with a thoracic aortic aneurysm (TAA) were identified using the ICD10 code for thoracic aortic aneurysm After excluding individuals with dissections as previously defined a total of 1084 TAA cases remained for further analysis A subset of 263 TAA patients requiring surgery was identified using surgical codes for open repair of the thoracic aorta or aortic root the remaining 447,570 individuals of European ancestry without any ICD10 codes for aortic disease (I71) or congenital malformations deformations and chromosomal abnormalities (Q00-Q99) were included for comparison Thoracic aortic dissection was defined as having an encounter with an ICD10 diagnosis code of I71.01 or I71.03 or International Classification of Diseases Ninth Revision (ICD9) codes 441.01 or 441.03 TAA was defined as having an encounter with an ICD10 diagnosis code of I71.1 Sanger sequencing of UTHealth probands and any available affected family members was done to confirm variants identified through ES or WGS and further analyzed with custom Python scripts single nucleotide variants (SNVs) were filtered for a read depth \(\ge\) 7 and were retained if they either had one or more heterozygous variant genotype with an allele balance ratio \(\ge\) 0.15 Dermal fibroblasts from a normal control and the individual with the FBN1 c.2294-3 C > A variant were grown One of two culture plates was incubated in the presence of cycloheximide (100 μg/ml Sigma-Aldrich) for 6 hours before extraction of total RNA from both plates with the RNeasy Mini kit (Qiagen) Complementary DNA (cDNA) was synthesized with random hexamers and SuperScript™ III reverse transcriptase (Invitrogen) The FBN1 region of interest was amplified by PCR with a sense primer in exon 17 (5’- GAATGACGTCAGCAGGCAGT) and an antisense primer in exon 21 (5’- GGAGCAGCACTGGGACTTTA) The products were separated on 7% polyacrylamide gel and visualized using the “Carestream 212PRO” camera The normal and all abnormal products were excised from the gel The DNA was retrieved by submersion of the gel slices in 100 μl of sterile water at room temperature overnight and 1 μl of each was reamplified using the same primers in exons 17 and 21 The amplicons were sequenced with BigDye™ Terminator v3.1 and capillary electrophoresis on the ABI 3500 Genetic Analyzer and the data were analyzed with the Chromas software Dermal fibroblasts from a patient with the FBN1 c.7820-3 C > A splicing variant and a gender and age-matched healthy control were grown in DMEM/High Glucose media (Hyclone) plus 10% FBS (Sigma) and antibiotic antimycotic solution (Sigma) in a 37°C patient and control cells were treated for 8 hours with either cycloheximide (100 µg/ml Total RNA from each plate was prepared with Trizol reagent (Thermo Fisher) cDNAs were generated using SuperScript IV VILO (Thermo Fisher) PCR products crossing the putative mutation site were amplified with primers E61-F (5’-CAGACCGGCTCCAGCTGTGAAGA-3’) and E65-R (5’-CATTGGCTTCTGTCTCAGACTG-3’) with KAPA HIFI PCR kit (Roche) The PCR products were Sanger sequenced with primers E61-Fa (5’-CCAGCTGTGAAGACGTGGAC-3’) and E64-Ra (5’-CAAGCCTCTGGGGAGAGTGA-3’) a Family members carrying the FBN1 variant are marked with a ‘+’ including the proband’s brother with a systemic score (SS) of 8 Individuals with normal aortic imaging are marked with a ‘*’ b Gel electrophoresis image of RT-PCR from skin fibroblasts of proband’s brother Band 1 (blue) represents a small amount of two abnormal splice products generated by the rare use of two cryptic acceptor sites in the exon Bands 2-5 (blue) are two heteroduplex pairs formed between normal and abnormal products Ao: Aortic diameter at the sinuses of Valsalva; Z-score: normalized aortic diameter an inhibitor of nonsense-mediated decay (NMD) ES exome sequencing; WGS whole genome sequencing; ESTAD sporadic aortic dissection cohort ( < 60 years of age without family history or syndromic features); HTAD Families with multiple members affected by heritable thoracic aortic disease; TAA thoracic aortic aneurysm; HI haploinsufficiency; MAF minor allele frequency; LB/B likely benign or benign; NS not significant (p > 0.05) *number of HTAD pedigrees Pedigree 1 (a) had extensive aortic disease and multiple individuals with a clinical diagnosis of MFS but negative genetic testing predicted to activate a cryptic splice site within intron 56 Pedigree 2 (b) was similarly affected with aortic disease and clinical MFS diagnoses but negative molecular testing WGS of the proband revealed a novel FBN1 variant predicted to cause intron retention and extension of exon 11 of the mRNA transcript aortic diameters at the sinuses of Valsalva (Ao) and normalized aortic diameter Z-scores are shown for individuals who underwent clinical assessment MFS – Marfan syndrome; WGS – whole genome sequencing A possible contributor to the varied phenotype associated with NCVAS is the efficiency of aberrant splicing which may lead to varied levels of the wild-type transcript in different tissues The effect of the variant also depends on whether a new donor or acceptor site is created and how many nucleotides are inserted or deleted the translational reading frame is preserved If the insertion or deletion of nucleotides is not a multiple of three leading to an unstable mRNA molecule that may be degraded via NMD While bioinformatic computational tools can predict such events only RNA-splicing assays can functionally validate the impact and extent of splicing changes the FBN1 c.6872-1003 C > T variant identified on WGS in two brothers with MFS had a lower score (SpliceAI = 0.39) than the c.2294-3 C > A variant (SpliceAI = 0.45) found in a sporadic dissection case and family members without any aortic enlargement additional genetic and non-genetic factors such as dissection-specific polygenic risk due to common variants and hypertension may also augment the likelihood of aortic dissection in carriers of NCVAS The absence of these variants in TAA cases suggests that alternative mechanisms may contribute to aneurysm formation in the general population rather than rare variants disrupting HTAD genes these results indicate that WGS should be considered for individuals and families who meet the diagnostic criteria for MFS but a causative variant is not identified with clinical genetic testing allowing more accurate assessments of aberrant splicing and its role in disease multiple splice products are possible and should be considered due to the potential impact on pathogenicity and phenotype variants outside the canonical ± 1,2 splice sites may be an underrecognized contributor to TAD specifically for early-onset sporadic dissection cases and MFS patients meeting diagnostic criteria but with negative molecular testing show promise in identifying such variants that have been excluded or not identified in bioinformatic analyses Despite the observed overall enrichment of these variants in dissection cases further investigation is required to predict the penetrance of disease in carriers The UTHealth datasets are available in dbGaP Study Accession: phs000693.v7.p3 The PMBB dataset is not publicly available due to IRB restrictions requiring a collaboration with a Penn investigator to access PMBB data The UKB data is available to researchers upon approval by an expert access committee The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author 2022 ACC/AHA Guideline for the Diagnosis and Management of Aortic Disease: A Report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines Marfan syndrome caused by a recurrent de novo missense mutation in the fibrillin gene Update on the genetic risk for thoracic aortic aneurysms and acute aortic dissections: implications for clinical care Role of Clinical Genetic Testing in the Management of Aortopathies Use of genetics for personalized management of heritable thoracic aortic disease: how do we get there Genes in thoracic aortic aneurysms/dissections - do they matter Next-generation sequencing of 32 genes associated with hereditary aortopathies and related disorders of connective tissue in a cohort of 199 patients Genetic diversity and pathogenic variants as possible predictors of severity in a French sample of nonsyndromic heritable thoracic aortic aneurysms and dissections (nshTAAD) Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on splicing: Recommendations from the ClinGen SVI Splicing Subgroup Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology Predicting Splicing from Primary Sequence with Deep Learning Lord, J. et al. Predicting the impact of rare variants on RNA splicing in CAGI6. Hum. Genet. https://doi.org/10.1007/s00439-023-02624-3 (2024) An FBN1 pseudoexon mutation in a patient with Marfan syndrome: confirmation of cryptic mutations leading to disease An FBN1 deep intronic mutation in a familial case of Marfan syndrome: an explanation for genetically unsolved cases Overcoming challenges associated with identifying FBN1 deep intronic variants through whole-genome sequencing Guo, D.-C. et al. An FBN1 deep intronic variant is associated with pseudoexon formation and a variable Marfan phenotype in a five generation family. Clin. Genet. 103, 704–708, https://doi.org/10.1111/cge.14322 (2023) Transcriptome-directed analysis for Mendelian disease diagnosis overcomes limitations of conventional genomic testing Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age The Penn Medicine BioBank: Towards a genomics-enabled learning healthcare system to accelerate precision medicine in a diverse population MYLK pathogenic variants aortic disease presentation and characterization of pathogenic missense variants ClinVar: public archive of relationships among sequence variation and human phenotype SciPy 1.0: fundamental algorithms for scientific computing in Python SMAD4 rare variants in individuals and families with thoracic aortic aneurysms and dissections Fast and accurate short read alignment with Burrows-Wheeler transform Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv 201178. https://doi.org/10.1101/201178 (2018) ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data seqr: A web-based analysis and collaboration tool for rare disease genomics Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program Inactivating variants in ANGPTL4 and risk of coronary artery disease The importance of mutation detection in Marfan syndrome and Marfan-related disorders: report of 193 FBN1 mutations Evaluating the quality of Marfan genotype-phenotype correlations in existing FBN1 databases Alternative splicing of exon 37 of FBN1 deletes part of an “eight-cysteine” domain resulting in the Marfan syndrome UMD (Universal mutation database): a generic software to build and analyze locus-specific databases Detection of thirty novel FBN1 mutations in patients with Marfan syndrome or a related fibrillinopathy Mutation spectrum of the fibrillin-1 (FBN1) gene in Taiwanese patients with Marfan syndrome NGS panel analysis in 24 ectopia lentis patients; a clinically relevant test with a high diagnostic yield Gene panel sequencing in heritable thoracic aortic disorders and related entities - results of comprehensive testing in a cohort of 264 patients Spontaneous intracranial hypotension as first symptom of aneurysms-osteoarthritis syndrome: a case report First genetic analysis of aneurysm genes in familial and sporadic abdominal aortic aneurysm The revised Ghent nosology for the Marfan syndrome Z-score for adults. Marfan Foundation https://marfan.org/dx/z-score-adults/ (2021) Identification of the minimal combination of clinical features in probands for efficient mutation detection in the FBN1 gene Clinical and genetic analysis of Korean patients with Marfan syndrome: possible ethnic differences in clinical manifestation Predicting RNA splicing from DNA sequence using Pangolin A novel heterozygous intronic FBN1 variant contributes to aberrant RNA splicing in marfan syndrome Functional Analysis of an Intronic FBN1 Pathogenic Gene Variant in a Family With Marfan Syndrome Fifteen novel FBN1 mutations causing Marfan syndrome detected by heteroduplex analysis of genomic amplicons Download references This work was supported by NHLBI R01HL109942 (D.M.M) the Remebrin’ Benjamin and John Ritter Foundations (D.M.M) and funds from the American Heart Association 23POST1011251 (J.D.) Sequencing and data analysis were provided by the University of Washington Center for Rare Disease Research (UW-CRDR) with support from NHGRI grants U01 HG011744 and U24 HG011746 The PMBB is supported by the Perelman School of Medicine at the University of Pennsylvania and the National Center for Advancing Translational Sciences of the National Institutes of Health under CTSA award number UL1TR001878 University of Texas Health Science Center at Houston (UTHealth) University of Pennsylvania Perelman School of Medicine Department of Laboratory Medicine and Pathology New York City Office of Chief Medical Examiner Brotman-Baty Institute for Precision Medicine wrote the manuscript with input from other co-authors All authors read and approved the final manuscript Download citation DOI: https://doi.org/10.1038/s41525-025-00472-w Music creation platform Splice has appointed Jeff Roberto as Senior Vice President of Marketing Roberto will be responsible for overseeing global marketing initiatives and supporting the platform’s strategic growth the company announced Thursday (December 5) Roberto brings extensive experience to the role, having served as Chief Marketing Officer at Nodle and DistroKid He also served as SVP Marketing at Picsart where he contributed to the platform’s growth to 100 million monthly users and helped secure a $130 million funding raise in 2021 at a $1 billion+ valuation Roberto also held roles at Shazam, Napster He began his career at the University of Connecticut‘s college radio station someone that can help us scale an already thriving business who understands the music industry and most importantly cares deeply about the creative communities we support,” said Kakul Srivastava cares deeply about the creative communities we support.” “Our AI Roadmap gives us a huge opportunity to build revolutionary creative experiences for musicians.” “Music and creators are two of my greatest passions and Splice is perfectly positioned at their intersection “I’m excited to support music producers of all levels I look forward to expanding the brand across new horizons as we continue to deliver innovative AI tools that enhance creative workflows.” Roberto’s appointment came five months after Splice appointed music industry veteran Kenny Ochoa as Senior Vice President of Content Splice provides a catalog that includes high-quality The company also offers access to plugins and DAWs through a rent-to-own “Gear” marketplace and craft music directly from their phones merges the company’s sample library with its Create technology In 2022, Spice introduced an AI-powered app called “CoSo” that uses AI to find sound samples from across the Splice catalog that work together Splice was valued at nearly USD $500 million in 2021 after securing $55 million in funding, according to Bloomberg Behind The Scenes28.04.25Sound decisions: Splice's work on Adolescence Adolescence has become the third most-watched English-language Netflix show of all time racking up over 130 million views and counting crafted by Splice’s James Drake and Jules Woods played a key role in shaping its emotional depth and immersive storytelling the series continues to resonate with global audiences in an unprecedented way Watch the video case study to see how sound helped define the world of Adolescence – and why it made all the difference SUBSCRIBE TODAY The new Splice Mic feature is available now on iOS Splice has launched a new feature for its iOS app called Splice Mic The new addition lets you record vocals straight into the app over existing Stacks — Splice’s name for a selection of loops from their catalogue that make up an idea The vocals can then be saved along with your Stack or you can use Splice’s built-in AI tools to analyse the recording and recommend more loops from Splice’s library that match your vocal Once you’re happy with the Stack and vocals you’ve created you can save them and export them to your DAW to continue working on the idea Splice will maintain all pitch shifting and warping to ensure your samples remain compatible “The phone is already a huge part of music making,” said Splice’s SVP of content “Now songwriters and producers can record vocal ideas over Stacks of samples and now those Stacks can be merged with vocals.” Splice Mic is now available in beta on the free Splice Mobile app. Click here to download the app and watch the video below to hear it in action Last year, Splice updated its mobile app with an AI interface. Following that, the sample giant added custom sample uploads to its AI-powered Create engine. by Nilay Patel FacebookThreadsIf you buy something from a Verge link, Vox Media may earn a commission. See our ethics statement. The head of the sample platform thinks creatives “deserve better” than AI tools that do all the work for them Today, I’m talking with Kakul Srivastava, CEO of music creation platform Splice. I don’t think I need to really introduce Splice, actually — I just need to play this clip: If you exist on planet Earth, you know that as the guitar loop from Sabrina Carpenter’s “Espresso,” which is an inescapable pop music phenomenon. You can check out the sample in full right here in the “Espresso” chorus Listen to Decoder, a show hosted by The Verge’s Nilay Patel about big ideas — and other problems. Subscribe here you know that some of my favorite conversations are with people building technology products for creatives and that I am obsessed with how technology changes the music industry because it feels like whatever happens to music happens to everything else five years later So this one was really interesting because Splice is all wrapped in all that — and some of its new products might change how music is made all over again Srivastava joined Splice as its CEO three years ago so she has a lot of experience working at a company that makes tools for a creative user base that’s threatened by things like automation and AI But if you’ve listened to any of our Adobe episodes you know that the flip side of that is people actually using these tools at high rates because they’re fun to play with and make some parts of the creative process easier So I really wanted to dig into that with Srivastava not only to understand where Splice stands but also to see how the broader music industry can try and make sense of this technology and what it could do to music I also wanted to talk about how the company navigates the incredibly complex minefield of copyright law and attribution on the internet — something that’s only getting more complicated with AI and the increasing number of copyright lawsuits filed against big AI companies There’s a lot in this one — and Srivastava was willing to fall pretty deep down some of these rabbit holes with me This interview has been edited for length and clarity I have wanted to have this conversation forever we ran into each other at the Code Conference last year and we just were off to the races talking about music and technology and AI and I’m glad you’re finally here because so much has changed since then But all of the issues are kind of still there and still working towards resolution some of the core issues are still the core issues at least one massive hit single has been made using loops from Splice I think “Espresso.” There’s “Espresso” and there’s a lot of other ones but a very large proportion of top music everywhere uses Splice Let’s start with the very basics for people who maybe aren’t familiar with how music is made today or with Splice Splice is a music creation platform that is used by music creators and musicians we have this tagline “starts with sound,” so we do start with sound and we provide them with probably the world’s most diverse I was looking at a report from our team that just came back from Brazil We’re meeting artists on the ground, so we’re capturing the sounds of the world, and we make that available through our platform. We also provide AI-based creative tools that help you start with a sound, but make it your own. We have compositional AI. We just launched something brand new at SXSW called Splice Mic start with a musical idea right in your phone and we’ll help you compose around that by putting the right samples next to it which is “here are the foundational pieces of making a song,” right we’re going to have this library of audio,” and then there’s this turn which I see a lot of companies that make creative software starting to make Adobe I think is the paradigmatic example of this right now You can just push generate to fill in Photoshop and it just does a bunch of stuff for you You can prompt Photoshop now in various ways and it does stuff for you Are you all the way there with Splice and where you’re going “Write me a country song,” and Splice will just do it for you That’s totally not what we’re trying to do and I’m so glad you asked this question because I really want to put this idea out there the people who we think about all day long the last thing they want is someone to make the song for them one of the things that we learned when we launched Create How are they able to get the tools to capture what’s happening inside and turn it into a song turn it into something that they can share with other people So it is absolutely not push-button creation You must at the company have some sense of how people perceive building music out of sample packs And even before “Espresso,” like “Umbrella” by Rihanna was [made with] GarageBand which I think is just a moment in music that should belong in the history books music is now just assembling a bunch of pre-made samples and that’s good or bad Is that a framework you’re using as you enter the AI generation era I’m going to take exception with what you just said I don’t think music making today is putting just a bunch of samples together I think that using samples to create music is a really profound creative process I will concede that that is a very reductive criticism it’s a process that’s been developed over decades that it’s really powerful So I think of samples as the building blocks for how modern music is made One of our largest growing genres is country music You’re using samples to make country music I think the artistry of using samples to make music is that you start with a sample what you do inside the digital audio workstation I asked that question somewhat to provoke that response you have that response to the criticism of sample usage Is that informing how you’re thinking about the criticism of AI usage And I personally spend a ton of time with creators and what they are telling me over and over and over again is “I want better tools.” And when I was at Adobe this is also something that we heard from people “I want better tools.” And so the work for us the work for any company that’s wanting to really meet the needs of this growing and large market is how do you build better tools in this era of AI let me type a bunch of prompts and I get a song out at the end,” but what happens next How do I change this particular part of the song and get it to sound a certain different way How do I take this sample and make it into something else You saw this with the Splice Mic launch as well A lot of it is: how do we get more of you into the music creation process as quickly as possible whether we’re talking about using a synthesizer to make music how do you make sure the creative process is respected throughout those different transitions in music innovation That’s a lot of incoming about what your product should look like Now there are these huge private equity companies that own huge catalogs that want to assert their rights in various ways Do they have a point of view that’s informing how you’re using AI or how you’re thinking about sample licensing Because that seems like the most complicated part of your business We are aligned across the industry — whether it’s with Universal Music or any of the other key high-quality players in the industry — we are very aligned that the rights of the creator have to be respected We’re going to focus on the creators and what creators want and we’re going to try to meet their needs So the rights of the creators have to be respected we take this pretty seriously and we take it seriously throughout the entirety of our process How does a sample producer or sample pack creator come into the Splice platform We have an entire organization that does the intake So that process of ingesting is something we take seriously Is it tagged appropriately all the way through getting onto the platform And all the quality stuff: Is the sound clear what is the experience of someone who is downloading a sample from Splice and able to use it We want to make sure that every single download that you do on Splice lets you download the PDF that says you’ve got full rights to this material to use it for any kind of creation.” So that’s something that’s a basic part of what we do and that’s the big story that everyone’s talking about which is if you’re going to use content to train you should train on content that you have rights to It’s not okay to disrespect the rights of creators most players in this space are pretty aligned on that It occurs to me just as you described that that you are a creator platform for creators There are people who sit around making sample packs and then they might make money uploading sample packs to Splice you’ve got artists who are downloading the sample packs paying you money to go use some in other songs and I wanted to come back to that for one second That’s what’s really magical about using samples to make music It’s not just a random sound that you got on Splice There’s an artist at the other end of that we work with some of the people who are really at the forefront of funk and what that means and what that sound is So when you’re using a sample pack from Splice It’s a storytelling between those two different artists coming together Are there creators who make their entire living just making sample packs for you Is that a viable approach to being a professional musician they make hundreds of thousands of dollars they’re building their own musical career and this is part of what they’re doing I will say that the revenue that we’ve shared with the artists on our platform over time So it’s nice to be able to feel good about that I’ve been spending a lot of time just thinking about the economics of creator platforms you see that creators have to augment their income they all have to do brand expansions or sponsored content or whatever Is there a ceiling to how successful you can be on a Splice Sabrina Carpenter makes “Espresso.” I’m guessing the person who made that sample pack did not get paid more money because that song was a hit that’s the pro and the con of being royalty-free what that means for the creators is they don’t have to get stressed about it You don’t have to worry about clearing the rights The downside is you don’t get to share in the sort of upside when something big like that happens We’re really here to make sure that as many people can create as possible and that gives them access to this unlimited library of sounds along with the creative tools that we’re investing in heavily for the future You get a certain number of credits per month and use those credits to download sounds that you can then use as you want And is growth just getting more and more artists to use Splice on both sides as creators and as people who are subscribers it’s been an interesting journey over the last three years while I’ve been here I think that brings us to the Decoder questions I think you had two different stints at Adobe Right out of business school and when it was a perpetual business and more recently Adobe is the creative software company. They have a very, I would say, back-and-forth relationship with creatives. We had [Adobe CEO] Shantanu Narayen on the show We got feedback on that episode of Decoder like nothing else we’ve ever experienced what AI means to Adobe as a company and its user base You obviously have some of that experience how have you thought about applying those lessons to what is well on its way to being one of those companies for the musical community Adobe has been a really important part of my career journey I was also one of the early people at Flickr There’s some Flickr users right now who are writing us emails I was also head of product and marketing at GitHub So I’ve had a chance to see creator tools in multiple different places and all of that has really informed what I’m bringing here to Splice The journey for me has been a little bit around pattern recognition that I see here at Splice is that you have a business that’s centered around content and you have a lot of rich metadata around that content and you have lots and lots of impressions around that content so that users are giving you information about it we have about a million songs that are samples that are sounds that are downloaded today We have 28 million stacks that have been created using our AI tools So we have a lot of impressions of what sounds people are listening to you can use that to build rich experiences on top which is what we’re doing now with the creative tools That feels very familiar to bring to Splice to bring to the music industry where I’ve seen it at GitHub “We are going to make the tools that actually help you create the music.” You can look at it in a slightly more abstract way Splice doesn’t see what you’re doing in those apps but those are the dominant music creation apps You’re suggesting with something like Splice Mic or Splice Create you’re going to take some of that creation I think there’s a lot of opportunity to reinvent how we make music The features you launched at SXSW are interesting because they use AI to make that a little bit faster You can sketch an idea very quickly on a phone now.Is that the extension — “We’re going to take some of Pro Tools market share We’re going to go take some of Logic’s market share” So I think that word “take” suggests a zero-sum game. This is not a zero-sum game, right? It’s about expanding and exploring the creative process. Many of our users use Splice Mic or use our mobile app as an adjunct part of their process that they will ultimately finish inside a digital audio workstation (DAW) super top-end producers has worked with many of the big names that you would recognize I’ll generate a bunch of stacks so that by the time I get to the studio I’ve got a bunch of ideas that I can show the artist right away to say ‘Do you want to go this way or do you want to go that way?’” And that’s a really core part of his creative process I was just at my kid’s school where they have a digital music production class listening to sounds on Splice is a really core part of learning What does it mean to create a Bollywood hit What does it mean to create something that’s a K-pop sound?” And I think that’s a different way to use this experience it’s not that we’re going to take [market share] away from this place or this place but how do we expand how much we’re part of the creative journey in different ways But the idea that you’re going to start and finish a song in a Pro Tools Do we think that we’re going to directly compete with Pro Tools There are people who tell us every single day “You will take Ableton out of my cold dead hands It’s not going to happen.” And there are a lot of other parts of the creative process that are painful inevitably there will be a situation where they’re like we need to find a certain kind of kick drum.” And they’ll find a folder and they’ll do a subfolder and they’ll do a sub-folder and then they’ll finally find the sub-sub-sub folder that has 20 kick drum sounds that they have saved You just go through and you listen to these sounds we’ve just done this new experience that we launched in October last year where we integrated with Studio One and there’s a Splice integrated search with sound experience So we listen to what you’re creating inside Studio One and we’ll suggest the samples that go with it right there integrated as part of your creative workflow Do I think I’m going to replace Studio One Can I make the Studio One experience a lot better because Splice is there and Splice is smart with AI How do those conversations work with all those digital audio workstation providers The companies that make them are all very different All of the music tech industry is very quirky eccentric Europeans floating around this industry in particular It’s one of my favorite parts of the tech industry to cover they’re just going to do whatever it wants to do How does that competition and cooperation work I’ve actually been really impressed at how collaborative the industry really is So the conversations with Studio One was very and we’re working with other partners as well to bring that integration I think there’s generally a recognition that we’re good at what we do the kind of work that we can do in terms of bringing these sample packs to the world It’s not something that they want to replicate They want to make great experiences inside Ableton I think the AI stuff is new to a lot of people in the industry A lot of the team that I’ve brought into Splice over the last few years comes from a core tech background which is unique in some ways for the music tech space So I think there’s a lot of respect around that I think there’s an attractiveness to a subscription business model that has been difficult for this industry to adopt I think there’s a lot of curiosity about that Could we use a content business model to get more recurring revenue But I think many people have found that it’s not as easy as it looks One of the things you say about bringing people who have a core tech background is that helps you innovate in things like AI Where you just need to be on the cutting edge of the technology Tech and music in particular have always just crashed into each other The thing I say on the show over and over again is if you pay attention to the music industry and what tech is doing to the music industry you have the view into what tech will do to everything else five years out How are you thinking about that dynamic right now “I need to hire more tech people.” Is it just for AI or is there something else you’re trying to accomplish with the addition of that talent and when I look at the music creation process I feel like these music creators have been underserved with great innovative experiences and I think it’s important to focus on the creative workflow and provide people better tools over time When I think about the collision between tech and music it’s weird because there’s actually more similarity than dissimilarity in Splice we have some really great software developers who love music We have a whole bunch of musicians and artists who think in that same weird mathy way that great software developers think I also think that there’s this mindset out there that musicians are scared of technology I actually think that musicians love hacking there was all this threat around synthesizers and all of that stuff and then Stevie Wonder took it to a totally different and it allows them better tools to get to the other place What they don’t love is push-button creation you will find the right ways to bring technology innovation here I think there’s something else that you’re pushing on here that I think is important and maybe it’s one of your Decoder questions around how do you bring the cultural mindset from the tech industry and meld it with the music industry I might as well ask you the Decoder questions We are fundamentally a product company first So my largest organization at Splice is the product development organization Because I think that tight loop is super important they’re all in one org and that’s product development Our second largest org is our content team And those are the people that they’re going to Brazil Maybe the third thing that I’ll point out that’s really important to me and how I structure the org Is we have a very strong central data organization that reports directly to me So a lot of people put that inside product dev Data’s obviously important for finance and how we run the business And how is it split between those three groups they say they have investments in content teams but they really just hope the scale carries them forward Instagram does not have some huge content team that is traveling the world to get content It’s the same with YouTube or TikTok or whoever else They might manage some of their top influencers but really the volume of content comes to them Is that a tipping point that you think Splice can reach or do you want to maintain control over the library It’s really important for us to make sure our library is the highest quality that it can be it’s not going to be a free-for-all where anyone is uploading anything they want because we need to maintain that high quality There’s all kinds of stuff that’s being uploaded to all of these big platforms So I think the biggest change has been around The reason that’s important for me is because I need to understand what our customers actually care about how are they voting with their clicks as opposed to whatever opinions everybody else has and that is really where data and the math and the science turns into something else which is a real experience that people can feel The reason that’s important is because we’re serving creative people and that’s what creative people do as well — they take all of these inputs and they turn it into something new So building a strong design team that is either made up of music creators themselves or people who spend a lot of time with music creators is really important. And the third thing that I really brought in that’s important is that we build our products with the customers. So everything that we launch, there are tools that we built in to allow people to give us feedback. In fact, when we launched Create the biggest button in the Create experience was the feedback button Every single time someone typed in something to give us feedback it comes into a Slack channel that’s with all the designers and the engineers and the product managers So we’re actively talking about the feedback from the customers as it’s coming in I absolutely love that we build product that way I think everyone should build product that way One of the reasons I always ask about structure on the show is that it’s a proxy for culture You make some big choices about how things are organized You’re in an interesting spot because you took over for co-founders How have you thought about changing the culture The reason I love your question around structure is because I do see that it’s a proxy for values and that’s why I answered it the way I did around data Those are fundamental values that I want to bring and inculcate into the company There’s something else that we also did that was around building culture I spent a lot of time listening to the team trying to learn what made this culture unique and then I reflected back to the organization these are the values that I’m hearing from you all And we came up with something that we call our DISCO values: direct And even though these are new values that we came up with after I joined they have felt so authentic to the culture that we have that’s existed for a long time Every single new employee that comes on talks about which DISCO value they resonate with most which is also in many ways a proxy for culture and values I’ve always been a very math and science kind of person I’ve always been someone who’s very analytical I study all the different tools for decision-making but as the decision sets that come to me become more complex and as we operate in an increasingly more complex world I have found myself relying more and more on intuition I would say that my decision-making process is People know in my team that I spend a lot of time on our dashboards I will spend a lot of time watching research videos and understanding how people are using our tools I will spend a lot of time personally talking to different customers and once I’ve kind of drowned myself in all this information We’re going to put this into practice because the “making creative software for creative people in the age of AI” is about as tense as it gets in the balance between what the numbers are telling us and how the people feel And what I mean specifically is the numbers are telling everyone that people are using the AI tools every software maker I’ve talked to has introduced AI tools with any meaningful value they’re doing generative fill all day long Then what you hear from the creatives on social media or online They stole everything from me.” And that is about as big of a divide in tech I think that is challenging a lot of how everyone is going to make decisions So I’m going to read you a quote from one of your ostensible competitors and it tracks with everything you’re saying but I suspect you are going to disagree with this quote and I just want to sit with that for a minute So you have said, “Right, creators just want to create, they want all this stuff to get out of their way.” So here’s the CEO of Suno, Mikey Shulman. Suno is just “push a button, it makes you a song,” right? You say country song, it just spits out a country song at you. And here’s what he recently said: “It takes a lot of time You have to get really good at an instrument or really good at a piece of production software I think the majority of people don’t enjoy the majority of time they spend making music It is not really enjoyable to make music now.” I have no idea what Mikey Shulman is talking about but that does track with what you’re saying that you just want to get the software out of the way But he spun the knob all the way to “just prompt me for a song.” And a lot of people reacted to this quote very strongly How do you sit in the middle of that to say “There’s a line and I’m going to enforce the line and we’re not just going to prompt it all the way to a song?”Also Do you think people don’t enjoy making music Here’s what I have learned by serving creative people for most of my career: the creative process is essential for people who create but the struggle is to authentically translate what is inside you into something else your tools will help you —will enable you to do that — and other times your tools will get in the way Understanding the distinction between those two is the whole ball game but it’s really about allowing the struggle to come to life and to dismiss it by this push-button set of tools I think the creative process and creative people deserve better They deserve better technology that enables them as opposed to reducing this profound activity to a button So this is where I think the line is inherently qualitative here’s what we’re going to do and here’s what we’re not going to do.” And the tension of “It’s not really enjoyable to make music now,” you can describe that as using the software sucks or I just want to have an idea and hear it as fast as I can And then you can describe it the way you’re saying which is there’s some parts of the struggle that are the creative process If the data tells you that people really want to just click the button and make the music are your values strong enough to not send you all the way down the road I think it depends on which people you’re listening to We are really clear about the people that we’re listening to We are listening to creative people who love the process of music creation when we gave them Create for the first time I want something that gives me more creative freedom the people we’re listening to are super clear and the signals they’re giving us The other side of this marketplace is consumers We see consumers and fans all the time now react very strongly to AI generated imagery you make a movie poster and it’s got a bunch of AI in it You can’t see that the characters in the movie poster have 12 fingers and their hair bleeds into the skyscraper behind them Do you perceive that kind of consumer or fan backlash to AI in music the same way that we see it in visual art I have seen really clear signals from our customers that they are not really interested in computer-generated samples We are in fact investing in more human-created samples This is why we’re sending people out to the sort of subgenre locations It’s really important for our strategy to continue to do that because people want to connect with the stories of the real artists on the other side of the sample I think what an end user who’s listening to Sabrina Carpenter today and will listen to somebody else’s music tomorrow what they can hear is going to be interesting I love that Kendrick Lamar won the Pulitzer Prize for music and the people who won the Pulitzer Prize for music So what art is and what is acceptable changes over time I would expect it to continue to change over time I know that artists will use different tools discussion about watermarks and encryption and letting people know when images were edited by AI or created by AI and I would say there are some deep and meaningful challenges with even making that technology work consistently There’s not anything quite like that on the music side I think it’s going to be really hard to disambiguate around sound You’ve had some really great conversations about this topic on your podcast I think it’s a really important debate and discussion to have There is going to be a bunch of bad AI-generated content out there I think that the toothpaste is out of the tube we have to do the right thing around respecting the rights of creators and doing the right thing with respect to training data Maybe some of these cases that are open will help us get to the right answer but I don’t think it’s going to come out of watermarking You talked about the flood of AI content that’s coming The big consumer platforms are embracing it Mark Zuckerberg would love it if all the content on Facebook was AI and he was paying zero out to creators YouTube is really leaning into the idea that you should interact with your favorite creators through AI avatars and that they should make even more videos or AI should help them make even more videos to increase the volume of content that appears I don’t know exactly how it’s going to play out but I understand the incentives for those platforms to make those choices to say what we want always is more content because that will create more attention and we can serve more ads and we’re in this finite zero-sum intention game.” You’re not in that game specifically and you do allow artists to make music with AI using your tools Do you allow AI-generated samples to enter your library Why is it okay to make music with AI but not to have it in the sample library I think it’s what users are coming to Splice for today They are coming to find those authentic sounds made by humans That’s not to say that people aren’t using AI to master sounds or things like that You’re using AI to master your audio and video probably here and that’s fine as long as there’s an authentic artist’s artistic vision and voice behind it that’s super important for us to continue to be focused there With respect to these social platforms that you’re talking about And inasmuch as these social platforms are important for our creators as a way to share their output But these social platforms have grown because they allow people to have emotional connection with each other “I’m really angry about this particular issue,” or “I’m reaching out for support for these fires in LA,” or these connections that we make and finding support around this very specific cancer that I have that I can’t find other people to connect with online if we erode those actual emotional connections between people in order to save a buck in paying out creators I think the value of these platforms will diminish over time Maybe we shouldn’t spend so much time on TikTok Maybe we should spend more time creating music on our own So I think these are really interesting evolutions that are going to happen in the industry I care a lot about where some of this stuff goes and so many things our users create just to hang out on their desktop because it was just for the joy of creating And some of it goes on and becomes a Billboard top 100 hit but I’m just as happy that someone is spending time creating Let me ask that again in just a different frame We’ve talked a lot about active creation and what the tools are for and the fact that your customers They want to add something to what the computer-generated product is giving them and that process of addition creates additional value Some very important songs have been made that way using Splice and other tools But you’re saying that is not a good enough argument to get AI-generated audio into the sample library If it’s good enough for me to send to a major label and play on the radio shouldn’t it be good enough to get into the Splice sample library So I think that the distinction in my mind and I think for many of our creators is that or was AI used as a tool to bring a human creator’s idea to life Do people use technology to create the samples that end up on Splice People are using lots of tools and technology “I’ve created an algorithm to pump out a whole bunch of samples that are computer-generated for the mass market.” Those are not going to end up on Splice who is creating art that they care deeply about and they’re using AI tools as part of that process I just don’t know how to write it down in a way that can be consistently enforced across all the geographies that you’re operating in with all of your teams going out in the world or in a way that’s understandable to artists who might want to be part of Splice Is there a definition you have of where the line is I will take it down to something very simple There’s a human being who we have a relationship with on both sides of our platform and so on the side of the platform where we are working with a musician an instrumentalist who wants to provide a sample to Splice and we talk to them about what they’re trying to do How many sample packs do we need every quarter there’s a Japanese potter who is making handmade percussion instruments that he then records we’ve got crazy kids making all kinds of super electronic and they’ve got a different tool set that they’re using as part of their process you can’t use this tool because it’s AI-generated or not,” but do you have that authentic vision for what you’re creating And it’s not that difficult to tell the difference between a person who is creating that way “I typed in a bunch of prompts and I got a whole plethora of computer-generated sounds.” The other extremely challenging piece of the puzzle with AI-generated content is when you veer into impersonation. We’ve seen this in the hip-hop industry a lot recently. We’ve seen it with OpenAI and Scarlett Johansson’s voice. There’s a lawsuit. The voice got pulled. Who knows how that’s going to play out? We see there’s the Elvis Act in Tennessee where impersonation is illegal and I don’t think there’s a great answer for whether Elvis impersonators themselves are now illegal Are you playing in that space where you’re letting people use artist voices or sound-alikes I think there are lots of people who are playing in that space or interested in that space and creative people are actually really clear with us They are coming to Splice because they want to find their authentic sound and so we work really hard at the very other end of that which is how do we allow our users to authentically find their own vibe Voices is one thing, right? They’re pretty recognizable. The fake Drake song set the industry ablaze It was just very obviously a fake Drake song There’s not a great legal system for saying It seems like we’re on our way to understanding how to get there Then there’s kind of the existing mess of music copyright. We talk about the “Blurred Lines” case on this show a lot I think more than any other podcast we’ve talked about “Blurred Lines,” a song which came and went and whose moment is over but it continues to come up on Decoder maybe once a month That lawsuit is “you guys stole a vibe from Marvin Gaye not anything direct,” but the jury was like Robin Thicke and Pharrell have to pay the money.” That’s something you could very easily see a user of Splice wandering into We’re going to layer some samples and we’re going to get to a vibe that’s too close to another artist.” Is that something you worry about Is that something you try to protect users from It is, and it’s also been a core part of how music evolves over time. There’s this whole conversation around reheated nachos and what that means and I think artists and musicians build upon each other’s work and this conversation’s been around since the beginning of sampling which is “what am I referring to when I use this sample and what’s the story that I’m trying to tell?” or you could argue that it’s building on a shared piece of work that’s a community piece of work that continues to evolve over time I think that that’s what makes art and music in particular super fascinating I love that you guys have this whole debate around that particular song and what’s right and wrong should be defined by the artists But the idea that you would accidentally boost too much of an existing song by using an AI tool which is trained on bits and pieces of existing songs The push and pull is people being very unhappy about the money, and now we’re at a place where it’s easier than ever to be derivative, and the money is absolutely not clear — that artists are very upset about their work being trained on, maybe not in your tools, but certainly in other tools. The labels are suing Suno and Udio, its competitor, for training on their data Because it seems like the problem is going to get worse faster than the legal system will even comprehend the technology Most of these problems get worse before the legal system catches up Technology outpaces how quickly legislative action catches up we’re doing a lot of work to try to create standards within great companies in the music space that are saying We have to make sure our training data is clean.” I think there are a lot of companies that are trying to do the right thing Is there one standard that has won out amongst all the others but I know that a lot of people are working really hard on this problem We care deeply about the rights of creators so that’s going to stay really important for us How do you feel about the labels suing Suno and Udio Is that something that’s a warning sign for you Do you think that that is going to get resolved I think what the labels are trying to do is support the rights of the creators so we absolutely support the rights of the creators it’s always going to be about the creators first and I know my customers deeply care about the fact that they have rights to the content they create using Splice That’s why we allow people to download the rights PDF even if they’re not putting their song up on Spotify or trying to make a billion dollars from it they want to know that they have the ability to do that So that’s what governs our decisions around clean training data If I wanted to sign up for a Splice account You say you can’t train AI on these tracks Yeah, I don’t either. It’s such an important issue. And the scale of the Internet, the scale of content on the internet is so vast that — What is fair use? What is not fair use? What is public consumption? What is public record? What is public ownership? We are in uncharted territory, and we’re going to be watching it just like you are. How would you write a fairer system if you were clean sheeting this? How would you write a fairer system that makes creators feel valued, gets them paid, and still allows people to build these AI systems that a lot of people are getting some value out of? I would love to say that I’m the expert who could write something like that. I have a much more straightforward problem to look after, which is, how do I help creative people be creative and get the ideas from their hearts and minds out there? Yeah, I’m going to leave that problem to people way smarter than me, who are legal minds who are working really hard on this. Well, if I get anyone on the show who has an answer, I’ll let you know. I just talk for a living. I haven’t done anything useful in a long time. Kakul, you’ve given us so much time. What’s next for Splice? What should people be looking out for? All right. We’ll have to have you back soon as some of these issues play out. Thank you so much for coming by. I would love to. I had such an enjoyable conversation. Thank you so much, Nilay. Questions or comments about this episode? Hit us up at decoder@theverge.com. We really do read every email! A podcast from The Verge about big ideas and other problems. A weekly newsletter by David Pierce designed to tell you everything you need to download, watch, read, listen to, and explore that fits in The Verge’s universe. Please upgrade your browser to improve your experience Johns Hopkins researchers have developed a powerful new AI tool called Splam that can identify where splicing occurs in genes—an advance that could help scientists analyze genetic data with greater accuracy offering new insights into how genes function and mutations contribute to disease Their results appear in Genome Biology "Precisely identifying splicing sites is key to understanding how cells interpret genetic instructions," says co-lead author Kuan-Hao Chao, a doctoral student in the Whiting School of Engineering's Department of Computer Science who is affiliated with the Center for Computational Biology (CCB) "Splam lets us analyze genetic data with accuracy and efficiency showing how mutations affect our health and why the same gene can produce different proteins in different conditions." He is joined on the project by his advisors—Steven Salzberg, the Bloomberg Distinguished Professor of Computational Biology and Genomics and the director of the CCB, and Mihaela Pertea an associate professor of biomedical engineering and genetic medicine with a secondary appointment in the Department of Computer Science—as well as Alan Mao a fourth-year undergraduate double majoring in biomedical engineering and computer science Image credit: Whiting School of Engineering Cells rely on genes to guide their functions with each gene containing both useful instructions (called exons) and non-essential segments (called introns) Splicing is the process by which cells trim away the non-essential portions recognizing splice sites computationally is a crucial step in accurately assembling gene transcripts in modern genetics studies where RNA sequencing experiments measure the level at which a gene is expressed—basically whether it's turned on or off—in different conditions cancer researchers often use RNA sequencing techniques to compare gene expression in healthy versus cancerous cells," says Chao Identifying splice sites is also important in annotating genomes which involves identifying which parts of our DNA are functional and what roles they play in the body One familiar application of genome annotation is in genetic testing services such as those offered by companies like 23andMe These tests analyze parts of your genome to tell you about your ancestry Genome annotation makes this possible by identifying and interpreting these regions of the human genome Compared to the state-of-the-art "SpliceAI" tool the Hopkins team's "Splam" method uses a much shorter DNA sequence window to predict RNA splice sites making its model more biologically realistic and feasible for use in research The team's Splam algorithm takes a DNA sequence of 800 nucleotides—400 each of adenine (A) and thymine (T) on both sides of potential donor and acceptor sites—and outputs the probability for every base pair being a donor site "Our algorithm attempts to recognize these donor/acceptor sites in pairs just as a spliceosome 'molecular machine' does in the cell when it cuts out an intron," says Chao The researchers developed their algorithm to recognize splice junctions within a window of 800 nucleotides—a far smaller region than the 10,000 nucleotides required by Splice AI The team reports that despite requiring less genomic data Splam achieves better splice junction recognition accuracy than SpliceAI After training their deep learning model on human DNA the researchers ran additional tests on other species' genetic codes "A frequent concern about deep learning methods is whether they simply memorize their training data or if their predictive models will work on data that diverges from what they have seen in training," Chao says "So to evaluate whether Splam had learned more general splicing rules we collected data from three successively more distant species and applied the algorithm to each of them without re-training." The team chose the genomes of a chimpanzee and a flowering plant in the mustard family Their subsequent experiments demonstrated that Splam's biologically inspired design still produced highly accurate results on these more distant DNA sequences—showing that their method had indeed learned essential splicing patterns shared across many animals and plants The team's next steps include applying its model to more species and integrating its method into existing RNA sequencing pipelines for practical use in transcriptome assembly "Our method has immediate applications in improving transcriptome assembly and reducing splicing noise making it valuable for a wide range of genomic studies," says Chao "We hope that Splam will contribute to the better understanding of our genomes and the genes within them." Posted in Health, Science+Technology Tagged genetics, computer science, artificial intelligence What will the music of tomorrow sound like According to the latest report from Splice the creator economy and cross-entertainment and creative shifts that are set to define the musical landscape in 2025 The "Sounds of 2025" report reveals that music will be characterized by genre fusions and global influence. Among the hundreds of genres across Splice, one is raising its volume louder than the rest: "pluggnb" Fusing the trap sub-genre "plugg" with '90s R&B and gospel harmonies "pluggnb" is the fastest-growing genre on the platform with downloads spiking 342.8% in 2024 Unofficial "pluggnb" remixes dominated TikTok in 2024 and led to adoption of the genre by K pop heavyweights like LE SSERRAFIM and ILLIT "Splice is uniquely positioned to see the sounds that are driving music production globally This report gives us a sense of how music producers are bending and blending genres in real time and just what genres and regions will define what we hear in 2025 and beyond." Kakul Srivastava Creators in LA drive the most Splice downloads by far with New York creators driving roughly half as many And while American cities make up half of the ranking for Splice's top ten cities the Splice user base is increasingly global Australia has both Sydney and Melbourne in the Top 10 "Splice is uniquely positioned to see the sounds that are driving music production globally This report gives us a sense of how music producers are bending and blending genres in real time and just what genres and regions will define what we hear in 2025 and beyond." "The music industry is always trying to get ahead of trends but there is perhaps no more forward-looking cultural trend than sample usage," says Mark Mulligan "The sounds that producers use to create today give us a window not only into the genres that will be big tomorrow but also early indications of entirely new genres that will emerge from the process of mixing samples across styles and cultures The genres that stand out in this report also underline wider trends: the growing importance of scenes and fan remixing in shaping the sounds of the future." ● Pluggnb: The Fastest Growing Genre of 2024 Pluggnb—a fusion of trap's plugg subgenre with '90s R&B and gospel harmonies—rose an incredible 342.8% in downloads while Seoul has emerged as a key hub for this genre's growth TikTok and digital culture are driving its global rise making it a key contender for the mainstream in 2025 ● Jersey Club: From Underground to Global Phenomenon Jersey club, the high-energy hip-hop/house hybrid, saw exponential growth in 2024, particularly in Berlin, where it became the city's fastest-growing genre on Splice. Its influence is growing worldwide, with artists like UNIIQU3 and Cookiee Kawaii bringing it to new audiences Expect Jersey Club to continue its global domination as 2025 progresses "Rage," a harder and YoungBoy Never Broke Again are introducing this sound to the mainstream This genre is likely to continue its rapid rise blending intense energy with a more experimental ● A New Era of Dance Music: Melodic House & Techno cinematic approach gaining traction globally ● Afro House and the Global South's Influence The global rise of "Afro House" particularly from South African creators like Black Coffee As international audiences increasingly embrace the genre's fusion of African rhythms and house beats it's expected to play a significant role in the sounds of 2025 it highlights the emergence of 'Brazilian phonk' which was made popular by European producers who fused "phonk" with the sounds of Latin America and its reclamation by Brazilian producers While Los Angeles and New York remain the top cities for Splice downloads the rise of international hubs is undeniable Tokyo and Berlin are two of the fastest-growing music production cities globally with new sounds and genres emerging from these places ● Los Angeles: Dominating trends like K-pop and pluggnb LA continues to be a hotbed for genre experimentation and cross-cultural influence ● São Paulo: Home to the rapidly growing drift phonk sound this Brazilian city is becoming a critical center for phonk's evolution ● Johannesburg: As Afro House continues to soar South Africa is establishing itself as a global powerhouse for innovative ● Berlin: Jersey club's unexpected rise in Berlin exemplifies the city's place in the global music scene To read the full report - click here John Vlautin, Splice, 1 818-763-9800, [email protected], www.splice.com Metrics details a neuronal RNA-binding protein expressed in the central nervous system is essential for survival in mice and normal development in humans A single amino acid change (I197V) in NOVA1’s second RNA binding domain is unique to modern humans we generated mice carrying the human-specific I197V variant (Nova1hu/hu) and analyzed the molecular and behavioral consequences While the I197V substitution had minimal impact on NOVA1’s RNA binding capacity it led to specific effects on alternative splicing and CLIP revealed multiple binding peaks in mouse brain transcripts involved in vocalization These molecular findings were associated with behavioral differences in vocalization patterns in Nova1hu/hu mice as pups and adults Our findings suggest that this human-specific NOVA1 substitution may have been part of an ancient evolutionary selective sweep in a common ancestral population of Homo sapiens possibly contributing to the development of spoken language through differential RNA regulation during brain development the genetic basis underlying these specialized human traits remains to be fully identified These findings underscore the importance of incorporating diverse human samples to identify and validate the genetic background of modern human traits through genomic comparisons technical concerns continue to make definitive conclusions about the nature of the NOVA1 I197V variant in brain challenging we generated humanized mice harboring this variant to study its consequences for RNA regulation and behavior in vivo we used gene-editing to substitute the NOVA1 isoleucine (I) isoform present in most mammals and archaic hominids (Neanderthals and Denisovans) with the human-specific valine (V) variant at position 197 in mice Comparison of these humanized NOVA1 mice (Nova1hu/hu) with wild-type mice carrying the ancestral Nova1 gene (Nova1wt/wt) revealed specific transcriptomic and behavior differences related to vocalization and evidence that the human-specific amino acid 197 variant confers vocalization changes in humanized mice suggest a role for NOVA1 in the evolution of human-specific language d Comparison of normalized Tajima’s D values The first gene set includes NOVA1 and NOVA2 and NOVA1-neighboring genes (FOXG1 and STXBP6) on chromosome 14 The second gene set includes all genes on chromosome 14 e Model of the evolutionary timing for the 197th amino acid change in the NOVA1 gene noting the Nova1hu/hu mice generated in this study Nova1hu/hu mice express the modern human-specific amino acid in the NOVA1 protein The bottom panel shows the corresponding position within the KH2 domain of the NOVA1 protein Amino acids structurally proximal (<5 Å) to the 197th amino acid Using human genetic data from the 1000 Genomes Project we calculated the DH value for the NOVA1 locus which was nominally significant at p = 0.046 given the multiple hypothesis testing involved in our exploration of several selection tests the observation that the NOVA1 197 V allele became nearly fixed and is shared across human population groups suggests it arose and increased to high frequency before their divergence Our analyses support the idea that the NOVA1 197 V variant was part of an ancient selective sweep in modern humans predating many other known sweeps in the human genome and showed down-regulation in the P21 midbrain of Nova1hu/hu mice (average TPM 18.6 in Nova1wt/wt these in vivo and in vitro studies reveal not only the resilience of the I197V variant in maintaining the biophysical properties of RNA binding with minimal global disruption but also its remarkable conservation of overall function this variant exerts specific effects on alternative splicing (AS) prompting us to investigate the I197V variant’s impact on RNA regulation We first determined the brain regions for AS analysis based on the expression patterns of NOVA1 Expressed genes in the P21 midbrain were used as background for this analysis f Percentage of genes with differential AS changes in each behavior-related gene ontology category The number of transcripts with differential AS events in Nova1hu/hu is shown relative to the total number of genes in each category it is plausible that these vocalization related transcripts are similarly affected in a context-dependent manner such as in response to sensory cues from surrounding environment these findings indicate that Nova1hu/hu mice with I197V substitution exhibit subtle but specific impact on RNAs in the brain particularly in genes involved in animal behavior and vocalization These data strengthen the potential relationship between Nova1hu/hu and vocalization suggesting that vocalization studies in these mice would be valuable a Isolation-induced ultrasonic vocalization (USV) test in pups b USV parameters and syllable classification c Fqmax distribution and two-Gaussian fit for pup USVs Ashman’s D score (a measure of separation of two distributions where a score above 2 indicates good separation) is shown Each Gaussian center and weight are labeled with the intercept of the two Gaussian distributions (black triangle) used as the cutoff between high and low Fqmax USVs d Ratio of high or low Fqmax in syllables “d” and “m” The ratio of syllables belonging to each distribution (high or low) is calculated for the total number of each syllable type e Courtship-induced USV test for adult mice f Duration distribution and two-Gaussian fit for syllable “s” in adult USVs The intercept of the two Gaussian distributions (black triangle) was used as the cutoff between long and short duration Short and long “s” syllable examples are shown at the top of the plot g Peak frequency parameters in long duration “s” h Fqmax distribution and two-Gaussian fit in adult USVs The black star marks the 100 kHz cutoff between high and low Fqmax Examples of low and high Fqmax syllables are shown at the top of the plot i Frequency variance (Fq variance) in high Fqmax in adult USVs h) at the bottom of the density plots show the mean (black dots) and standard deviation (whiskers) by peak for each genotype No significant differences were observed between genotypes each circle represents data from a single pup: Nova1hu/hu N = 41 three experiments were conducted over consecutive weeks and the average value for each mouse is plotted (white circles): Nova1hu/hu N = 13 Statistical analysis was performed by Wilcoxon rank sum tests (two-sided We also tested the bimodality and syllable ratios with Fqmin and confirmed the same trend (increased ratio in high Fq in Nova1hu/hu pups) Heterozygous Nova1wt/hu pups showed intermediate values between Nova1hu/hu and Nova1wt/wt pups for these parameters suggesting that the effect of the I197V substitution in NOVA1 protein on pup USVs is dosage-dependent These observations demonstrate distinct changes in the vocalizations of Nova1hu/hu pups changes in vocal quality in Nova1hu/hu pups had no impact on the behavior of the mother mice in this assay These changes were not observed in pup isolation-induced USVs indicating that this effect is developmentally specific and/or context-dependent with simple syllables like “s” having lower values and more complex syllables like “m” with multiple jumps having higher values This suggests that Nova1hu/hu mice produce more complex high-frequency USVs than Nova1wt/wt mice These findings demonstrate that vocal behavior is altered in both pups and adults in Nova1hu/hu mice we investigated the biological effect of a single amino acid substitution By analyzing Nova1hu/hu mice carrying this allele we identified molecular changes in alternative splicing in the brain including brain regions associated with vocal behavior and identified changes in vocalization patterns in pups and adult mice These findings suggest that during human evolution the I197V substitution in NOVA1 protein may have contributed to the development of neural systems involved in more complex vocal communication This underscores the unique nature of the I197V variant which occurred within a region of the genome resistant to change These results confirm that NOVA1 has undergone strong positive selection and that the I197V variant is part of an evolutionary selective sweep in the emergence of Homo sapiens may leave subtler genetic signatures that require novel detection methods This suggests that the ancient NOVA1 selective sweep may represent part of a broader set of undiscovered ancient sweeps it is plausible that the I197V substitution affects cortical regulation of vocalization the Y-maze test results indicated that Nova1hu/hu mice had spatial working memory comparable to that of control mice Future studies will be necessary to investigate the effects of the I197V substitution on USVs in female mice as well as adult female preferences to USVs in adult Nova1hu/hu male mice These observations may indicate a common or related molecular alteration in the neural circuits involved in the USV production between humanized Nova1 mice and humanized Foxp2 mice Future studies should aim to identify the molecular and neural basis of these alterations as well as the physiological significance of these vocalization changes in the context of social behavior Our molecular analysis showed that the sequence-specific RNA binding of NOVA1 was unaffected by the human substitution and that steady-state gene expression levels in the brains of Nova1hu/hu mice were nearly identical to those of wild type mice we detected alternative splicing changes in several transcripts associated with vocalization The expression pattern of NOVA1 in the brain and the enrichment of its target transcripts to specific biological pathways support a link between NOVA1 function and vocal behavior Uncovering the precise molecular mechanisms underlying the phenotypes in Nova1hu/hu mice will require further study of the neural circuits for vocalization as well as on regulatory factors influencing NOVA protein function This study sets the groundwork for understanding molecular mechanisms driving the evolution of human vocal communication we analyzed a single amino acid unique to modern humans in the RNA binding protein NOVA1 and examined its biological effects in vivo by introducing this amino acid in mice NOVA1 is highly intolerant to changes in amino acid sequences during evolution with the exception of this single amino acid change in humans We propose that this change was part of an evolutionary sweep associated with specific changes in the neuronal transcriptome and vocal communication All procedures were performed according to the guidelines of the Institutional Animal Care and Use Committee (IACUC) under the IACUC protocol # 23014 at the Rockefeller University 000664) mice were obtained from the Jackson Lab Nova1hu/hu mice generated in this study were backcrossed to C57BL/6J strain at least 8 times The mice were housed in individually ventilated cages (five per cage) under conditions of a 12 h light/dark cycle and ambient temperature of 21 ± 4 °C with 40–70% humidity Male or female mice aged 7 days (for isolation induced pup USV test) and 8–20 weeks (for playback behavioral experiment and courtship-induced adult USV test) were used for animal experiments Littermates of the same sex were randomly assigned to experimental groups Nova1hu/hu mice were generated by directly injecting the sgRNA/Cas9 RNP with a single-stranded repair template DNA (ssDNA) into C57BL6 zygote to substitute isoleucine to valine at amino acid 197 of mouse NOVA1 gRNA and the ssDNA were designed as follows gRNA (5’-TGCTACTGTGAAGGCTATAA-3’): overlapping the DNA sequence (mm10/ chr12: 46,700,902–46,700,904) of the mouse Nova1 genomic locus encoding the 197th amino acid of NOVA1 ssDNA: 140 nt length DNA homologous to the NOVA1 locus with a nucleotide substitution (A to G) to cause an amino acid change from isoleucine to valine at the 197th position Two silent mutations were also designed to create BtsaI restriction enzyme recognition site for genotyping Genomic DNA was extracted from the tail of the F0 animals and the DNA corresponding to the area around the 197th amino acid was amplified by PCR and subsequently cloned into a plasmid for determining the sequence of the modified allele Genomic sequence analysis revealed that among 13 F0 animals 8 animals harbored the designed allele (with three nucleotide substitutions: one causing I197V amino acid substitution two for restriction enzyme recognition site for genotyping (not causing amino acid changes)) Animals carrying the designed humanized Nova1 allele were crossed to the wild-type C57/BL6 mice and this process was continuously repeated for subsequent generations to eliminate possible off-target mutations sequences around the genomic DNA encoding the 197th amino acid were amplified by PCR subsequently digested with the BtsaI restriction enzyme Each Mouse genotype; wild type (Nova1wt/wt) heterozygous (Nova1hu/wt) was determined by band size obtained by electrophoresis Siblings obtained by crossing heterozygous parents were used in the experiment DNA band size after restriction enzyme treatment: wild type (613 bp), homozygous (389 bp and 224 bp), heterozygous (613 bp/ 389 bp + 224 bp) (see Supplementary Fig. 3c) Primary antibodies used for immunohistochemistry and western blotting were as follows; rabbit anti-NOVA1 (1/1000 dilution) [EPR13847] (ab183024 rabbit anti-NOVA1 C-terminal (1/1000 dilution) [EPR13848] (ab183723 human anti-pan NOVA (1/10,000 dilution) (anti-Nova paraneoplastic human serum) and rabbit anti-ATCB (1/10,000 dilution) (ab8227 3 or 12-week-old mice were perfused with PBS and 4% paraformaldehyde (PFA) The solution was sequentially replaced with 15% sucrose/ PBS and 30% sucrose/ PBS Frozen brains were sliced into 30–50 μm thick sections in a cryostat (CM3050S Slices were washed three times with PBS at room temperature (RT) incubated in 0.2% Triton X-100/PBS for 15 min at RT blocked in 1.5% normal donkey serum (NDS)/PBS for 1 h at RT incubated overnight at 4 °C with primary antibody in 1.5% NDS/PBS then incubated in Alexa Incubated with 488 555 or 647 conjugated donkey secondary antibody The nuclei were stained using 4’,6-diamidino-2-phenylindole (DAPI) solution (1 μg/ml) Images of specimens were collected with a BZ-X700 (KEYENCE) microscope midbrain and cerebellum) of P21 mouse brains were lysed in RIPA buffer (50 mM Tris-HCl; 150 mM NaCl; 0.1% SDS; 0.5% sodium deoxycholate; 1% NP-40) and subjected to immunoblotting using the antibodies described above Quantification of western blots was done with ImageJ (v1.53) Each band signal was quantified and normalized with ACTB signal to control for differences in loading The genes encoding each NOVA1 protein (NOVA1wt and NOVA1hu) were cloned into the pGEX6p1 vector and expressed in E N-terminally GST (Glutathione S-Transferase) fused NOVA1 was induced by the addition of IPTG (final conc and then incubated in the presence of Triton-X (final conc Cleared supernatant was collected after centrifugation (12,000 x g 10 min 4 °C) After incubating with Glutathione Sepharose beads (GE Healthcare Biosciences the mixture was washed three times with PBS The GST tag was cleaved from the NOVA1 protein by PreScission Protease treatment (GE Healthcare 4 °C for 4 h) to obtain purified NOVA1 protein The concentration of each purified NOVA1 protein was determined by SDS-PAGE followed by GelCode Blue staining (Thermo Fisher Scientific The single-stranded RNA probe was designed as previously41 The following single strand RNA were synthesized by IDT: CCTTATCATGCTGACTCACGTCATTTCATCTCATCAAGGGAGTCAGTGGGATA Synthesized RNA was first incubated at 80 °C for 10 min and labeled at the 5’ end by T4 polynucleotide kinase treatment (New England BioLabs The labeled probes were purified by G-25 column (VWR 95017-621) and diluted to the appropriate concentration with water or dissected midbrain at E18.5 of Nova1hu/hu and Nova1wt/wt mice The mRNA-seq library was prepared from RNA extracted with Trizol following the Illumina TruSeq protocol of polyA selection Multiplex libraries were sequenced as 125 nt paired-end runs on the HiSeq-2500 platform at Rockefeller University Genomic Core These raw datasets and processed data files have been deposited with Gene Expression Omnibus (GSE253297) NOVA1-CLIP was performed in P21 dissected cortex midbrain and cerebellum of Nova1hu/hu and Nova1wt/wt using each three biological replicates triturated using 20 G needle and crosslinked three times on ice for 400 mJ/cm2 using Stratalinker Crosslinked material was collected by centrifugation 0.5% deoxycholate and 0.1% SDS with protease inhibitor) and subjected to DNase (RQ1 DNase: Promega) and RNase (RNase A: Affymetrix) treatment at a final dilution of 1:20,000 for 5 min The lysate was clarified by centrifugation at 20,000 × g for 20 min The supernatant was used for immunoprecipitation with 200 μL of Protein A Dynabeads (Invitrogen) loaded with 18 μg anti-Nova1 antibody (abcam) for 2 h at 4 °C The samples were washed as follows: twice with wash buffer twice with Nelson stringent wash buffer (15 mM Tris pH 7.4 twice with Nelson high salt buffer (15 mM Tris pH 7.4 twice with Nelson low salt buffer (15 mM Tris pH 7.4 and twice with PNK wash buffer (50 mM Tris pH 7.4 RNA fragments were dephosphorylated using FastAP Alkaline phosphatase (Thermo Fisher Scientific) and subjected to 3′ ligation overnight at 16 °C with a pre-adenylated linker (preA-L32) using truncated KQ T4 RNA Ligase2 (NEB) The RNA-protein complexes were labeled with 32P-γ-ATP using T4 PNK (NEB) and subjected to SDS-PAGE and transfer to nitrocellulose membrane Appropriate regions of the membrane were cut out and RNA was extracted according to the following conditions: 100 mM Tris PH7.5 RNA was purified by phenol-chloroform extraction method Cloning was performed using the BrdU-CLIP protocol the reverse transcription reaction was performed using Superscript III (Thermo Fisher Scientific) and the cDNA was BrdU-labeled by including BrdU in the reaction solution Immunoprecipitation was performed with 5 μg anti-BrdU antibody (abcam) and 25 μg protein G Dynabeads per reaction (45 min at room temperature) followed by washing with the following solutions (including Denhardt’s solution): once with IP buffer (0.3x SSPE BrdU-immunoprecipitation was performed again under the same conditions cDNA was circularized on beads using CircLigase II (Epicentre) and PCR was performed using Accuprime Pfx supermix (Thermo Fisher Scientific) and Syber Green until RFU 250–500 PCR products were purified using Agencourt AMPure XP (Beckman Coulter) and concentrations were measured by TapeStation High-throughput sequencing was performed at the Rockefeller University Genome Resource Center These raw datasets and processed data files have been deposited with Gene Expression Omnibus (GSE253296) The data set for Nova1 knockout mouse (E18.5 midbrain) was kindly provided by Dr The data are available from GEO submission GSE69711 Data visualizations were done using R (v4.2.0) Correlation matrix was visualized using corrplot package PCA analysis was performed using FactoMineR and factoextra packages and visualized using ggplot2 package Sequencing tracks were visualized using Integrative Genomic Viewer (IGV De novo motif analysis and motif density analysis were done using findMotifsGenome.pl and annotatePeaks.pl commands in HOMER (v4.11) 7-day-old pups were isolated from their mother and littermates Each pup was placed quietly on a small open-faced plastic plate in the sound attenuating chamber (15” × 24” × 12” Igloo® beach cooler with a tube for pumped air circulation input An ultrasonic microphone was suspended a small distance from the pup the recording box was cleaned with 70% alcohol and distilled water and allowed to fully dry before the next experiments Vocalizations were recorded with UltraSoundGateCM16/CMPA ultrasonic microphones connected to an Ultrasound Gate USGH amplifier Recordings were saved using the AvisoftRecorderUSG software (Sampling frequency: 250 kHz; FFT-length: 1024 points; 16-bits) All acoustic hardware was obtained from Avisoft Bioacoustics® (Berlin Mothers rearing 7-day-old offspring were used in the playback experiment We used a three-chamber box (12” × 23.5” × 15.5”) connected by a passageway through which a mouse could pass for the test Each chamber at both ends was equipped with a speaker (Vifa ultrasonic speaker FLIR) was placed on the ceiling of the chamber to record the behavior of the mouse The speakers were connected to an UltraSoundGate Player 216H (Avisoft Bioacoustics) using Avisoft Recorder USGH and had a frequency range (±12 dB as the maximum deviation from the average sound volume) of 25–125 kHz We adjusted the loudness between the channels by controlling the level of the peak power before the experiment we made sure that both songs were audible at the entrance of both rooms so that the mother can respond to the songs but not loud enough that the microphones could detect the song being played in the other room playbacks were triggered when the mouse broke an infrared sensor located in the center of the three-chamber box One speaker on one side played one pup-USV recording and the other speaker simultaneously played another pup-USV recording both of which were previously recorded during the pup isolation induced USV test for 5 min Pup-USV recording was prepared in Audacity® by stitching vocalizations from 4-5 pups for each genotype These recording files contained an equivalent number of pup-USVs (Nova1wt/wt 1689 USVs Nova1hu/hu 1561 USVs) and were confirmed to reflect the vocal characteristics of each genotype a second 5 min playback session was conducted after 1 min quiet period the two recordings playing from the speakers were switched to eliminate the possible preference by the location the mother was allowed to explore freely in the box and the time she spent in each room was counted The box was cleaned between experiments with 70% alcohol and distilled water and allowed to fully dry before the next experiments and the male mice housed in the same cage until the test day the males were placed in a new cage and then singly habituated in the sound recording environment (as described for pup USV test) for 15 min The males were then exposed to adult female mice for 5 min We used the females (8-12 weeks old) in estrus (selected visually for wide vaginal opening and pink surrounding) The test was conducted three times per mouse and a different female mouse was used as the stimulus each time to avoid familiarity effects The order of mice tested each time was shuffled to avoid the possible order effects the mouse cage was cleaned with 70% alcohol and distilled water where μ and σ are the center and standard deviation of each Gaussian Cutoff Fqmax in pup USVs between low and high USVs were defined as the intercept of the two Gaussian fits to the distribution to the nearest frequency (kHz) Cutoff durations between short and long USVs were defined as the intercept of the two Gaussian fits to the USV duration distribution to the nearest millisecond Statistical analysis was performed by pairwise Wilcoxon rank sum tests with correction (Bonferroni) for multiple comparisons between genotypes Correction was not applied for the parameters in call structure This is because individual properties are assumed to be related to each other which increases type 2 errors caused by overcorrection The tests were performed using the elevated revolving rod (Stoelting Mice were placed on the apparatus and habituated for few minutes The rod accelerated at a constant rate (4 to 40 rpm in 300 s) and the time it took the animals to fall was recorded Tests were performed three times and the average value was calculated Statistical analysis was performed by Wilcoxon rank sum test The Y-maze tests were conducted according to the described procedure130 The tests were performed in a Y-maze with three arms of equal length at 120° angles to each other (Stoelting Mice were placed in the center of the maze and has free access to all three arms If the animal chooses an arm different from the arm it arrived in This is considered a correct response; conversely returning to the previous arm is considered an error The number of times and the order in which the animals entered the arms are recorded and used to calculate the alternation rate The behavior of the mice was recorded for 8 min The sequencing data of three Neanderthal genomes were obtained from The Draft Neanderthal Genome Project (https://www.ebi.ac.uk/ena/browser/view/PRJEB2065). The sequencing data of Denisovan genome accompanied with nine modern human genomes were obtained from Denisovan Genome Project (http://cdna.eva.mpg.de/denisova/) The fastq files of each sample were aligned to the human genome (hg19) by Burrows-Wheeler Aligner (BWA The aligned SAM files were processed into BAM files by Picard (v2.18.7) and Genome Analysis Toolkit (GATK Variant calling for each sample was processed with Mutect2 of GATK4 The variants were annotated by using ANNOVAR (v2) The variants in modern human populations were obtained from ExAC database (v0.3.1) (https://gnomad.broadinstitute.org/downloads) which contains 60,706 exomes mapped to hg19 the variants in NOVA1 loci (chromosome 14: 26912296 ~ 27067239) of Neanderthal and modern human populations were subsetted by using bcftools (v1.19) The frequency of minor alleles for each position in NOVA1 loci was calculated Information of statistical methods and the number of biological replicates in the analysis are in the figure legends and methods section of each analysis as appropriate Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article Revising the human mutation rate: implications for understanding human evolution Deciphering African late middle Pleistocene hominin diversity and the origin of our species Evolution of vocal learning and spoken language Evidence of a vocalic proto-system in the Baboon (Papio papio) suggests pre-hominin speech precursors Evolutionary loss of complexity in human vocal anatomy as an adaptation for speech A high-coverage genome sequence from an archaic Denisovan individual The complete genome sequence of a Neanderthal from the Altai Mountains A high-coverage Neandertal genome from Vindija Cave in Croatia No evidence for recent selection at FOXP2 among diverse human populations Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment Comment on ‘Human TKTL1 implies greater neurogenesis in frontal neocortex of modern humans than Neanderthals’ A forkhead-domain gene is mutated in a severe speech and language disorder Identification of FOXP2 truncation as a novel cause of developmental speech and language deficits A Foxp2 mutation implicated in human speech deficits alters sequencing of ultrasonic vocalizations in adult male mice Knockout of Foxp2 disrupts vocal development in mice A humanized version of Foxp2 affects cortico-basal ganglia circuits in mice A humanized version of Foxp2 does not affect ultrasonic vocalization in adult mice A humanized version of Foxp2 affects ultrasonic vocalization in adult female and male mice The derived FOXP2 variant of modern humans was shared with Neandertals Human TKTL1 implies greater neurogenesis in frontal neocortex of modern humans than Neanderthals is homologous to an RNA-binding protein and is specifically expressed in the developing motor system The human pancreatic islet transcriptome: expression of candidate genes for type 1 diabetes and the impact of pro-inflammatory cytokines Nova1 is a master regulator of alternative splicing in pancreatic beta cells NOVA1 prevents overactivation of the unfolded protein response and facilitates chromatin access during human white adipogenesis Paraneoplastic syndromes involving the nervous system The neuronal RNA-binding protein Nova-2 is implicated as the autoantigen targeted in POMA patients with dementia Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability CLIP identifies Nova-regulated RNA networks in the brain HITS-CLIP yields genome-wide insights into brain alternative RNA processing Nova regulates GABA(A) receptor gamma2 alternative splicing via a distal downstream UCAU-rich intronic splicing enhancer NOVA1 acts on Impact to regulate hypothalamic function and translation in inhibitory neurons Common molecular pathways mediate long-term potentiation of synaptic excitation and slow synaptic inhibition Integrative modeling defines the Nova splicing-regulatory network and its combinatorial controls Response to comment on ‘Reintroduction of the archaic variant of NOVA1 in cortical organoids alters neurodevelopment’ Efficient high-precision homology-directed repair-dependent genome editing by HDRobust The neuronal RNA binding protein Nova-1 recognizes specific RNA targets in vitro and in vivo The neuronal splicing factor nova co-localizes with target RNAs in the dendrite NOVA2-mediated RNA regulation is required for axonal pathfinding during development Sequence-specific RNA binding by a Nova KH domain: implications for paraneoplastic disease and the fragile X syndrome Crystal structures of Nova-1 and Nova-2 K-homology RNA-binding domains Molecular population genetics of sequence length diversity in the Adh region of Drosophila pseudoobscura Statistical tests for detecting positive selection by utilizing high-frequency variants Mapping gene flow between ancient hominins through demography-aware inference of the ancestral recombination graph Fast and accurate estimation of selection coefficients and allele histories from ancient and modern DNA marks gastric atrophy and shows evidence of adaptive gene loss in humans A point mutation in the FMR-1 gene associated with fragile X mental retardation Essential role for KH domains in RNA binding: impaired RNA binding by a mutation in the KH domain of FMR1 that causes fragile X syndrome a female germ cell-specific tumor suppressor gene in Caenorhabditis elegans affect a conserved domain also found in Src-associated protein Sam68 The onconeural antigen Nova-1 is a neuron-specific RNA-binding protein the activity of which is inhibited by paraneoplastic antibodies Binding of a novel SMG-1-Upf1-eRF1-eRF3 complex (SURF) to the exon junction complex triggers Upf1 phosphorylation and nonsense-mediated mRNA decay Human Upf1 is a highly processive RNA helicase and translocase with RNP remodelling activities The mechanism of eukaryotic translation initiation and principles of its regulation Mammalian WTAP is a regulatory subunit of the RNA N6-methyladenosine methyltransferase A METTL3–METTL14 complex mediates mammalian nuclear RNA N6-adenosine methylation From unwinding to clamping—the DEAD box RNA helicase family Differential NOVA2-mediated splicing in excitatory and inhibitory neurons regulates cortical development and cerebellar function Protein-RNA and protein-protein recognition by dual KH1/2 domains of the neuronal splicing factor Nova-1 Nova autoregulation reveals dual functions in neuronal splicing Microstimulation in different parts of the periaqueductal gray generates different types of vocalizations in the cat Stimulation of the midbrain periaqueductal gray modulates preinspiratory neurons in the ventrolateral medulla in the rat in vivo The midbrain periaqueductal gray control of respiration Brain stem integration of vocalization: role of the midbrain periaqueductal gray Anatomical study of the final common pathway for vocalization in the cat Integrated defence reaction elicited by excitatory amino acid microinjection in the midbrain periaqueductal grey region of the unrestrained cat Flight and immobility evoked by excitatory amino acid microinjection within distinct parts of the subtentorial midbrain periaqueductal gray of the cat The emotional motor system and micturition control GABAergic control of micturition within the periaqueductal grey matter of the male rat The role of the periaqueductal grey in vocal behaviour The midbrain periaqueductal gray as an integrative and interoceptive neural structure for breathing The contribution of periaqueductal gray in the regulation of physiological and pathological behaviors Genome-wide association studies establish that human intelligence is highly heritable and polygenic highly polygenic and associated with FNBP1L CTag-PAPERCLIP reveals alternative polyadenylation promotes cell-type specific protein diversity and shifts Araf isoforms with microglia activation The human language-associated gene SRPX2 regulates synapse formation and vocalization in mice Sociability and synapse subtype-specific defects in mice lacking SRPX2 AUTS2 regulation of synapses for proper synaptic inputs and social communication Truncating mutations in NRXN2 and NRXN1 in autism spectrum disorders and schizophrenia Regulated intron removal integrates motivational state and experience Mouse vocal communication system: are ultrasounds learned or innate The neural control of vocalization in mammals: a review Midbrain periaqueductal gray and vocal patterning in a teleost fish The neurobiology of vocal communication in marmosets Discrete subregions of the rat midbrain periaqueductal gray project to nucleus ambiguus and the periambigual region Effects of midbrain lesions on lordosis and ultrasound production Role of the periaqueductal grey in vocal expression of emotion The effects of brainstem lesions on vocalization in the squirrel monkey A specialized neural circuit gates social vocalizations in the mouse Ultrasonic vocalisation emitted by infant rodents: a tool for assessment of neurobehavioural development Neonatal behaviors associated with ultrasonic vocalizations in mice (mus musculus): a slow-motion analysis Functional ontogeny of hypothalamic Agrp neurons in neonatal mouse behaviors Male mice song syntax depends on social contexts and influences female preferences Development of social vocalizations in mice Detecting Bimodality in Astronomical Datasets mixtools: an R package for analyzing mixture models mothers rush: does maternal responsiveness affect the amount of ultrasonic vocalizations in mouse pups Differences in patterns of pup care in Mus musculus domesticus Effects of previous experience and parity in XLII inbred mice Ultrasonic vocalizations in mouse models for speech and socio-cognitive disorders: insights into the evolution of vocal communication Chabout, J., Jones-Macopson, J. & Jarvis, E. D. Eliciting and analyzing male mouse ultrasonic vocalization (USV) songs. J. Vis. Exp. https://doi.org/10.3791/54137 (2017) Waidmann, E. N., Yang, V. H. Y., Doyle, W. C. & Jarvis, E. D. Mountable miniature microphones to identify and assign mouse ultrasonic vocalizations. bioRxiv 2024.02.05.579003 https://doi.org/10.1101/2024.02.05.579003 (2024) The temporal organization of mouse ultrasonic vocalizations Quantifying ultrasonic mouse vocalizations using acoustic analysis in a supervised statistical machine learning framework and men: the mouse ultrasonic song system has some features similar to humans and song-learning birds Longer metaphase and fewer chromosome segregation errors in modern human than Neanderthal brain development Reduced purine biosynthesis in humans after their divergence from Neandertals Ultraconserved elements in the human genome GC-biased gene conversion drives accelerated evolution of ultraconserved elements in mammalian and avian genomes Bayesian inference of ancient human demography from individual genome sequences A novel reticular node in the brainstem synchronizes neonatal mouse crying with breathing A functionally and anatomically bipartite vocal pattern generator in the rat brain stem Large-scale mapping of vocalization-related activity in the functionally diverse nuclei in rat posterior brainstem Social cognition and the evolution of language: constructing cognitive phylogenies Vocal labeling of others by nonhuman primates Social context increases ultrasonic vocalizations during restraint in adult mice Spatial organization of receptive fields in the auditory midbrain of awake mouse Ultrasonic emissions: Do they facilitate courtship of mice Female mice ultrasonically interact with males during courtship displays Ultrasonic vocalizations emitted during dyadic interactions in female mice: a possible index of sociability OLego: fast and sensitive mapping of spliced mRNA-Seq reads using small seeds Systematic discovery of regulated and conserved alternative exons in the mammalian brain reveals NMD modulating chromatin regulators edgeR: a Bioconductor package for differential expression analysis of digital gene expression data CLIP Tool Kit (CTK): a flexible and robust pipeline to analyze CLIP sequencing data Analysis of Mouse Vocal Communication (AMVOC): a deep analysis and classification of ultrasonic vocalisations The Y-maze for assessment of spatial working and reference memory in mice Statistical method for testing the neutral mutation hypothesis by DNA polymorphism Hitchhiking under positive Darwinian selection Wang, W. RockefellerUniversity/popgen_dbsnp: Popgen_v2. (Zenodo, 2024). https://doi.org/10.5281/ZENODO.14367749 Download references Yuhki Saito for providing guidance on the method and analysis of the transcriptome experiments J Lomax Boyd for assistance in the design of the playback experiments We are deeply grateful to the Rockefeller University Resource Centers: the CRISPR and genome editing center the Transgenic and Reproductive Technology Center and the Genomics Resource Center David Reich for critical review and constructive comments as well as members of the Darnell lab for discussions of the manuscript Japan Society for the Promotion of Science postdoctoral fellowship for research abroad (J.S.P.S.) (YT) NIH Awards NINDS Outstanding Investigator Award R35NS097404 (R.B.D.) Keck Foundation Award and NIH Transformative Research Award R01DC018691 (E.D.J.) are Howard Hughes Medical Institute Investigators This research was supported by US National Institutes of Health grant R35-GM127070 (to A.S.) and the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory The content is solely the responsibility of the authors and does not necessarily represent the official views of the US National Institutes of Health The Laboratory of Molecular Neuro-oncology The Laboratory of Neurogenetics of Language The Laboratory of Biochemistry and Molecular Biology Conceptualization: R.B.D.; Methodology: Y.T.; Investigation: Y.T.; Statistics on population genetics: J.D.L. A.S.; Visualization: Y.T.; Funding acquisition: Y.T. Download citation DOI: https://doi.org/10.1038/s41467-025-56579-2 Metrics details A Correction to this article was published on 01 April 2025 The etiology of congenital heart disease (CHD) is complex comprising both genetic and environmental factors the genetic etiology remains largely elusive Trio exome sequencing identified a heterozygous FLT4 splice site variant in two families with respectively tetralogy of Fallot (TOF) and variable CHD comprising both the TOF spectrum and aortic coarctation Sanger sequencing on cDNA confirmed aberrant splicing for the c.985+1G > A variant transcriptome sequencing uncovered altered splicing for the c.1657+6T > C variant our study establishes FLT4 splice site variants as a molecular cause of both left and right-sided isolated CHD RNA-sequencing emerges as a valuable technique in unraveling the missing inheritability of CHD Prices may be subject to local taxes which are calculated during checkout Data sharing is not applicable to this article as no datasets were generated or analyzed during the current study All variants have been submitted to the Clinvar database and can be accessed using the following accession numbers: SCV005407801 for NM_182925.5: c.1657+6T > C and SCV005407803 for NM_182925.5: c.985+1G > A The original online version of this article was revised: Author Tim Van Damme’s name was incorrectly written as Tim Vandamme A Correction to this paper has been published: https://doi.org/10.1038/s41431-025-01831-y Moons P, Sluysmans T, De Wolf D, Massin M, Suys B, Benatar A, et al. Congenital heart disease in 111 225 births in Belgium: birth prevalence, treatment and survival in the 21st century. Acta Paediatr. 2009;98:472–7. https://doi.org/10.1111/J.1651-2227.2008.01152.X Jang MY, Patel PN, Pereira AC, Willcox JAL, Haghighi A, Tai AC, et al. Contribution of previously unrecognized RNA splice-altering variants to congenital heart disease. Circ Genom Precis Med. 2023;16:224. https://doi.org/10.1161/CIRCGEN.122.003924 Lambrechts D, Devriendt K, Driscoll DA, Goldmuntz E, Gewillig M, Vlietinck R, et al. Low expression VEGF haplotype increases the risk for tetralogy of Fallot: a family based association study. J Med Genet. 2005;42:519–22. https://doi.org/10.1136/JMG.2004.026443 Škorić-Milosavljević D, Lahrouchi N, Bosada FM, Dombrowsky G, Williams SG, Lesurf R, et al. Rare variants in KDR, encoding VEGF receptor 2, are associated with tetralogy of Fallot. Genet Med. 2021;23:1952–60. https://doi.org/10.1038/s41436-021-01212-y Kawasaki T, Kitsukawa T, Bekku Y, Matsuda Y, Sanbo M, Yagi T, et al. A requirement for neuropilin-1 in embryonic vessel formation. Development. 1999;126:4895–902. https://doi.org/10.1242/DEV.126.21.4895 Stalmans I, Lambrechts D, De Smet F, Jansen S, Wang J, Maity S, et al. VEGF: a modifier of the del22q11 (DiGeorge) syndrome? Nat Med. 2003;9:173–82. https://doi.org/10.1038/nm819 Page DJ, Miossec MJ, Williams SG, Monaghan RM, Fotiou E, Cordell HJ, et al. Whole exome sequencing reveals the major genetic contributors to non-syndromic tetralogy of Fallot Europe PMC Funders Group. Circ Res. 2019;124:553–63. https://doi.org/10.1161/CIRCRESAHA.118.313250 Jin SC, Homsy J, Zaidi S, Lu Q, Morton S, Depalma SR, et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart disease probands. Nat Genet. 2017. https://doi.org/10.1038/ng.3970 Tabib A, Talebi T, Ghasemi S, Pourirahim M, Naderi N, Maleki M, et al. A novel stop-gain pathogenic variant in FLT4 and a nonsynonymous pathogenic variant in PTPN11 associated with congenital heart defects. Eur J Med Res. 2022;27:286. https://doi.org/10.1186/s40001-022-00920-8 Reuter MS, Jobling R, Chaturvedi RR, Manshaei R, Costain G, Heung T, et al. Haploinsufficiency of vascular endothelial growth factor related signaling genes is associated with tetralogy of Fallot. Genet Med. 2018;21:1001–7. https://doi.org/10.1038/s41436-018-0260-9 Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q.Genome Aggregation Database Consortium et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:19. https://doi.org/10.1038/s41586-020-2308-7 Richards S, Aziz N, Bale S, Bick D, Das S, et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet Med. https://doi.org/10.1038/gim.2015.30 den Dunnen JT, Dalgleish R, Maglott DR, Hart RK, Greenblatt MS, Mcgowan-Jordan J, et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum Mutat. 2016;37:564–9. https://doi.org/10.1002/HUMU.22981 Martin FJ, Amode MR, Aneja A, Austine-Orimoloye O, Azov AG, Barnes I, et al. Ensembl 2023. Nucleic Acids Res. 2023;51:D933–41. https://doi.org/10.1093/NAR/GKAC958 Ttir H, Robinson JT, Mesirov JP. Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief Bioinform. https://doi.org/10.1093/bib/bbs017 Paysan-Lafosse T, Blum M, Chuguransky S, Grego T, Azaro Pinto BL´, Salazar GA, et al. InterPro in 2022. Nucleic Acids Res. 2023;51. https://doi.org/10.1093/nar/gkac993 Gordon K, Spiden SL, Connell FC, Brice G, Cottrell S, Short J, et al. FLT4/VEGFR3 and Milroy disease: novel mutations, a review of published variants and database update. Hum Mutat. 2012. https://doi.org/10.1002/humu.22223 Monaghan RM, Naylor RW, Flatman D, Kasher PR, Williams SG, Keavney BD. FLT4 causes developmental disorders of the cardiovascular and lymphovascular systems via pleiotropic molecular mechanisms. Cardiovasc Res. 2024. https://doi.org/10.1093/cvr/cvae104 Fontana F, Haack T, Reichenbach M, Knaus P, Puceat M, Abdelilah-Seyfried S. Antagonistic activities of Vegfr3/Flt4 and notch1b fine-tune mechanosensitive signaling during zebrafish cardiac valvulogenesis. Cell Rep. 2020;32. https://doi.org/10.1016/J.CELREP.2020.107883 Truty R, Ouyang K, Rojahn S, Garcia S, Colavin A, Hamlington B, et al. Spectrum of splicing variants in disease genes and the ability of RNA analysis to reduce uncertainty in clinical interpretation. Am J Hum Genet. 2021;108:696. https://doi.org/10.1016/J.AJHG.2021.03.006 Wai HA, Lord J, Lyon M, Gunning A, Kelly H, Cibin P, et al. Blood RNA analysis can increase clinical diagnostic rate and resolve variants of uncertain significance. Genet Med. 2020;22. https://doi.org/10.1038/s41436 Download references The authors thank the families for their kind availability in sharing the findings within the scientific community This project was supported by a grant “Scientific research on heart diseases (2023)” from the Philanthropic Center Pelicano to BC and by a Research Grant of the Research Foundation—Flanders (G035620N) to BC BC is a senior clinical investigator of the Research Foundation—Flanders Kristof Vandekerckhove & Joseph Panzer and SV; writing—original draft preparation All authors have read and agreed to the published version of the manuscript This study was conducted in accordance with the 1984 Declaration of Helsinki and its subsequent revisions The legal guardians of the individuals involved in this study provided written informed consent for the disclosure of case details a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law Download citation DOI: https://doi.org/10.1038/s41431-025-01788-y Metrics details The approval of splice-switching oligonucleotides with phosphorodiamidate morpholino oligomers (PMOs) for treating Duchenne muscular dystrophy (DMD) has advanced the field of oligonucleotide therapy PMOs encounter challenges such as poor tissue uptake thereby affecting patient’s prognosis and quality of life we have developed a PMOs-based heteroduplex oligonucleotide (HDO) technology This innovation involves a lipid-ligand-conjugated complementary strand hybridized with PMOs significantly enhancing delivery to key tissues in mdx mice and serum creatine kinase by restoring internal deleted dystrophin expression PMOs-based HDOs normalized cardiac and CNS abnormalities without adverse effects Our technology increases serum albumin binding to PMOs and improves blood retention and cellular uptake Here we show that PMOs-based HDOs address the limitations in oligonucleotide therapy for DMD and offer a promising approach for diseases amenable to exon-skipping therapy however since PMO cannot cross the blood-brain barrier (BBB) it is not expected to have a therapeutic effect on these symptoms we have developed a new type of HDO with a PMO in place of the gapmer-type ASO and a different intracellular mechanism from that of conventional HDOs We also assessed whether PMO/HDO improves treatment efficacy using a dystrophic mdx mouse model a Structure of phosphorodiamidate morpholino oligomers (PMOs) duplexed with lipid ligand (tocopherol (Toc) or cholesterol (Cho))-conjugated complementary strand b Confirmation of annealing between PMOs and the complementary strand with lipid ligands electrophoresed on a 16% acrylamide gel d) Pharmacokinetics of PMO after intravenous injection of a single 100 mg/kg (11.88 μmol/kg) PMO dose or molar equivalent of Toc-HDO or Chol-HDO The hybridization-based ELISA shows the pharmacokinetic (c) and biodistribution (d) data in mdx mice (n = 4) injected with PMO lipid conjugated PMO/HDO with mouse albumin HDOs showed highly significant enhancements in the binding affinity for albumin for which the parent PMO showed no affinity PMO conjugated directly with cholesterol could be synthesized thereby preventing their administration to mdx mice This experience highlighted the benefit of attaching lipids to PMO with a complementary strand a Timeline of PMO or HDO administration and animal sacrifice b Detection of exon 23-skipped dystrophin mRNA in the heart and skeletal muscles of mdx mice 2 weeks after once-weekly systemic intravenous (IV) injections for a total of 1 or Chol-HDO at a dose equimolar to PMO (11.88 μmol/kg) (n = 4–9 per group) Data are presented as mean ± S.E.M c Time course of exon 23-skipped dystrophin mRNA 14 or 112 days after five injections of PMO or Chol-HDO (11.88 μmol/kg) in indicated muscles of mdx mouse (n = 4–9 per group) a Images of dystrophin immunostaining (red) in indicated muscles 2 weeks after the fifth PMO dose (100 mg/kg) or Chol-HDO at an equimolar dose to PMO (11.88 μmol/kg) b Quantification of dystrophin-expressing fibers (%) to the total number of fibers in the indicated tissues (n = 4 per group) and (d) percentage of centrally nucleated fibers (CNF) in quadriceps 2 weeks after five weekly injections of phosphate-buffered saline (PBS) e Representative images of caveolin 3 immunostaining (red) in quadriceps femoris (QF) counterstained with DAPI (blue) to evaluate CNF and CSA f Western blot showing robust dystrophin expression in the heart and QF from HDO-treated mice Data are presented as mean ± S.E.M and were analyzed using one-way analysis of variance followed by Tukey’s tests (b–d) Production of dystrophin was confirmed using western blot analysis (Fig. 3f) We found that PMO/HDOs restored markedly higher levels of dystrophin than PMOs in the heart and QF compared with those in wild-type B10 control mice a Serum Creatine Kinase (CK) levels in mice injected once weekly for a total of 5 doses with PBS or HDOs (11.88 μmol/kg) Serum CK levels are reduced after treatment with Toc- or Chol-HDO correlating with the levels of dystrophin restoration (n = 5-9 per group) b Forelimb grip test (n = 9–15 per group) and (c) treadmill test (n = 4–7 per group) performances evaluated in mdx mice injected once weekly for a total of 5 doses with PBS or HDOs (11.88 μmol/kg) ECG abnormalities observed in mdx are prevented in the treated mdx mice (d) QTc and (e) QRS duration (n = 6–11 per group) f Quantification of the heart fibrosis-stained regions in the left ventricle (n = 4 per group) Transverse sections revealing the level of the papillary muscles 8 weeks (2 months) after once weekly for a total of 5 doses of PBS or HDOs (11.88 μmol/kg) g Representative images of the heart after Masson’s trichrome staining in the left ventricle Data are presented as mean ± S.E.M and were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (a–f) Chol-HDO treatment normalized both functions of mdx mice to the level of those in B10 mice a Detection of exon 23-skipped dystrophin mRNA in the whole brain of mdx mice 2 weeks after once weekly for a total of 5 doses of PMO or Chol-HDO (11.88 μmol/kg) (n = 4 per group) b Duration of tonic immobility (freezing) expressed as a percentage of freezing time (n = 4–6 per group) c Total horizontal movement distance traveled (distance run in 10 min) (n = 4–6 per group) d Representative trajectory diagram of B10 and mdx mice treated with PBS a–c And were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (b a Detection of exon 23-skipped dystrophin mRNA in the indicated muscle of mdx mice after five weekly SC injections of Chol-HDO (11.88 μmol/kg) (n = 4–7 per group) b Serum CK levels in mice with five weekly SC injections of Chol-HDO (11.88 μmol/kg) (n = 4–9 per group) c Treadmill test (n = 4–7 per group) and (d) forelimb grip test (n = 4–13 per group) were also evaluated in mdx mice subcutaneously injected five times with Chol-HDO (11.88 μmol/kg) Data are presented as mean ± S.E.M (a–d) and were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (b–d) a Structure of the cholesterol-conjugated complementary strand DNA gap and 2’-OMe gap b Detection of exon 23-skipped dystrophin mRNA in the indicated tissues of mdx mice 2 weeks after weekly injections for a total 5 doses of Chol-HDO (11.88 μmol/kg) with DNA gap or 2’-OMe gap (n = 4–6 per group) c Serum CK levels (n = 4–9 per group) and (d) forelimb grip test (n = 4–13 per group) results in mice treated with Chol-HDO (11.88 μmol/kg) with DNA gap or 2’-OMe gap b–d and were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests (c small foci of inflammatory cell infiltration were observed in the liver parenchyma of mdx mice treated with any of the interventions No other significant lesions were observed in mdx mice treated with Chol-HDO there was occasional increased size heterogeneity of hepatocyte nuclei no significant lesions were noted in mdx mice following treatment with PBS or Chol-HDO mdx mice treated with Toc-HDO occasionally showed a slight increase in cellular density in glomeruli the normalization of cardiac dysfunction with robust expression of dystrophin in the heart of mdx mice indicate a potential for improved prognosis in patients with DMD The present results show that Chol-HDO improved freezing and the movement distance traveled in response to restraint in the mdx mice This suggests that the expression of dystrophin contributes to this normalization not only in the skeletal muscle but also in the CNS Improving the freezing behavior could be associated with improving CNS/psychological symptoms in patients with DMD the DLS results showed that no particle formation was observed in PMO/HDO Cholesterol and tocopherol-conjugation increased PMO delivery to the liver and kidneys such as altered serum hepatic or renal function indices or adverse clinical outcomes were observed during the course of our experiments (up to 4 months after the last injection) even following administration of multiple high doses (100 mg/kg; 11.88 μmol/kg) SC administration of PMO/HDO induced a slightly weaker skipping effect than IV administration; however improved functioning was observed with the former long-term SC administration is expected to have higher efficacy and provides the option of self-administration these results indicate that HDO technology may represent a new avenue for novel exon-skipping drugs for DMD and other multisystemic disorders The basic concept of HDO originally assumed that the complementary strand is cleaved by RNase H in the cell. However, as shown in Supplementary Fig. 1A the complementary strand of this new type of HDO might be cleaved by other RNases DNase would cleave the complementary strand in case of use natural DNA instead of natural RNA in center portion of the complementary strand PMO/HDOs consisting of a complementary strand fully composed of 2’OMe modifications showed no in vivo skipping activity likely because the complementary strand was not cleaved owing to the high resistance of 2’-OMe RNA to nucleases the increased skipping efficiency achieved by single dosing may not correspond with the extreme increase in PMO concentration (100–150-fold) within the muscles of mice treated with PMO/HDO We initially postulated that the complementary strand separation was poor most of the complementary strand was likely already separated from the PMO/HDO in the muscle tissue since ISH and HELISA use the complementary strand of the PMO sequence as a probe and binds only to single-stranded PMO the endosomal escape of PMO from PMO-HDO may be inefficient increased delivery into necrotic fibers might be unproductive PMO was distributed in normal-sized muscle fibers but was highly abundant especially in necrotic fibers and small-diameter fibers that appeared to be regenerating fibers necrotic and regenerating fibers were relatively absent suggesting that long-term administration may decrease the concentration of PMO in the QF as it was taken up by normal-sized muscle fibers ligands must be developed that will be preferentially taken up by normal-sized muscle fibers we have developed a new type of lipid-conjugated HDO using parent PMOs resulting in a functionally normal motor phenotype in a mouse model of DMD including the normalization of abnormalities in cardiovascular and behavioral symptoms Although further optimization of intracellular complementary strand cleavage is necessary these PMO/HDO properties make it particularly attractive as a treatment for patients with DMD and other genetic diseases affecting the heart who are eligible for exon-skipping therapy All complementary strands for the experiment were synthesized by GeneDesign (Osaka PMOs target the donor splice site of exon 23 (+7–18) of the mouse dystrophin pre-mRNA All animals were maintained on a 12 h light/12 h dark cycle in a pathogen-free animal facility (temperature: 18–24 °C; humidity: 40–70%) with free access to food (CLEA Rodent Diet CE-2 6–8 week-old males) were injected intravenously in the retro-orbital sinus or subcutaneously once per week with AONs They were randomly assigned to experimental or control groups All studies were conducted in accordance with the ethical guidelines of Tokyo Medical and Dental University and in strict compliance with the Fundamental Guidelines for Proper Conduct of Animal Experiment and Related Activities in Academic Research Institutions as set forth by the Ministry of Education Approval for the experiments was granted by TMDU (Approval number A2022-085A) Experiments are in accordance with the ARRIVE guidelines All possible efforts were made to minimize the number of animals used and to alleviate their discomfort Each antisense oligonucleotide (AON) against exon 23 of the dystrophin gene was dissolved in PBS (stock concentrations: 2 mM) and 11.88 μmol/kg of each AON was injected into the retro-orbital sinus once weekly for a total of 1 mice were sacrificed under anesthesia with 4% isoflurane (Wako Total RNA was extracted from cells or muscle tissues using ISOGEN 2 (NIPPON GENE and 300 ng or 500 ng of total RNA was processed using the QIAGEN OneStep RT-PCR Kit (QIAGEN according to the manufacturer’s instructions The primer sequences were mEx22F 5’-ATCCAGCAGTCAGAAAGCAAA-3’ and mEx24R 5’-CAGCCATCCATTTCTGTAAGG-3’ for amplification from exons 22 to 24 The PCR conditions were 50 °C for 30 min and 95 °C for 15 min The PCR bands were analyzed using Bioanalyzer 2100 (Agilent and the resulting PCR bands were extracted using a QIAquick Gel extraction Kit (QIAGEN Nederland) for direct sequencing using an ABI 3100 (Thermo Fisher Scientific Skipping efficiency was calculated using the following formula: [(molality of skipped translation products) × 100% / (molality of skipped translation products + molality of unskipped translation products)] Ten-micrometer cryosections were cut from flash-frozen muscle using the Leica CM3050 S Germany) placed on MAS-coated glass slides (Matsunami Glass Industrial and blocked for 1 h with 5% goat serum (S-1000 Vector Laboratories USA) in PBS or mouse-on-mouse blocking buffer containing mouse IgG blocking reagent (#MKB-2213 Vector Laboratories) at room temperature (~25 °C) The tissues were then incubated with the following primary antibodies overnight at 4 °C: rabbit anti-dystrophin against C-terminus (ab15277 tissue sections were treated with secondary antibodies (Alexa Fluor 546 goat anti-mouse #A-11030 and Alexa Fluor 568 goat anti-rabbit #A-11011 Thermo Fisher Scientific) for 1 h (1:1000) Coverslips were mounted using VECTASHIELD Antifade Mounting Medium with 4’,6-diamidino-2-phenylindole (DAPI) (VECTOR H-1200) Centrally nucleated fibers and the myofiber cross-sectional area of QF were measured using HALO® Image Analysis (Indica labs In situ hybridization of the morpholino oligomer was performed using the miRNAscope® HD (RED) Assay Kit (Advanced Cell Diagnostics [ACD] Fresh-frozen QF muscles and heart tissues were sectioned (10 μm) using the Leica CM3050 S Germany) and placed on SuperFrost Plus slides (Thermo Fisher Scientific) Slides were fixed in 4% paraformaldehyde for 1 h at 4 °C and washed in 100% ethanol twice for 5 min each Sections were then incubated in hydrogen peroxide for 10 min at room temperature and washed in distilled water twice for 1 min each Protease IV treatment was applied to the tissues which were incubated in a chamber at room temperature for 30 min Slides were incubated with PMO sequence probes (SR-ASO-PMO-S1 Further amplification of the target probe signal was performed according to the manufacturer’s instructions (miRNAscope HD detection protocol Amp 1-6) Fast red was prepared by combining Red-A and Red-B (1:60) and incubated for 10 min at room temperature and imaged on SLIDEVIEW VS200 (Evident Co. the hybridization of probe to PMO was performed according to the miRCURY® LNA® miRNA ISH Optimazation Kits (FFPE) protocol (Qiagen The LNA-modified probe with overhang for signal amplification was designed and synthesized at Qiagen The sequence is 5-CTCTATATCTCCAACCCGAATTTCAGGTAAGCCGAGGTTT-3’ Slides were washed in 5X SSCT for 10 min and then incubated in amplification buffer (5X SSCT 10% low molecular weight dextran sulfate) for 30 min at room temperature Hybridization chain reaction was performed in amplification buffer containing 6 μmol/L hairpin amplifiers Slides were mounted in Prolong Diamond with DAPI (P36966 Sections were imaged on an STELLARIS 8 confocal microscope (Leica Microsystems Imaging analysis was conducted with Imaris (ver Proteins were extracted from sliced frozen muscle using SDS buffer (0.125 M Tris/HCl with pH 6.4 and 0.005% BPB) supplemented with 1X Protease Inhibitor (Complete Mini The normal control lysate from a B10 mouse was prepared as a reference for dystrophin expression Subject and normal control lysates were denatured at 100 °C for 3 min and electrophoresed in a Tris-acetate 3–8% gradient polyacrylamide gel (Thermo Fisher Scientific) at 150 V for 40 min The proteins were transferred to a PVDF membrane (Bio-Rad After incubation with 5% nonfat milk (NACALAI TESQUE the membrane was incubated at 4 °C overnight with an anti-dystrophin antibody (ab15277 The membrane was washed three times for 10 min each in TBST and incubated with a horseradish peroxidase-conjugated anti-rabbit (#111-035-003 1:10,000) antibodies (Jackson ImmunoResearch followed by six washes with TBST and allowed to develop with West Dura Extended Duration Substrate (Thermo Fisher Scientific) The immunoreactive bands were detected using the ChemiDoc XRS Image System (Bio-Rad Laboratories PMOs in the blood were quantified using sera from blood samples of treated mdx mice or age-matched samples homogenized in RIPA buffer (Thermo Fisher Scientific) and incubated with proteinase K (NACALAI TESQUE lysates were spun at maximum speed for 15 min to collect the supernatant Probes with complementary sequences to the PMOs used were synthesized and conjugated at the 5′ and 3′ ends with digoxigenin and biotin The first and last seven nucleotides of the probes were fully phosphorothioated PMO amounts were calculated in reference to a standard curve constructed from fluorescence values given by the respective PMO standards Muscle strength was measured using the forelimb grip test with a grip strength meter (MK-380CM/FM; Muromachi Kikai The average of three measurements per animal per time point was recorded for comparative analysis Running sessions were performed on a four-lane motorized treadmill equipped with electric shock (Treadmill for Rats and Mice Model MK-680 S; Muromachi Kikai Co. Ltd) at least 1 week after the last injection The treadmill was set at an inclination of 0° All mice were acclimated to the treadmill belt for 5 min before starting to walk and then forced to run at 5 m/min for 5 min the speed was increased by 1 m/min each minute The test was stopped when the mouse was exhausted or spent 5 continuous seconds on the shock grid This was quantified as the time the mouse moved <0.5 cm (2 cm) per second Unconditioned fear responses induced by this acute stress were characterized by periods of tonic immobility (freezing) during the 10 min recording period Body-surface electrocardiography (ECG) was performed in a blinded manner, as described previously63 ECG in lead II configuration was recorded using the PowerLab system (PowerLab 4/26 ADInstruments) under anesthesia with 1% isoflurane ECG parameters were obtained by averaging those from three different ECGs The QT interval was defined as an interval between the onset of the QRS complex and the end of the negative component of the T wave QTc was calculated using the following formula: QTc = QT interval (ms)/√(RR interval (s) × 10) Blood chemistry was assessed in the SRL Laboratory (Tokyo and the blood cell count was measured at LSI Medicine (Tokyo The size and size distribution of nanoparticles were determined via DLS using a Zetasizer Pro instrument (Malvern Instrument Ltd. The sample solutions were loaded into a low-volume cuvette (ZEN2112) and the measurements were carried out with a detection angle of 173° and a temperature of 25 °C PMO and PMO/HDO were labeled at the 5′ terminus of the PMO with Alexa Fluor 647 Binding measurements were conducted in 1X DPBS (Gibco) in flat-bottom non-binding 96-well plates (Corning Alexa 647-labeled PMO or PMO/HDO were added at a final concentration of 2 nM to solutions of albumin ranging from sub nM to mM concentrations Solutions were equilibrated at least 30 min before measuring fluorescence polarization (λex = 635 nm λem = 675 nm) on a Tecan InfiniteM1000 Pro (Baldwin Park The GraphPad Prism 9 software (version 9.5.0) and Microsoft Excel for Microsoft 365 MSO (version 2211) were used to analyze the data All numerical values were presented as mean ± standard error of the mean (SEM) Differences among more than three groups were analyzed using one-way analysis of variance followed by Tukey’s Kramer tests Statistical differences between two groups were analyzed using the Student’s one-tailed t-test Significant levels were set at *P < 0.05 Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article All data supporting the findings of this study are available within the paper and supplementary information files. Source data are provided with this paper Neonatal screening for Duchenne muscular dystrophy: a novel semiquantitative application of the bioluminescence test for creatine kinase in a pilot national program in Cyprus Diagnosis and management of Duchenne muscular dystrophy and pharmacological and psychosocial management Current and emerging treatment strategies for Duchenne muscular dystrophy and smooth muscle failure in Duchenne muscular dystrophy Function and genetics of dystrophin and dystrophin-related proteins in muscle NS-065/NCNP-01: an antisense oligonucleotide for potential treatment of exon 53 skipping in Duchenne muscular dystrophy Systemic administration of the antisense oligonucleotide NS-065/NCNP-01 for skipping of exon 53 in patients with Duchenne muscular dystrophy and efficacy of viltolarsen in boys With Duchenne muscular dystrophy amenable to exon 53 skipping: a phase 2 randomized clinical trial Eteplirsen for the treatment of Duchenne muscular dystrophy Increased dystrophin production with golodirsen in patients with Duchenne muscular dystrophy and pharmacokinetics of casimersen in patients with Duchenne muscular dystrophy amenable to exon 45 skipping: a randomized Viltolarsen in Japanese Duchenne muscular dystrophy patients: a phase 1/2 study Low dystrophin levels increase survival and improve muscle pathology and function in dystrophin/utrophin double-knockout mice Dose-dependent restoration of dystrophin expression in cardiac muscle of dystrophic mice by systemically delivered morpholino One-year treatment of morpholino antisense oligomer improves skeletal and cardiac muscle functions in dystrophic mdx mice Functional correction in mouse models of muscular dystrophy using exon-skipping tricyclo-DNA oligomers Cognitive dysfunction in Duchenne muscular dystrophy: a possible role for neuromodulatory immune molecules Control of backbone chemistry and chirality boost oligonucleotide splice switching activity Palmitic acid conjugation enhances potency of tricyclo-DNA splice switching oligonucleotides Antibody-oligonucleotide conjugates enter the clinic Enhanced exon skipping and prolonged dystrophin restoration achieved by TfR1-targeted delivery of antisense oligonucleotide using FORCE conjugation in mdx mice A cell-penetrating peptide enhances delivery and efficacy of phosphorodiamidate morpholino oligomers in mdx mice Peptide-conjugated oligonucleotides evoke long-lasting myotonic dystrophy correction in patient-derived cells and mice The endosomal escape vehicle platform enhances delivery of oligonucleotides in preclinical models of neuromuscular disorders DNA/RNA heteroduplex oligonucleotide for highly efficient gene silencing Cholesterol-functionalized DNA/RNA heteroduplexes cross the blood–brain barrier and knock down genes in the rodent CNS DNA/RNA heteroduplex oligonucleotide technology for regulating lymphocytes in vivo Development and application of an ultrasensitive hybridization-based ELISA method for the determination of peptide-conjugated phosphorodiamidate morpholino oligonucleotides Combined microRNA and mRNA detection in mammalian retinas by in situ hybridization chain reaction Triggered amplification by hybridization chain reaction and oxidative phosphorylation in mdx mouse muscular dystrophy Human dystrophin expression corrects the myopathic phenotype in transgenic mdx mice Functional rescue of dystrophin-deficient mdx mice by a chimeric peptide-PMO Truncated dystrophin ameliorates the dystrophic phenotype of mdx mice by reducing sarcolipin-mediated SERCA inhibition Myostatin propeptide gene delivery by adeno-associated virus serotype 8 vectors enhances muscle growth and ameliorates dystrophic phenotypes in mdx mice Multiple pathological events in exercised dystrophic mdx mice are targeted by pentoxifylline: outcome of a large array of in vivo and ex vivo tests Electrocardiographic findings in mdx mice: a cardiac phenotype of Duchenne muscular dystrophy Regulation of the cardiac L-type Ca2+ channel by the actin-binding proteins alpha-actinin and dystrophin Challenges and opportunities in dystrophin-deficient cardiomyopathy gene therapy Evolution of the mdx mouse cardiomyopathy: physiological and morphological findings Increased connective tissue growth factor associated with cardiac fibrosis in the mdx mouse model of dystrophic cardiomyopathy Adeno-associated virus serotype-9 microdystrophin gene therapy ameliorates electrocardiographic abnormalities in mdx mice Blunted cardiac beta-adrenergic response as an early indication of cardiac dysfunction in Duchenne muscular dystrophy The association of cardiac muscle necrosis and inflammation with the degenerative and persistent myopathy of MDX mice Early right ventricular fibrosis and reduction in biventricular cardiac reserve in the dystrophin-deficient mdx heart Accelerating the mdx heart histo-pathology through physical exercise A deficit of brain dystrophin impairs specific amygdala GABAergic transmission and enhances defensive behaviour in mice Serum transaminase levels in boys with Duchenne and becker muscular dystrophy Ratio of creatine kinase to alanine aminotransferase as a biomarker of acute liver injury in dystrophinopathy Dystrophins carrying spectrin-like repeats 16 and 17 anchor nNOS to the sarcolemma and enhance exercise performance in a mouse model of muscular dystrophy Functional deficits in nNOSmu-deficient skeletal muscle: myopathy in nNOS knockout mice Mechanisms of palmitic acid-conjugated antisense oligonucleotide distribution in mice Conjugation of hydrophobic moieties enhances potency of antisense oligonucleotides in the muscle of rodents and non-human primates Self-assembly into nanoparticles is essential for receptor mediated uptake of therapeutic antisense oligonucleotides Morpholino oligomer-mediated exon skipping averts the onset of dystrophic pathology in the mdx mouse Mdx mice inducibly expressing dystrophin provide insights into the potential of gene therapy for Duchenne muscular dystrophy A morpholino oligomer therapy regime that restores mitochondrial function and prevents mdx cardiomyopathy Repeat-dose toxicology evaluation in cynomolgus monkeys of AVI-4658 a phosphorodiamidate morpholino oligomer (PMO) drug for the treatment of Duchenne muscular dystrophy Efficacy of multi-exon skipping treatment in Duchenne muscular dystrophy dog model neonates Low immunogenicity of LNP allows repeated administrations of CRISPR-Cas9 mRNA into skeletal muscle in mice Innate and conditioned reactions to threat in rats with amygdaloid lesions Characterization of the interactions of chemically-modified therapeutic nucleic acids with plasma proteins using a fluorescence polarization assay Download references Abe for their care of the laboratory animals We appreciate the access to the slide scanner VS200 (Olympus) granted by the Research Core of Tokyo Medical and Dental University This research was supported by the Basic Science and Platform Technology Programs for Innovative Biological Medicine (18am0301003h0005) and Advanced Biological Medicine (23am0401006h0005) to T.Y. from the Japan Agency for Medical Research and Development (AMED) and a JSPS KAKENHI Grant-in-Aid for Scientific Research (A) (19H01016 to T.N (A) (22H00440 to T.N.) and (B) (16H05221 to T.N.) from the Ministry of Education Science and Technology (MEXT) of Japan (Tokyo) This research was also supported by the Joint Research Fund with Takeda Pharmaceutical Company These authors contributed equally: Juri Hasegawa Department of Neurology and Neurological Science Graduate School of Medical and Dental Sciences NucleoTIDE and PepTIDE Drug Discovery Center Department of Bio-informational Pharmacology performed the experiments and analyzed data All authors have read and approved the final manuscript has ongoing collaborations with Takeda Pharmaceutical Co. and serves as an academic advisor for Rena Therapeutics Inc The other authors declare no competing interests are paid employees of Takeda Pharmaceutical Company Limited Download citation DOI: https://doi.org/10.1038/s41467-024-48204-5 Heteroduplex oligonucleotide technology was applied to morpholino oligomers and normalized motor and central nervous system functions of Duchenne muscular dystrophy model mice Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology Metrics details CAG 3' splice sites (3'ss) are more than twice as frequent as TAG 3'ss The greater abundance of the former has been attributed to a higher probability of exon skipping upon cytosine-to-thymine transitions at intron position -3 (-3C > T) than thymine-to-cytosine variants (-3T > C) molecular mechanisms underlying this bias and its clinical impact are poorly understood base-pairing probabilities (BPPs) and RNA secondary structures were compared between CAG 3'ss that produced more skipping of downstream exons than their mutated UAG versions (termed “laggard” CAG 3'ss) and UAG 3'ss that resulted in more skipping than their mutated CAG counterparts (canonical 3'ss) The laggard CAG 3’ss showed significantly higher BPPs across intron-exon boundaries than canonical 3'ss The difference was centered on positions -5 to -1 relative to the intron-exon junction the region previously shown to exhibit the strongest high-resolution ultraviolet crosslinking to the small subunit of auxiliary factor of U2 snRNP (U2AF1) RNA secondary structure predictions suggested that laggard CAG 3'ss were more often sequestered in paired conformations and in longer stem structures while canonical 3'ss were more frequently unpaired the excess of base-pairing at 3'ss has a potential to alter the hierarchy in intrinsic splicing efficiency of human YAG 3'ss from canonical CAG > UAG to non-canonical UAG > CAG to modify the clinical impact of transitions at this position and to change their classification from pathogenic to benign or vice versa the translational potential of genomics has remained limited by our inability to reliably predict which variants lead to actionable phenotypes often prohibiting accurate diagnosis and counseling This challenge is magnified by realization that even identical mutations at the same position of traditional splice-site consensus sequences may have unexpected or even opposite phenotypic effects the importance of intramolecular RNA base-pairing at individual splice-site positions is poorly understood if this bias can explain the higher abundance of CAG 3'ss in mammalian genomes no ab initio tools exist to identify anomalous YAG 3'ss that increase exon skipping when mutated from UAG to CAG it has been unclear why the non-canonical -3T alleles can promote exon inclusion as compared to the -3C alleles and thus become superior to canonical -3C alleles this study has compared base-pairing probabilities (BPPs) of transcript pairs with laggard CAG 3'ss and canonical 3'ss Even the small number of informative transcript pairs (n = 22) has revealed higher average BPPs across intron-exon junctions of laggard CAG 3'ss (ie 3'ss with the hierarchy in splicing efficiency of UAG > CAG) as compared to 3'ss with the canonical order CAG > UAG The maximum discrimination was observed for positions -5 to -1 relative to 3'ss These results suggest that the accessibility of pyrimidine bases at position -3 can control not only splicing efficiency but also clinical outcome of these mutations on a scale benign to pathogenic or vice versa PU values range between 0 (completely base-paired) and 1 (completely unpaired) BPP and PU values were averaged and means and standard deviations of the two groups of 3'ss were compared using an unpaired t-test Nucleotide distribution across 3'ss and distribution of paired and unpaired nucleotides in most stable structures was compared using χ2 tests b Average BPPs across laggard and canonical 3'ss (n = 4 and 18 respectively) and their allelic counterparts dashed lines represent BPP values for alternate pyrimidines Asterisks represent the region with significant differences between the two groups of 3'ss d Mean BPP values for the indicated regions and associated P-values for McCaskill (c) and CONTRAFold (d) algorithms e Mean BPPs across laggard and canonical 3'ss and across their allelic counterparts PU values across laggard and canonical 3' splice sites a Mean PU values across 3'ss sequences of the two groups of 3'ss b Comparison of average PU values for the indicated positions relative to the intron-exon junction (vertical line) A lack of adenines and uridines between positions -3 and -20 of laggard CAG 3' splice sites uridines upstream (a) but not downstream (b) of the intron-exon boundary χ2 values for 2 × 4 contingency tables were 31.7 (P < 0.0001) (a) and 4.2 (P = 0.2) (b) c Adenines were absent just upstream of laggard CAG 3'ss d Nucleotide distribution upstream of 195,404 human 3’ss independent profiling of BPPs and PU values across the two groups of 3'ss identified significant increase in predicted base-pairing in the group of transcripts where UAG 3'ss were more efficient than their CAG 3'ss versions RNA secondary structure has a potential to alter the hierarchy in intrinsic efficiency of human 3’ss from canonical CAG > UAG(>AAG > GAG) to non-canonical UAG > CAG(>AAG > GAG) (3'ss in parentheses have not been tested in this work) the same C > T or T > C mutations at position -3 of 3'ss can have distinct phenotypic outcomes in different sequence and structural contexts rather than secondary structure constraints could switch CAG versus UAG 3'ss preferences in splicing efficiency Establishing a larger group of laggard CAG 3’ss and their local folding patterns should help define molecular interactions at this position and 3'ss responses to dynamic secondary structure formation across intron-exon junctions the more abundant and generally more splice-proficient CAG 3'ss may turn into “laggards” and skip the downstream exon more than their intrinsically weaker UAG 3’ss counterparts This work identifies a collection of 3'ss that provide a starting point for exploring structural requirements for their usage in much greater detail which should facilitate our understanding of structural interactions that involve position -3 These results also suggest that prediction of splicing and clinical outcomes of DNA mutations and polymorphisms in mammalian genes may never be 100% accurate without considering RNA structure of primary transcripts particularly across traditional and auxiliary splicing motifs The data generated or analyzed during this study can be found within this article and its supplementary file Standards and guidelines for the interpretation of sequence variants: a join consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology Alternative splicing caused by RNA secondary structure Effects of RNA secondary structure on alternative splicing of pre-mRNA: is folding limited to a region behind the transcribing RNA polymerase Short artificial hairpins sequester splicing signals and inhibit yeast pre-mRNA splicing Pre-mRNA secondary structures influence exon recognition Conserved RNA secondary structures promote alternative splicing New insights into RNA secondary structure in the alternative splicing of pre-mRNAs RNA secondary structure mediates alternative 3’ss selection in Saccharomyces cerevisiae The role of short RNA loops in recognition of a single-hairpin exon derived from a mammalian-wide interspersed repeat RNA structure in splicing: An evolutionary perspective Quantitative evaluation of all hexamers as exonic splicing elements A broad analysis of splicing regulation in yeast using a large library of synthetic introns RNA splice junctions of different classes of eukaryotes: sequence statistics and functional implications in gene expression U2AF binding selects for the high conservation of the C Comprehensive splice-site analysis using comparative genomics Exonic splicing code and coordination of divalent metals in proteins Scanning and competition between AGs are involved in 3’ splice site selection in mammalian introns A classification model relative to splicing for variants of unknown clinical significance: application to the CFTR gene c.2381-3T>C mutation of DMD gene: a rare SNP without significant pathogenicity Unexpected inactivation of acceptor consensus splice sequence by a -3 C to T transition in intron 2 of the CFTR gene In vitro splicing deficiency induced by a C to T mutation at position -3 in the intron 10 acceptor site of the phenylalanine hydroxylase gene in a patient with phenylketonuria A leaky splicing mutation affecting SMN1 exon 7 inclusion explains an unexpected mild case of spinal muscular atrophy Splicing of phenylalanine hydroxylase (PAH) exon 11 is vulnerable: molecular pathology of mutations in PAH exon 11 Genetic modulation of RNA splicing with a CRISPR-guided cytidine deaminase Clinical characteristics of POC1B-associated retinopathy and assignment of pathogenicity to novel deep intronic and non-canonical splice site variants Two autopsy cases of sudden unexpected death from Dravet syndrome with novel de novo SCN1A variants Analysis of base-pairing probabilities of RNA molecules involved in protein-RNA interactions CentroidFold: a web server for RNA secondary structure prediction Prediction of RNA secondary structure using generalized centroid estimators The equilibrium partition function and base pair binding probabilities for RNA secondary structures Centroid estimation in discrete high-dimensional spaces with applications in biology Using RNA secondary structures to guide sequence motif finding towards single-stranded regions A rule of seven in Watson-Crick base-pairing of mismatched sequences Amount of RNA secondary structure required to induce an alternative splice A mutational analysis of the polypyrimidine tract of introns Effects of sequence differences in pyrimidine tracts on splicing A T to C mutation in the polypyrimidine tract of the exon 9 splicing site of the RB1 gene responsible for low penetrance hereditary retinoblastoma Differences in allelic distribution of two polymorphisms in the VHL-associated gene CUL2 in pheochromocytoma patients without somatic CUL2 mutations Structural basis for polypyrimidine tract recognition by the essential pre-mRNA splicing factor U2AF65 Cloning and intracellular localization of the U2 small nuclear ribonucleoprotein auxiliary factor small subunit Inhibition of msl-2 splicing by Sex-lethal reveals interaction between U2AF35 and the 3’ splice site AG Functional recognition of the 3’ splice site AG by the splicing factor U2AF35 Both subunits of U2AF recognize the 3’ splice site in Caenorhabditis elegans Functional significance of U2AF1 S34F mutations in lung adenocarcinomas Wild-Type U2AF1 Antagonizes the splicing program characteristic of U2AF1-mutant tumors and is required for cell survival Elucidation of the aberrant 3’ splice site selection by cancer-associated mutations on the U2AF1 Download references Generation of data used in this work was funded by inventor royalties (to IV) from a licensing agreement unrelated to this work (US patents 9,714,422 and 10,196,639) personally contributed to the University of Southampton and administered as a research grant by the same institution Funding for open access charge was provided by the University of Southampton IV conceived the scientific question addressed in this work analyzed and interpreted the data and wrote the manuscript The author declares no competing interests Download citation DOI: https://doi.org/10.1038/s10038-024-01308-8 Metrics details A new study combines massively parallel assays transcriptomics and biophysical modeling to provide a framework for analyzing the effects of compounds that modulate pre-mRNA splicing The results lend important insights into the mechanisms of drug action and facilitate the design of splicing therapies Download references The Barcelona Institute of Science and Technology Jorge Herrero-Vicente & Juan Valcárcel Institució Catalana de Recerca i Estudis Avançats (ICREA) is a member of the scientific advisory boards of Remix Therapeutics Reprints and permissions Download citation DOI: https://doi.org/10.1038/s41589-024-01678-2 Find tour dates and live music events for all your favorite bands and artists in your city announced it has moved into the plugin space with the acquisition of Spitfire Audio a prominent UK-based maker of virtual instrument libraries Spitfire Audio has developed a reputation for its virtual instrumentation among composers The company’s virtual instrument libraries have been used in recordings by high-profile creatives and organizations such as Hans Zimmer and producers and are committed to celebrating and supporting their work,” said Kakul Srivastava Our shared vision is to develop tools that expand—not replace—human creativity.” “We’ve always focused on inspiring people to create extraordinary music,” said Paul Thomson The financial terms of the acquisition were not disclosed citing a person with knowledge of the deal reported that the transaction closed at about $50 million New Artist Signings Find tour dates and live music events for all your favorite bands and artists in your city! Get concert tickets, news and more! CelebrityAccess provides unparalleled, detailed information on over 50,000 Entertainers, Speakers, Celebrities, and their representatives, as well as hundreds of thousands of records for venues, agents, and managers. Get the best and latest industry news, data, new artist signings, insider commentary and more, delivered right to your inbox! Volume 17 - 2024 | https://doi.org/10.3389/fnmol.2024.1412964 This article is part of the Research TopicCome as You R(NA): Post-transcriptional Regulation Will Do the RestView all 12 articles Pediatric neurological disorders are frequently devastating and present unmet needs for effective medicine The successful treatment of spinal muscular atrophy with splice-switching antisense oligonucleotides (SSO) indicates a feasible path to targeting neurological disorders by redirecting pre-mRNA splicing One direct outcome is the development of SSOs to treat haploinsufficient disorders by targeting naturally occurring non-productive splice isoforms The development of personalized SSO treatment further inspired the therapeutic exploration of rare diseases This review will discuss the recent advances that utilize SSOs to treat pediatric neurological disorders ASO gapmers have been recently approved by the FDA to treat SOD1 ALS This review focuses on the progress of SSOs in targeting pediatric neurological conditions The natural occurrence of alternative splicing and the identification of splicing enhancers/suppressors indicate that re-directing splicing holds its own dimension for gene regulation and therapeutic intervention About 10% of exonic human mutations are estimated to cause diseases by disrupting pre-mRNA splicing (Soemedi et al., 2017). While whole-exome sequencing detects exonic and splice site mutations for genetically defined disorders, integrating transcriptome and whole-genome analysis uncovers more causal intronic splicing mutations (Cummings et al., 2017; Kim et al., 2023) These splicing mutations frequently introduce aberrant splice sites that lead to loss-of-function or hypomorphic alleles Disease-causing splicing variants can be suppressed to treat human diseases Redirecting splicing can also lead to beneficial effects by (1) bypassing nonessential inframe exons that carry pathogenic mutations (2) bypassing an additional exon to correct the reading frame and (3) redirecting alternative splicing to promote functional isoform production This review focuses on recently reported SSO strategies targeting pediatric neurological conditions and the value of genetic tools (A) Variant-specific SSOs suppress the gain of cryptic splice sites in the introns (top) or exons (bottom) Bypassing a non-essential exon that carries pathogenic mutations (top) skipping an additional non-essential exon (orange) to correct the translational reading frame (middle) or switching for a functional mutually exclusive exon (bottom) (C) Gene-specific SSOs treating recessive or haploinsufficient conditions by converting naturally occurring non-functional (or unstable) splice isoforms to functional isoforms Genetic suppression of non-productive splicing mimicking the maximal and constant effect of an SSO can provide in vivo evidence about the neurological and organismal functions of the non-productive isoform to what extent the protein level can be restored and whether it can rescue phenotypes associated with loss-of-function alleles Recessive diseases frequently involve loss-of-function alleles, and several SSO-based therapeutic strategies have been reported (Figure 1) SSO can promote the inclusion or exclusion of specific exons it is straightforward to use SSOs to suppress undesired exons SSOs can also block splicing silencers and promote exon inclusion to make functional proteins These works suggest a promising exon-skipping strategy for CLN3 (Δex78) Batten’s disease This work paved the path for expedited genetic diagnosis and individualized drug development over 1,400 SCN1A mutations have been reported as pathogenic in ClinVar (a public database to aggregate genetic variants and clinical findings) and a significant fraction of such mutations cause severe loss of function (frameshift causal mutations for neurodevelopmental disorders have been reported in dozens to hundreds of genes targeting such a vast number of mutated alleles using variant- or exon-specific SSOs presents a daunting task the naturally occurring non-productive alternative splicing in disease-associated genes can be targetable switches for gene regulation Clinical trials of the SSO in Dravet patients are ongoing and appear promising These studies suggest that targeting the non-productive isoform can be a promising therapeutic approach indicating the existence of a splicing enhancer for the A3SS-NMD This study indicates that switching functionally equivalent but mutually exclusive exons can bypass deleterious effects and demonstrates the application of a human organoid-rat chimeric system and completely blocking AS-NMD may have undesired consequences mimicking the maximum effect of SSO treatment can rescue or alleviate phenotypes in mouse models of human diseases The active research and collaborative efforts in the field are drawing a promising future for SSO therapy The author(s) declare that financial support was received for the research XZ was supported by grants from the National Institutes of Health (DP2-GM137423 and R01-MH130594) The author would like to thank Oriane Mauger and Michael Kiebler for the opportunity to contribute this review; and thank Runwei Yang and other colleagues for critically reading this manuscript The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher Consensus guidelines for the design and in vitro preclinical efficacy testing N-of-1 exon skipping antisense oligonucleotides PubMed Abstract | Crossref Full Text | Google Scholar Crossref Full Text | Google Scholar Barbosa-Morais The evolutionary landscape of alternative splicing in vertebrate species Spliced segments at the 5′ terminus of adenovirus 2 late mRNA Bhattacharyya Crossref Full Text | Google Scholar The TREAT-NMD DMD global database: analysis of more than 7,000 Duchenne muscular dystrophy mutations Widespread intron retention in mammals functionally tunes transcriptomes RNA-based translation activators for targeted gene upregulation Aberrant inclusion of a poison exon causes Dravet syndrome and related SCN1A-associated genetic epilepsies Crossref Full Text | Google Scholar Therapeutic efficacy of antisense oligonucleotides in mouse models of CLN3 batten disease Protracted CLN3 batten disease in mice that genetically model an exon-skipping therapeutic approach splicing with antisense oligonucleotides reduces toxic amyloid-beta production Antisense oligonucleotide therapeutic approach for Timothy syndrome An amazing sequence arrangement at the 5′ ends of adenovirus 2 messenger RNA Crossref Full Text | Google Scholar Dawicki-McKenna Mapping PTBP2 binding in human brain identifies SYNGAP1 as a target for therapeutic splice switching Crossref Full Text | Google Scholar Correction of prototypic ATM splicing mutations and aberrant ATM function with antisense morpholino oligonucleotides structure and function of approved oligonucleotide therapeutics PubMed Abstract | Crossref Full Text | Google Scholar Very mild muscular dystrophy associated with the deletion of 46% of dystrophin NOVA-dependent regulation of cryptic NMD exons controls synaptic protein levels after seizure Google Scholar Nusinersen versus sham control in infantile-onset spinal muscular atrophy Crossref Full Text | Google Scholar Alternative splicing: increasing diversity in the proteomic world PubMed Abstract | Crossref Full Text | Google Scholar Single-cell isoform RNA sequencing characterizes isoforms in thousands of cerebellar cells mutations in SYNGAP1 in autosomal nonsyndromic mental retardation Understanding genotypes and phenotypes in epileptic encephalopathies PubMed Abstract | Crossref Full Text | Google Scholar Crossref Full Text | Google Scholar Peripheral SMN restoration is essential for long-term rescue of a severe spinal muscular atrophy mouse model Google Scholar Targeted deubiquitination rescues distinct trafficking-deficient ion channelopathies RNA therapeutics: beyond RNA interference and antisense oligonucleotides Mitochondrial clearance and maturation of autophagosomes are compromised in LRRK2 G2019S familial Parkinson's disease patient fibroblasts Crossref Full Text | Google Scholar Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements Rescue of hearing and vestibular function by antisense oligonucleotides in a mouse model of human deafness Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans Integrative functional genomic analysis of human brain development and neuropsychiatric risks Antisense oligonucleotide modulation of non-productive alternative splicing upregulates gene expression Developmental attenuation of neuronal apoptosis by neural-specific splicing of Bak1 microexon Disrupted auto-regulation of the spliceosomal gene SNRPB causes cerebro-costo-mandibular syndrome Crossref Full Text | Google Scholar Targeted intron retention and excision for rapid gene regulation in response to neuronal activity Towards a therapy for Angelman syndrome by targeting a long non-coding RNA Nusinersen versus sham control in later-onset spinal muscular atrophy Evolutionary dynamics of gene and isoform regulation in mammalian tissues Antisense oligonucleotide-mediated correction of CFTR splicing improves chloride secretion in cystic fibrosis patient-derived bronchial epithelial cells Crossref Full Text | Google Scholar Evaluating human mutation databases for "treatability" using patient-customized therapy A single nucleotide difference that alters splicing patterns distinguishes the SMA gene SMN1 from the copy gene SMN2 Alternative splicing in neurodegenerative disease and the promise of RNA therapies Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing Panagiotakos Aberrant calcium channel splicing drives defects in cortical differentiation in Timothy syndrome ASO targeting RBM3 temperature-controlled poison exon splicing prevents neurodegeneration in vivo Crossref Full Text | Google Scholar Targeted exon skipping of a CEP290 mutation rescues Joubert syndrome phenotypes in vitro and in a murine model a selective survival of motor Neuron-2 (SMN2) gene splicing modifier for the treatment of spinal muscular atrophy (SMA) Antisense oligonucleotides: the next frontier for treatment of neurological disorders Crossref Full Text | Google Scholar SMN gene duplication and the emergence of the SMN2 gene occurred in distinct hominids: SMN2 is unique to Homo sapiens Satterstrom A single ataxia telangiectasia gene with a product similar to PI-3 kinase The novel neuronal ceroid lipofuscinosis gene MFSD8 encodes a putative lysosomal transporter RNA-targeting splicing modifiers: drug development and screening assays Splicing defects in the ataxia-telangiectasia gene ATM: underlying mutations and consequences RNA isoform screens uncover the essentiality and tumor-suppressor activity of ultraconserved poison exons Evaluation of 36 patients from Turkey with neuronal ceroid lipofuscinosis: clinical neuroradiological and histopathologic studies Van Nostrand A large-scale binding and functional map of human RNA-binding proteins a PDZ domain-containing protein expressed in the inner ear sensory hair cells Safety and efficacy of drisapersen for the treatment of Duchenne muscular dystrophy (DEMAND II): an exploratory PubMed Abstract | Crossref Full Text | Google Scholar Alternative isoform regulation in human tissue transcriptomes Single-cell long-read sequencing in human cerebral organoids uncovers cell-type-specific and autism-associated exons Coordination of alternative splicing and alternative polyadenylation revealed by targeted long read sequencing Cell-type-specific alternative splicing governs cell fate in the developing cerebral cortex PSD-95 is post-transcriptionally repressed during early neural development by PTBP1 and PTBP2 Citation: Zhang X (2024) Splice-switching antisense oligonucleotides for pediatric neurological disorders Received: 06 April 2024; Accepted: 12 July 2024; Published: 25 July 2024 Copyright © 2024 Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) distribution or reproduction in other forums is permitted provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited in accordance with accepted academic practice distribution or reproduction is permitted which does not comply with these terms *Correspondence: Xiaochang Zhang, eGN6aGFuZ0B1Y2hpY2Fnby5lZHU= Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher 94% of researchers rate our articles as excellent or goodLearn more about the work of our research integrity team to safeguard the quality of each article we publish Splice SVP of Content Kenny Ochoa notes how the “phone is already a huge part of music making” When you purchase through affiliate links on MusicTech.com, you may contribute to our site through commissions. Learn moreCredit: Splice Music creation platform and sample library Splice has launched Splice Mic a new update to Splice Mobile which allows creators to record vocals over instrumentals created within the app Since its revamp in 2023 users have been able to use Splice Create’s AI power to generate arrangements – or Stacks – using loops and sounds from the Splice library Splice Mic enables producers to record vocals or other instruments over the top of those Stacks using the microphone built into their smartphone noting how the “phone is already a huge part of music making” “About one million users have made more than 28 million stacks so far and now songwriters and producers can record vocal ideas over stacks of samples,” he says And now those stacks can be merged with vocals.” Splice has teamed up with songwriter and DJ Leland – who has worked with the likes of Troye Sivan Ariana Grande and Charli XCX – and LA’s Laurelvale Studios inviting teams of songwriters to create Stacks with Create on Splice Mobile Songwriters invited to participate in the project – dubbed 60 Second Stack – include Madison Love (Lady Gaga “We got the team together to see who could start the best new Stacks,” says Leland Designed to make on-the-go collaboration easier than ever Splice Mobile allows users to share ideas directly within the app “Musicians are already using voice recording functions on their phones to capture ideas away from the studio,” Splice says giving songwriters the creative depth of the Splice Sounds catalogue and Create.” Splice Mic is just the latest push by Splice to make songwriting and creative workflows more seamless. Last year, PreSonus Studio One became the first DAW to integrate Splice offering millions of royalty-free samples into the Studio One workflow Splice Mic is now available in Splice Mobile. For more information, head to Splice. Metrics details Interpreting the clinical significance of putative splice-altering variants outside canonical splice sites remains difficult without time-intensive experimental studies we introduce Parallel Splice Effect Sequencing (ParSE-seq) a multiplexed assay to quantify variant effects on RNA splicing We first apply this technique to study hundreds of variants in the arrhythmia-associated gene SCN5A Variants are studied in ‘minigene’ plasmids with molecular barcodes to allow pooled variant effect quantification including disease-relevant induced pluripotent stem cell-derived cardiomyocytes (iPSC-CMs) The assay strongly separates known control variants from ClinVar enabling quantitative calibration of the ParSE-seq assay Using these evidence strengths and experimental data we reclassify 29 of 34 variants with conflicting interpretations and 11 of 42 variants of uncertain significance we show that many synonymous and missense variants disrupted RNA splicing Two splice-altering variants in the assay also disrupt splicing and sodium current when introduced into iPSC-CMs by CRISPR-Cas9 editing ParSE-seq provides high-throughput experimental data for RNA-splicing to support precision medicine efforts and can be readily adopted to study other loss-of-function genotype-phenotype relationships functional assays of splice-altering variants have thus far not high-throughput splicing assay would facilitate the reclassification of variants of uncertain significance (VUS) and conflicting interpretation (CI) variants in disease-associated genes Enabling larger scale investigations into splice-altering variants in SCN5A would decrease clinical uncertainty when managing patients at risk for this potentially fatal arrhythmia syndrome Parallel Splice Effect-sequencing (ParSE-seq) to determine the splice-altering consequences of hundreds of intronic and exonic variants in SCN5A We implement ParSE-seq for 244 SCN5A variants in human embryonic kidney (HEK293) cells and 224 variants in induced pluripotent-derived cardiomyocyte cells (iPSC-CM) We calibrate the assay with nearly 50 ClinVar-annotated benign and pathogenic variants and compare our experimental outcomes to the in silico tool SpliceAI and contribute functional data to help adjudicate 29 CI variants in ClinVar we demonstrate that some missense variants may be incorrectly described as having normal function by conventional cDNA-based patch clamping assays that cannot assess splicing outcomes This design enabled the linkage of reads to variants even if the variant was not included in the spliced transcript E PSI for all WT exons in iPSC-CMs and HEK cells Data are averaged across three replicates and error bars represent the standard error of the mean A Example lollipop diagram showing variants superimposed along construct The y-axis represents mean ΔPSI_norm across three biological replicates and the x-axis represents genomic position along the exonic (box) and intronic (line) segments of the synthetic insert blue intronic variants outside the 2-bp canonical splice sites B Lollipop diagram showing distribution of ParSE-seq investigated variants in HEK cells An average of three experimental replicates is shown C Lollipop diagram showing distribution of ParSE-seq investigated variants in iPSC-CMs D Waterfall plot of mean ΔPSI_norm by variant (N = 243) Red dashed line corresponds to −50% normalized ΔPSI E Spearman correlation (two-sided test) of mean ΔPSI_norm between HEK and iPSC-CMs (N = 207) Error bars refer to 95% confidence interval F Volcano plot of normalized ΔPSI and −log10(FDR) Each dot represents a variant studied in iPSC-CMs (N = 243) Most variants fall within normal (blue) or abnormal (red) quadrants but some remain indeterminant due to statistical or biological ambiguity (gray) G Barplot of ParSE-seq variant outcomes by variant mutation type in iPSC-CMs 2-bp indicates the conserved 2-base pair canonical splice sites AG-GT Raw data for plots available in Source Data multiple aberrant splicing events were observed at appreciable levels Many of these altered transcripts resulted in changes to the reading frame some small and large in-frame insertions and deletions were also observed there were 614 SCN5A-BrS patients harboring a variety of SCN5A variants there were a total of 27 splice-altering variants affecting 43 patients (functionally abnormal non-consensus splice variants or consensus splice site variants) we proactively investigated 18 unique variants harbored by 36 patients which complemented our previous low-throughput investigations Correlation of normalized ΔPSI for non-canonical splice site variants against aggregate SpliceAI scores (N = 140; D) Confidence interval fit using LOESS (see “Methods” section) P values were determined using a Pearson correlation In addition to aggregate SpliceAI scores and ParSE-seq ΔPSI_norm comparisons across the library, we also compared specific SpliceAI molecular predictions to observed ParSE-seq splice outcomes (Supplementary Fig. 13) SpliceAI predictions typically matched experimental data for activation of cryptic splice sites (c.3358A > T although in the ParSE-seq assay a different cryptic site may become activated (c.4220C > G) or exon skipping may result (c.4298G > T and c.3564G > T) Some variants led to multiple splice aberrations in the ParSE-seq experiment (e.g. exon skipping and exon truncation/intron retention) despite only having a single SpliceAI predicted aberrant event (c.3358A > T and c.2024-11T > A) D Classifications of conflicting interpretation (CI) variants using functional evidence Calibration control outcomes and ClinVar classifications in Source Data We classified 1 CI variant as LP due to its splice-altering effect ParSE-seq can help identify a class of missense splice-altering variants for which cDNA-based assays of protein function yield incorrect conclusions about variant pathogenicity This result highlights that for missense variants ParSE-seq can be used to complement traditional cDNA-based assays of protein function we developed a high-throughput method to assess the splicing consequences of hundreds of variants (ParSE-seq) We implemented barcoding of a pool of minigene plasmids to enable multiplexed splicing readouts using high-throughput sequencing and applied the method to study variants in the cardiac sodium channel gene SCN5A we quantified variant effects on splicing for 224 variants and detected 78 variants with abnormal splicing We observed concordance of splicing results for 45/47 B/LB and P/LP variants and we determined that our assay could be applied at the strong level in the ACMG classification scheme (BS3 and PS3) when assuming ClinVar as ground truth Leveraging these calibrated strengths of evidence We determined that the in silico tool SpliceAI has high concordance with experimentally measured splicing effects We also demonstrated examples of missense variants that had normal electrophysiology using conventional heterologous expression cDNA-based approaches but disrupt splicing we showed that our ParSE-seq results predict aberrant splicing in a disease-relevant iPSC-CM model with consequences at both the RNA and protein levels We envision that ParSE-seq will be applicable to many disease genes and will be accessible using openly available computational pipelines and democratized gene synthesis available to the community this method may help classify large sets of variants in Mendelian disease-associated genes that act through a loss-of-function mechanism Although we validate two ParSE-seq splice-altering variants by CRISPR editing of the iPSC-CMs most variants were tested only in multiplexed minigene assays While we anticipate most splice-altering variants to result in loss-of-function (NaV1.5 peak current abrogation for SCN5A variants) there may be alternative mechanisms revealed by functional assessment of the CRISPR-edited iPSC-CM model while ParSE-seq quantifies broad molecular impacts such as exon skipping it is possible that some splicing abnormalities may not have a detrimental effect on downstream protein function (protein tolerant in-frame insertion/deletions) there are no observed indel variants >3 amino acids that are reported as B/LB The frequency of affected variant heterozygotes is difficult to ascertain based off ClinVar data alone as detailed patient phenotypes and case counts are not routinely reported from submitting centers There may be examples where the ParSE-seq minigene assay does not fully capture all nuances of biology at the endogenous locus splicing regulatory motifs in the native context may have a long-distance effect not captured in the minigene-based assay This incomplete ascertainment may lead to discordant results with in silico predictors for a subset of variants exons using non-canonical 2-bp splice sites and exons that were difficult to synthesize due to high GC content or restriction enzyme incompatibility were not included in the library Exons that undergo extensive alternative splicing in the endogenous tissue (e.g. and 24) may have low intrinsic PSI in the minigene assay which may limit the use of the assay for these exons we anticipate that ParSE-seq will be a useful method for to rapidly assessing variant splicing effects Given the plethora of variants that may act through disrupting splicing our method can be used to efficiently characterize the splicing effects of variants in disease-associated genes The participant (male age 30–40) from which these cells were derived provided informed consent As we were interested in variant-level effects we did not consider sex/gender or genetic ancestry in the selection of this cell line The Vanderbilt University Medical Center IRB (#9047) approved the use of the induced pluripotent stem cells used in this study we conservatively assumed at least one affected patient per reported variant The minigene-based assay requires an acceptor and donor splice site on each end of the test exon and is therefore incompatible with the first or last coding exons (2 and 28) SCN5A uses two instances of non-canonical AC/AT splice sites between exons 3 and 4 we did not study variants in these four exons or in adjacent intronic locations we were unable to include plasmids with exon 15 due to synthesis incompatibility (high GC content) and exon 17 due to overlap of restriction enzymes used for barcoding primers with AscI/MfeI sequences flanking the region of interest were used in a Q5 PCR reaction (NEB) following manufacturer’s protocol The amplicon was PCR purified (Qiagen) per manufacturer’s protocol pAG424 and the amplicon were then each digested with AscI and MfeI (NEB) at 37 °C for 1 h followed by separation on a 1% agarose gel and then purified following instructions from a Gel Extraction Kit (QIAGEN) Each component was then ligated with supplies from a T4 ligation kit (NEB) for 1 h followed by heat inactivation at 65 °C for 10 min A 1 μL aliquot was then used to transform 50 μl competent cells (NEB) and DNA extraction using a Spin Miniprep Kit (QiIAGEN) The plasmids were sequence verified by Genewiz before use in the ParSE-seq assay the double stranded DNA was then phenol/chloroform extracted and digested using AscI and MfeI (NEB) and was again purified by phenol/chloroform extraction The pool of minigene plasmids was also digested with AscI and MfeI and cleaned by gel extraction (QIAGEN) The digested vector pool and barcode insert were ligated using T4 ligase (NEB) The ligation product was PCR purified (QIAGEN) and electroporated into ElectroMax DH10B cells (ThermoFisher) using a Gene Pulser Electroporator (BioRad; 2.0 kV The resulting bacterial culture was then grown overnight and DNA was isolated by a maxiprep (QIAGEN) to yield the barcoded plasmid library Barcode diversity was estimated by plating dilutions of the library on LB-ampicillin plates and counting colonies and used to generate a SMRT Bell 3.0 library (PacBio) according to the manufacturer’s instructions The library was sequenced with PacBio Sequel II 8M SMRT Cell by Maryland Genomics We recorded 30 h of PacBio SMRT cell sequencing To mitigate sequencing errors in the raw PacBio data we only analyzed Circular Consensus Sequence (CCS) reads A total of 4,136,990 CCS reads were obtained as fastq files The median Q score was 48 across CCS reads The barcode identity was assigned as the most frequently aligned insert if that insert represented more than 50% of the read counts After implementation of these quality control cutoffs 284 of the 290 targeted plasmids were successfully detected in the plasmid pool The PCR protocol included a single denaturation step of 98 °C for 30 s touchdown with 10 cycles of 98 °C for 10 s 10 cycles of 65–55 °C (decreasing by 1 °C/cycle for 15 s followed by an additional 20 cycles of 98 °C 10 s followed by a final extension of 72 °C for 5 min PCR amplicons were purified with a PCR purification kit (QIAGEN) Libraries were then sequenced using Illumina NovaSeq paired-end 150 base sequencing to ~50M reads/sample A diagram of the computational pipeline is presented in Supplementary Fig. 3 Reads were filtered for correct barcode prefix and suffix sequences and were divided into separate files by barcode as described above each barcode was required to be present in at least 25 reads in each replicate to be included The PSI metric was calculated using grep searches for splice junctions corresponding to the WT exon splicing to the reference exons in the R1 and R2 reads: For variants that would alter the coding sequence of the WT exon a bespoke R1 or R2 junction was created and used for those specific variants PSIs were then averaged across barcodes for each variant using the barcode-variant lookup table from the assembly step described above We used the assigned PSI as the average PSI for each variant across 3 independent transfections into HEK or iPSC-CMs This ΔPSI_norm value represents the change in splicing of the variant compared to the level of splicing of the corresponding WT exon The value is normalized to the level of WT splicing to determine the percent change of splicing regardless of the baseline level of WT PSI if a WT exon had a PSI of 80% and the variant exon had a PSI of 40% A ΔPSI_norm value of −100% indicates a complete loss of normal splicing whereas a value of 0 indicates identical splicing to WT Comparison of WT and variant PSI (using mean of three replicate samples) used a two-sided t-test implemented in R FDR were calculated using the R command p.adjust To also account for statistical significance variants with FDR < 0.1 and ΔPSI_norm < −50% were considered splice-altering Variants with FDR > 0.1 and ΔPSI_norm ≥ −20% were considered non-splice altering All other variants were labeled indeterminate in the assay We excluded variants from our analysis if the standard error of the PSI among the three replicates was >0.15 and barplots were plotted in R using ggplot2 (see GitHub for code) We used locally estimated scatterplot smoothing (LOESS) as a non-parametric regression model for comparing in silico splicing predictors with experimental ParSE-seq data LOESS was selected as a smoothing method due to the largely bimodal distribution of our experimental data A 95% confidence interval is displayed alongside the line of fit LOESS was implemented in ggplot2 using default settings with ΔPSI_normvariant plotted as a function of aggregate SpliceAI predictions Full code describing this analysis is available on GitHub we calculate the likelihood ratio of pathogenicity (termed OddsPath) for both splice-altering (pathogenic) and non-splice-altering (benign) assay results we removed all variants with indeterminate scores we calculated benign and pathogenic OddsPath values using the equations: The benign and pathogenic posterior P2 was then calculated: Following the approach recommended by Brnich et al.24 if P2Pathogenic = 1 it was conservatively estimated to have one additional discordant variant (a functionally abnormal benign variant) by the following equation: an additional discordant variant was included (a functionally normal pathogenic variant): Each posterior was combined with the prior to derive an OddsPath and assign evidence for PS3 and BS3 criteria In the manual iPSC-CM patch clamp experiments the series resistance (Rs) was monitored using Seal Test (Clampex 10.9 software) to achieve a range of 5–10 MΩ Current–voltage curves were generated by repeated voltage changes to the same cells Two trials were performed for each cell line and data were then averaged across all measured cells Two complementary oligonucleotides were phosphorylated and annealed using T4 PNK (NEB) per protocol followed by simultaneous digestion and ligation of pX458 with BbsI-HF (NEB) and T4 ligase (NEB) per manufacturer’s protocol The sample was transformed into competent E and then colony expansion and miniprep (Qiagen) as described above The cloned guide plasmid and a 151-nucleotide repair template bearing the desired change and PAM site variant were co-electroporated into dissociated iPSCs using the Neon Transfection System (ThermoFisher MPK5000) and sorted for GFP+ cells using a BD Fortessa 5-laser instrument DNA extracted using QuickExtract (Lucigen) PCR amplified using primers mo198 and mo199 and Sanger sequenced to identify a colony with a heterozygous edit During manual patch clamp experiments on iPSC-CMs membrane resistance (Rm) was monitored throughout using the Membrane Test (Clampex 10.9) We first optimized the electrode capacitance compensation on the amplifier performed following giga-seal formation and before achievement of the whole-cell configuration the capacitive transients were completely and well-compensated by ~80% when whole-cell capacitance compensation was enabled we used Seal Test (Clampex 10.9) as an oscilloscope window for monitoring the current signal to achieve a reading close to 10 MΩ Optimization of the capacitance compensation was an extremely important step for accurate Cm and Rm measurements in Membrane Test To achieve high quality giga-seal formation before cell membrane break-in we chose cells with giga-seal of 1–2 GΩ for the experiments Whole-cell voltage-clamp experiments in iPSC-CMs were conducted at room temperature (22–23 °C) Glass microelectrodes were heat polished to tip resistances of 0.5–1 MΩ Data acquisition was carried out using MultiClamp 700B patch clamp amplifier and pCLAMP 10 software suite (Molecular Device Corp. four-pole Bessel filter) and digitized at frequency of 2–20 kHz by using an analog-to-digital interface (DigiData 1550B capacitance and series resistance were corrected ~80% Voltage-clamp protocols used are shown on the figures Electrophysiological data were analyzed using Clampfit 10 software and the figures were prepared by using Graphing & Analysis software OriginPro 8.5.1 (OriginLab Corp. To provide better voltage control of sodium current the external solution was K+-free and Ca2+-free with a lower sodium concentration (50 mmol/L) The pipette (intracellular) solution had (in mM) NaF 5 To eliminate the overlapped L-/T-type inward calcium currents and outward potassium currents and 200 µM 4-aminopyride) were added into the cell bath solution cells were held at −100 mV and current was elicited with a 50-ms pulse from −100 to +40 mV in 10 mV increments Current densities were expressed in the unit of pA/pF after normalization to cell size (pF) generated from the cell capacitance calculated by the function of Membrane Test (OUT 0) in pCLAMP 10 software The average capacitances of the iPSC-CMs were 56.9 ± 4.8 pF (WT) The average membrane resistances of the iPSC-CMs were 1.54 ± 0.09 GΩ (WT) and 1.55 ± 0.1 GΩ (c.4220G > C variant) These parameters were not statistically significantly different from each other (p > 0.05) We did not measure membrane potentials during the sodium current (INa) measurements under modified experimental conditions (see below) However differentiated iPSC-CMs from the same population control iPSC line studied with physiological intra- and extracellular solutions had potentials ranging from −75 to −90 mV with an average potential of −82.2 ± 1.2 mV Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article Strategic vision for improving human health at The Forefront of Genomics Spectrum of splicing variants in disease genes and the ability of RNA analysis to reduce uncertainty in clinical interpretation Cummings, B. B. et al. Improving genetic diagnosis in Mendelian disease with transcriptome sequencing. Sci. Transl. Med. 9 https://doi.org/10.1126/scitranslmed.aal5209 (2017) Genetic diagnosis of Mendelian disorders via RNA sequencing Standardized practices for RNA diagnostics using clinically accessible specimens reclassifies 75% of putative splicing variants Use of splicing reporter minigene assay to evaluate the effect on splicing of unclassified genetic variants Valenzuela-Palomo, A. et al. Splicing predictions, minigene analyses, and ACMG-AMP clinical classification of 42 germline PALB2 splice-site variants. J. Pathol. https://doi.org/10.1002/path.5839 (2021) Functional classification of DNA variants by hybrid minigenes: identification of 30 spliceogenic variants of BRCA2 exons 17 and 18 Intronic CRISPR repair in a preclinical model of Noonan syndrome-associated cardiomyopathy O’Neill, M. J. et al. Functional assays reclassify suspected splice-altering variants of uncertain significance in Mendelian channelopathies. Circ. Genom. Precis. Med. https://doi.org/10.1161/circgen.122.003782 (2022) Tobert, K. E. et al. Genome sequencing in a genetically elusive multi-generational long QT syndrome pedigree identifies a novel LQT2-causative deeply intronic KCNH2 variant. Heart Rhythm https://doi.org/10.1016/j.hrthm.2022.02.004 (2022) High-throughput splicing assays identify missense and silent splice-disruptive POU1F1 variants underlying pituitary hormone deficiency High-throughput mutagenesis identifies mutations and RNA-binding proteins controlling CD19 splicing and CART-19 therapy resistance A multiplexed assay for exon recognition reveals that an unappreciated fraction of rare genetic variants cause large-effect splicing disruptions Vex-seq: high-throughput identification of the impact of genetic variation on pre-mRNA splicing efficiency Contribution of noncanonical splice variants to TTN truncating variant cardiomyopathy Identification of pathogenic gene mutations in LMNA and MYBPC3 that alter RNA splicing Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1 A calibrated functional patch-clamp assay to enhance clinical variant interpretation in KCNH2-related long QT syndrome Saturation-scale functional evidence supports clinical variant interpretation in Lynch syndrome Genome-wide association analyses identify new Brugada syndrome risk loci and highlight a new mechanism of sodium channel regulation in disease susceptibility Reappraisal of reported genes for sudden arrhythmic death: evidence-based evaluation of gene validity for Brugada syndrome Arrhythmic phenotypes are a defining feature of dilated cardiomyopathy-associated SCN5A variants: a systematic review An international compendium of mutations in the SCN5A-encoded cardiac sodium channel in patients referred for Brugada syndrome genetic testing Sudden cardiac arrest associated with use of a non-cardiac drug that reduces cardiac excitability: evidence from bench Cryptic 5′ splice site activation in SCN5A associated with Brugada syndrome Genetic effects on gene expression across human tissues Enhancing rare variant interpretation in inherited arrhythmias through quantitative analysis of consortium disease cohorts and population controls Genetics of congenital arrhythmia syndromes: the challenge of variant interpretation Pathogenicity assignment of variants in genes associated with cardiac channelopathies evolve toward diagnostic uncertainty Listening to silence and understanding nonsense: exonic mutations that affect splicing Translation of human-induced pluripotent stem cells: from clinical trial in a dish to precision medicine Quality and quantity control of gene expression by nonsense-mediated mRNA decay Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes Bersell, K. R. et al. Transcriptional dysregulation underlies both monogenic arrhythmia syndrome and common modifiers of cardiac repolarization. Circulation https://doi.org/10.1161/circulationaha.122.062193 (2022) Functionally validated SCN5A variants allow interpretation of pathogenicity and prediction of lethal events in Brugada syndrome SCN5A (NaV1.5) variant functional perturbation and clinical presentation: variants of a certain significance Mechanism and modeling of human disease-associated near-exon intronic variants that perturb RNA splicing SpliceVault predicts the precise nature of variant-associated mis-splicing Biology of cardiac arrhythmias: ion channel protein trafficking Deep mutational scanning: a new style of protein science Variant interpretation: functional assays to the rescue Massively parallel functional testing of MSH2 missense variants conferring Lynch syndrome risk SpliceAI-visual: a free online tool to improve SpliceAI splicing variant interpretation High-throughput discovery of trafficking-deficient variants in the cardiac potassium channel KV11.1 Deep mutational scan of an SCN5A voltage sensor Using high-resolution variant frequencies to empower clinical genome interpretation and economical inverse polymerase chain reaction-based method for generating a site saturation mutant library Measuring the activity of protein variants on a large scale using deep mutational scanning Real-time DNA sequencing from single polymerase molecules Examining sources of error in PCR by single-molecule sequencing Wada, Y. et al. Common ancestry-specific ion channel variants predispose to drug-induced arrhythmias. Circulation https://doi.org/10.1161/circulationaha.121.054883 (2022) Chemically defined generation of human cardiomyocytes Bodbin, S. E., Denning, C. & Mosqueira, D. Transfection of hPSC-cardiomyocytes using Viafect™ transfection reagent. Methods Protoc. 3 https://doi.org/10.3390/mps3030057 (2020) pROC: an open-source package for R and S+ to analyze and compare ROC curves An openly available online tool for implementing the ACMG/AMP standards and guidelines for the interpretation of sequence variants An improved platform for functional assessment of large protein libraries in mammalian cells A platform for functional assessment of large variant libraries in mammalian cells High-throughput reclassification of SCN5A variants Dominant negative effects of SCN5A missense variants Glazer, A. M. et al. Arrhythmia variant associations and reclassifications in the eMERGE-III sequencing study. Circulation https://doi.org/10.1161/circulationaha.121.055562 (2021) O’Neill, M. J. et al. Multicenter clinical and functional evidence reclassifies a recurrent noncanonical filamin C splice-altering variant. Heart Rhythm https://doi.org/10.1016/j.hrthm.2023.05.006 (2023) CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens fastp: an ultra-fast all-in-one FASTQ preprocessor Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype The sequence alignment/map format and SAMtools Matthew J. O’Neill, D. M. R., & Andrew M. Glazer. https://doi.org/10.5281/zenodo.13170911 (Zenodo Download references These authors jointly supervised this work: Dan M Vanderbilt Center for Arrhythmia Research and Therapeutics (VanCART) developed the ParSE-seq experimental methodology and completed all computational analysis provided experimental support for manual patch clamp of iPSC-CMs assisted in iPSC-CM CRISPR editing and cell culture All authors reviewed and edited the manuscript All remaining authors declare no competing interests Download citation DOI: https://doi.org/10.1038/s41467-024-52474-4 a royalty-free sample and loop marketplace and music creation platform has appointed music industry veteran Kenny Ochoa as Senior Vice President of Content Ochoa will oversee Splice’s global content team Ochoa’s remit will also include responsibility for Splice’s content pipeline which according to Splice will include “building out robust Artist partnerships and driving the platform’s overall industry outreach” Kenny Ochoa joins Splice from Snap where he served as Head of Music Curation & Licensing (responsible for the platform’s music programming and curation in the US so it was vital that we found someone with the relationships taste and passion to help us build and prepare for the future of music creation,” said Splice CEO someone with deep experience of both the tech and music industries who is also empathetic to the creator community.” Added Srivastava: “Kenny is what we were looking for who is also empathetic to the creator community” “I’m excited about where music might go in the framework of a creator first platform Splice is already one of the most talked about Ochoa said:  “I’m excited about where music might go in the framework of a creator first platform “Their success is fueled by their commitment to their catalog and the creator community I am thrilled to be working with Kakul and the amazing team she’s assembled to build tools that will serve the next generation of music creators around the world.” In 2021, Splice was valued at nearly USD $500 million after securing $55 million in funding, according to Bloomberg In March, Splice launched a new mobile app experience powered by AI technologyMusic Business Worldwide Metrics details Huntington’s disease (HD) is caused by a CAG repeat expansion in the HTT gene the mechanisms leading to disrupted RNA processing in HD remain unclear Here we identify TDP-43 and the N6-methyladenosine (m6A) writer protein METTL3 to be upstream regulators of exon skipping in multiple HD systems Disrupted nuclear localization of TDP-43 and cytoplasmic accumulation of phosphorylated TDP-43 occurs in HD mouse and human brains with TDP-43 also co-localizing with HTT nuclear aggregate-like bodies distinct from mutant HTT inclusions The binding of TDP-43 onto RNAs encoding HD-associated differentially expressed and aberrantly spliced genes is decreased m6A RNA modification is reduced on RNAs abnormally expressed in the striatum of HD R6/2 mouse brain including at clustered sites adjacent to TDP-43 binding sites Our evidence supports TDP-43 loss of function coupled with altered m6A modification as a mechanism underlying alternative splicing in HD no disease-modifying treatment is available Although abnormal interactions between HTT and RBPs were reported suggesting disruption of RNA processing in HD the mechanisms by which mHTT leads to alterations of RNA expression and splicing—a hallmark of HD and other neuropathological disorders—remain undetermined It is unknown whether HD-associated expression and splicing alterations are related to TDP-43 disruption in HD Novel processing alterations were confirmed by long-read RNA-seq We identified primary sequence motifs—RNA binding sites associated with mHTT-dependent changes in alternative splicing (AS) using primary sequence models—that implicated two HTT-interacting RBPs: TDP-43 and methyltransferase 3 (METTL3) Using molecular and neuropathological measures and TDP-43 enhanced crosslinking and immunoprecipitation sequencing (eCLIP-seq) and m6A eCLIP-seq we determined that mHTT disrupts TDP-43 and METTL3 function in post-transcriptional processing of their RNA targets in HD We further show a nuclear aggregate-like structure in the brains of patients with HD that contain TDP-43 and HTT This study provides evidence for functional disruption of TDP-43 in HD and an association with abnormal m6A RNA modification in HD This work also suggests that TDP-43 dysregulation may be an important component of pathogenesis in a broader group of diseases than previously thought oligodendrocyte progenitor cells; premyelin cells; and vascular cells Pie chart showing the detection of significant excluded exons in PacBio Iso-seq long-read sequencing Canonical binding motif for TDP-43 (UG rich) and METTL3 (DRACH) Our data suggest that TDP-43 contributes to transcriptional dysregulation in the HD R6/2 which may involve an exciting intersection between TDP-43 and m6A in HD that has not been previously described we investigated how TDP-43 and the m6A RNA modification might contribute to altered splicing and HD pathology Gradient scale represents z-scores of normalized gene counts Reproducible R6/2 HD and NT IDR TDP-43 peaks were centered and plotted on all TDP-43 binding sites TDP-43 mRNA level by qPCR normalized to cyclophilin as a percentage of PBS control Heatmap showing clustering of 3mos HD R6/2 TDP-43 ASO treated and control PBS treated on TDP-43 KD-dependent DGE changes Schematic of iPSC differentiation into MSNs with TDP-43 KD by siRNA Left: western blot for TDP-43 protein levels after treatment of MSNs with TDP-43 siRNA n = 3 differentiation replicates per condition Right: bar graph plots TDP-43 intensity normalized to Revert total protein stain Statistical significance was determined by two-way ANOVA with Sidak’s multiple comparisons test (18Q: P < 0.0001 Venn diagram showing the overlap of DEGs between HTT-18Q MSNs scramble control versus TDP-43 siRNA and HTT-18Q MSNs versus mHTT-50Q Example of key gene expression changes anticipated from TDP-43 KD Statistical significance was determined by unpaired two-tailed t-test (TDP-43 KD versus Ctrl STMN2: P < 0.0001 95% CI: −768.6 to −611.6; 18Q versus 50Q STMN2: P = 0.0001 95% CI: −496.4 to −336.7; TDP-43 KD versus Ctrl UNC13B: P = 0.0076 95% CI: −64.21 to −18.29; 18Q versus 50Q UNC13B: P = 0.0132 95% CI: −36.78 to −7.682; TDP-43 KD versus Ctrl CAMK2B: P = 0.0006 95% CI: −47.26 to −26.79; 18Q versus 50Q CAMK2B: P = 0.0287 our analysis shows that TDP-43 KD drives similar gene expression pattern changes as mHTT in HD MSNs Representative IF staining images of SFG from patients with HD compared to non-HD control individuals showing decreased TDP-43 (yellow) signal intensity Left: quantification of decreased nuclear TDP-43 signal intensity; five representative images were taken at ×40 from five HD and two control individuals A CellProfiler pipeline was created to identify larger nuclei (enriched for neurons) by DAPI staining The average of TPD-43 nuclear signal was obtained by measuring the intensity signal within a mask defined by DAPI Each cell’s mean nuclear TDP-43 intensity is plotted One-way ANOVA was performed with multiple comparisons and resulted in significant changes between all HD versus control comparisons (data not shown) Numbers on top of each group indicate the number of cells plotted Right: dot plot showing grouped data by genotype Statistical significance was derived from unpaired two-tailed t-test between control versus HD (P < 0.0001 Representative IF staining images of the motor cortex from a patient with ALS (positive control) compared to patients with HD using antibodies against total TDP-43 (yellow) phosphorylated TDP-43 (purple) and nuclear stain DAPI White arrowhead indicates pTDP-43 cytoplasmic aggregation; red arrowhead (HD1) indicates cytoplasmic aggregate verified by orthogonal view IF images showing pTDP-43 AL bodies (yellow) within MAP2-positive neurons (white) Each bar is derived from five random ×20 images from each patient %AL bodies is the number of AL bodies per patient normalized to the total number of Map2-positive neurons Statistical significance was determined by unpaired two-tailed t-test (number of neurons: P = 0.3655 95% CI: −48.51 to 20.01; AL bodies (%): P = 0.0146 Experiments in a–e were repeated at least three times with similar results represented above This striking accumulation of nuclear pTDP-43 and HTT into distinct spherical AL bodies in MAP2-positive neurons from HD patient brains represents a type of TDP-43 pathology not previously described that may be unique to HD b and h were repeated at least three times with similar results represented above These human data are consistent with the dysregulation of m6A modification in the R6/2 mice These results support that (1) there is a connection between m6A RNA modification and TDP-43 and (2) TDP-43 dysfunction occurs before m6A dysregulation we generated and integrated multi-omics data to investigate mechanisms involved in aberrant splicing We demonstrated that the RBP TDP-43 and the m6A writer METTL3 have altered protein subcellular localization and protein expression These alterations accompanied a corresponding enrichment in HD-specific AS and decreased interaction with dysregulated RNAs defining the striatal HD signature IF imaging in HD mice and HD patient brain tissue revealed co-localization of TDP-43 with mutant HTT in nuclear inclusions decreased nuclear TDP-43 and a corresponding increase in aggregated phosphorylated TDP-43 in the cytoplasm We also found an accumulation of spherical fibrous-like pTDP-43 in the nucleus of Map2-positive neurons that co-localize with HTT Our analysis of RNA-seq data from both mouse and human samples revealed changes in AS with increased exon exclusion events in HD and toxicity is modulated through genetic perturbation of m6A machinery Our finding that TDP-43 binding corresponds with m6A deposition on downregulated striatal genes in HD suggests a co-regulatory role for m6A modification with TDP-43 in HD We also observed that the presence of aberrant previously unannotated exon splicing corresponds to hallmark HD genes that are primarily downregulated; however both an increase and a decrease in novel unannotated exon expression were identified to result in DGE We propose a mechanism in which the interaction of HTT with multiple RBPs regulates CE splicing Further studies are required to identify additional CE-regulating RBPs Our TDP-43 eCLIP-seq detected decreased TDP-43 binding to RNAs making it plausible that these AL bodies may be similar to anisosomes and contain RNA-free TDP-43; this can be addressed in future studies we identified three TDP-43 aggregation phenotypes in HD one of which has not previously been observed which presents the possibility that our molecular readouts are resulting from a combinatorial effect of known TDP-43-dependent regulation and regulation not previously described Our future efforts will be aimed at elucidating the role of AL bodies in HD progression and neurodegeneration No statistical methods were used to pre-determine sample sizes; however sample size selections were made to be similar to previously published studies and all experiments were repeated at least three times with similar results as represented in this study Data were assumed to be normally distributed all treatment groups were randomly selected and analyses were performed blinded to treatment/disease status Data were excluded only for animals that did not survive to endpoint Differential expression statistics were performed within cited packages below Statistical testing was performed in GraphPad Prism 10 (GraphPad Software) Human brain samples were obtained in collaboration with the Netherlands Brain Bank (NBB), the Netherlands Institute for Neuroscience, Amsterdam (open access: https://www.brainbank.nl/) and the Neurological Foundation of New Zealand Human Brain Bank (NZBB). Information on patients can be found in Supplementary Data 9 All materials have been removed of any patient identifiers All material was collected from donors for whom or from whom a written informed consent for a brain autopsy and the use of the material and clinical information for research purposes had been obtained by the NBB and the NZBB approved by the Health and Disability Ethics Committee (ethics no.: 14/NTA/208/AM02) Animal experiments were carried out in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals and an approved animal research protocol by the Institutional Animal Care and Use Committee (IACUC AUP-21-087) at the University of California an institution accredited by the Association for Assessment and Accreditation of Laboratory Animal Care all procedures were conducted in accordance with the guidelines of the University of California iPSC work carried out in this study was approved by the UCI Human Stem Cell Research Oversight Committee (UCI hSCRO no 118) and the UCI Institutional Review Board (UCI IRB no 000664)) in this study were obtained from The Jackson Laboratory at approximately 5 weeks of age The sex of animals was balanced and age matched All mice were housed on a 12-h light/dark schedule with ad libitum access to food and water Animals were housed at controlled temperature and humidity: 70 ± 2 °F and 50 ± 5% humidity (RASL-seq: (68–79 °F) and humidity (30–70%)) Animals were aged and then euthanized with Euthasol overdose (pentobarbital sodium and phenytoin sodium) Cardiac perfusion was performed with 0.01 M PBS followed by brain harvesting and isolation of striatum and cortex from the left hemisphere that was flash frozen in liquid nitrogen and stored at −80 °C until use for biochemical analysis The other halves were post-fixed in 4% paraformaldehyde cryoprotected in 30% sucrose and cut at 30 μm on a sliding vibratome for immunohistochemistry frozen tissues were lysed (lysis buffer: 50 mM Tris-HCl pH 7.4 1:200 Protease Inhibitor Cocktail III (add fresh) samples were homogenized by douncing in lysis buffer followed by incubation on ice for 30 min Lysate was then sonicated 3× for 10 s at 40% amplitude Protein quantification was performed by Lowery protein assay with linear range dilution C57Bl/6 males were dosed at 5 weeks of age with PBS control (n = 5) or ASO (Ionis Pharmaceuticals) targeting TDP-43 (n = 5) at 500 µg by ICV bolus injection cortex and striatum were collected for downstream analysis iPSC colonies switched to neural induction medium (Advanced DMEM/F12 (1:1) supplemented with 2 mM GlutaMAX cells were passaged 1:2 with Accutase and replated on Matrigel Cells were passaged again 1:2 at day 8 with Accutase and replated on Matrigel in a different medium (Advanced DMEM/F12 (1:1) supplemented with 2 mM GlutaMAX cells were dissociated with Accutase and plated at a density of 111,000 per cm2 on NUNC-treated tissue culture plastic treated with poly-d-lysine and Matrigel in SCM1 medium (Advanced DMEM/F12 (1:1) supplemented with 2 mM GlutaMAX Full medium change to SCM2 medium at day 23 (Advanced DMEM/F12 (1:1): Neurobasal A (50:50) supplemented with 2 mM GlutaMAX 50% medium change every 2–3 d until day 37 Human TDP-43 and non-targeting control siRNA were obtained from Horizon Discovery (Accell SMARTPool E-012394-00-0050); cells were treated with siRNA on day 23 and harvested at day 37 the optimal protein concentration and primary antibody concentration were determined by linear range according to LI-COR’s protocol 1 mg of protein was used for each IP—1:1,000 dilution of primary antibody and 30 µl of Dynabeads sheep anti-rabbit or goat anti-mouse IP was carried out by incubation for 1 h at room temperature and then a wash was performed with 3× high-salt wash buffer (50 mM Tris-HCl pH 7.4 followed by 2× low-salt wash buffer (20 mM Tris-HCl pH 7.4 Elution was achieved by 10-min incubation at 80 °C in 1× LDS and 1 mM DTT Co-IP and regular western blot samples were run on 4–12% Bis-Tris gels and 3–8% Tris-acetate Specific protein band analysis was performed using LI-COR Empira Studio software with normalization to Revert total protein stain A list of primary and secondary antibodies with dilutions used in this study can be found in Supplementary Data 10 coronal sections that included the striatum were selected Antigen retrieval (AR) was performed for 20 min at 80 °C (AR buffer: 10 mM Tri-Na citrate buffer Tissue slices were permeabilized for 10 min at room temperature (permeabilization buffer: PBS + 2.5% BSA and 0.2% Triton X-100) followed by blocking for 2 h at room temperature (blocking buffer: PBS + 5% NGS (or NDS) + 1% BSA + 0.1% Triton X-100) Primary antibodies were added at the indicated concentration in blocking buffer and incubated overnight at 4 °C Secondary antibody was performed for 2 h at room temperature followed by Hoeschst (1:3,000) for 10 min at room temperature Tissues were then mounted onto slides and coverslips with Fluoromount-G (Southern Biotech 5 µm of paraffin-embedded sections was used Tissue sections were heated at 65 °C for 30 min and then deparaffinized with 100% CitriSolv (Thermo Fisher Scientific Milli-Q water for 5 min two times and rehydrated and then AR was performed with antigen unmaking solution (Vector Laboratories Sections were blocked for 1 h at room temperature with 5% normal goat or donkey serum in 0.1% Triton X-100 Sections were incubated in primary antibody overnight at 4 °C in 1% normal donkey serum in 0.1% Triton X-100 Sections were then incubated in secondary antibodies (1:400 dilution) for 1 h at room temperature in 1× PBS Secondary antibodies used included Alexa Fluor 488 (1:400; Thermo Fisher Scientific A-21202) and Alexa Fluor 555 (1:400; Thermo Fisher Scientific Tissues were then treated with TrueBlack Lipofuscin Autofluorescence Quencher (Biotium 23007) and incubated in Hoeschst for 10 min at room temperature Sections were then mounted with coverslips using Fluoromount-G Images were taken on a Zeiss AiryScan 900 and an Olympus FluoView FV2000 confocal system Images were processed using AiryScan software Images were taken with the same acquisition settings Images were then imported to Imaris imaging software version 9 for post-imaging analysis AL body images had background subtraction to make the phenotype clearer; however no intensity measurements or statistics were performed images had all ‘auto-adjustment’ settings reset to raw values minimum/maximum and gamma (default value of 1) were applied to all images for comparison and analysis each cell containing nuclear IF signal was quantified with the Imaris surface tool (version 10) and CellProfiler (version 4.2.6) Normalization was performed between animals by dividing by surface volume Statistical analysis was performed with unpaired two-tailed t-test between HD versus NT and one-way ANOVA with multiple comparisons where appropriate The module talon_label_reads with option –ar 20 was used to compute the fraction of As at the ends of read alignments TALON databases of mouse (Ensembl 87 annotations) were created using talon_initialize_database with options –l 0–5p 500–3p 300 The TALON module was run with default parameters to identify transcripts using the initialized database which are limited to known and consistently observed transcripts were generated using talon_filter_transcripts Filtered and unfiltered transcript abundances were obtained using the talon_abundance module Processing of RNA for LC–MS-based m6A analysis was performed as described in Mathur et al.112 100 ng of twice-purified polyA-RNA was digested with 1 U of nuclease P1 (Sigma-Aldrich followed by treatment of 1 U of alkaline phosphatase (Sigma-Aldrich 21.5 μl of the purified nucleoside sample (equivalent of 25 ng of RNA) was mixed with 2× volume (43 μl) of acetonitrile Samples were centrifugated at 16,000g for 10 min at 4 °C and 40 µl of supernatant was loaded into MS vials Adenosine and m6A signals were analyzed by a quadrupole Orbitrap mass spectrometer (Thermo Fisher Scientific) coupled to hydrophilic interaction chromatography (HILIC) via electrospray ionization LC separation was performed on an Xbridge BEH amide column (2.1 mm × 150 mm 130-Å pore size; Waters) at 25 °C using a gradient of solvent A (5% acetonitrile in water with 20 mM ammonium acetate and 20 mM ammonium hydroxide) and solvent B (100% acetonitrile) The autosampler temperature was set at 4 °C and the injection volume of the sample was 3 μl MS data were acquired in positive ion mode with a full-scan mode from m/z 240 to 290 with 140,000 resolutions Data were analyzed using El-MAVEN software (version 0.12.0) S3190) standards were quantified based on standard calibration curves using authentic standards Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington’s disease chromosomes The Huntington’s Disease Collaborative Research Group Huntington’s disease: underlying molecular mechanisms and emerging concepts Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onset The contribution of somatic expansion of the CAG repeat to symptomatic development in Huntington’s disease: a historical perspective Propensity for somatic expansion increases over the course of life in Huntington disease Disease-associated repeat instability and mismatch repair Special issue: DNA repair and somatic repeat expansion in Huntington’s disease The pathogenic exon 1 HTT protein is produced by incomplete splicing in Huntington’s disease patients HDinHD: a rich data portal for Huntington’s disease research Proteomic analysis of wild-type and mutant huntingtin-associated proteins in mouse brains identifies unique interactions and involvement in protein synthesis Proteins with intrinsically disordered domains are preferentially recruited to polyglutamine aggregates RNA-binding protein TLS is a major nuclear aggregate-interacting protein in huntingtin exon 1 with expanded polyglutamine-expressing cells Huntingtin protein interactions altered by polyglutamine expansion as determined by quantitative proteomic analysis Huntington’s disease mice and human brain tissue exhibit increased G3BP1 granules and TDP43 mislocalization Colocalization of transactivation-responsive DNA-binding protein 43 and huntingtin in inclusions of Huntington disease Interaction with polyglutamine aggregates reveals a Q/N-rich domain in TDP-43 Ubiquitinated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis Long pre-mRNA depletion and RNA missplicing contribute to neuronal vulnerability from loss of TDP-43 TDP-43 is a component of ubiquitin-positive tau-negative inclusions in frontotemporal lobar degeneration and amyotrophic lateral sclerosis TDP-43 and FUS/TLS: emerging roles in RNA processing and neurodegeneration Phosphorylated TDP-43 in frontotemporal lobar degeneration and amyotrophic lateral sclerosis TDP-43 proteinopathy presenting with typical symptoms of Parkinson’s disease TARDBP mutations in a cohort of Italian patients with Parkinson’s disease and atypical parkinsonisms Coexistence of Huntington’s disease and amyotrophic lateral sclerosis: a clinicopathologic study Loss of TDP‐43 oligomerization or RNA binding elicits distinct aggregation patterns Characterizing the RNA targets and position-dependent splicing regulation by TDP-43 TDP-43 repression of nonconserved cryptic exons is compromised in ALS-FTD TDP-43 loss and ALS-risk SNPs drive mis-splicing and depletion of UNC13A TDP-43 represses cryptic exon inclusion in the FTD–ALS gene UNC13A Premature polyadenylation-mediated loss of stathmin-2 is a hallmark of TDP-43-dependent neurodegeneration ALS-implicated protein TDP-43 sustains levels of STMN2 a mediator of motor neuron growth and repair Region-specific RNA m6A methylation represents a new layer of control in the gene regulatory network in the mouse brain A majority of m6A residues are in the last exons allowing the potential for 3′ UTR regulation Transient N-6-methyladenosine transcriptome sequencing reveals a regulatory role of m6A in splicing efficiency Methylation of structured RNA by the m6A writer METTL16 is essential for mouse embryonic development The m6A writer: rise of a machine for growing tasks Dynamic m6A modification regulates local translation of mRNA in axons METTL3-mediated m6A modification is required for cerebellar development Temporal control of mammalian cortical neurogenesis by m6A methylation m6A-Atlas: a comprehensive knowledgebase for unraveling the N6-methyladenosine (m6A) epitranscriptome Landscape and regulation of m6A and m6Am methylome across human and mouse tissues Topology of the human and mouse m6A RNA methylomes revealed by m6A-seq N6-methyladenosine (m6A) recruits and repels proteins to regulate mRNA homeostasis Epitranscriptomes in the adult mammalian brain: dynamic changes regulate behavior RNA methylation influences TDP43 binding and disease pathogenesis in models of amyotrophic lateral sclerosis and frontotemporal dementia Altered m6A RNA methylation contributes to hippocampal memory deficits in Huntington’s disease mice Exon 1 of the HD gene with an expanded CAG repeat is sufficient to cause a progressive neurological phenotype in transgenic mice Neurological abnormalities in a knock-in mouse model of Huntington’s disease Longitudinal evaluation of the Hdh(CAG)150 knock-in murine model of Huntington’s disease Comprehensive behavioral and molecular characterization of a new knock-in mouse model of Huntington’s disease: ZQ175 SONAR discovers RNA-binding proteins from analysis of large-scale protein–protein interactomes Transcriptional signatures in Huntington’s disease Obenauer, J. C. et al. Expression analysis of Huntington disease mouse models reveals robust striatum disease signatures. Preprint at bioRxiv https://doi.org/10.1101/2022.02.04.479180 (2023) rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-seq data Versatile pathway-centric approach based on high-throughput sequencing to anticancer drug discovery The Akt-SRPK-SR axis constitutes a major pathway in transducing EGF signaling to regulate alternative splicing in the nucleus Single-nucleus RNA-seq identifies Huntington disease astrocyte states Wyman, D. et al. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. Preprint at bioRxiv https://doi.org/10.1101/672931 (2019) Huntington disease oligodendrocyte maturation deficits revealed by single-nucleus RNAseq are rescued by thiamine-biotin supplementation Single-cell differential splicing analysis reveals high heterogeneity of liver tumor-infiltrating T cells Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities Single-nucleotide-resolution mapping of m6A and m6Am throughout the transcriptome Network organization of the huntingtin proteomic interactome in mammalian brain TDP-43 regulates its mRNA levels through a negative feedback loop Cell environment shapes TDP-43 function with implications in neuronal and muscle disease Truncated stathmin-2 is a marker of TDP-43 pathology in frontotemporal dementia UPFront and center in RNA decay: UPF1 in nonsense-mediated mRNA decay and beyond A new view of transcriptome complexity and regulation through the lens of local splicing variations RNA sequence analysis of human Huntington disease brain reveals an extensive increase in inflammatory and developmental gene expression Robust transcriptome-wide discovery of RNA-binding protein binding sites with enhanced CLIP (eCLIP) Aberrant development corrected in adult-onset Huntington’s disease iPSC-derived neuronal cultures via WNT signaling modulation N6-methyladenosine-dependent RNA structural switches regulate RNA–protein interactions Interaction of tau with HNRNPA2B1 and N6-methyladenosine RNA mediates the progression of tauopathy Mechanism of STMN2 cryptic splice-polyadenylation and its correction for TDP-43 proteinopathies Mis-spliced transcripts generate de novo proteins in TDP-43–related ALS/FTD A fluid biomarker reveals loss of TDP-43 splicing repression in presymptomatic ALS–FTD m1A in CAG repeat RNA binds to TDP-43 and induces neurodegeneration Transcriptome sequencing reveals aberrant alternative splicing in Huntington’s disease Widespread dysregulation of mRNA splicing implicates RNA processing in the development and progression of Huntington’s disease Huntington’s disease-specific mis-splicing unveils key effector genes and altered splicing factors Regulatory mechanisms of incomplete huntingtin mRNA splicing Splicing repression is a major function of TDP-43 in motor neurons Nuclear bodies: random aggregates of sticky proteins or crucibles of macromolecular assembly Nuclear bodies in neurodegenerative disease HSP70 chaperones RNA-free TDP-43 into anisotropic intranuclear liquid spherical shells METTL3 from target validation to the first small-molecule inhibitors: a medicinal chemistry journey Small-molecule inhibition of METTL3 as a strategy against myeloid leukaemia Inducible and reversible RNA N6-methyladenosine editing Li, H., Qiu, J. & Fu, X. D. RASL-seq for massively parallel and quantitative analysis of gene expression. Curr. Protoc. Mol. Biol. https://doi.org/10.1002/0471142727.mb0413s98 (2012) Integrated genome browser: visual analytics platform for genomics Developmental alterations in Huntington’s disease neural cells and pharmacological rescue in cells and mice featureCounts: an efficient general purpose program for assigning sequence reads to genomic features Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 Differential expression analysis of multifactor RNA-seq experiments with respect to biological variation GOrilla: a tool for discovery and visualization of enriched GO terms in ranked gene lists Full-length RNA-seq from single cells using Smart-seq2 Minimap2: pairwise alignment for nucleotide sequences Rbfox proteins regulate alternative mRNA splicing through evolutionarily conserved RNA bridges deepTools2: a next generation web server for deep-sequencing data analysis MetaPlotR: a Perl/R pipeline for plotting metagenes of nucleotide modifications and other transcriptomic sites Quantitative analysis of m6A RNA modification by LC-MS Download references We would like to thank the patients affected by HD and their families for their critical contributions to research This work was supported by a Chan Zuckerberg Initiative Collaborative Pairs grant (L.M.T and R.C.S.) and by the following National Institutes of Health (NIH) grants: R35 NS116872 (L.M.T.) R01 AA029124 (C.J.) and K22CA234399 (G.L.) It was also supported by US Department of Defense grant TS200022 (G.L.) Additional support was provided by the National Institute of Neurological Disorders and Stroke of the NIH under award number F31NS124293T32 (T.B.N.) a Hereditary Disease Foundation postdoctoral fellowship (R Maimon) and a postdoctoral fellowship from the ALS Association (S.V.-S.) The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH We would like to thank the Netherlands Brain Bank and R Curtis (New Zealand Brain Bank) for supplying the human brain tissue for our study Petrucelli for the generous gift of the S409/S410 TDP-43 antibody Brown for help on MAJIQ analysis and for scientific discussions We would like to thank the laboratory of G Van Nostrand and ECLIPSEBIO for the wonderful scientific discussions Department of Cellular and Molecular Medicine Department of Psychiatry & Human Behavior Department of Microbiology and Molecular Genetics Broad Institute of Harvard University and MIT Nature Neuroscience thanks the anonymous reviewers for their contribution to the peer review of this work figure legends for Supplementary Data 1–12 and supplementary figures and references DESeq2 output from HTT (18Q) non-targeting siRNA versus TDP-43 KD siRNA at day 37 MSNs Statistical P values listed were determined using the DESeq2 package Filtered for events with adjusted P < 0.05 Supplementary Data 8: List of DEGs that overlapped between TDP-43 KD and mHTT dependent (P < 0.05) as determined by the hypergeometric test Column A contains genes that overlap with upregulated genes in the mHTT condition; column B contains genes that overlap with downregulated genes in the mHTT condition Supplementary Data 9: Summary of human postmortem brain tissues used in this study Supplementary Data 10: List of antibodies used in this study Supplementary Data 11: List of primer pairs used for RASL-seq Supplementary Data 12: RASL-seq raw data for R6/2 (starting at row 1) Q150 (starting at row 1,748) and Q175 (starting at row 3,538) Download citation DOI: https://doi.org/10.1038/s41593-024-01850-w pick a genre and the app will automatically detect the key and find samples that match it Sample platform Splice has launched an update to its mobile app that lets songwriters and producers record vocal ideas over tracks sketched out using its AI-powered Stacks feature Stacks can be used to generate track ideas by layering samples from Splice's library and the app will instantly create a Stack that layers multiple samples in that genre that share the same key and tempo; these can then be mixed muted or swapped out for new samples from Splice's library while the global key and tempo can be adjusted across the whole Stack Splice Mic lets app users record over ideas generated using Stacks and it'll even analyse the vocal recording to find additional samples that match it harmonically After recording a loop of up to one minute in length users can then trim it using the app's audio editor before snapping it to the beat grid Give Splice one of your own loops and it will now use AI to find a compatible stack of sounds to go with it “The phone is already a huge part of music making," says Splice's SVP of Content Kenny Ochoa and now those stacks can be merged with vocals” the company invited two opposing teams of songwriters and producers to create tracks in 60 seconds using Splice's mobile app "We got the team together to see who could start the best new Stacks," said artist and producer Leland and if you ask me how I did it I don't know because I think I was in a flow process”: Rihards Zalupe on composing the music for the Oscar-winning animated movie “We’d do a soundcheck and everything would be great it wouldn’t turn on”: 10 more things producers can learn from our In The Studio With.. It was unbelievable”: The making of Robert Palmer's Addicted To Love