GENOME-WIDE ANALYSIS OF POLYADENYLATION SITES IN Glycine max
W. Shah1, M. Sajjad1, N. Akhtar2 and M. N. Akhtar1,*
1Department of Biosciences, COMSATS University Islamabad, Islamabad 45550, Pakistan;
2Department of Health Informatics, University of Hail, Hail, Saudi Arabia.
*Corresponding author’s email: nadeemakhtar@comsats.edu.pk
ABSTRACT
Alternative polyadenylation (APA) is a critical cellular process that dynamically regulates gene expression and contributes to transcriptome and proteome diversity by impacting about 70% genes in animals and plants. However, the lack of extensive 3'-sequencing data limits comprehensive understanding of polyadenylation in Glycine max. This study aimed to address this by identifying high quality polyadenylation clusters (PACs) using 12 billion reads from the 568 RNA-Seq samples. This study identified 75,556 PACs in the Glycine max genome, primarily in 3'-UTRs but also in 5'-UTRs, introns, and intergenic regions. Intergenic PACs and RNA-Seq evidence extended the 3’-ends of many genes, revealing annotation gaps. APA was observed in 65% of the genes, much higher than 19% noted in Ensemb l annotations. APA genes depicted complex PAC expression, with dominant PACs linked to diverse cellular processes including translation, stability, transport, cellular organization, and stress response. Using a uniform criterion, the nucleotide composition and motifs in Glycine max were extensively compared with plants including Oryza sativa, Arabidopsis thaliana,Medicago truncatula, and Zea mays. The results highlighted preference for AAUAAA and its variant motifs, which were less frequent in all plants. However, Glycine max top 3’-UTRs motifs showed conservation and appeared consistently as top motifs across other plants. Additionally, nucleotide composition in AAUAAA region was conserved, but far upstream region diverged between monocotyledonous and dicotyledonous plants groups. Genes with AAUAAA were involved in metabolic processes consistent with Zea mays indicating evolutionary constraints. Taken together, our results offer a comprehensive resource for understanding polyadenylation mediated gene regulation in Glycine max.
Keywords: Alternative Polyadenylation; Incomplete 3'-UTR; AAUAAA; Monocotyledonous; Dicotyledonous; Transcriptome; RNA-Seq; Annotation; Far Upstream Region; Soybean |