Scientific Perspectives
Exploring potential RNA-binding protein interactions in conserved unstructured regions of SARS-CoV-2
Rojano-Nisimura, A.M., Lukasiewicz, A.J., and Contreras, L.M.*
*A.M. and A.J. contributed equally to this work.
The current Coronavirus (COVID-19) pandemic, has now spread to over 200 countries, with 1,356,780 confirmed cases as of April 8th, 2020 1. With these numbers growing rapidly, there is an urgent need for the scientific community to consolidate efforts and generate data to better understand this disease and develop diagnosis and treatment strategies.
COVID-19 is caused by the Severe Acute Respiratory Symptom coronavirus 2 (SARS-CoV-2) virus, which is part of the Betacoronavirus genus. These viruses are positive-sense single-stranded RNA ((+) ssRNA) viruses, meaning that their genomic RNA (gRNA) simultaneously serves as messenger RNA (mRNA) to encode for proteins 2. Coronaviruses have a relatively large genome size of ~30 kilobases (kb); the SARS-CoV-2 genome is 29,903 bp 3.
The severity of our most recent pandemic has prompted a global effort by our scientific research community, with many groups redirecting resources to understand the underlying biology and treatment of this virus. Currently, databases such as the Global Initiative on Sharing All Influenza Data (GISAID) host thousands of newly sequenced isolates of the SARS-CoV-2 genome, allowing for identification of genomic features that contribute to its spread and survival. Comparisons of these sequences to other well-studied coronaviruses, such as SARS-CoV that caused the 2002 global pandemic, can allow us to generate hypotheses regarding conserved factors that contribute to its biology. For single stranded RNA viruses, some of these features include the genomic RNA secondary structure and interactions with viral and cellular RNA binding proteins.
Given the functional and essential role of these regions, conservation is widespread among members of the same viral genus. Some of the most well-understood structural elements of the coronavirus genome include regions at the 5’ and 3’ ends. The 5’ end contains several cis-acting RNA elements that consist of highly-structured stem loops. Recognition of these elements is needed for genome replication. At the 3’ end there are two conserved structures that are also essential for RNA synthesis: a bulged stem loop and a pseudoknot. Additionally, transcription-regulating sequences (TRS) that flank each gene and are recognized during coronavirus transcription 4.
Structural elements within protein-encoding regions have also been identified in the coronavirus genome. A three-stemmed pseudoknot was confirmed in the SARS-CoV, which serves as a stimulatory element for ribosomal frameshifting and translation of the RNA dependent RNA polymerase needed for viral replication 5.
Beyond conserved structured regions, identifying conserved regions that lack [or are less likely to form] a secondary structure is important, since they represent potential drug targets for nucleic-acid therapeutics. Particularly, antisense oligonucleotide-based drugs will bind more tightly to regions without stable secondary structural elements 6. For this purpose, antisense oligonucleotides (ASOs) can be designed to target specific viral RNA sequences through complementary base pairing. Once bound, the antisense oligonucleotide can modulate RNA expression by either blocking transcript translation, occluding binding sites needed for protein interactions or by cis-acting RNA elements, promoting degradation by endogenous enzymes, or disrupting key structural elements 7. These mechanisms have been taken advantage of in applied therapeutics: In 2018, the use of formivisen to treat cytomegalovirus (CMV) was the first antisense oligonucleotide-based therapeutic to get FDA approval8.
A recent study has identified conserved structured and unstructured regions of the SARS-CoV-2 genome that may contribute to viral replication 9. To investigate the potential for these conserved unstructured regions to act as targets of ASO therapeutics and study, we performed a search for common RNA binding motifs within these short regions. Modified oligonucleotides, such as Locked Nucleic Acids (LNAs), can hybridize without triggering RNA cleavage, and are commonly used in the study of binding site-specific interactions. Using the ATtRACT database 10, we searched for the presence of RNA-binding protein motifs in the set of conserved unstructured regions identified in 9. Other computational approaches have identified possible RNA-binding proteins (RBPs) that interact with the SARS-CoV-2 genome 11. By filtering our predicted motifs by this published set of predicted RBPs and interactions domains, we have identified a set of 40 possible RBP-gene target pairs of interest to those in the viral interactome community.
Of these, 6 have been identified to serve a role in viral translation and replication. These include Poly(rC) Binding Proteins 1 and 2 (PCBP1, PCBP2), hnRNP A1(ROA1), the lupus protein (LA), Polypyrimidine tract-binding protein 1 (PTBP1), and hnRNP Q. (Figure 1b) Of the RNA binding proteins found to overlap with predicted SARS-CoV-2 binding proteins 11, the hnRNP Q protein was found to bind at an unstructured region at the 3’ end of the virus (29865-29902). Prior studies of SARS-CoV have identified hnRNP Q as contributing positively to viral RNA synthesis 12 and may be a target of interest. In addition, hnRNP A1 is a key RBP involved in coronavirus translation initiation 13 and was predicted to bind to many of the conserved unstructured elements used in this investigation.
Outside of these virus-specific RNA-binding proteins, many of the additional proteins identified co-localize to the cytoplasm during stress granule formation, or are found in key regions of viral protein expression, such as A1CF and its presence at the end of the endoplasmic reticulum. The link between these RBPs and viral RNA is speculative, but the association could be indicative of host response to invasion.
Among the set of 40 possible RBP-gene target pairs we evaluated the sequence specificity of single unstructured RNA regions to SARS-CoV-2 and identified three high specificity (E-value <0.15), low RBP-interacting unstructured regions: the first (18219-18233) one within the ORF1ab region, a second one(22552-22566) within the region encoding for the spike glycoprotein (S), and a third region (29074-29088) within the sequence encoding for the nucleocapsid phosphoprotein (N). Importantly, when aligned against the human genome, we did not find any significant match that could dissuade their use as target therapeutic regions (E-value > 5).
The unstructured regions discussed in this article could be further studied as candidate regions to target SARS-CoV-2. Additionally, the predicted RBP-gene target pairs should be validated experimentally, since the information about binding interactions in vivo would improve our understanding of the virus and its life cycle and provide additional guidance for the development of treatments.
References:
1. World Health Organization (WHO). (2020). Coronavirus disease (COVID-19) Pandemic. https://www.who.int/emergencies/diseases/novel-coronavirus-2019
2. Fehr, A. R., & Perlman, S. (2015). Coronaviruses: an overview of their replication and pathogenesis. Methods in molecular biology (Clifton, N.J.), 1282, 1–23. https://doi.org/10.1007/978-1-4939-2438-7_1
3. Wu, F., Zhao, S., Yu, B., Chen, Y. M., Wang, W., Song, Z. G., Hu, Y., Tao, Z. W., Tian, J. H., Pei, Y. Y., Yuan, M. L., Zhang, Y. L., Dai, F. H., Liu, Y., Wang, Q. M., Zheng, J. J., Xu, L., Holmes, E. C., & Zhang, Y. Z. (2020). A new coronavirus associated with human respiratory disease in China. Nature, 579(7798), 265–269. https://doi.org/10.1038/s41586-020-2008-3
4. Sola, I., Almazán, F., Zúñiga, S., & Enjuanes, L. (2015). Continuous and Discontinuous RNA Synthesis in Coronaviruses. Annual review of virology, 2(1), 265–288. https://doi.org/10.1146/annurev-virology-100114-055218
5. Plant, E. P., & Dinman, J. D. (2008). The role of programmed-1 ribosomal frameshifting in coronavirus propagation. Frontiers in bioscience: a journal and virtual library, 13, 4873–4881. https://doi.org/10.2741/3046
6. Priore, S. F., Moss, W. N., and Turner, D. H. (2013). Influenza B virus has global ordered RNA structure in (+) and (-) strands but relatively less stable predicted RNA folding free energy than allowed by the encoded protein sequence. BMC research notes, 6, 330. https://doi.org/10.1186/1756-0500-6-330
7. Bennett, C.F., and Swayze, E.E. (2010). RNA targeting therapeutics: molecular mechanisms of antisense oligonucleotides as a therapeutic platform. Annu Rev Pharmacol Toxicol. 2010;50:259-93. doi: 10.1146/annurev.pharmtox.010909.105654.
8. Bajan, S., & Hutvagner, G. (2020). RNA-Based Therapeutics: From Antisense Oligonucleotides to miRNAs. Cells, 9(1), 137. https://doi.org/10.3390/cells9010137
9. Rangan, R., Zheludev, I. N., & Das, R. (2020). RNA genome conservation and secondary structure in SARS-CoV-2 and SARS-related viruses. BioRxiv, 2020.03.27.012906. https://doi.org/10.1101/2020.03.27.012906
10. Giudice, G., Sanchez-Cabo, F., Torroja, C., and Lara-Pezzi, E. (2016). ATtRACT—a database of RNA-binding proteins and associated motifs. Data- base 2016, baw035.
11. Vandelli, A., Monti, M., Milanetti, E., Ponti, R. D., & Tartaglia, G. G. (2020). Structural analysis of SARS-CoV-2 and prediction of the human interactome. BioRxiv, 2, 2020.03.28.013789. https://doi.org/10.1101/2020.03.28.013789
12. Galán, C., Sola, I., Nogales, A., Thomas, B., Akoulitchev, A., Enjuanes, L., & Almazán, F. (2009). Host cell proteins interacting with the 3′ end of TGEV coronavirus genome influence virus replication. Virology, 391(2), 304–314. https://doi.org/10.1016/j.virol.2009.06.006
13. K.Nakagawa, K.G. Lokugamage, S. M. (2016). Viral and Cellular mRNA Translation in Coronavirus Infected Cells. Advances in Virus Research, 96(January), 165–192.
Alejandra Matsuri Rojano-Nisimura
Matsuri Rojano-Nisimura is a third year Ph.D. student and Fulbright-Garcia Robles grantee in the Biochemistry program at The University of Texas at Austin. Her research explores how RNA-based regulation helps pathogenic bacteria survive and adapt to different stresses inside their host.