<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>jbiol174</ui>
   <ji>1475-4924</ji>
   <fm>
      <dochead>Minireview</dochead>
      <bibl>
         <title>
            <p>Motifs from the deep</p>
         </title>
         <aug>
            <au ca="yes" id="A1">
               <snm>Hwang</snm>
               <mi>W</mi>
               <fnm>Tony</fnm>
               <insr iid="I1"/>
               <email>tony.hwang@mail.utexas.edu</email>
            </au>
            <au id="A2">
               <snm>Codrea</snm>
               <fnm>Vlad</fnm>
               <insr iid="I1"/>
            </au>
            <au id="A3">
               <snm>Ellington</snm>
               <mi>D</mi>
               <fnm>Andrew</fnm>
               <insr iid="I1"/>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Department of Chemistry and Biochemistry, Institute for Cell and Molecular Biology, University of Texas, Austin, TX78712, USA</p>
            </ins>
         </insg>
         <source>Journal of Biology</source>
         <issn>1475-4924</issn>
         <pubdate>2009</pubdate>
         <volume>8</volume>
         <issue>8</issue>
         <fpage>72</fpage>
         <url>http://jbiol.com/content/8/8/72</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">19735583</pubid>
               <pubid idtype="doi">10.1186/jbiol174</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <pub>
            <date>
               <day>2</day>
               <month>9</month>
               <year>2009</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2009</year>
         <collab>BioMed Central Ltd</collab>
      </cpyrt>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <p>Because of the increasing recognition of the importance of non-coding RNAs in gene regulation, there is considerable interest in identifying RNA motifs in genomic data. In a recent report in <it>BMC Genomics</it>, Breaker and colleagues describe a new algorithm for identifying functional noncoding RNAs in metagenomic sequences of marine organisms, a strategy that may be particularly effective for discovering new and unique riboswitches.</p>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="bmcbiol_series_title" id="bmcbiolcommentary">Commentary</classification>
         <classification type="BMC" subtype="bmcbiol_series_editor" id="bmcbiolcommentary"/>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p/>
         </st>
         <p>Noncoding RNAs (ncRNAs) are increasingly recognized as mediators of disease <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> and as fundamental regulators of metabolic pathways in prokaryotes <abbrgrp><abbr bid="B2">2</abbr></abbrgrp> and eukaryotes <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. An unexpectedly large number of ncRNAs have been found to have key roles in essential cellular functions, including chromosome maintenance and DNA replication, RNA processing and translation, and protein translocation and stability <abbrgrp><abbr bid="B2">2</abbr><abbr bid="B4">4</abbr></abbrgrp>. The largest class of regulatory RNAs comprises microRNAs (miRNAs) of less than 30 nucleotides that bind to mRNAs and promote degradation or repress translation <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. Less numerous than miRNAs, but widespread among bacteria, are riboswitches: structured RNAs located primarily in the 3' or 5' untranslated regions (UTRs) of bacterial mRNAs that bind metabolites and change conformation to regulate gene expression. Ribo-switches are characterized by conserved motifs that include an 'aptamer domain' that recognizes the metabolite ligands and an 'expression platform' that can alter the conformation and function of regulatory elements involved in transcription or translation.</p>
         <p>In recent years, experimental and bioinformatic strategies have been developed to discover ncRNA candidates in organisms ranging from <it>Escherichia coli </it>to humans. The laboratory of Ronald Breaker has now developed a novel method that extends this search to marine metagenomic data. The current work started with the genome of '<it>Candidatus </it>Pelagibacter ubique', which comprises as much as 20% of all marine metagenomic sequence reads <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>, making it possibly the most abundant organism in the world, with estimates of approximately 10<sup>28 </sup>individual cells. '<it>C. </it>P. ubique' has the smallest genome yet found in a free-living organism, consisting of only 1.3 megabases, 1,354 genes, and very little noncoding DNA <abbrgrp><abbr bid="B6">6</abbr></abbrgrp>.</p>
         <p>Computational methods have previously been successful at identifying structured RNAs, and the authors <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> developed a so-called comparative genomics pipeline for identifying regions in the '<it>C</it>. P. ubique' genome most likely to contain functional ncRNAs (Figure <figr fid="F1">1</figr>). Their strategy improves on a similar method developed by the same lab in 2007 <abbrgrp><abbr bid="B7">7</abbr></abbrgrp> to search bacterial genomes. The 2007 work used complex criteria to define UTRs and identify structured sequences within them, but the current method treats all intergenic regions (IGRs), whether transcribed or not, as potentially harboring ncRNAs. The average IGR length in '<it>C</it>. P. ubique' is a meager 3 nucleotides, so the authors narrowed their search to a short list of IGRs that are longer than 100 nucleotides, known to contain the vast majority of previously identified functional RNAs. For comparison, the average IGR in <it>Saccharomyces cerevisiae </it>is 515 nucleotides and in humans is 12,000 nucleotides, so it is more difficult to construct a manageable list of potentially functional IGRs in these organisms using the size criterion alone.</p>
         <fig id="F1">
            <title>
               <p>Figure 1</p>
            </title>
            <caption>
               <p>Flowchart of the computational methods used by Breaker <it>et al. </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp> in the identification of candidate ncRNA motifs</p>
            </caption>
            <text>
               <p>Flowchart of the computational methods used by Breaker <it>et al. </it><abbrgrp><abbr bid="B5">5</abbr></abbrgrp> in the identification of candidate ncRNA motifs. The steps in the process were as follows: <b>(a) </b>Identify IGRs by size, %GC content; <b>(b) </b>eliminate ncRNA motifs of known structure, such as tRNAs, rRNAs and annotated riboswitches; <b>(c) </b>find conserved IGR sequences in other genomes using BLAST analysis of the CAMERA database, and exclude protein-coding regions; <b>(d) </b>align IGRs and predict conserved secondary structures using CMFinder; and <b>(e) </b>search for homologs using conserved secondary structure criteria. Green indicates a candidate ncRNA motif.</p>
            </text>
            <graphic file="jbiol174-1"/>
         </fig>
         <p>Once candidate ncRNAs had been identified, a series of homology searches were used to further filter the structural hypotheses <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. The first homology search used '<it>C</it>. P. ubique' IGRs as the reference sequences and looked for homologous IGRs in an ocean metagenomic database maintained by the CAMERA community (see references in <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> for details of programs and databases used). The second homology search looked for similarities between the IGRs and proteins in the NCBI nucleotide/protein sequence and CAMERA databases. IGRs similar to proteins were excluded. The authors were not limited by the novelty and incomplete curation of metagenomic sequences, and were able to predict (and avoid) unannotated protein coding regions using tools such as the MetaGene program and the Conserved Domain Database. CMFinder, a covariance analysis program that looks solely for RNA covariance and does not penalize sequences that show codon preservation, was used to align the '<it>C</it>. P. ubique' IGRs with their homologs and to predict a common secondary structure.</p>
         <p>The program RAVENNA performed the third homology search, but in this case the consensus structures of IGRs, not their individual primary sequences, were used as references <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. An ingenious aspect of this approach is that it is partially iterative: matches to the consensus secondary structures were used to refine those same secondary structures so that they could be used again to search for additional matches. This cycle can be performed any number of times, until a unique (with the exception of pseudoknots) and refined endpoint is achieved. This approach is reminiscent of its analog counterpart, the <it>in vitro </it>selection of functional RNA structures. In contrast, a purely statistical modeling of secondary ncRNA structure might not have been sufficient to identify homologs in other organisms. NcRNAs with the same function may fold differently as a result of having become adapted to the needs of a specific organism. That said, an alternative solution would have been to run the first homology search against the expansive set of genomic and metagenomic databases as opposed to just the CAMERA database. The resulting matches would be more comprehensive and could have possibly reduced the number of structure-based searches needed to arrive at a unique consensus structure.</p>
         <p>By using consensus structures for comparison, the authors <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> could extend their studies from '<it>C</it>. P. ubique' to the enormous number of metagenomic sequences gathered from various environments, ranging from the ocean to acid mine drainage to mammalian intestines. Finally, the authors <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> looked at which genes appeared directly downstream of the putative functional ncRNAs in order to predict the pathways in which the ncRNAs might have a role.</p>
         <p>A wide array of known ncRNAs were identified in the metagenomic sequences. These known ncRNAs include rRNAs, tRNAs, riboswitches, and the RNA components of RNase P and Signal Recognition Particle (SRP) <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. In addition, eight novel structured RNA motifs were found. Four of these were unique to the metagenomic data, whereas the other four motifs include three possible <it>cis</it>-regulatory elements and a new <it>S-</it>adenosylmethionine-V (SAM-V) riboswitch class. One of the <it>cis</it>-regulatory elements, present upstream of the <it>rpsB </it>gene, had previously been characterized <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>, but the others seemed to be novel, indicating that despite extensive genomic sequencing, many novel RNA motifs and functions may still remain to be found. The authors <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> also identified many IGRs that did not exhibit RNA structure but contained relatively short, conserved segments; these sequences may be protein recognition sites on the prokaryotic genophore.</p>
         <p>The small '<it>C</it>. P. ubique' genome has a striking global AT bias (71% AT), hypothesized to be an adaptation to nutrient-poor environments such as the open ocean because of the fact that A and T are energetically cheaper to synthesize <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. '<it>C. </it>P. ubique' probably cannot spend much energy creating regulatory proteins, but rather relies extensively on the interaction between metabolites and nucleic acids. Further evidence of the resourceful nature of these organisms is the fact that the motifs of RNase P RNA, SRP RNA, and two riboswitches were more than one standard deviation smaller in '<it>C</it>. P. ubique' than in other &#945;-proteobacteria <abbrgrp><abbr bid="B5">5</abbr></abbrgrp>. Interestingly, precisely because most of the identified sequences are AT-rich, the metagenomic structures may prove to be especially useful for the identification of mecha nistically important G and C residues in these structured RNAs. In prokaryotes, a high GC content is strongly correlated with structured RNAs and has been hypothesized to increase their stability, whereas there is no such correlation for genomes as a whole or for protein coding regions <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>. This analysis suggests that the disproportionately represented G-C base-pairs in the newly revealed pseudoknot of the SAM-V riboswitch <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> are probably particularly important for its structure and function.</p>
         <p>Indeed, it can be argued that choosing metagenomic sequence information to scour for new riboswitches may have been particularly inspired. Small or streamlined genomes tend to be particularly AT-rich <abbrgrp><abbr bid="B9">9</abbr></abbrgrp>. Such genomes may also have great need for small regulatory elements, and riboswitches are, in general, smaller than corresponding protein-based transcription or translation factors. Thus, examining the 'lifestyles of the small and AT-rich' may not only enable the quick identification of new and unique riboswitches, but also their functional sequences and structures.</p>
      </sec>
   </bdy>
   <bm>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>Mechanisms of RNA-mediated disease</p>
            </title>
            <aug>
               <au>
                  <snm>O'Rourke</snm>
                  <fnm>JR</fnm>
               </au>
               <au>
                  <snm>Swanson</snm>
                  <fnm>MS</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2009</pubdate>
            <volume>284</volume>
            <fpage>7419</fpage>
            <lpage>7423</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.R800025200</pubid>
                  <pubid idtype="pmpid" link="fulltext">18957432</pubid>
                  <pubid idtype="pmcid">2658036</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Regulatory RNAs in bacteria</p>
            </title>
            <aug>
               <au>
                  <snm>Waters</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Storz</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>2009</pubdate>
            <volume>136</volume>
            <fpage>615</fpage>
            <lpage>628</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.cell.2009.01.043</pubid>
                  <pubid idtype="pmpid" link="fulltext">19239884</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>RNA and transcriptional modulation of gene expression</p>
            </title>
            <aug>
               <au>
                  <snm>Hawkins</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Morris</snm>
                  <fnm>KV</fnm>
               </au>
            </aug>
            <source>Cell Cycle</source>
            <pubdate>2008</pubdate>
            <volume>7</volume>
            <fpage>602</fpage>
            <lpage>607</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">18256543</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Experimental approaches to identify non-coding RNAs</p>
            </title>
            <aug>
               <au>
                  <snm>H&#252;ttenhofer</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Vogel</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>635</fpage>
            <lpage>646</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkj469</pubid>
                  <pubid idtype="pmcid">1351373</pubid>
                  <pubid idtype="pmpid" link="fulltext">16436800</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Identification of candidate structured RNAs in the marine organism '<it>Candidatus </it>Pelagibacter ubique'</p>
            </title>
            <aug>
               <au>
                  <snm>Meyer</snm>
                  <fnm>MM</fnm>
               </au>
               <au>
                  <snm>Ames</snm>
                  <fnm>TD</fnm>
               </au>
               <au>
                  <snm>Smith</snm>
                  <fnm>DP</fnm>
               </au>
               <au>
                  <snm>Weinberg</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Schwalbach</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Giovannoni</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Breaker</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>BMC Genomics</source>
            <pubdate>2009</pubdate>
            <volume>10</volume>
            <fpage>268</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1186/1471-2164-10-268</pubid>
                  <pubid idtype="pmcid">2704228</pubid>
                  <pubid idtype="pmpid" link="fulltext">19531245</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Genome streamlining in a cosmopolitan oceanic bacterium</p>
            </title>
            <aug>
               <au>
                  <snm>Giovannoni</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Tripp</snm>
                  <fnm>HJ</fnm>
               </au>
               <au>
                  <snm>Givan</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Podar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Vergin</snm>
                  <fnm>KL</fnm>
               </au>
               <au>
                  <snm>Baptista</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Bibbs</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Eads</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Richardson</snm>
                  <fnm>TH</fnm>
               </au>
               <au>
                  <snm>Noordewier</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Rapp&#233;</snm>
                  <fnm>MS</fnm>
               </au>
               <au>
                  <snm>Short</snm>
                  <fnm>JM</fnm>
               </au>
               <au>
                  <snm>Carrington</snm>
                  <fnm>JC</fnm>
               </au>
               <au>
                  <snm>Mathur</snm>
                  <fnm>EJ</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <fpage>1242</fpage>
            <lpage>1245</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1114057</pubid>
                  <pubid idtype="pmpid" link="fulltext">16109880</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline</p>
            </title>
            <aug>
               <au>
                  <snm>Weinberg</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Barrick</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Yao</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Roth</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kim</snm>
                  <fnm>JN</fnm>
               </au>
               <au>
                  <snm>Gore</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>JX</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>ER</fnm>
               </au>
               <au>
                  <snm>Block</snm>
                  <fnm>KF</fnm>
               </au>
               <au>
                  <snm>Sudarsan</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Neph</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tompa</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ruzzo</snm>
                  <fnm>WL</fnm>
               </au>
               <au>
                  <snm>Breaker</snm>
                  <fnm>RR</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2007</pubdate>
            <volume>35</volume>
            <fpage>4809</fpage>
            <lpage>4819</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/nar/gkm487</pubid>
                  <pubid idtype="pmcid">1950547</pubid>
                  <pubid idtype="pmpid" link="fulltext">17621584</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>A new regulatory circuit in ribosomal protein operons: S2-mediated control of the rpsB-tsf expression in vivo</p>
            </title>
            <aug>
               <au>
                  <snm>Aseev</snm>
                  <fnm>LV</fnm>
               </au>
               <au>
                  <snm>Levandovskaya</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Tchufistova</snm>
                  <fnm>LS</fnm>
               </au>
               <au>
                  <snm>Scaptsova</snm>
                  <fnm>NV</fnm>
               </au>
               <au>
                  <snm>Boni</snm>
                  <fnm>IV</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2008</pubdate>
            <volume>14</volume>
            <fpage>1882</fpage>
            <lpage>1894</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1261/rna.1099108</pubid>
                  <pubid idtype="pmcid">2525966</pubid>
                  <pubid idtype="pmpid" link="fulltext">18648071</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Base composition bias might result from competition for metabolic resources</p>
            </title>
            <aug>
               <au>
                  <snm>Rocha</snm>
                  <fnm>EP</fnm>
               </au>
               <au>
                  <snm>Danchin</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>291</fpage>
            <lpage>294</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02690-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">12044357</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>High guanine-cytosine content is not an adaptation to high temperature: a comparative analysis amongst prokaryotes</p>
            </title>
            <aug>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Merchant</snm>
                  <fnm>AR</fnm>
               </au>
            </aug>
            <source>Proc Biol Sci</source>
            <pubdate>2001</pubdate>
            <volume>268</volume>
            <fpage>493</fpage>
            <lpage>497</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1098/rspb.2001.1430</pubid>
                  <pubid idtype="pmcid">1088632</pubid>
                  <pubid idtype="pmpid" link="fulltext">11296861</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
      </refgrp>
   </bm>
</art>
