free full text journal articles: genetics and proteomics


Advertisement


 

Google
 
Web www.neurotransmitter.net

Recent Articles in Nucleic Acids Research

Duvick J, Fu A, Muppirala U, Sabharwal M, Wilkerson MD, Lawrence CJ, Lushbough C, Brendel V
PlantGDB: a resource for comparative plant genomics.
Nucleic Acids Res. 2007 Dec 6; .
PlantGDB (http://www.plantgdb.org/) is a genomics database encompassing sequence data for green plants (Viridiplantae). PlantGDB provides annotated transcript assemblies for >100 plant species, with transcripts mapped to their cognate genomic context where available, integrated with a variety of sequence analysis tools and web services. For 14 plant species with emerging or complete genome sequence, PlantGDB's genome browsers (xGDB) serve as a graphical interface for viewing, evaluating and annotating transcript and protein alignments to chromosome or bacterial artificial chromosome (BAC)-based genome assemblies. Annotation is facilitated by the integrated yrGATE module for community curation of gene models. Novel web services at PlantGDB include Tracembler, an iterative alignment tool that generates contigs from GenBank trace file data and BioExtract Server, a web-based server for executing custom sequence analysis workflows. PlantGDB also hosts a plant genomics research outreach portal (PGROP) that facilitates access to a large number of resources for research and training. [Abstract/Link to Full Text]

Leulliot N, Bohnsack MT, Graille M, Tollervey D, Tilbeurgh HV
The yeast ribosome synthesis factor Emg1 is a novel member of the superfamily of alpha/beta knot fold methyltransferases.
Nucleic Acids Res. 2007 Dec 6;
Emg1 was previously shown to be required for maturation of the 18S rRNA and biogenesis of the 40S ribosomal subunit. Here we report the determination of the crystal structure of Emg1 at 2 A resolution in complex with the methyl donor, S-adenosyl-methionine (SAM). This structure identifies Emg1 as a novel member of the alpha/beta knot fold methyltransferase (SPOUT) superfamily. In addition to the conserved SPOUT core, Emg1 has two unique domains that form an extended surface, which we predict to be involved in binding of RNA substrates. A point mutation within a basic patch on this surface almost completely abolished RNA binding in vitro. Three point mutations designed to disrupt the interaction of Emg1 with SAM each caused>100-fold reduction in SAM binding in vitro. Expression of only Emg1 with these mutations could support growth and apparently normal ribosome biogenesis in strains genetically depleted of Emg1. We conclude that the catalytic activity of Emg1 is not essential and that the presence of the protein is both necessary and sufficient for ribosome biogenesis. [Abstract/Link to Full Text]

Petersen J, Poulsen L, Petronis S, Birgens H, Dufva M
Use of a multi-thermal washer for DNA microarrays simplifies probe design and gives robust genotyping assays.
Nucleic Acids Res. 2007 Dec 6;
DNA microarrays are generally operated at a single condition, which severely limits the freedom of designing probes for allele-specific hybridization assays. Here, we demonstrate a fluidic device for multi-stringency posthybridization washing of microarrays on microscope slides. This device is called a multi-thermal array washer (MTAW), and it has eight individually controlled heating zones, each of which corresponds to the location of a subarray on a slide. Allele-specific oligonucleotide probes for nine mutations in the beta-globin gene were spotted in eight identical subarrays at positions corresponding to the temperature zones of the MTAW. After hybridization with amplified patient material, the slides were mounted in the MTAW, and each subarray was exposed to different temperatures ranging from 22 to 40 degrees C. When processed in the MTAW, probes selected without considering melting temperature resulted in improved genotyping compared with probes selected according to theoretical melting temperature and run under one condition. In conclusion, the MTAW is a versatile tool that can facilitate screening of a large number of probes for genotyping assays and can also enhance the performance of diagnostic arrays. [Abstract/Link to Full Text]

Katahira J, Miki T, Takano K, Maruhashi M, Uchikawa M, Tachibana T, Yoneda Y
Nuclear RNA export factor 7 is localized in processing bodies and neuronal RNA granules through interactions with shuttling hnRNPs.
Nucleic Acids Res. 2007 Dec 5;
The nuclear RNA export factor (NXF) family proteins have been implicated in various aspects of post-transcriptional gene expression. This study shows that mouse NXF7 exhibits heterologous localization, i.e. NXF7 associates with translating ribosomes, stress granules (SGs) and processing bodies (P-bodies), the latter two of which are believed to be cytoplasmic sites of storage, degradation and/or sorting of mRNAs. By yeast two-hybrid screening, a series of heterogeneous nuclear ribonucleoproteins (hnRNPs) were identified as possible binding partners for NXF7. Among them, hnRNP A3, which is believed to be involved in translational control and/or cytoplasmic localization of certain mRNAs, formed a stable complex with NXF7 in vitro. Although hnRNP A3 was not associated with translating ribosomes, it was co-localized with NXF7 in P-bodies. After exposing to oxidative stress, NXF7 trans-localized to SGs, whereas hnRNP A3 did not. In differentiated neuroblastoma Neuro2a cells, NXF7 was co-localized with hnRNP A3 in cell body and neurites. The amino terminal half of NXF7, which was required for stable complex formation with hnRNP A3, coincided with the region required for localization in both P-bodies and neuronal RNA granules. These findings suggest that NXF7 plays a role in sorting, transport and/or storage of mRNAs through interactions with hnRNP A3. [Abstract/Link to Full Text]

Hoischen C, Bussiek M, Langowski J, Diekmann S
Escherichia coli low-copy-number plasmid R1 centromere parC forms a U-shaped complex with its binding protein ParR.
Nucleic Acids Res. 2007 Dec 3;
The Escherichia coli low-copy-number plasmid R1 contains a segregation machinery composed of parC, ParR and parM. The R1 centromere-like site parC contains two separate sets of repeats. By atomic force microscopy (AFM) we show here that ParR molecules bind to each of the 5-fold repeated iterons separately with the intervening sequence unbound by ParR. The two ParR protein complexes on parC do not complex with each other. ParR binds with a stoichiometry of about one ParR dimer per each single iteron. The measured DNA fragment lengths agreed with B-form DNA and each of the two parC 5-fold interon DNA stretches adopts a linear path in its complex with ParR. However, the overall parC/ParR complex with both iteron repeats bound by ParR forms an overall U-shaped structure: the DNA folds back on itself nearly completely, including an angle of approximately 150 degrees . Analysing linear DNA fragments, we never observed dimerized ParR complexes on one parC DNA molecule (intramolecular) nor a dimerization between ParR complexes bound to two different parC DNA molecules (intermolecular). This bacterial segrosome is compared to other bacterial segregation complexes. We speculate that partition complexes might have a similar overall structural organization and, at least in part, common functional properties. [Abstract/Link to Full Text]

Ruan J, Li H, Chen Z, Coghlan A, Coin LJ, Guo Y, Hériché JK, Hu Y, Kristiansen K, Li R, Liu T, Moses A, Qin J, Vang S, Vilella AJ, Ureta-Vidal A, Bolund L, Wang J, Durbin R
TreeFam: 2008 Update.
Nucleic Acids Res. 2007 Dec 1;
TreeFam (http://www.treefam.org) was developed to provide curated phylogenetic trees for all animal gene families, as well as orthologue and paralogue assignments. Release 4.0 of TreeFam contains curated trees for 1314 families and automatically generated trees for another 14 351 families. We have expanded TreeFam to include 25 fully sequenced animal genomes, as well as four genomes from plant and fungal outgroup species. We have also introduced more accurate approaches for automatically grouping genes into families, for building phylogenetic trees, and for inferring orthologues and paralogues. The user interface for viewing phylogenetic trees and family information has been improved. Furthermore, a new perl API lets users easily extract data from the TreeFam mysql database. [Abstract/Link to Full Text]

Tang X, Swaminathan J, Gewirtz AM, Dmochowski IJ
Regulating gene expression in human leukemia cells using light-activated oligodeoxynucleotides.
Nucleic Acids Res. 2007 Dec 1;
Light-activated antisense oligodeoxynucleotides (asODNs) were developed to control the degradation of target mRNA in living cells by RNase H. A 20-mer asODN previously shown to target c-myb, a hematopoietic transcription factor, was covalently attached via a photocleavable linker (PL) to partially complementary 20-mer sense strands (sODNs). In the 'caged' state, the sODN blocked hybridization of the asODN to c-myb mRNA. Six asODN-PL-sODN conjugates, C1-C6, were synthesized. C5, with twelve complementary bases, gave the largest decrease in melting temperature (T(m)) upon UV irradiation (DeltaT(m) = -29 degrees C). The most thermally stable conjugate, C6 (T(m) = 84 degrees C), gave the lowest background RNase H activity, with just 8.6% degradation of an RNA 40-mer after 1 h incubation. In biochemical assays with C6, RNA digestion increased 10-fold 10 min after UV irradiation. Finally, phosphorothioated analogs S-C5 and S-C6 were synthesized to test activity in cultured K562 (human leukemia) cells. No knockdown of c-myb mRNA or protein was observed with intact S-C5 or S-C6, whereas more than half of c-myb mRNA was degraded 24 h after photoactivation. Two-fold photomodulation of c-MYB protein levels was also observed with S-C5. However, no photomodulation of c-MYB protein levels was observed with S-C6, perhaps due to the greater stability of this duplex. [Abstract/Link to Full Text]

Saiz L, Vilar JM
Ab initio thermodynamic modeling of distal multisite transcription regulation.
Nucleic Acids Res. 2007 Dec 1;
Transcription regulation typically involves the binding of proteins over long distances on multiple DNA sites that are brought close to each other by the formation of DNA loops. The inherent complexity of assembling regulatory complexes on looped DNA challenges the understanding of even the simplest genetic systems, including the prototypical lac operon. Here we implement a scalable approach based on thermodynamic molecular properties to model ab initio systems regulated through multiple DNA sites with looping. We show that this approach applied to the lac operon accurately predicts the system behavior for a wide range of cellular conditions, which include the transcription rate over five orders of magnitude as a function of the repressor concentration for wild type and all seven combinations of deletions of three operators, as well as the observed induction curves for cells with and without active catabolite activator protein. Our results provide new insights into the detailed functioning of the lac operon and reveal an efficient avenue to incorporate the required underlying molecular complexity into fully predictive models of gene regulation. [Abstract/Link to Full Text]

Wisniewski JR, Zougman A, Mann M
N{varepsilon}-Formylation of lysine is a widespread post-translational modification of nuclear proteins occurring at residues involved in regulation of chromatin function.
Nucleic Acids Res. 2007 Dec 1;
Post-translational modification of histones and other chromosomal proteins regulates chromatin conformation and gene activity. Methylation and acetylation of lysyl residues are among the most frequently described modifications in these proteins. Whereas these modifications have been studied in detail, very little is known about a recently discovered chemical modification, the N(epsilon)-lysine formylation, in histones and other nuclear proteins. Here we mapped, for the first time, the sites of lysine formylation in histones and several other nuclear proteins. We found that core and linker histones are formylated at multiple lysyl residues located both in the tails and globular domains of histones. In core histones, formylation was found at lysyl residues known to be involved in organization of nucleosomal particles that are frequently acetylated and methylated. In linker histones and high mobility group proteins, multiple formylation sites were mapped to residues with important role in DNA binding. N(epsilon)-lysine formylation in chromosomal proteins is relatively abundant, suggesting that it may interfere with epigenetic mechanisms governing chromatin function, which could lead to deregulation of the cell and disease. [Abstract/Link to Full Text]

Venditti V, Niccolai N, Butcher SE
Measuring the dynamic surface accessibility of RNA with the small paramagnetic molecule TEMPOL.
Nucleic Acids Res. 2007 Dec 1;
The surface accessibility of macromolecules plays a key role in modulating molecular recognition events. RNA is a complex and dynamic molecule involved in many aspects of gene expression. However, there are few experimental methods available to measure the accessible surface of RNA. Here, we investigate the accessible surface of RNA using NMR and the small paramagnetic molecule TEMPOL. We investigated two RNAs with known structures, one that is extremely stable and one that is dynamic. For helical regions, the TEMPOL probing data correlate well with the predicted RNA surface, and the method is able to distinguish subtle variations in atom depths, such as the relative accessibility of pyrimidine versus purine aromatic carbon atoms. Dynamic motions are also detected by TEMPOL probing, and the method accurately reports a previously characterized pH-dependent conformational transition involving formation of a protonated C-A pair and base flipping. Some loop regions are observed to exhibit anomalously high accessibility, reflective of motions that are not evident within the ensemble of NMR structures. We conclude that TEMPOL probing can provide valuable insights into the surface accessibility and dynamics of RNA, and can also be used as an independent means of validating RNA structure and dynamics in solution. [Abstract/Link to Full Text]

Pang CN, Lin K, Wouters MA, Heringa J, George RA
Identifying foldable regions in protein sequence from the hydrophobic signal.
Nucleic Acids Res. 2007 Dec 1;
Structural genomics initiatives aim to elucidate representative 3D structures for the majority of protein families over the next decade, but many obstacles must be overcome. The correct design of constructs is extremely important since many proteins will be too large or contain unstructured regions and will not be amenable to crystallization. It is therefore essential to identify regions in protein sequences that are likely to be suitable for structural study. Scooby-Domain is a fast and simple method to identify globular domains in protein sequences. Domains are compact units of protein structure and their correct delineation will aid structural elucidation through a divide-and-conquer approach. Scooby-Domain predictions are based on the observed lengths and hydrophobicities of domains from proteins with known tertiary structure. The prediction method employs an A*-search to identify sequence regions that form a globular structure and those that are unstructured. On a test set of 173 proteins with consensus CATH and SCOP domain definitions, Scooby-Domain has a sensitivity of 50% and an accuracy of 29%, which is better than current state-of-the-art methods. The method does not rely on homology searches and, therefore, can identify previously unknown domains. [Abstract/Link to Full Text]

Haque ME, Grasso D, Spremulli LL
The interaction of mammalian mitochondrial translational initiation factor 3 with ribosomes: evolution of terminal extensions in IF3mt.
Nucleic Acids Res. 2007 Dec 1;
Mammalian mitochondrial initiation factor 3 (IF3(mt)) has a central region with homology to bacterial IF3. This homology region is preceded by an N-terminal extension and followed by a C-terminal extension. The role of these extensions on the binding of IF3(mt) to mitochondrial small ribosomal subunits (28S) was studied using derivatives in which the extensions had been deleted. The K(d) for the binding of IF3(mt) to 28S subunits is approximately 30 nM. Removal of either the N- or C-terminal extension has almost no effect on this value. IF3(mt) has very weak interactions with the large subunit of the mitochondrial ribosome (39S) (K(d) = 1.5 muM). However, deletion of the extensions results in derivatives with significant affinity for 39S subunits (K(d) = 0.12-0.25 muM). IF3(mt) does not bind 55S monosomes, while the deletion derivative binds slightly to these particles. IF3(mt) is very effective in dissociating 55S ribosomes. Removal of the N-terminal extension has little effect on this activity. However, removal of the C-terminal extension leads to a complex dissociation pattern due to the high affinity of this derivative for 39S subunits. These data suggest that the extensions have evolved to ensure the proper dissociation of IF3(mt) from the 28S subunits upon 39S subunit joining. [Abstract/Link to Full Text]

Bauer M, Marschaus L, Reuff M, Besche V, Sartorius-Neef S, Pfeifer F
Overlapping activator sequences determined for two oppositely oriented promoters in halophilic Archaea.
Nucleic Acids Res. 2007 Dec 1;
Transcription of the genomic region involved in gas vesicle formation in Halobacterium salinarum (p-vac) and Haloferax mediterranei (mc-vac) is driven by two divergent promoters, P(A) and P(D), separated by only 35 nt. Both promoters are activated by the transcription activator GvpE which in the case of P(mcA) requires a 20-nt sequence (UAS) consisting of two conserved 8-nt sequence portions located upstream of BRE. Here, we determined the two UAS elements in the promoter region of p-vac by scanning mutageneses using constructs containing P(pD) (without P(pA)) fused to the bgaH reporter gene encoding an enzyme with beta-galactosidase activity, or the dual reporter construct pApD with P(pD) fused to bgaH and P(pA) to an altered version of gvpA. The two UAS elements found exhibited a similar extension and distance to BRE as previously determined for the UAS in P(mcA). Their distal 8-nt portions almost completely overlapped in the centre of P(pD)-P(pA), and mutations in this region negatively affected the GvpE-mediated activation of both promoters. Any alteration of the distance between BRE and UAS resulted in the loss of the GvpE activation, as did a complete substitution of the proximal 8-nt portion, underlining that a close location of UAS and BRE was very important. [Abstract/Link to Full Text]

Berglund AC, Sjölund E, Ostlund G, Sonnhammer EL
InParanoid 6: eukaryotic ortholog clusters with inparalogs.
Nucleic Acids Res. 2007 Dec 5;
The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters. [Abstract/Link to Full Text]

Birzele F, Csaba G, Zimmer R
Alternative splicing and protein structure evolution.
Nucleic Acids Res. 2007 Nov 30;
Alternative splicing is thought to be one of the major sources for functional diversity in higher eukaryotes. Interestingly, when mapping splicing events onto protein structures, about half of the events affect structured and even highly conserved regions i.e. are non-trivial on the structure level. This has led to the controversial hypothesis that such splice variants result in nonsense-mediated mRNA decay or non-functional, unstructured proteins, which do not contribute to the functional diversity of an organism. Here we show in a comprehensive study on alternative splicing that proteins appear to be much more tolerant to structural deletions, insertions and replacements than previously thought. We find literature evidence that such non-trivial splicing isoforms exhibit different functional properties compared to their native counterparts and allow for interesting regulatory patterns on the protein network level. We provide examples that splicing events may represent transitions between different folds in the protein sequence-structure space and explain these links by a common genetic mechanism. Taken together, those findings hint to a more prominent role of splicing in protein structure evolution and to a different view of phenotypic plasticity of protein structures. [Abstract/Link to Full Text]

Burhans WC, Weinberger M
DNA replication stress, genome instability and aging.
Nucleic Acids Res. 2007 Nov 30;
Genome instability is a fundamentally important component of aging in all eukaryotes. How age-related genome instability occurs remains unclear. The free radical theory of aging posits oxidative damage to DNA and other cellular constituents as a primary determinant of aging. More recent versions of this theory predict that mitochondria are a major source of reactive oxygen species (ROS) that cause oxidative damage. Although substantial support for the free radical theory exists, the results of some tests of this theory have been contradictory or inconclusive. Enhanced growth signaling also has been implicated in aging. Many efforts to understand the effects of growth signaling on aging have focused on inhibition of oxidative stress responses that impact oxidative damage. However, recent experiments in the model organism Saccharomyces cerevisiae (budding yeast) and in higher eukaryotes suggest that growth signaling also impacts aging and/or age-related diseases-including cancer and neurodegeneration-by inducing DNA replication stress, which causes DNA damage. Replication stress, which has not been broadly considered as a factor in aging, may be enhanced by ROS that signal growth. In this article, we review evidence that points to DNA replication stress and replication stress-induced genome instability as important factors in aging. [Abstract/Link to Full Text]

Benson ML, Smith RD, Khazanov NA, Dimcheff B, Beaver J, Dresslar P, Nerothin J, Carlson HA
Binding MOAD, a high-quality protein ligand database.
Nucleic Acids Res. 2007 Nov 30;
Binding MOAD (Mother of All Databases) is a database of 9836 protein-ligand crystal structures. All biologically relevant ligands are annotated, and experimental binding-affinity data is reported when available. Binding MOAD has almost doubled in size since it was originally introduced in 2004, demonstrating steady growth with each annual update. Several technologies, such as natural language processing, help drive this constant expansion. Along with increasing data, Binding MOAD has improved usability. The website now showcases a faster, more featured viewer to examine the protein-ligand structures. Ligands have additional chemical data, allowing for cheminformatics mining. Lastly, logins are no longer necessary, and Binding MOAD is freely available to all at http://www.BindingMOAD.org. [Abstract/Link to Full Text]

Bird C
Editorial.
Nucleic Acids Res. 2007;35(20): [Abstract/Link to Full Text]

Todd BA, Rau DC
Interplay of ion binding and attraction in DNA condensed by multivalent cations.
Nucleic Acids Res. 2007 Nov 29;
We have measured forces generated by multivalent cation-induced DNA condensation using single-molecule magnetic tweezers. In the presence of cobalt hexammine, spermidine, or spermine, stretched DNA exhibits an abrupt configurational change from extended to condensed. This occurs at a well-defined condensation force that is nearly equal to the condensation free energy per unit length. The multivalent cation concentration dependence for this condensation force gives the apparent number of multivalent cations that bind DNA upon condensation. The measurements show that the lower critical concentration for cobalt hexammine as compared to spermidine is due to a difference in ion binding, not a difference in the electrostatic energy of the condensed state as previously thought. We also show that the resolubilization of condensed DNA can be described using a traditional Manning-Oosawa cation adsorption model, provided that cation-anion pairing at high electrolyte concentrations is taken into account. Neither overcharging nor significant alterations in the condensed state are required to describe the resolubilization of condensed DNA. The same model also describes the spermidine(3+)/Na(+) phase diagram measured previously. [Abstract/Link to Full Text]

El-Shemerly M, Hess D, Pyakurel AK, Moselhy S, Ferrari S
ATR-dependent pathways control hEXO1 stability in response to stalled forks.
Nucleic Acids Res. 2007 Nov 29;
Nucleases play important roles in DNA synthesis, recombination and repair. We have previously shown that human exonuclease 1 (hEXO1) is phosphorylated in response to agents stalling DNA replication and that hEXO1 consequently undergoes ubiquitination and degradation in a proteasome-dependent manner. In the present study, we have addressed the identity of the pathway transducing stalled-replication signals to hEXO1. Using chemical inhibitors, RNA interference, ATM- and ATR-deficient cell lines we have concluded that hEXO1 phosphorylation is ATR-dependent. By means of mass spectrometry, we have identified the sites of phosphorylation in hEXO1 in undamaged cells and in cells treated with hydroxyurea (HU). hEXO1 is phosphorylated at nine basal sites and three additional sites are induced by HU treatment. Analysis of single- and multiple-point mutants revealed that mutation to Ala of the three HU-induced sites of phosphorylation partially rescued HU-dependent degradation of hEXO1 and additionally stabilized the protein in non-treated cells. We have raised an antibody to pS(714), an HU-induced site of the S/T-Q type, and we provide evidence that S(714) is phosphorylated upon HU but not IR treatment. The antibody may be a useful tool to monitor signal transduction events triggered by stalled DNA replication. [Abstract/Link to Full Text]

Draper WE, Hayden EJ, Lehman N
Mechanisms of covalent self-assembly of the Azoarcus ribozyme from four fragment oligonucleotides.
Nucleic Acids Res. 2007 Nov 29;
RNA oligomers of length 40-60 nt can self-assemble into covalent versions of the Azoarcus group I intron ribozyme. This process requires a series of recombination reactions in which the internal guide sequence of a nascent catalytic complex makes specific interactions with a complement triplet, CAU, in the oligomers. However, if the CAU were mutated, promiscuous self-assembly may be possible, lessening the dependence on a particular set of oligomer sequences. Here, we assayed whether oligomers containing mutations in the CAU triplet could still self-construct Azoarcus ribozymes. The mutations CAC, CAG, CUU and GAU all inhibited self-assembly to some degree, but did not block it completely in 100 mM MgCl(2). Oligomers containing the CAC mutation retained the most self-assembly activity, while those containing GAU retained the least, indicating that mutations more 5' in this triplet are the most deleterious. Self-assembly systems containing additional mutant locations were progressively less functional. Analyses of properly self-assembled ribozymes revealed that, of two recombination mechanisms possible for self-assembly, termed 'tF2' and 'R2F2', the simpler one-step 'tF2' mechanism is utilized when mutations exist. These data suggest that self-assembling systems are more facile than previously believed, and have relevance to the origin of complex ribozymes during the RNA World. [Abstract/Link to Full Text]

Nolan T, Cecere G, Mancone C, Alonzi T, Tripodi M, Catalanotto C, Cogoni C
The RNA-dependent RNA polymerase essential for post-transcriptional gene silencing in Neurospora crassa interacts with replication protein A.
Nucleic Acids Res. 2007 Nov 29;
Post-transcriptional gene silencing (PTGS) pathways play a role in genome defence and have been extensively studied, yet how repetitive elements in the genome are identified is still unclear. It has been suggested that they may produce aberrant transcripts (aRNA) that are converted by an RNA-dependent RNA polymerase (RdRP) into double-stranded RNA (dsRNA), the essential intermediate of PTGS. However, how RdRP enzymes recognize aberrant transcripts remains a key question. Here we show that in Neurospora crassa the RdRP QDE-1 interacts with Replication Protein A (RPA), part of the DNA replication machinery. We show that both QDE-1 and RPA are nuclear proteins and that QDE-1 is specifically recruited onto the repetitive transgenic loci. We speculate that this localization of QDE-1 could allow the in situ production of dsRNA using transgenic nascent transcripts as templates, as in other systems. Supporting a link between the two proteins, we found that the accumulation of short interfering RNAs (siRNAs), the hallmark of silencing, is dependent on an ongoing DNA synthesis. The interaction between QDE-1 and RPA is important since it should guide further studies aimed at understanding the specificity of the RdRP and it provides for the first time a potential link between a PTGS component and the DNA replication machinery. [Abstract/Link to Full Text]

Masson P, Leimgruber E, Creton S, Collart MA
The dual control of TFIIB recruitment by NC2 is gene specific.
Nucleic Acids Res. 2007 Nov 29;
Negative co-factor 2 (NC2) is a conserved eukaryotic complex composed of two subunits, NC2alpha (Drap1) and NC2beta (Dr1) that associate through a histone-fold motif. In this work, we generated mutants of NC2, characterized target genes for these mutants and studied the assembly of NC2 and general transcription factors on target promoters. We determined that the two NC2 subunits mostly function together to be recruited to DNA and regulate gene expression. We found that NC2 strongly controls promoter association of TFIIB, both negatively and positively. We could attribute the gene-specific repressor effect of NC2 on TFIIB to the C-terminal domain of NC2beta, and define that it requires ORF sequences of the target gene. In contrast, the positive function of NC2 on TFIIB targets is more general and requires adequate levels of the NC2 histone-fold heterodimer on promoters. Finally, we determined that NC2 becomes limiting for TATA-binding protein (TBP) association with a heat inducible promoter under heat stress. This study demonstrates an important positive role of NC2 for formation of the pre-initiation complex on promoters, under normal conditions through control of TFIIB, or upon activation by stress via control of TBP. [Abstract/Link to Full Text]

Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M
DrugBank: a knowledgebase for drugs, drug actions and drug targets.
Nucleic Acids Res. 2007 Dec 11;
DrugBank is a richly annotated resource that combines detailed drug data with comprehensive drug target and drug action information. Since its first release in 2006, DrugBank has been widely used to facilitate in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. The latest version of DrugBank (release 2.0) has been expanded significantly over the previous release. With approximately 4900 drug entries, it now contains 60% more FDA-approved small molecule and biotech drugs including 10% more 'experimental' drugs. Significantly, more protein target data has also been added to the database, with the latest version of DrugBank containing three times as many non-redundant protein or drug target sequences as before (1565 versus 524). Each DrugCard entry now contains more than 100 data fields with half of the information being devoted to drug/chemical data and the other half devoted to pharmacological, pharmacogenomic and molecular biological data. A number of new data fields, including food-drug interactions, drug-drug interactions and experimental ADME data have been added in response to numerous user requests. DrugBank has also significantly improved the power and simplicity of its structure query and text query searches. DrugBank is available at http://www.drugbank.ca. [Abstract/Link to Full Text]

Pollard LM, Bourn RL, Bidichandani SI
Repair of DNA double-strand breaks within the (GAA*TTC)n sequence results in frequent deletion of the triplet-repeat sequence.
Nucleic Acids Res. 2007 Nov 27;
Friedreich ataxia is caused by an expanded (GAA*TTC)(n) sequence, which is unstable during intergenerational transmission and in most patient tissues, where it frequently undergoes large deletions. We investigated the effect of DSB repair on instability of the (GAA*TTC)(n) sequence. Linear plasmids were transformed into Escherichia coli so that each colony represented an individual DSB repair event. Repair of a DSB within the repeat resulted in a dramatic increase in deletions compared with circular templates, but DSB repair outside the repeat tract did not affect instability. Repair-mediated deletions were independent of the orientation and length of the repeat, the location of the break within the repeat or the RecA status of the strain. Repair at the center of the repeat resulted in deletion of approximately half of the repeat tract, and repair at an off-center location produced deletions that were equivalent in length to the shorter of the two repeats flanking the DSB. This is consistent with a single-strand annealing mechanism of DSB repair, and implicates erroneous DSB repair as a mechanism for genetic instability of the (GAA*TTC)(n) sequence. Our data contrast significantly with DSB repair within (CTG*CAG)(n) repeats, indicating that repair-mediated instability is dependent on the sequence of the triplet repeat. [Abstract/Link to Full Text]

Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, Feolo M, Geer LY, Helmberg W, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Miller V, Ostell J, Pruitt KD, Schuler GD, Shumway M, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E
Database resources of the National Center for Biotechnology Information.
Nucleic Acids Res. 2007 Nov 27;
In addition to maintaining the GenBank(R) nucleic acid sequence database, the National Center for Biotechnology Information (NCBI) provides analysis and retrieval resources for the data in GenBank and other biological data available through NCBI's web site. NCBI resources include Entrez, the Entrez Programming Utilities, My NCBI, PubMed, PubMed Central, Entrez Gene, the NCBI Taxonomy Browser, BLAST, BLAST Link, Electronic PCR, OrfFinder, Spidey, Splign, RefSeq, UniGene, HomoloGene, ProtEST, dbMHC, dbSNP, Cancer Chromosomes, Entrez Genome, Genome Project and related tools, the Trace, Assembly, and Short Read Archives, the Map Viewer, Model Maker, Evidence Viewer, Clusters of Orthologous Groups, Influenza Viral Resources, HIV-1/Human Protein Interaction Database, Gene Expression Omnibus, Entrez Probe, GENSAT, Database of Genotype and Phenotype, Online Mendelian Inheritance in Man, Online Mendelian Inheritance in Animals, the Molecular Modeling Database, the Conserved Domain Database, the Conserved Domain Architecture Retrieval Tool and the PubChem suite of small molecule databases. Augmenting the web applications are custom implementations of the BLAST program optimized to search specialized data sets. These resources can be accessed through the NCBI home page at www.ncbi.nlm.nih.gov. [Abstract/Link to Full Text]

Amen M, Espinoza HM, Cox C, Liang X, Wang J, Link TM, Brennan RG, Martin JF, Amendt BA
Chromatin-associated HMG-17 is a major regulator of homeodomain transcription factor activity modulated by Wnt/ -catenin signaling.
Nucleic Acids Res. 2007 Nov 27;
Homeodomain (HD) transcriptional activities are tightly regulated during embryogenesis and require protein interactions for their spatial and temporal activation. The chromatin-associated high mobility group protein (HMG-17) is associated with transcriptionally active chromatin, however its role in regulating gene expression is unclear. This report reveals a unique strategy in which, HMG-17 acts as a molecular switch regulating HD transcriptional activity. The switch utilizes the Wnt/beta-catenin signaling pathway and adds to the diverse functions of beta-catenin. A high-affinity HMG-17 interaction with the PITX2 HD protein inhibits PITX2 DNA-binding activity. The HMG-17/PITX2 inactive complex is concentrated to specific nuclear regions primed for active transcription. beta-Catenin forms a ternary complex with PITX2/HMG-17 to switch it from a repressor to an activator complex. Without beta-catenin, HMG-17 can physically remove PITX2 from DNA to inhibit its transcriptional activity. The PITX2/HMG-17 regulatory complex acts independently of promoter targets and is a general mechanism for the control of HD transcriptional activity. HMG-17 is developmentally regulated and its unique role during embryogenesis is revealed by the early embryonic lethality of HMG-17 homozygous mice. This mechanism provides a new role for canonical Wnt/beta-catenin signaling in regulating HD transcriptional activity during development using HMG-17 as a molecular switch. [Abstract/Link to Full Text]

Bardin C, Leroy JL
The formation pathway of tetramolecular G-quadruplexes.
Nucleic Acids Res. 2007 Nov 27;
Oligonucleotides containing guanosine stretches associate into tetrameric structures stabilized by monovalent ions. In order to describe the sequence of reactions leading to association of four identical strands, we measured by NMR the formation and dissociation rates of (TGnT)(4) quadruplexes (n = 3-6), their dissociation constants and the reaction orders for quadruplex formation. The quadruplex formation rates increase with the salt concentration but weakly depend on the nature (K(+), Na(+) or Li(+)) of the counter ions. The activation energies for quadruplex formation are negative. The quadruplex lifetimes strongly increase with the G-tract length and are much more longer in K(+) solution than in Na(+) or Li(+) solutions. The reaction order for quadruplex formation is 3 in 0.125 M KCl and 4 in LiCl solutions. The kinetics measurements suggest that quadruplex formation proceeds step by step via sequential strand association into duplex and triplex intermediate species. Triplex formation is rate limiting in 0.125 M KCl solution. In LiCl, each step of the association process depends on the strand concentration. Parallel reactions to formation of the fully matched canonical quadruplex may result in kinetically trapped mismatched quadruplexes making the canonical quadruplex practically inaccessible in particular at low temperature in KCl solution. [Abstract/Link to Full Text]


The Universal Protein Resource (UniProt).
Nucleic Acids Res. 2007 Nov 27;
The Universal Protein Resource (UniProt) provides a stable, comprehensive, freely accessible, central resource on protein sequences and functional annotation. The UniProt Consortium is a collaboration between the European Bioinformatics Institute (EBI), the Protein Information Resource (PIR) and the Swiss Institute of Bioinformatics (SIB). The core activities include manual curation of protein sequences assisted by computational analysis, sequence archiving, development of a user-friendly UniProt website, and the provision of additional value-added information through cross-references to other databases. UniProt is comprised of four major components, each optimized for different uses: the UniProt Knowledgebase, the UniProt Reference Clusters, the UniProt Archive and the UniProt Metagenomic and Environmental Sequences database. UniProt is updated and distributed every three weeks, and can be accessed online for searches or download at http://www.uniprot.org. [Abstract/Link to Full Text]

Dönitz J, Goemann B, Lizé M, Michael H, Sasse N, Wingender E, Potapov AP
EndoNet: an information resource about regulatory networks of cell-to-cell communication.
Nucleic Acids Res. 2007 Nov 27;
EndoNet is an information resource about intercellular regulatory communication. It provides information about hormones, hormone receptors, the sources (i.e. cells, tissues and organs) where the hormones are synthesized and secreted, and where the respective receptors are expressed. The database focuses on the regulatory relations between them. An elementary communication is displayed as a causal link from a cell that secretes a particular hormone to those cells which express the corresponding hormone receptor and respond to the hormone. Whenever expression, synthesis and/or secretion of another hormone are part of this response, it renders the corresponding cell an internal node of the resulting network. This intercellular communication network coordinates the function of different organs. Therefore, the database covers the hierarchy of cellular organization of tissues and organs as it has been modeled in the Cytomer ontology, which has now been directly embedded into EndoNet. The user can query the database; the results can be used to visualize the intercellular information flow. A newly implemented hormone classification enables to browse the database and may be used as alternative entry point. EndoNet is accessible at: http://endonet.bioinf.med.uni-goettingen.de/ [Abstract/Link to Full Text]


Recent Articles in Genome Research

Roca X, Olson AJ, Rao AR, Enerly E, Kristensen VN, Børresen-Dale AL, Andresen BS, Krainer AR, Sachidanandam R
Features of 5'-splice-site efficiency derived from disease-causing mutations and comparative genomics.
Genome Res. 2007 Nov 21;
Many human diseases, including Fanconi anemia, hemophilia B, neurofibromatosis, and phenylketonuria, can be caused by 5'-splice-site (5'ss) mutations that are not predicted to disrupt splicing, according to position weight matrices. By using comparative genomics, we identify pairwise dependencies between 5'ss nucleotides as a conserved feature of the entire set of 5'ss. These dependencies are also conserved in human-mouse pairs of orthologous 5'ss. Many disease-associated 5'ss mutations disrupt these dependencies, as can some human SNPs that appear to alter splicing. The consistency of the evidence signifies the relevance of this approach and suggests that 5'ss SNPs play a role in complex diseases. [Abstract/Link to Full Text]

Korshunova Y, Maloney RK, Lakey N, Citek RW, Bacher B, Budiman A, Ordway JM, McCombie WR, Leon J, Jeddeloh JA, McPherson JD
Massively parallel bisulphite pyrosequencing reveals the molecular complexity of breast cancer-associated cytosine-methylation patterns obtained from tissue and serum DNA.
Genome Res. 2007 Nov 21;
Cytosine-methylation changes are stable and thought to be among the earliest events in tumorigenesis. Theoretically, DNA carrying tumor-specifying methylation patterns escape the tumors and may be found circulating in the sera from cancer patients, thus providing the basis for development of noninvasive clinical tests for early cancer detection. Indeed, using methylation-specific PCR-based techniques, several groups reported the detection of tumor-associated methylated DNA in the sera from cancer patients with varying clinical success. However, by design, such analytical approaches allow assessment of the presence of molecules with only one methylation pattern, leaving the bigger picture unexplored. The limited knowledge about circulating DNA methylation patterns hinders the efficient development of clinical methylation tests and testing platforms. Here, we report the results of a comprehensive methylation pattern analysis from breast cancer clinical tissues and sera obtained using massively parallel bisulphite pyrosequencing. The four loci studied were recently discovered by our group, and demonstrated to be powerful epigenetic biomarkers of breast cancer. The detailed analysis of more than 700,000 DNA fragments derived from more than 50 individuals (cancer and cancer-free) revealed an unappreciated complexity of genomic cytosine-methylation patterns in both tissue derived and circulating DNAs. Both tumor and cancer-free tissues (as well as sera) contained molecules with nearly every conceivable cytosine-methylation pattern at each locus. Tumor samples displayed more variation in methylation level than normal samples. Importantly, by establishing the methylation landscape within circulating DNA, this study has better defined the development challenges facing DNA methylation-based cancer-detection tests. [Abstract/Link to Full Text]

Torres TT, Metta M, Ottenwälder B, Schlötterer C
Gene expression profiling by massively parallel sequencing.
Genome Res. 2007 Nov 21;
Massively parallel sequencing holds great promise for expression profiling, as it combines the high throughput of SAGE with the accuracy of EST sequencing. Nevertheless, until now only very limited information had been available on the suitability of the current technology to meet the requirements. Here, we evaluate the potential of 454 sequencing technology for expression profiling using Drosophila melanogaster. We show that short (< approximately 80 bp) and long (> approximately 300-400 bp) cDNA fragments are under-represented in 454 sequence reads. Nevertheless, sequencing of 3' cDNA fragments generated by nebulization could be used to overcome the length bias of the 454 sequencing technology. Gene expression measurements generated by restriction analysis and nebulization for fragments within the 80- to 300-bp range showed correlations similar to those reported for replicated microarray experiments (0.83-0.91); 97% of the cDNA fragments could be unambiguously mapped to the genomic DNA, demonstrating the advantage of longer sequence reads. Our analyses suggest that the 454 technology has a large potential for expression profiling, and the high mapping accuracy indicates that it should be possible to compare expression profiles across species. [Abstract/Link to Full Text]

Rasmussen MD, Kellis M
Accurate gene-tree reconstruction by learning gene- and species-specific substitution rates across multiple complete genomes.
Genome Res. 2007 Dec;17(12):1932-42.
Comparative genomics provides a general methodology for discovering functional DNA elements and understanding their evolution. The availability of many related genomes enables more powerful analyses, but requires rigorous phylogenetic methods to resolve orthologous genes and regions. Here, we use 12 recently sequenced Drosophila genomes and nine fungal genomes to address the problem of accurate gene-tree reconstruction across many complete genomes. We show that existing phylogenetic methods that treat each gene tree in isolation show large-scale inaccuracies, largely due to insufficient phylogenetic information in individual genes. However, we find that gene trees exhibit common properties that can be exploited for evolutionary studies and accurate phylogenetic reconstruction. Evolutionary rates can be decoupled into gene-specific and species-specific components, which can be learned across complete genomes. We develop a phylogenetic reconstruction methodology that exploits these properties and achieves significantly higher accuracy, addressing the species-level heterotachy and enabling studies of gene evolution in the context of species evolution. [Abstract/Link to Full Text]

Engström PG, Ho Sui SJ, Drivenes O, Becker TS, Lenhard B
Genomic regulatory blocks underlie extensive microsynteny conservation in insects.
Genome Res. 2007 Dec;17(12):1898-908.
Insect genomes contain larger blocks of conserved gene order (microsynteny) than would be expected under a random breakage model of chromosome evolution. We present evidence that microsynteny has been retained to keep large arrays of highly conserved noncoding elements (HCNEs) intact. These arrays span key developmental regulatory genes, forming genomic regulatory blocks (GRBs). We recently described GRBs in vertebrates, where most HCNEs function as enhancers and HCNE arrays specify complex expression programs of their target genes. Here we present a comparison of five Drosophila genomes showing that HCNE density peaks centrally in large synteny blocks containing multiple genes. Besides developmental regulators that are likely targets of HCNE enhancers, HCNE arrays often span unrelated neighboring genes. We describe differences in core promoters between the target genes and the unrelated genes that offer an explanation for the differences in their responsiveness to enhancers. We show examples of a striking correspondence between boundaries of synteny blocks, HCNE arrays, and Polycomb binding regions, confirming that the synteny blocks correspond to regulatory domains. Although few noncoding elements are highly conserved between Drosophila and the malaria mosquito Anopheles gambiae, we find that A. gambiae regions orthologous to Drosophila GRBs contain an equivalent distribution of noncoding elements highly conserved in the yellow fever mosquito Aëdes aegypti and coincide with regions of ancient microsynteny between Drosophila and mosquitoes. The structural and functional equivalence between insect and vertebrate GRBs marks them as an ancient feature of metazoan genomes and as a key to future studies of development and gene regulation. [Abstract/Link to Full Text]

Heger A, Ponting CP
Evolutionary rate analyses of orthologs and paralogs from 12 Drosophila genomes.
Genome Res. 2007 Dec;17(12):1837-49.
The newly sequenced genome sequences of 11 Drosophila species provide the first opportunity to investigate variations in evolutionary rates across a clade of closely related species. Protein-coding genes were predicted using established Drosophila melanogaster genes as templates, with recovery rates ranging from 81%-97% depending on species divergence and on genome assembly quality. Orthology and paralogy assignments were shown to be self-consistent among the different Drosophila species and to be consistent with regions of conserved gene order (synteny blocks). Next, we investigated the rates of diversification among these species' gene repertoires with respect to amino acid substitutions and to gene duplications. Constraints on amino acid sequences appear to have been most pronounced on D. ananassae and least pronounced on D. simulans and D. erecta terminal lineages. Codons predicted to have been subject to positive selection were found to be significantly over-represented among genes with roles in immune response and RNA metabolism, with the latter category including each subunit of the Dicer-2/r2d2 heterodimer. The vast majority of gene duplications (96.5%) and synteny rearrangements were found to occur, as expected, within single Müller elements. We show that the rate of ancient gene duplications was relatively uniform. However, gene duplications in terminal lineages are strongly skewed toward very recent events, consistent with either a rapid-birth and rapid-death model or the presence of large proportions of copy number variable genes in these Drosophila populations. Duplications were significantly more frequent among trypsin-like proteases and DM8 putative lipid-binding domain proteins. [Abstract/Link to Full Text]

Villasante A, Abad JP, Planelló R, Méndez-Lago M, Celniker SE, de Pablos B
Drosophila telomeric retrotransposons derived from an ancestral element that was recruited to replace telomerase.
Genome Res. 2007 Dec;17(12):1909-18.
Drosophila telomeres do not have arrays of simple telomerase-generated G-rich repeats. Instead, Drosophila maintains its telomeres by occasional transposition of specific non-long terminal repeat (non-LTR) retrotransposons to chromosome ends. The genus Drosophila provides a superb model system for comparative telomere analysis. Here we present an evolutionary study of Drosophila telomeric elements to ascertain the significance of telomeric retrotransposons (TRs) in the maintenance of Drosophila telomeres. PCR and in silico surveys in the sibling species of Drosophila melanogaster and in more distantly related species show that multiple TRs maintain telomeres in Drosophila. In addition to TRs with two open reading frames (ORFs) capable of autonomous transposition, there are deleted telomeric retrotransposons that have lost their ORF2, which we refer to as half telomeric-retrotransposons (HTRs). The phylogenetic relationship among these telomeric elements is congruent with the phylogeny of the species, suggesting that they have been vertically inherited from a common ancestor. Our results suggest that an existing non-LTR retrotransposon was recruited to perform the cellular function of telomere maintenance. [Abstract/Link to Full Text]

Stage DE, Eickbush TH
Sequence variation within the rRNA gene loci of 12 Drosophila species.
Genome Res. 2007 Dec;17(12):1888-97.
Concerted evolution maintains at near identity the hundreds of tandemly arrayed ribosomal RNA (rRNA) genes and their spacers present in any eukaryote. Few comprehensive attempts have been made to directly measure the identity between the rDNA units. We used the original sequencing reads (trace archives) available through the whole-genome shotgun sequencing projects of 12 Drosophila species to locate the sequence variants within the 7.8-8.2 kb transcribed portions of the rDNA units. Three to 18 variants were identified in >3% of the total rDNA units from 11 species. Species where the rDNA units are present on multiple chromosomes exhibited only minor increases in sequence variation. Variants were 10-20 times more abundant in the noncoding compared with the coding regions of the rDNA unit. Within the coding regions, variants were three to eight times more abundant in the expansion compared with the conserved core regions. The distribution of variants was largely consistent with models of concerted evolution in which there is uniform recombination across the transcribed portion of the unit with the frequency of standing variants dependent upon the selection pressure to preserve that sequence. However, the 28S gene was found to contain fewer variants than the 18S gene despite evolving 2.5-fold faster. We postulate that the fewer variants in the 28S gene is due to localized gene conversion or DNA repair triggered by the activity of retrotransposable elements that are specialized for insertion into the 28S genes of these species. [Abstract/Link to Full Text]

Stark A, Kheradpour P, Parts L, Brennecke J, Hodges E, Hannon GJ, Kellis M
Systematic discovery and characterization of fly microRNAs using 12 Drosophila genomes.
Genome Res. 2007 Dec;17(12):1865-79.
MicroRNAs (miRNAs) are short regulatory RNAs that inhibit target genes by complementary binding in 3' untranslated regions (3' UTRs). They are one of the most abundant classes of regulators, targeting a large fraction of all genes, making their comprehensive study a requirement for understanding regulation and development. Here we use 12 Drosophila genomes to define structural and evolutionary signatures of miRNA hairpins, which we use for their de novo discovery. We predict >41 novel miRNA genes, which encompass many unique families, and 28 of which are validated experimentally. We also define signals for the precise start position of mature miRNAs, which suggest corrections of previously known miRNAs, often leading to drastic changes in their predicted target spectrum. We show that miRNA discovery power scales with the number and divergence of species compared, suggesting that such approaches can be successful in human as dozens of mammalian genomes become available. Interestingly, for some miRNAs sense and anti-sense hairpins score highly and mature miRNAs from both strands can indeed be found in vivo. Similarly, miRNAs with weak 5' end predictions show increased in vivo processing of multiple alternate 5' ends and have fewer predicted targets. Lastly, we show that several miRNA star sequences score highly and are likely functional. For mir-10 in particular, both arms show abundant processing, and both show highly conserved target sites in Hox genes, suggesting a possible cooperation of the two arms, and their role as a master Hox regulator. [Abstract/Link to Full Text]

Ruby JG, Stark A, Johnston WK, Kellis M, Bartel DP, Lai EC
Evolution, biogenesis, expression, and target predictions of a substantially expanded set of Drosophila microRNAs.
Genome Res. 2007 Dec;17(12):1850-64.
MicroRNA (miRNA) genes give rise to small regulatory RNAs in a wide variety of organisms. We used computational methods to predict miRNAs conserved among Drosophila species and large-scale sequencing of small RNAs from Drosophila melanogaster to experimentally confirm and complement these predictions. In addition to validating 20 of our top 45 predictions for novel miRNA loci, the large-scale sequencing identified many miRNAs that had not been predicted. In total, 59 novel genes were identified, increasing our tally of confirmed fly miRNAs to 148. The large-scale sequencing also refined the identities of previously known miRNAs and provided insights into their biogenesis and expression. Many miRNAs were expressed in particular developmental contexts, with a large cohort of miRNAs expressed primarily in imaginal discs. Conserved miRNAs typically were expressed more broadly and robustly than were nonconserved miRNAs, and those conserved miRNAs with more restricted expression tended to have fewer predicted targets than those expressed more broadly. Predicted targets for the expanded set of microRNAs substantially increased and revised the miRNA-target relationships that appear conserved among the fly species. Insights were also provided into miRNA gene evolution, including evidence for emergent regulatory function deriving from the opposite arm of the miRNA hairpin, exemplified by mir-10, and even the opposite strand of the DNA, exemplified by mir-iab-4. [Abstract/Link to Full Text]

Lin MF, Carlson JW, Crosby MA, Matthews BB, Yu C, Park S, Wan KH, Schroeder AJ, Gramates LS, St Pierre SE, Roark M, Wiley KL, Kulathinal RJ, Zhang P, Myrick KV, Antone JV, Celniker SE, Gelbart WM, Kellis M
Revisiting the protein-coding gene catalog of Drosophila melanogaster using 12 fly genomes.
Genome Res. 2007 Dec;17(12):1823-36.
The availability of sequenced genomes from 12 Drosophila species has enabled the use of comparative genomics for the systematic discovery of functional elements conserved within this genus. We have developed quantitative metrics for the evolutionary signatures specific to protein-coding regions and applied them genome-wide, resulting in 1193 candidate new protein-coding exons in the D. melanogaster genome. We have reviewed these predictions by manual curation and validated a subset by directed cDNA screening and sequencing, revealing both new genes and new alternative splice forms of known genes. We also used these evolutionary signatures to evaluate existing gene annotations, resulting in the validation of 87% of genes lacking descriptive names and identifying 414 poorly conserved genes that are likely to be spurious predictions, noncoding, or species-specific genes. Furthermore, our methods suggest a variety of refinements to hundreds of existing gene models, such as modifications to translation start codons and exon splice boundaries. Finally, we performed directed genome-wide searches for unusual protein-coding structures, discovering 149 possible examples of stop codon readthrough, 125 new candidate ORFs of polycistronic mRNAs, and several candidate translational frameshifts. These results affect >10% of annotated fly genes and demonstrate the power of comparative genomics to enhance our understanding of genome organization, even in a model organism as intensively studied as Drosophila melanogaster. [Abstract/Link to Full Text]

Bhutkar A, Russo SM, Smith TF, Gelbart WM
Genome-scale analysis of positionally relocated genes.
Genome Res. 2007 Dec;17(12):1880-7.
During evolution, genome reorganization includes large-scale events such as inversions, translocations, and segmental or even whole-genome duplications, as well as fine-scale events such as the relocation of individual genes. This latter category, which we will refer to as positionally relocated genes (PRGs), is the subject of this report. Assessment of the magnitude of such PRGs and of possible contributing mechanisms is aided by a comparative analysis of related genomes, where conserved chromosomal organization can aid in identifying genes that have acquired a new location in a lineage of these genomes. Here we utilize two methods to comprehensively identify relocated protein-coding genes in the recently sequenced genomes of 12 species of genus Drosophila. We use exceptions to the general rule of maintenance of chromosome arm (Muller element) association for most Drosophila genes to identify one major class of PRGs. We also identify a partially overlapping set of PRGs among "embedded genes," located within the extents of other surrounding genes. We provide evidence that PRG movements have at least two different origins: Some events occur via retrotransposition of processed RNAs and others via a DNA-based transposition mechanism. Overall, we identify several hundred PRGs that arose within a lineage of the genus Drosophila phylogeny and provide suggestive evidence that a few thousand such events have occurred within the radiation of the insect order Diptera, thereby illustrating the magnitude of the contribution of PRG movement to chromosomal reorganization during evolution. [Abstract/Link to Full Text]

Kheradpour P, Stark A, Roy S, Kellis M
Reliable prediction of regulator targets using 12 Drosophila genomes.
Genome Res. 2007 Dec;17(12):1919-31.
Gene expression is regulated pre- and post-transcriptionally via cis-regulatory DNA and RNA motifs. Identification of individual functional instances of such motifs in genome sequences is a major goal for inferring regulatory networks yet has been hampered due to the motifs' short lengths that lead to many chance matches and poor signal-to-noise ratios. In this paper, we develop a general methodology for the comparative identification of functional motif instances across many related species, using a phylogenetic framework that accounts for the evolutionary relationships between species, allows for motif movements, and is robust against missing data due to artifacts in sequencing, assembly, or alignment. We also provide a robust statistical framework for evaluating motif confidence, which enables us to translate evolutionary conservation into a confidence measure for each motif instance, correcting for varying motif length, composition, and background conservation of the target regions. We predict targets of fly transcription factors and miRNAs in alignments of 12 recently sequenced Drosophila species. When compared to extensive genome-wide experimental data, predicted targets are of high quality, matching and surpassing ChIP-chip microarrays and recovering miRNA targets with high sensitivity. The resulting regulatory network suggests significant redundancy between pre- and post-transcriptional regulation of gene expression. [Abstract/Link to Full Text]

Cornell MJ, Alam I, Soanes DM, Wong HM, Hedeler C, Paton NW, Rattray M, Hubbard SJ, Talbot NJ, Oliver SG
Comparative genome analysis across a kingdom of eukaryotic organisms: Specialization and diversification in the Fungi.
Genome Res. 2007 Dec;17(12):1809-22.
The recent proliferation of genome sequencing in diverse fungal species has provided the first opportunity for comparative genome analysis across a eukaryotic kingdom. Here, we report a comparative study of 34 complete fungal genome sequences, representing a broad diversity of Ascomycete, Basidiomycete, and Zygomycete species. We have clustered all predicted protein-encoding gene sequences from these species to provide a means of investigating gene innovations, gene family expansions, protein family diversification, and the conservation of essential gene functions-empirically determined in Saccharomyces cerevisiae-among the fungi. The results are presented with reference to a phylogeny of the 34 fungal species, based on 29 universally conserved protein-encoding gene sequences. We contrast this phylogeny with one based on gene presence and absence and show that, while the two phylogenies are largely in agreement, there are differences in the positioning of some species. We have investigated levels of gene duplication and demonstrate that this varies greatly between fungal species, although there are instances of coduplication in distantly related fungi. We have also investigated the extent of orthology for protein families and demon