MetaDB: A Metadatabase for the Biological Sciencesbrought to you by Neurotransmitter.net |
| 1. Glomerular Activity Response Archive for the Rat Olfactory Bulb |
URL: http://leonlab.bio.uci.edu/ Categories: Neuroscience Databases "This archive contains the averaged activity maps that we have generated from the glomerular response to selected odorants in rat olfactory bulbs, as assessed by [14]-C 2-deoxyglucose uptake. These response profiles may be searched either by the Chemical Abstracts Service Registry Number (CAS number) of the odorants, odorant name, chemical formula, or chemical features. A detailed description of the procedure used to generate the response maps is provided, along with templates for duplication of the technique. Finally, a profile of identified olfactory bulb glomerular response modules is provided. This template may be printed on a transparent page to allow individualized comparisons in response patterns among odorants to be made." |
| 2. Brain Biodiversity Bank |
URL: http://brancusi.usc.edu/bkms/ Categories: Neuroscience Databases "The Brain Biodiversity Bank refers to the repository of images of and information about brain specimens contained in the collections associated with the National Museum of Health and Medicine at the Armed Forces Institute of Pathology in Washington, DC. These collections include, besides the Michigan State University Collection, the Welker Collection from the University of Wisconsin, the Yakovlev-Haleem Collection from Harvard University, the Meyer Collection from the Johns Hopkins University, and the Huber-Crosby and Crosby-Lauer Collections from the University of Michigan. Our purpose here is to provide some examples of ways in which images and information from the Collections, in digital electronic format, can be used in educational, research and commercial enterprises." |
| 3. BrainInfo: A Primate Brain Information System |
URL: http://braininfo.rprc.washington.edu/ Categories: Neuroscience Databases "BrainInfo is a website that helps one identify structures in the brain and provides many different kinds of information about each structure. It consists of three basic knowledge bases: NeuroNames, which provides the index to brain structures and narrative information about them; the Template Atlas, which shows the structures that are found in the primate brain; and NeuroMaps, a set of several hundred overlays that will show the location of different kinds of information that have been mapped to the standard background maps (templates) of the Atlas. Information about brain structures in other species, particularly the human, is provided by links to other websites." |
| 4. BrainWeb: Simulated Brain Database |
URL: http://www.bic.mni.mcgill.ca/brainweb/ Categories: Neuroscience Databases "As the interest in the computer-aided, quantitative analysis of medical image data is growing, the need for the validation of such techniques is also increasing. Unfortunately, there exists no `ground truth' or gold standard for the analysis of in vivo acquired data. These pages provide a solution to the validation problem, in the form of a Simulated Brain Database (SBD). The SBD contains a set of realistic MRI data volumes produced by an MRI simulator. These data can be used by the neuroimaging community to evaluate the performance of various image analysis methods in a setting where the truth is known. Currently, the SBD contains simulated brain MRI data based on two anatomical models: normal and multiple sclerosis (MS). For both of these, full 3-dimensional data volumes have been simulated using three sequences (T1-, T2-, and proton-density- (PD-) weighted) and a variety of slice thicknesses, noise levels, and levels of intensity non-uniformity. These data are available for viewing in three orthogonal views (transversal, sagittal, and coronal), and for downloading." |
| 5. CoCoDat: Collation of Cortical [single neuron + neuronal microcircuitry] Data |
URL: http://www.cocomac.org/cocodat/ Categories: Neuroscience Databases "CoCoDat is a microcircuitry database that contains not only bibliographic references, but also data and parameter values from published experimental reports. The data characterize the experimental procedures, the brain structure (region, layer, neuron type and cellular compartment), as well as the experimental results obtained in the six categories: Morphology, Firing properties, Ionic currents, Ionic conductances, Synaptic currents, and Connectivity." |
| 6. CoCoMac |
URL: http://cocomac.org/ Categories: Neuroscience Databases "CoCoMac (Collations of Connectivity data on the Macaque brain) is our approach to produce a systematic record of the known wiring of the primate brain. The main database contains details of hundreds of tracing studies in their original descriptions. Further data are continuously added. To overcome the problem of divergent brain maps we developed ORT (Objective Relational Transformation), an algorithmic method to convert data in a coordinate- independent way based on logical relations between areas in different brain maps. We use CoCoMac data to analyse the organisation of the cerebral cortex, and to establish its structure- function relationships. This includes multi-variate statistics and computer simulation of models that take into account the real anatomy of the primate cerebral cortex." |
| 7. The Talairach Daemon |
URL: http://ric.uthscsa.edu/RIC_WWW.data/Components/talairach/talairachdaemon.html Categories: Neuroscience Databases "The Talairach Daemon (TD) is a high-speed database server for querying and retrieving data about human brain structure over the internet. The core components of this server are a unique memory-resident application and memory-resident databases. The memory-resident design of the TD server provides high-speed access to its data. This is supported by using TCP/IP sockets for communications and by minimizing the amount of data transferred during transactions. By keeping most transactions to a low number of bytes (less than 50 generally), even slow throughput network transfers (1 Kbyte/sec) should have reasonable response times. The TD server data is searched using x-y-z coordinates resolved to 1x1x1 mm volume elements within a standardized stereotaxic space. An array, indexed by x-y-z coordinates, that spans 170 mm (x), 210 mm (y) and 200 mm (z), provides high-speed access to data. Array dimensions were selected to be approximately 25% larger than those of the Co-planar Stereotaxic Atlas of the Human Brain (Talairach and Tournoux, 1988). Coordinates tracked by the TD server are spatially consistent with the Talairach Atlas. Each array location stores a pointer to a relation record that holds data describing what is present at the corresponding coordinate. Presently, the data in relation records are either Structure Probability Maps (SP Maps) or Talairach Atlas Labels, though others can be easily added. The relation records are implemented as linked lists to names and values for brain structures." |
| 8. International Consortium for Brain Mapping (ICBM) Subject Database |
URL: https://services.loni.ucla.edu/ida/login.jsp?project=ICBM&search=true Categories: Neuroscience Databases "The ICBM Subject Database has been constructed to provide an effective means for archival and protection of collaborator collected image data. The goal of this software is to provide a convenient mechanism for searching the existence of particular image data while protecting its usage at the same time. We have built the appropriate database query mechanisms to ensure that no image data or identifying patient information is accessible to the outside world or to any others without the appropriate authorization and the expressed permission to release data from the collaborator that acquired and provided the data. The ICBM Subject Database may be queried using a combination of demographic and image-related attributes. Authorized investigations may form collections of images to download." |
| 9. The GENESIS Neural Database and Modeler's Workspace ChannelDB |
URL: http://www.genesis-sim.org/hbp/ Categories: Neuroscience Databases "A realistic neuronal model represents a modeler's understanding of the structure and function of a part of the nervous system. As the number of neurobiologists constructing realistic models continues to grow, and as the models become ever more sophisticated, they collectively represent a significant accumulation of knowledge about the structural and functional organization of nervous systems. But at the same time, locating appropriate models and interpreting them becomes increasingly more difficult as the number of online model and experimental databases grows. The central motivation for the Modeler's Workspace project is to address these problems. With support from The Human Brain Project, we began by exploring the construction of a brain database based on our existing neural simulation system, GENESIS. This was a feasibility study for a novel approach to neural database construction, organization, and interaction. The Modeler's Workspace was originally conceived as the user interface to this system. As the design has evolved, the creation of a next-generation interface for collaborative neural simulations has become our goal. Although the initial version uses GENESIS as the simulator, the design permits the use of multiple simulation systems, with or without the use of a database. This allows modeling at multiple levels of scale from the molecular level, through the subcellular (e.g. ion channel), single cell, and network levels, to the systems level (e.g. relating models to fMRI studies). The Modeler's Workspace is a collection of software tools that enable users to interact over the WWW with databases of models and data. It provides facilities for: searching multiple remote databases for model components based on various criteria; visualizing the characteristics of the components retrieved; creating new components, either from scratch or derived from existing models; combining components into new models; linking models to experimental data as well as online publications; and interacting with simulation packages such as GENESIS to simulate the new constructs. ... We are now in the phase of implementing the core components of the Modelers Workspace (MWS). The first of these is ChannelDB, an implementation of a database of ionic conductance models stored in simulator-independent NeuroML format. At present, ChannelDB is implemented as a stand-alone module, with its own graphical user interface to the database, which is implemented with MySQL. After further development, the ChannelDB GUI will be merged into the MWS." |
| 10. Identified Neuron Database Project |
URL: http://n002bsel.bios.uic.edu/ Categories: Neuroscience Databases "NEUROPAD is a database of identified insect neurons. It was developed during research studies of insect nervous systems*. It is available on this web site as Version 3.0.1 which supersedes the earlier releases 1.1 and 2.0 (Version 2.0 also is available free to anyone interested). NEUROPAD allows structural information about nerve cells to be mapped into an idealized plan of the central nervous system [CNS] and it stores that information along with 1) relevant physiological and behavioral observations, and 2) reproductions of the original anatomical description of each cell, and other relevant data from peer reviewed publications. NEUROPAD was designed with orthopteroid insects (such as crickets and cockroaches) in mind. However, it contains information on cells from a number of different insect species, with slightly different CNS organizational schemes, such as differing numbers of ganglia." |
| 11. SumsDB: Surface Management Systems DataBase |
URL: http://brainmap.wustl.edu:8081/sums/index.jsp Categories: Neuroscience Databases "SumsDB (Surface Management Systems DataBase) is a repository of brain-mapping data developed in the Van Essen laboratory. It emphasizes cortical surface-based representations, but also contains whole-brain volume data. SumsDB includes surface-based atlases of cerebral and cerebellar cortex in primates (human, macaque) and rodents (mouse, rat). Many types of experimental data pertaining to cortical structure and function can be viewed on these atlases. SumsDB also contains extensive data from individual experimental hemispheres." |
| 12. IBSR: Internet Brain Segmentation Repository |
URL: http://www.cma.mgh.harvard.edu/ibsr/ Categories: Neuroscience Databases "Its purpose is to encourage the development and evaluation of segmentation methods by providing raw test and image data, human expert segmentation results, and methods for comparing segmentation results. ... This repository is meant to contain standard test image data sets which will permit a standardized mechanism for evaluation of the sensitivity of a given analysis method to signal to noise ratio, contrast to noise ratio, shape complexity, degree of partial volume effect, etc. This capability is felt to be essential to further development in the field since many published algorithms tend to only operate successfully under a narrow range of conditions which may not extend to those experienced under the typical clinical imaging setting. This repository is also meant to describe and discuss methods for the comparison of results." |
| 13. IBVD: Internet Brain Volume Database |
URL: http://www.cma.mgh.harvard.edu/ibvd/ Categories: Neuroscience Databases "The goal of IBVD is to provide a web-based searchable database of brain neuroanatomic volumetric observations. This is designed to access both group volumetric results as well as volume observations in individual cases. A major thrust effort is to enable electronic access to the results that exist in the published literature. Currently, there is quite limited electronic or searchable methods for the data observations that are contained in publications. This effort will facilitate the disemination of volumetric observations by making a more complete corpus of volumetric observations findable to the neuroscience researcher. This also enhances the ability to perform comparative and integrative studies, as well as metaanalysis. Extensions that permit pre-published, non-published and other representation are planned, again to facilitate comparitive analyses." |
| 14. CellPropDB: Cellular Properties Database |
URL: http://senselab.med.yale.edu/senselab/CellPropDB/ Categories: Neuroscience Databases "Cellular Properties Database (CellPropDB) provides a simple repository for data regarding membrane channels, receptor and neurotransmitters that are expressed in specific types of cells. The database is presently focused on neurons but will eventually include other cell types, such as glia, muscle, and gland cells." |
| 15. NeuronDB: Neuron Database |
URL: http://senselab.med.yale.edu/senselab/NeuronDB/ Categories: Neuroscience Databases "NeuronDB provides a dynamically searchable database of three types of neuronal properties: voltage gated conductances, neurotransmitter receptors, and neurotransmitter substances. It contains tools that provide for integration of these properties in a given type of neuron and compartment, and for comparison of properties across different types of neurons and compartments." |
| 16. ModelDB: Model Database |
URL: http://senselab.med.yale.edu/senselab/ModelDB/ Categories: Neuroscience Databases "ModelDB provides an accessible location for storing and efficiently retrieving compartmental neuron models. ModelDB is tightly coupled with NeuronDB. Models can be coded in any language for any environment, though ModelDB has been initially constructed for use with NEURON and GENESIS." |
| 17. OdorDB: Odor Molecule Database |
URL: http://senselab.med.yale.edu/senselab/OdorDB/ Categories: Microarray Data and other Gene Expression Databases "Odor molecule Database (OdorDB) contains data on the odor molecules that have been shown to interact with different olfactory receptors. It is aimed at helping to solve the unprecedented problem of identifying the preferred odor ligands among thousands of potential molecules for the hundreds of different olfactory receptors." |
| 18. OdorMapDB: Olfactory Bulb Odor Map DataBase |
URL: http://senselab.med.yale.edu/senselab/OdorMapDB/default.asp Categories: Neuroscience Databases "OdorMapDB is designed to be a database to support the experimental analysis of the molecular and functional organization of the olfactory bulb and its basis for the perception of smell. It is primarily concerned with archiving, searching and analysing maps of the olfactory bulb generated by different methods. The first aim is to facilitate comparison of activity patterns elicited by odor stimulation in the glomerular layer obtained by different methods in different species. It is further aimed at facilitating comparison of these maps with molecular maps of the projections of olfactory receptor neuron subsets to different glomeruli, especially for gene targeted animals and for antibody staining." |
| 19. GENSAT Bacterial Artificial Chromosome Transgenics Project |
URL: http://www.gensat.org/index.html Categories: Neuroscience Databases "The Gensat database contains a gene expression atlas of the central nervous system of the mouse based on bacterial artificial chromosomes (BACs). In each of the BAC transgenic vectors, endogenous protein coding sequences have been replaced by sequences encoding the EGFP reporter gene. As in any gene replacement experiment, the stability of the reporter gene can vary somewhat from the endogenous gene. Thus these results measure the relative rates of transcription for each gene; they are not a direct measure of mRNA accumulation or of protein abundance for the endogenous gene products. Furthermore, the enhanced sensitivity of reporter gene assays, particularly in BAC lines carrying multiple copies of the BAC transgene may allow detection of sites of expression that are not evident in situ hybridization experiments. This database contains histological data from given BAC transgenic mouse lines at three developmental stages – embryonic day 15.5 (E15.5), postnatal day 7 (P7) and adult; in all cases the data represent results of multiple transgenic lines. EGFP is visualized by staining with an anti-EGFP antibody using the DAB method, or by confocal microscopy of unstained tissue sections. Protocols for the modification of BACs, BAC transgenesis production and histology are provided." |
| 20. National Brain Databank: Brain Tissue Gene Expression Repository |
URL: http://132.183.217.124/brainbank/index.jsp Categories: Neuroscience Databases "The idea of creating a National Brain Databank has been on the agenda for the Harvard Brain Tissue Resource Center (HBTRC) for several years and the National Institute of Neurological Disease and Stroke (NINDS) and the National Institute of Mental Health (NIMH) has funded the implementation of this proposal. Since July 2003, the HBTRC initiated development of the National Brain Databank in conjunction with Akaza Research a biomedical informatics consulting firm based in Cambridge, MA. The system was developed using the Java J2EE application platform and the PostgreSQL database. It is designed to incorporate MIAME and MAGE-ML based microarray data sharing standards in the future. The initial version of the National Brain Databank is publicly released in April 2004 and continues to be further developed, based on ongoing usage and feedback from users. All of the data that is derived from studies of the HBTRC collection is being incorporated into the National Brain Databank. This data is available to the general public, although strict precautions are undertaken to maintain the confidentiality of the brain donors and their family members. These precautions include the use of anonymized numbers and restricted access to demographic information. For professional scientists who will require access to confidential information to complete their studies, a username and access code will be made available, after they have reviewed the HIPAA requirements and have agreed to abide by them. Data from various types of studies conducted on brain tissue in the HBTRC collection will be available from studies using different technologies, such as gene expression profiling, quantitative RT-PCR, situ hybridization, and immunocytochemistry and will have the potential for providing powerful insights into the subregional and cellular distribution of genes and/or proteins in different brain regions and eventually in specific subregions and cellular subtypes. All qualified investigators who would like to gain access to more detailed information regarding the subjects (including diagnostic reports on postmortem brain tissue) must demonstrate that they are aware of the HIPAA requirements for confidentiality by reviewing information that appears in the privacy policy on this website." |
| 21. BrainMap |
URL: http://brainmap.org/ Categories: Neuroscience Databases "BrainMap is an online database of published functional neuroimaging experiments with coordinate-based (Talairach) activation locations. The goal of BrainMap is to provide a vehicle to share methods and results of brain functional imaging studies. It is a tool to rapidly retrieve and understand studies in specific research domains, such as language, memory, attention, reasoning, emotion, and perception, and to perform meta-analyses of like studies." |
| 22. BrainML |
URL: http://brainml.org Categories: Neuroscience Databases "This site was created by the Laboratory of Neuroinformatics to describe BrainML and to serve as a repository for BrainML models. (A BrainML model is an XML Schema and optional vocabulary files describing a data model for electronic representation of neuroscience data, including data types, formats, and controlled vocabulary." |
| 23. Conexus |
URL: http://mallorn.ucdavis.edu/conexus/ Categories: Neuroscience Databases "The goal of the Conexus project is to build a graphical database of neuroanatomical connections, focussing on thalamo-cortical and cortico-cortical connections in the macaque monkey. Conexus will allow users to enter data, and also to search for and analyze patterns of data gathered from different experiments, by different investigators, reported in a variety of journals, together into one unified macaque atlas. Equipped with the proper search and 3D visualization tools, it will allow students, modelers, and experimentalists to learn about the available data on neuroanatomical connections, or to compare their own findings to existing data. Currently, the focus of the project is on the alignment tools that are necessary to visualize large numbers of immunohistologically stained sections together in 3D." |
| 24. FME: Foundational Model Explorer |
URL: http://sig.biostr.washington.edu/projects/fm/FME/index.html Categories: Anatomy Databases "The Foundational Model Explorer (FME) is an Internet based software application developed for viewing the content and organization of the Digital Anatomist Foundational Model (FMA). It was developed by the Structural Informatics Group at the University of Washington. The initial purpose of the FME was to provide a simple and intuitive interface to the FMA for domain experts, in the field of anatomy, participating in the evaluation of the FMA. The FME also provides an easily available method of exploring the FMA to individuals or groups considering the adoption of the Foundational Model of Anatomy knowledge base." |
| 25. Language Map Experiment Management System |
URL: http://tela.biostr.washington.edu/cgi-bin/repos/bmap_repo/main-menu.pl Categories: Neuroscience Databases "Bmap_repo is a web-based experiment management system for human brain mapping data. It is currently designed to manage language map data acquired during neurosurgery for tumors or intractable epilepsy, and during MR functional imaging studies. We are working to generalize these methods so that they are applicable to other brain mapping applications. ... Bmap_repo permits web-based collaborative experimental data management, currently among investigators in different departments at the University of Washington. The data are primarily obtained from patients of George Ojemann in Neurosurgery, as a result of cortical stimulation language mapping (CSM), which is performed to plan the operation for intractable epilepsy or tumors. The imaging protocols were designed by Ken Maravilla in Radiology, and David Corina in Psychology. The data are processed and managed under the direction of Jim Brinkley in the UW Structural Informatics Group in Biological Structure. Additional data analysis is done by members of David Corina's lab in Psychology." |
| 26. LONI (UCLA Laboratory of Neuro Imaging) Image Database |
URL: https://services.loni.ucla.edu/ida/login.jsp?search=true Categories: Neuroscience Databases "The LONI Image Database has been constructed to provide an effective means for archival and protection of collaborator collected image data. The goal of this software is to provide a convenient mechanism for searching the existence of particular image data while protecting its usage at the same time. We have built the appropriate database query mechanisms to ensure that no image data or identifying patient information is accessible to the outside world or to any others without the appropriate authorization and the expressed permission to release data from the collaborator that acquired and provided the data." |
| 27. Nervenet.org: The Informatics Center for Mouse Neurogenetics |
URL: http://www.nervenet.org Categories: Brain Atlases, Neuroscience Databases "This server hosts the Mouse Brain Library, an expanding collection of high-resolution histological images, atlases, MRIs, and databases on brain structure of more than 120 different lines of mice. Nervenet also includes several useful genetics and gene mapping databases to download (SNP databases, Map Manager databases, and the Portable Dictionary of the Mouse Genome). The publications section includes revised, expanded, and annotated papers, tutorials, and reviews on neurogenetics, gene mapping, complex trait analysis, stereology, and the control of neuron number." |
| 28. The NeSys Database on Brain Map Transformations in Cerebellar Systems |
URL: http://www.nesys.uio.no/ Categories: Neuroscience Databases "The aim of this database is to provide structure and structure-function data about brain map transformations in cerebellar systems. The present version is a web based archive based on data from 4 original publications and represents Project Phase 2 (reached in 2004). It includes data on the organization of projections to the pontine nuclei from three cortical areas: primary and secondary somatosensory areas (SI and SII), and the primary motor cortex (MI). Axonal tracer substances are injected into electrophysiologically defined locations in these areas, and distributions of terminal fields of labeling in the pontine nuclei are computer reconstructed in 3-D and transferred to a common, standardized coordinate system." |
| 29. AGNS (Arabidopsis GeneNet supplementary) Database |
URL: http://emj-pc.ics.uci.edu/mgs/dbases/agns/ Categories: Arabidopsis thaliana Databases "The aim of AGNS is to create an Internet available resource accumulating the data on detailed description of the experimental results and observed expression of the Arabidopsis genes at the levels of mRNA, protein, cell, tissue and ultimately at the levels of the organ and organism and in different genotypes from annotations of published papers. AGNS consists now of two databases, the Expression Database (ED) and the Phenotype Database (PD), and two controlled vocabularies. The ED describes gene expression in wild type, mutant and transgenic plants. The PD contains information on phenotypic abnormalities in mutant and transgenic plants. The RD contains references to the papers together with a description of plant growth conditions with an indication of the ecotypes used as control in the experiments. Both PD and ED have links to the PubMed and items in controlled vocabularies. Controlled vocabularies contain information on description of organs, tissues and cells both in the mature plant and at different developmental stages and description of developmental stages of the plant itself and of its separate organs. The most frequently used names of the stages, organs are highlighted and their synonyms are given. Every description of stages and organs is accompanied by detailed commentaries. All AGNS data have references to the papers from which they were annotated. Thus, AGNS accumulates information on the available Arabidopsis morphology and development and gene expression patterns in the wild type and in different mutants and transgenic lines, which is systematized and compared. AGNS makes possible search for genes expressed in particular organs, at particular stages, for genes whose expression is altered in particular mutants, and for mutants having similar phenotypic abnormalities." |
| 30. NTSA Workbench Database |
URL: http://soma.npa.uiuc.edu/ntsa/ Categories: Neuroscience Databases "The core of the database system is the 'card catalog' of the NTSA Workbench which describes the characteristics of the time series neuronal data that can be searched. This descriptive information is referred to as metadata, whereas the neuronal and behavioral time series data records themselves are referred to as raw data. The database Table Schema determines how the metadata are organized in the database. A hierarchical organization was adopted for the NTSA Workbench database Table Schema. This design provides a natural fit to the structure of neuroscience experiments, which can in most instances be described with reference to the following hierarchy: laboratory, experiment, subject, session, series and trial. Here, laboratory refers to a specific research group (e.g., the research of Co-PI David Clayton), experiment refers to a specific study in that laboratory (e.g., of songbird auditory thalamic neuron responses to natural and scrambled species-typical songs), subject refers to the experimental animal (e.g., zebra finch1255), session denotes the specific time and setting of neuronal data collection, series refers to sets of consecutive data collection episodes within a session having similar characteristics and trial refers to the individual data collection episodes, often defined by presentation of a single stimulus. Unlimited metadata descriptors can be used at each level to specify the experimental treatments." |
| 31. PPID: Protein-Protein Interaction Database |
URL: http://www.ppid.org/ Categories: Intermolecular Interactions and Signaling Pathways Databases "The Protein-Protein Interaction Database (PPID) was constructed to integrate a gamut of biological/bibliographical/molecular data and build a framework which might help understanding how cells orchestrate their protein content in order to become what they are: machines with a purpose. This is based on the simple paradigm that functionality like signal transduction cascades are held together in a close space, thereby allowing specific events to occur without the necessity of passive diffusion and random events." |
| 32. DDBJ: DNA Data Bank of Japan |
URL: http://www.ddbj.nig.ac.jp Categories: Nucleotide Sequences: International Nucleotide Sequence Database Collaboration Databases In the past year, we at DDBJ (DNA Data Bank of Japan; http://www.ddbj.nig.ac.jp) collected and released 1 066 084 entries or 718 072 425 bases including the whole chromosome 22 of chimpanzee, the whole-genome shotgun sequences of silkworm and various others. On the other hand, we hosted workshops for human full-length cDNA annotation and participated in jamborees of mouse full-length cDNA annotation. The annotated data are made public at DDBJ. We are also in collaboration with a RIKEN team to accept and release the CAGE (Cap Analysis Gene Expression) data under a new category, MGA (Mass Sequences for Genome Annotation). The data will be useful for studying gene expression control in many aspects. Citation for the above abstract: Tateno, Y., Saitou, N., Okubo, K., Sugawara, H., Gojobori, T. DDBJ in collaboration with mass-sequencing teams on annotation Nucl. Acids Res. 2005 33: D25-28 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D25 |
| 33. AluGene |
URL: http://alugene.tau.ac.il/ Categories: Human Genome Databases, Maps, and Viewers, Nucleotide Sequences: Coding and Non-coding DNA Databases Alu elements are short interspersed elements (SINEs) 300 nucleotides in length. More than 1 million Alus are found in the human genome. Despite their being genetically functionless, recent findings suggest that Alu elements may have a broad evolutionary impact by affecting gene structures, protein sequences, splicing motifs and expression patterns. Because of these effects, compiling a genomic database of Alu sequences that reside within protein-coding genes seemed a useful enterprise. Presently, such data are limited since the structural and positional information on genes and Alu sequences are scattered throughout incompatible and unconnected databases. AluGene (http://Alugene.tau.ac.il/) provides easy access to a complete Alu map of the human genome, as well as Alu-associated information. The Alu elements are annotated with respect to coding region and exon/intron location. This design facilitates queries on Alu sequences, locations, as well as motifs and compositional properties via a one-stop search page. Citation for the above abstract: Dagan, Tal, Sorek, Rotem, Sharon, Eilon, Ast, Gil, Graur, Dan AluGene: a database of Alu elements incorporated within protein-coding genes Nucl. Acids Res. 2004 32: D489-492 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D489 |
| 34. EMBL Nucleotide Sequence Database |
URL: http://www.ebi.ac.uk/embl/ Categories: Nucleotide Sequences: International Nucleotide Sequence Database Collaboration Databases The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data. Citation for the above abstract: Kanz, Carola, Aldebert, Philippe, Althorpe, Nicola, Baker, Wendy, Baldwin, Alastair, Bates, Kirsty, Browne, Paul, van den Broek, Alexandra, Castro, Matias, Cochrane, Guy, Duggan, Karyn, Eberhardt, Ruth, Faruque, Nadeem, Gamble, John, Diez, Federico Garcia, Harte, Nicola, Kulikova, Tamara, Lin, Quan, Lombard, Vincent, Lopez, Rodrigo, Mancuso, Renato, McHale, Michelle, Nardone, Francesco, Silventoinen, Ville, Sobhany, Siamak, Stoehr, Peter, Tuli, Mary Ann, Tzouvara, Katerina, Vaughan, Robert, Wu, Dan, Zhu, Weimin, Apweiler, Rolf The EMBL Nucleotide Sequence Database Nucl. Acids Res. 2005 33: D29-33 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/abstract/33/suppl_1/D29 |
| 35. GenBank |
URL: http://www.ncbi.nlm.nih.gov/Genbank/index.html Categories: Nucleotide Sequences: International Nucleotide Sequence Database Collaboration Databases GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 205 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the Web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at www.ncbi.nlm.nih.gov. Citation for the above abstract: Benson, Dennis A., Karsch-Mizrachi, Ilene, Lipman, David J., Ostell, James, Wheeler, David L. GenBank Nucl. Acids Res. 2006 34: D16-20 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D16 |
| 36. ACLAME: A CLAssification of genetic Mobile Elements |
URL: http://aclame.ulb.ac.be/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases The ACLAME database (http://aclame.ulb.ac.be) is a collection and classification of prokaryotic mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons. In addition to providing information on the full genomes and genetic entities, it aims to build a comprehensive classification of the functional modules of MGEs at the protein, gene and higher levels. This first version contains a comprehensive classification of 5069 proteins from 119 DNA bacteriophages into over 400 functional families. This classification was produced automatically using TRIBE-MCL, a graph-theory-based Markov clustering algorithm that uses sequence measures as input, and then manually curated. Manual curation was aided by consulting annotations available in public databases retrieved through additional sequence similarity searches using Psi-Blast and Hidden Markov Models. The database is publicly accessible and open to expert volunteers willing to participate in its curation. Its web interface allows browsing as well as querying the classification. The main objectives are to collect and organize in a rational way the complexity inherent to MGEs, to extend and improve the inadequate annotation currently associated with MGEs and to screen known genomes for the validation and discovery of new MGEs. Citation for the above abstract: Leplae, Raphael, Hebrant, Aline, Wodak, Shoshana J., Toussaint, Ariane ACLAME: A CLAssification of Mobile genetic Elements Nucl. Acids Res. 2004 32: D45-49 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D45 |
| 37. Ciliate MDS/IES Database |
URL: http://oxytricha.princeton.edu/dimorphism/database.htm Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases Ciliated protozoa have two kinds of nuclei: Macronuclei (MAC) and Micronuclei (MIC). In some ciliate classes, such as spirotrichs, most genes undergo several layers of DNA rearrangement during macronuclear development. Because of such processes, these organisms provide ideal systems for studying mechanisms of recombination and gene rearrangement. Here, we describe a database that contains all spirotrich genes for which both MAC and MIC versions are sequenced, with consistent annotation and easy access to all the features. An interface to query the database is available at http://oxytricha.princeton.edu/dimorphism/database.htm. Citation for the above abstract: Cavalcanti, Andre R. O., Clarke, Thomas H., Landweber, Laura F. MDS_IES_DB: a database of macronuclear and micronuclear genes in spirotrichous ciliates Nucl. Acids Res. 2005 33: D396-398 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D396 |
| 38. CORG: a database for COmparative Regulatory Genomics |
URL: http://corg.molgen.mpg.de/ Categories: Model Organisms and Comparative Genomics Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases, RNA Sequence Databases Sequence conservation in non-coding, upstream regions of orthologous genes from man and mouse is likely to reflect common regulatory DNA sites. Motivated by this assumption we have delineated a catalogue of conserved non-coding sequence blocks and provide the CORG—‘COmparative Regulatory Genomics’—database. The data were computed based on statistically significant local suboptimal alignments of 15 kb regions upstream of the translation start sites of, currently, 10 793 pairs of orthologous genes. The resulting conserved non-coding blocks were annotated with EST matches for easier detection of non-coding mRNA and with hits to known transcription factor binding sites. CORG data are accessible from the ENSEMBL web site via a DAS service as well as a specially developed web service (http://corg.molgen.mpg.de) for query and interactive visualization of the conserved blocks and their annotation. Citation for the above abstract: Dieterich, C., Wang, H., Rateitschak, K., Luz, H., Vingron, M. CORG: a database for COmparative Regulatory Genomics Nucl. Acids Res. 2003 31: 55-57 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/55 |
| 39. CUTG: Codon Usage Tabulated from GenBank |
URL: http://www.kazusa.or.jp/codon/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes. Citation for the above abstract: Nakamura, Yasukazu, Gojobori, Takashi, Ikemura, Toshimichi Codon usage tabulated from international DNA sequence databases: status for the year 2000 Nucl. Acids Res. 2000 28: 292- © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/292 |
| 40. Entrez Gene |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez. Citation for the above abstract: Maglott, Donna, Ostell, Jim, Pruitt, Kim D., Tatusova, Tatiana Entrez Gene: gene-centered information at NCBI Nucl. Acids Res. 2005 33: D54-58 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D54 |
| 41. FREP: Functional Repeats in Mouse cDNAs |
URL: http://facts.gsc.riken.go.jp/FREP/ Categories: Model Organisms and Comparative Genomics Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases The FREP database (http://facts.gsc.riken.go.jp/FREP/) contains 31 396 RepeatMasker-identified non-redundant variant repeat sequences derived from 16 527 mouse cDNAs with protein-coding potential. The repeats were computationally associated with potential effects on transcriptional variation, translation, protein function or involvement in disease to identify Functional REPeats (FREPs). FREPs are defined by the (i) occurrence of exon–exon boundaries in repeats, (ii) presence of polyadenylation sites in 3'UTR-located repeats, (iii) effect on translation, (iv) position in the protein- coding region or protein domains or (v) conditional association with disease MeSH terms. Currently the database contains 9261 (29.5%) inferred FREPs derived from 6861 (41.5%) mouse cDNAs. Integrated evidence of the functional assignments and dynamically generated sequence similarity search results support the exploration and annotation of functional, ancestral or taxon-specific repeats. Keyword and pre-selected feature searches (e.g. coding sequence–repeat or splice site–repeat relations) support intuitive database querying as well as the retrieval of repeat sequences. Integrated sequence search and alignment tools allow the analysis of known or identification of new functional repeat candidates. FREP is a unique resource for illuminating the role of transposons and repetitive sequences in shaping the coding part of the mouse transcriptome and for selecting the appropriate experimental model to study diseases with suspected repeat etiology contributions. Citation for the above abstract: Nagashima, Takeshi, Matsuda, Hideo, Silva, Diego G., Petrovsky, Nikolai, RIKEN GER Group, , GSL Members, , Konagaya, Akihiko, Schonbach, Christian FREP: a database of functional repeats in mouse cDNAs Nucl. Acids Res. 2004 32: D471-475 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D471 |
| 42. Genetic Codes |
URL: http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases "NCBI takes great care to ensure that the translation for each coding sequence (CDS) present in GenBank records is correct. Central to this effort is careful checking on the taxonomy of each record and assignment of the correct genetic code (shown as a /transl_table qualifier on the CDS in the flat files) for each organism and record. This page summarizes and references this work." |
| 43. Islander: Database of Genomic Islands |
URL: http://129.79.232.60/cgi-bin/islander/islander.cgi Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, Prokaryote Databases Prokaryotic chromosomes often contain islands, such as temperate phages or pathogenicity islands, delivered by site-specific integrases. Integration usually occurs within a tRNA or tmRNA gene, splitting the gene, yet sequences within the island restore the disrupted gene. The regenerated RNA gene and the displaced fragment of that gene thus mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm generates a list of tRNA and tmRNA genes, uses each as the query for a BLAST search of the starting DNA and removes unlikely hits through a series of filters. A search for islands in 106 whole bacterial genomes produced 143 candidates, with the search itself providing an estimate of three false candidates among these. Preliminary phylogenetic analysis of the associated integrases reduced this set to 89 cases of independently evolved site specificity, which showed strong bias for the tmRNA gene. The website Islander (http://www.indiana.edu/islander) presents the candidate islands in GenBank-style files and correlates integrase phylogeny with site specificity. Citation for the above abstract: Mantri, Yogita, Williams, Kelly P. Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities Nucl. Acids Res. 2004 32: D55-58 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D55 |
| 44. L1Base |
URL: http://l1base.molgen.mpg.de/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases L1Base is a dedicated database containing putatively active LINE-1 (L1) insertions residing in human and rodent genomes that are as follows: (i) intact in the two open reading frames (ORFs), full-length L1s (FLI-L1s) and (ii) intact ORF2 but disrupted ORF1 (ORF2-L1s). In addition, due to their regulatory potential, the full-length (>6000 bp) non-intact L1s (FLnI-L1s) were also included in the database. Application of a novel annotation methodology, L1Xplorer, allowed in-depth annotation of functional sequence features important for L1 activity, such as transcription factor binding sites and amino acid residues. The L1Base is available online at http://l1base.molgen.mpg.de. In addition, the data stored in the database can be accessed from the Ensembl web browser via a DAS service (http://l1das.molgen.mpg.de:8080/das). Citation for the above abstract: Penzkofer, Tobias, Dandekar, Thomas, Zemojtel, Tomasz L1Base: from functional annotation to prediction of active LINE-1 elements Nucl. Acids Res. 2005 33: D498-500 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D498 |
| 45. MethDB DNA Methylation Database |
URL: http://www.methdb.de/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases Methylation of cytosine in the 5 position of the pyrimidine ring is a major modification of the DNA in most organisms. In eukaryotes, the distribution and number of 5-methylcytosines (5mC) along the DNA is heritable but can also change with the developmental state of the cell and as a response to modifications of the environment. While DNA methylation probably has a number of functions, scientific interest has recently focused on the gene silencing effect methylation can have in eukaryotic cells. In particular, the discovery of changes in the methylation level during cancer development has increased the interest in this field. In the past, a vast amount of data has been generated with different levels of resolution ranging from 5mC content of total DNA to the methylation status of single nucleotides. We present here a database for DNA methylation data that attempts to unify these results in a common resource. The database is accessible via WWW (http://www.methdb.de). It stores information about the origin of the investigated sample and the experimental procedure, and contains the DNA methylation data. Query masks allow for searching for 5mC content, species, tissue, gene, sex, phenotype, sequence ID and DNA type. The output lists all available information including the relative gene expression level. DNA methylation patterns and methylation profiles are shown both as a graphical representation and as G/A/T/C/5mC-sequences or tables with sequence positions and methylation levels, respectively. Citation for the above abstract: Grunau, Christoph, Renault, Eric, Rosenthal, Andre, Roizes, Gerard MethDB--a public database for DNA methylation data Nucl. Acids Res. 2001 29: 270-274 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/270 |
| 46. MICdb: Database of Prokaryotic Microsatellites |
URL: http://210.212.212.7/MIC/index.html Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, Taxonomy and Identification Databases The MICdb (Microsatellites Database) (http://www.cdfd.org.in/micas) is a comprehensive relational database of non-redundant microsatellites extracted from fully sequenced prokaryotic genomes. The current version (1.0) of the database has been compiled from 83 genomes belonging to different phylogenetic groups. This database has been linked to MICAS, the web-based Microstatellite Analysis Server. MICAS provides a user-friendly front-end to systematically extract data on microsatellite tracts from genomes. The database contains the following information pertaining to the microsatellites: the regions (coding/non-coding, if coding, their GenBank annotations) containing microsatellite tracts; the frequencies of their occurrences, the size and the number of repeating motifs; and the sequences of the tracts. MICAS also provides an interface to Autoprimer, a primer design program to automatically design primers for selected microsatellite loci. Citation for the above abstract: Sreenu, Vattipally B., Alevoor, Vishwanath, Nagaraju, Javaregowda, Nagarajaram, Hampapathalu A. MICdb: database of prokaryotic microsatellites Nucl. Acids Res. 2003 31: 106-108 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/106 |
| 47. NPRD: Nucleosome Positioning Region Database |
URL: http://srs6.bionet.nsc.ru/srs6/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases Nucleosome Positioning Region Database (NPRD), which is compiling the available experimental data on locations and characteristics of nucleosome formation sites (NFSs), is the first curated NFS-oriented database. The object of the database is a single NFS described in an individual entry. When annotating results of NFS experimental mapping, we pay special attention to several important functional characteristics, such as the relationship between type of gene activity and nucleosome positioning, the influence of non-histone proteins on nucleosome formation, type of the variant of nucleosome positioning (translational or rotational), indication of tissue types and states of cell activity, description of experimental methods used and accuracy of nucleosome position determination, and the results of applying theoretical and computer methods to the analysis of contextual and conformational DNA properties. At present, the NPRD database contains 438 entries and integrates the data described in 124 original papers. The database URL: http://srs6.bionet.nsc.ru/srs6/. Then click the button ‘Databank’ and open the link NUCLEOSOME. Citation for the above abstract: Levitsky, Victor G., Katokhin, Aleksey V., Podkolodnaya, Olga A., Furman, Dagmara P., Kolchanov, Nikolay A. NPRD: Nucleosome Positioning Region Database Nucl. Acids Res. 2005 33: D67-70 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D67 |
| 48. PACRAT |
URL: http://www.biosci.ohio-state.edu/~pacrat/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases Analysis of intergenic sequences for purposes such as the investigation of transcriptional signals or the identification of small RNA genes is frequently complicated by traditional biological database structures. Genome data is commonly treated as chromosome-length sequence records, detailed by gene calls demarcating subsequences of the chromosomes. Given this model, the determination of non-called subsequences between any gene and its nearest neighbors requires an exhaustive search of all gene calls associated with the chromosome. Further compounding the issue, the location of intergenic regions for many called genes cannot be resolved unambiguously due to uncertainties in gene boundaries, as well as the presence of other conflicting gene calls. To address these difficulties we have constructed the PACRAT (http://www.biosci.ohio-state.edu/~pacrat/) database system. PACRAT preprocesses GenBank genome submissions, evaluates for every gene the character of its relationship to those genes nearest to it, and produces a relationally linked model of the gene ordering for the genome. Using this information, the interface allows the researcher to query gene data as well as intergenic sequence data based on a number of criteria. These include the ability to filter searches based on the status of start and stop positions, or upstream/downstream sequences as conflicting with called genes and automated extension of upstream or downstream searches to find probable operon promoters or terminators. The database is also indexed by KEGG classification, allowing, for example, functionally-related groups of high-quality promoter-containing regions to be easily retrieved as a group. Citation for the above abstract: Ray, William C., Daniels, Charles J. PACRAT: a database and analysis system for archaeal and bacterial intergenic sequence features Nucl. Acids Res. 2003 31: 109-113 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/109 |
| 49. PANDIT: Protein and Associated Nucleotide Domains with Inferred Trees |
URL: http://www.ebi.ac.uk/goldman-srv/pandit Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, Protein Domain and Protein Classification Databases PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phylogenetic methodology and the evolution of coding-DNA and protein sequences. Currently in version 17.0, PANDIT comprises 7738 families of homologous protein domains; for each family, DNA and corresponding amino acid sequence multiple alignments are available together with high quality phylogenetic tree estimates. Recent improvements include expanded methods for phylogenetic tree inference, assessment of alignment quality and a redesigned web interface, available at the URL http://www.ebi.ac.uk/goldman-srv/pandit. Citation for the above abstract: Whelan, Simon, de Bakker, Paul I. W., Quevillon, Emmanuel, Rodriguez, Nicolas, Goldman, Nick PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees Nucl. Acids Res. 2006 34: D327-331 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D327 |
| 50. xBASE |
URL: http://xbase.bham.ac.uk/ Categories: Prokaryote Databases The schema of the previously described Escherischia coli database coliBASE has been applied to a number of other bacterial taxa, under the collective name xBASE. The new databases include CampyDB for Campylobacter, Helicobacter and Wolinella; PseudoDB for pseudomonads; ClostriDB for clostridia; RhizoDB for Rhizobium and Sinorhizobium; and MycoDB, for Mycobacterium, Streptomyces and related organisms. The databases provide user friendly access to annotation and genome comparisons through a web-based graphical interface. Newly developed features include whole genome displays, ‘painting’ of genes according to properties such as GC content, a pattern search system to identify conserved motifs and batch BLAST searching of every protein encoded by a region. Examples of how the databases have been, and continue to be, used to generate hypotheses for subsequent laboratory investigation are presented. xBASE is available online at http://xbase.bham.ac.uk. Citation for the above abstract: Chaudhuri, Roy R., Pallen, Mark J. xBASE, a collection of online databases for bacterial comparative genomics Nucl. Acids Res. 2006 34: D335-337 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D335 |
| 51. RECODE: The Database of the Translational Recoding Events |
URL: http://recode.genetics.utah.edu/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases The RECODE database is a compilation of translational recoding events (programmed ribosomal frameshifting, codon redefinition and translational bypass). The database provides information about the genes utilizing these events for their expression, recoding sites, stimulatory sequences and other relevant information. The Database is freely available at http://recode.genetics.utah.edu/. Citation for the above abstract: Baranov, Pavel V., Gurvich, Olga L., Hammer, Andrew W., Gesteland, Raymond F., Atkins, John F. RECODE 2003 Nucl. Acids Res. 2003 31: 87-89 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/87 |
| 52. RefSeq: NCBI Reference Sequence |
URL: http://www.ncbi.nlm.nih.gov/RefSeq/ Categories: General Protein Sequence Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff. Citation for the above abstract: Pruitt, Kim D., Tatusova, Tatiana, Maglott, Donna R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins Nucl. Acids Res. 2005 33: D501-504 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D501 |
| 53. S/MARt DB: The S/MAR transaction DataBase |
URL: http://smartdb.bioinf.med.uni-goettingen.de/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases S/MARt DB, the S/MAR transaction database, is a relational database covering scaffold/matrix attached regions (S/MARs) and nuclear matrix proteins that are involved in the chromosomal attachment to the nuclear scaffold. The data are mainly extracted from original publications, but a World Wide Web interface for direct submissions is also available. S/MARt DB is closely linked to the TRANSFAC database on transcription factors and their binding sites. It is freely accessible through the World Wide Web (http://transfac.gbf.de/SMARtDB/) for non-profit research. Citation for the above abstract: Liebich, Ines, Bode, Jurgen, Frisch, Matthias, Wingender, Edgar S/MARt DB: a database on scaffold/matrix attached regions Nucl. Acids Res. 2002 30: 372-374 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/372 |
| 54. STRBase: Short Tandem Repeat DNA Internet DataBase |
URL: http://www.cstl.nist.gov/div831/strbase/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases The National Institute of Standards and Technology (NIST) has compiled and maintained a Short Tandem Repeat DNA Internet Database (http://www.cstl.nist.gov/biotech/strbase/) since 1997 commonly referred to as STRBase. This database is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. STRBase consolidates and organizes the abundant literature on this subject to facilitate on-going efforts in DNA typing. Observed alleles and annotated sequence for each STR locus are described along with a review of STR analysis technologies. Additionally, commercially available STR multiplex kits are described, published polymerase chain reaction (PCR) primer sequences are reported, and validation studies conducted by a number of forensic laboratories are listed. To supplement the technical information, addresses for scientists and hyperlinks to organizations working in this area are available, along with the comprehensive reference list of over 1300 publications on STRs used for DNA typing purposes. Citation for the above abstract: Ruitberg, Christian M., Reeder, Dennis J., Butler, John M. STRBase: a short tandem repeat DNA database for the human identity testing community Nucl. Acids Res. 2001 29: 320-322 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/320 |
| 55. The TIGR Plant Repeat Databases |
URL: http://www.tigr.org/tdb/e2k1/plant.repeats/ Categories: General Plant Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases In a number of higher plants, a substantial portion of the genome is composed of repetitive sequences that can hinder genome annotation and sequencing efforts. To better understand the nature of repetitive sequences in plants and provide a resource for identifying such sequences, we constructed databases of repetitive sequences for 12 plant genera: Arabidopsis, Brassica, Glycine, Hordeum, Lotus, Lycopersicon, Medicago, Oryza, Solanum, Sorghum, Triticum and Zea (www.tigr.org/tdb/e2k1/plant. repeats/index.shtml). The repetitive sequences within each database have been coded into super-classes, classes and sub-classes based on sequence and structure similarity. These databases are available for sequence similarity searches as well as downloadable files either as entire databases or subsets of each database. To further the utility for comparative studies and to provide a resource for searching for repetitive sequences in other genera within these families, repetitive sequences have been combined into four databases to represent the Brassicaceae, Fabaceae, Gramineae and Solanaceae families. Collectively, these databases provide a resource for the identification, classification and analysis of repetitive sequences in plants. Citation for the above abstract: Ouyang, Shu, Buell, C. Robin The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants Nucl. Acids Res. 2004 32: D360-363 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D360 |
| 56. UNIVEC |
URL: http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases "UniVec is a database that can be used to quickly identify segments within nucleic acid sequences which may be of vector origin (vector contamination). Screening using UniVec is efficient because a large number of redundant subsequences have been eliminated to create a database that contains only one copy of every unique sequence segment from a large number of vectors. In addition to vector sequences, UniVec also contains sequences for those adapters, linkers, and primers commonly used in the process of cloning cDNA or genomic DNA. This enables contamination with these oligonucleotide sequences to be found during the vector screen. UniVec can be obtained from the NCBI FTP directory: ftp://ftp.ncbi.nih.gov/pub/UniVec/." |
| 57. UTRdb/UTRsite |
URL: http://www.ba.itb.cnr.it/UTR/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, RNA Sequence Databases The 5' and 3' untranslated regions of eukaryotic mRNAs play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated (and also collated as the UTRsite database) and cross-links to genomic and protein data are provided. The integration of UTRdb with genomic and protein data has allowed the implementation of a powerful retrieval resource for the selection and extraction of UTR subsets based on their genomic coordinates and/or features of the protein encoded by the relevant mRNA (e.g. GO term, PFAM domain, etc.). All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://www.ba.itb.cnr.it/UTR/. Citation for the above abstract: Mignone, Flavio, Grillo, Giorgio, Licciulli, Flavio, Iacono, Michele, Liuni, Sabino, Kersey, Paul J., Duarte, Jorge, Saccone, Cecilia, Pesole, Graziano UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs Nucl. Acids Res. 2005 33: D141-146 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D141 |
| 58. VectorDB: Molecular Biology Vector Sequence Database |
URL: http://seq.yeastgenome.org/vectordb/ Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases "Space for VectorDB was provided by the Saccharomyces Genome Database (SGD) project. VectorDB contains annotations and sequence information for many vectors commonly used in molecular biology. Information for more than 2600 vectors is available with search facilities. Vectors which are also in GenBank have direct links to that database via NCBI's Entrez browser!" |
| 59. ASAP: the Alternative Splicing Annotation Project |
URL: http://www.bioinformatics.ucla.edu/ASAP/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases Recently, genomics analyses have demonstrated that alternative splicing is widespread in mammalian genomes (30–60% of genes reported to have multiple isoforms), and may be one of their most important mechanisms of functional regulation. However, by comparison with other genomics data such as genome annotation, SNPs, or gene expression, there exists relatively little database infrastructure for the study of alternative splicing. We have constructed an online database ASAP (the Alternative Splicing Annotation Project) for biologists to access and mine the enormous wealth of alternative splicing information coming from genomics and proteomics. ASAP is based on genome-wide analyses of alternative splicing in human (30 793 alternative splice relationships found) from detailed alignment of expressed sequences onto the genomic sequence. ASAP provides precise gene exon–intron structure, alternative splicing, tissue specificity of alternative splice forms, and protein isoform sequences resulting from alternative splicing. Moreover, it can help biologists design probe sequences for distinguishing specific mRNA isoforms. ASAP is intended to be a community resource for collaborative annotation of alternative splice forms, their regulation, and biological functions. The URL for ASAP is http://www.bioinformatics.ucla.edu/ASAP. Citation for the above abstract: Lee, Christopher, Atanelov, Levan, Modrek, Barmak, Xing, Yi ASAP: the Alternative Splicing Annotation Project Nucl. Acids Res. 2003 31: 101-105 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/101 |
| 60. ASD: Alternative Splicing Database |
URL: http://www.ebi.ac.uk/asd/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases Alternative splicing is an important regulatory mechanism of mammalian gene expression. The alternative splicing database (ASD) consortium is systematically collecting and annotating data on alternative splicing. We present the continuation and upgrade of the ASD [T. A. Thanaraj, S. Stamm, F. Clark, J. J. Riethoven, V. Le Texier, J. Muilu (2004) Nucleic Acids Res. 32, D64–D69] that consists of computationally and manually generated data. Its largest parts are AltSplice, a value-added database of computationally delineated alternative splicing events. Its data include alternatively spliced introns/exons, events, isoform splicing patterns and isoform peptide sequences. AltSplice data are generated by examining gene-transcript alignments. The data are annotated for various biological features including splicing signals, expression states, (SNP)-mediated splicing and cross-species conservation. AEdb forms the manually curated component of ASD. It is a literature-based data set containing sequence and properties of alternatively spliced exons, functional enumeration of observed splicing events, characterization of observed splicing regulatory elements, and a collection of experimentally clarified minigene constructs. ASD includes a workbench, which is an analysis tool that enables users to carry out splicing related analysis such as characterization of introns for various splicing signals, identification of splicing regulatory elements on a given RNA sequence, prediction of putative exons and prediction of putative translation start codons. The different ASD modules are integrated and can be accessed through user-friendly interfaces and visualization tools. ASD data has been integrated with Ensembl genome annotation project as a Distributed Annotation System (DAS) resource and can be viewed on Ensembl genome browser. The ASD resource is presented at (http://www.ebi.ac.uk/asd). Citation for the above abstract: Stamm, Stefan, Riethoven, Jean-Jack, Le Texier, Vincent, Gopalakrishnan, Chellappa, Kumanduri, Vasudev, Tang, Yesheng, Barbosa-Morais, Nuno L., Thanaraj, Thangavel Alphonse ASD: a bioinformatics resource on alternative splicing Nucl. Acids Res. 2006 34: D46-55 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D46 |
| 61. ASDB: Alternative Splicing Database |
URL: http://hazelton.lbl.gov/~teplitski/alt/ Categories: Human ORFs, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases Version 2.1 of ASDB (Alternative Splicing Data Base) contains 1922 protein and 2486 DNA sequences. The protein entries from SWISS-PROT are joined into clusters corresponding to alternatively spliced variants of one gene. The DNA division consists of complete genes with alternative splicing mentioned or annotated in GenBank. The search engine allows one to search over SWISS-PROT and GenBank fields and then follow the links to all variants. Citation for the above abstract: Dralyuk, I., Brudno, M., Gelfand, M. S., Zorn, M., Dubchak, I. ASDB: database of alternatively spliced genes Nucl. Acids Res. 2000 28: 296-297 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/296 |
| 62. ASHESdb: Alternatively Spliced Human genes by Exon Skipping - A Database |
URL: http://sege.ntu.edu.sg/wester/ashes/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases "Alternative splicing is the major contributor to protein diversity in human. Some genes can generate as many as thousand protein isoforms by alternative splicing. The mechanism of alternative splicing in normal and diseased states is perplexing. Differential joining of exons during alternative splicing is important in detecting genetic disorders. Alternative splicing is reported to regulate the sub-cellular localization of divalent metal transporter 1 isoforms and the NMDA R1 receptor gene. Therefore, a comprehensive knowledge on alternative splicing (mechanism and combinatorial protein diversity) is critical in efficient gene discovery and target validation. Alternative splicing can change the mRNA product in several ways. At its simplest level, an exon can be removed (exon skip), lengthened or shortened (alternative 5' or 3' splicing). However, identification of splice variants remains tricky and arduous mainly due to large intervening sequences and lack of tissue specific cDNA sequence data. As can be seen majority of currently known splice variants are identified using EST and EST coverage in the protein coding sequence of many genes is still inadequate to predict splicing to a large extent. Moreover, there are limitations in accuracy resulting from the single-pass sequencing that has been used to identify ESTs. In this database, we describe alternatively spliced (exon skipping) human genes identified strictly using full-length cDNA sequences (MGC). This novel approach makes the detection of splice variants more reliable and accurate. This circumvents the greatest challenges in using EST databases to understand alternative splicing and thereby facilitates the task of comprehending the relationships of these short EST sequences to each other and to other genes. The database integrates a variety of data for each gene ranging from gene map, gene structure, splice variants and tissue information. Information on mouse orthologs showing exon-skipping patterns for these genes is also provided. This database can be used to study the impact of alternative splicing on protein function and could be a useful resource to researchers who have found a new cDNA or human gene and wish to find additional information." |
| 63. EASED: Extended Alternatively Spliced EST Database |
URL: http://www.bioinf.mdc-berlin.de/splice/db/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases We established a database of alternative splice forms (ASforms) for nine eukaryotic organisms. ASforms are defined by comparing high-scoring ESTs with mRNA sequences using BLAST, taking known exon–intron information (from the Ensembl database). Filtering programs compare the ends of each aligned sequence pair for deletions or insertions in the EST sequence, which indicate the existence of alternative splice forms with respect to the exon–intron boundaries. Moreover, we defined the alternative splice profile of each human sequence. It indicates the number of alternatively spliced ESTs (NAE), the number of constitutively spliced ESTs (NCE) as well as the number of alternative splice sites (NSS) per mRNA. NAE and NCE correspond to the EST coverage and can be used as a quality indicator for the predicted alternative splice variants. The NSS value specifies the splice propensity of a gene. Additionally, the tissue type information of all ESTs was included. This allows (i) restriction of the search to certain tissues and (ii) calculation of the tissue-NAEs, tissue-NCEs and tissue-NSS. These scores are suitable for the estimation of tissue specificity of certain ASforms. Furthermore, the developmental stage and disease information of the ESTs is available. EASED is accessible at http://eased.bioinf.mdc-berlin.de/. Citation for the above abstract: Pospisil, Heike, Herrmann, Alexander, Bortfeldt, Ralf H., Reich, Jens G. EASED: Extended Alternatively Spliced EST Database Nucl. Acids Res. 2004 32: D70-74 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D70 |
| 64. ECgene: Gene Modeling with Alternative Splicing |
URL: http://genome.ewha.ac.kr/ECgene/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases ECgene provides annotation for gene structure, function and expression, taking alternative splicing events into consideration. The gene-modeling algorithm combines the genome-based expressed sequence tag (EST) clustering and graph-theoretic transcript assembly procedures. The website provides several viewers and applications that have many unique features useful for the analysis of the transcript structure and gene expression. The summary viewer shows the gene summary and the essence of other annotation programs. The genome browser and the transcript viewer are available for comparing the gene structure of splice variants. Changes in the functional domains by alternative splicing can be seen at a glance in the transcript viewer. We also provide two unique ways of analyzing gene expression. The SAGE tags deduced from the assembled transcripts are used to delineate quantitative expression patterns from SAGE libraries available publically. Furthermore, the cDNA libraries of EST sequences in each cluster are used to infer qualitative expression patterns. It should be noted that the ECgene website provides annotation for the whole transcriptome, not just the alternatively spliced genes. Currently, ECgene supports the human, mouse and rat genomes. The ECgene suite of tools and programs is available at http://genome.ewha.ac.kr/ECgene/. Citation for the above abstract: © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D75 |
| 65. EDAS: EST-Derived Alternative Splicing Database |
URL: http://www.genebee.msu.su/edas/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases |
| 66. ExInt: an Exon Intron Database |
URL: http://sege.ntu.edu.sg/wester/exint/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases The Exon/Intron Database (ExInt) stores information of all GenBank eukaryotic entries containing an annotated intron sequence. Data are available through a retrieval system, as flat-files and as a MySQL dump file. In this report we discuss several implementations added to ExInt, which is accessible at http://intron.bic.nus.edu.sg/exint/newexint/exint.html. Citation for the above abstract: Sakharkar, M., Passetti, F., de Souza, J. E., Long, M., de Souza, S. J. ExInt: an Exon Intron Database Nucl. Acids Res. 2002 30: 191-194 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/191 |
| 67. FESD: a Functional Element SNPs Database |
URL: http://combio.kribb.re.kr/FESD/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases We have created the Functional Element SNPs Database (FESD) that categorizes functional elements in human genic regions and provides a set of single nucleotide polymorphisms (SNPs) located within each area. In the FESD, the human genic regions were divided into 10 different functional elements, such as promoter regions, CpG islands, 5'-untranslated regions (5'-UTRs), translation start sites, splice sites, coding exons, introns, translation stop sites, polyadenylation signals and 3'-UTRs, and subsequently, all the known SNPs were assigned to each functional element at their respective position. With the FESD web interface, users can select a set of SNPs in the specific functional elements and get their flanking sequences for genotyping experiments, which will help in finding mutations that contribute to the common and polygenic diseases. A web interface for the FESD is freely available at http://combio.kribb.re.kr/ksnp/resd/. Citation for the above abstract: Kang, Hyo Jin, Choi, Kyoung Oak, Kim, Byung-Dong, Kim, Sangsoo, Kim, Young Joo FESD: a Functional Element SNPs Database in human Nucl. Acids Res. 2005 33: D518-522 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D518 |
| 68. FUGOID: Functional Genomics of Organellar Introns Database |
URL: http://web.austin.utexas.edu/fugoid/introndata/main.htm Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases, Organelle Databases FUGOID is a web-based, taxonomically broad organelle intron database that collects and integrates various functional and structural data on organellar (mitochondrial and chloroplast) introns. The main information provided by FUGOID includes intron sequence, subclass, resident ORF, self-splicing capability, host gene, protein factor(s) involved in splicing, mobility, insertion site, twintron, seminal references and taxonomic position of host organism. It is implemented in a relational database management system, allowing sophisticated, user-friendly searching, data entry and revision. Users can access the database by any common web browser using a variety of operating systems. The main page of the database is available at http://wnt.cc.utexas.edu/~ifmr530/introndata/main.htm. Citation for the above abstract: Li, Fei, Herrin, David L. FUGOID: functional genomics of organellar introns database Nucl. Acids Res. 2002 30: 385-386 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/385 |
| 69. HS3D: Homo Sapiens Splice Sites Dataset |
URL: http://www.sci.unisannio.it/docenti/rampone/ Categories: Human ORFs, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases "HS3D (Homo Sapiens Splice Sites Dataset) is a data set of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank Rel.123. The aim of this data set is to give standardized material to train and to assess the prediction accuracy of computational approaches for gene identification and characterization. From the complete GenBank (Primate Sequences Division) Rel.123 (162,557 entries), entries of Human Nuclear DNA including Complete CDS and more than one Exon have been selected, and 4523 exons and 3802 introns have been extracted from these entries. Details about extracted exons and introns are reported (Locus, number, Start and End position in the entry, sequence, length, G+C content, presence of not AGCT data (nucleotide scan check)). Statistics are also reported (overall nucleotides, average G+C content, nucleotide scan check results, number of not GT starting / AG ending introns, minimum / maximum / average length, length standard deviation) ." |
| 70. The Intronerator |
URL: http://www.cse.ucsc.edu/~kent/intronerator/ Categories: Invertebrate Databases, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases, RNA Sequence Databases The Intronerator (http://www.cse.ucsc.edu/~kent/intronerator/ ) is a set of web-based tools for exploring RNA splicing and gene structure in Caenorhabditis elegans. It includes a display of cDNA alignments with the genomic sequence, a catalog of alternatively spliced genes and a database of introns. The cDNA alignments include >100 000 ESTs and almost 1000 full-length cDNAs. ESTs from embryos and mixed stage animals as well as full-length cDNAs can be compared in the alignment display with each other and with predicted genes. The alt-splicing catalog includes 844 open reading frames for which there is evidence of alternative splicing of pre-mRNA. The intron database includes 28 478 introns, and can be searched for patterns near the splice junctions. Citation for the above abstract: Kent, W. James, Zahler, Alan M. The Intronerator: exploring introns and alternative splicing in Caenorhabditis elegans Nucl. Acids Res. 2000 28: 91-93 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/91 |
| 71. SpliceDB |
URL: http://www.softberry.com/berry.phtml?topic=splicedb&group=data&subgroup=spldb Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT–AG junctions (22 199 entries) and 0.56% have non-canonical GC–AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC–AG pairs (of which one was an error that corrected to GC–AG), 61 errors corrected to GT–AG canonical pairs, six AT–AC pairs (of which two were errors corrected to AT–AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac.uk/spldb/SpliceDB.html and at http://www.softberry.com/spldb/SpliceDB.html. Citation for the above abstract: Burset, M., Seledtsov, I. A., Solovyev, V. V. SpliceDB: database of canonical and non-canonical mammalian splice sites Nucl. Acids Res. 2001 29: 255-259 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/255 |
| 72. SpliceInfo: An Information Repository for mRNA Alternative Splicing in Human Genome |
URL: http://spliceinfo.mbc.nctu.edu.tw/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases We have developed an information repository named SpliceInfo to collect the occurrences of the four major alternative-splicing (AS) modes in human genome; these include exon skipping, 5'-alternative splicing, 3'-alternative splicing and intron retention. The dataset is derived by comparing the nucleotide and protein sequences available for a given gene for evidence of AS. Additional features such as the tissue specificity of the mRNA, the protein domain contained by exons, the GC-ratio of exons, the repeats contained within the exons, and the Gene Ontology are annotated computationally for each exonic region that is alternatively spliced. Motivated by a previous investigation of AS-related motifs such as exonic splicing enhancer and exonic splicing silencer, this resource also provides a means of identifying motifs candidates and this should help to identify potential regulatory mechanisms within a particular exonic sequence set and its two flanking intronic sequence sets. This is carried out using motif discovery tools to identify motif candidates related to alternative splicing regulation and together with a secondary structure prediction tool, will help in the identification of the structural properties of such regulatory motifs. The integrated resource is now available on http://SpliceInfo.mbc.NCTU.edu.tw/. Citation for the above abstract: Huang, Hsien-Da, Horng, Jorng-Tzong, Lin, Feng-Mao, Chang, Yu-Chung, Huang, Chen-Chia SpliceInfo: an information repository for mRNA alternative splicing in human genome Nucl. Acids Res. 2005 33: D80-85 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D80 |
| 73. SpliceNest |
URL: http://splicenest.molgen.mpg.de/ Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases We have integrated the protein families from SYSTERS and the expressed sequence tag (EST) clusters from our database GeneNest with SpliceNest, a new database mapping EST contigs into genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT, TrEMBL and PIR databases into disjoint protein family and superfamily clusters. GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human, mouse, Arabidopsis thaliana and zebrafish. SpliceNest is a web-based graphical tool to explore gene structure, including alternative splicing, based on a mapping of the EST consensus sequences from GeneNest to the complete human genome. The integration of SYSTERS, GeneNest and SpliceNest into one framework now permits an overall exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The databases are available for querying and browsing at http://cmb.molgen.mpg.de. Citation for the above abstract: Krause, Antje, Haas, Stefan A., Coward, Eivind, Vingron, Martin SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein Nucl. Acids Res. 2002 30: 299-300 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/299 |
| 74. Xpro: Database of Eukaryotic Protein Encoding Genes |
URL: http://origin.bic.nus.edu.sg/xpro/ Categories: Model Organisms and Comparative Genomics Databases, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases Xpro is a relational database that contains all the eukaryotic protein-encoding DNA sequences contained in GenBank with associated data required for the analysis of eukaryotic gene architecture. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein-coding regions, Xpro provides annotations on the splice sites and intron phases. Furthermore, Xpro validates intron positions using alignment information between the record’s sequence and EST sequences found in dbEST. In the process of validation, alternative splicing information is also obtained and can be found in the database. The intron-containing genes in the Xpro are also classified as experimental or predicted based on the intron position validation and specific keywords in the GenBank records that are present in predicted genes. An Entrez-like query system, which is familiar to most biologists, is provided for accessing the information present in the database system. A non-redundant set of Xpro database contents is also obtained by cross-referencing to the Swiss-Prot/TrEMBL and Pfam databases. The database currently contains information for 493 983 genes—351 918 intron- containing genes and 142 065 intron-less genes. Xpro is updated for each new GenBank release and is freely available via the internet at http://origin.bic. nus.edu.sg/xpro. Citation for the above abstract: Gopalan, Vivek, Tan, Tin Wee, Lee, Bernett T. K., Ranganathan, Shoba Xpro: database of eukaryotic protein-encoding genes Nucl. Acids Res. 2004 32: D59-63 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D59 |
| 75. Ares lab Yeast Intron Database |
URL: http://www.cse.ucsc.edu/research/compbio/yeast_introns.html Categories: Fungal Genome Databases, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases "This site contains information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. Introns present special problems for the annotation of eukaryotic genomes. Splice sites are information-poor, and their recognition by the splicing apparatus is highly context-dependent and regulated, making identification by computational gene prediction programs a challenge. At present we do not understand splice site context well enough to predict which potential splice sites will be used, and thus how the genomic sequences will be expressed. Understanding the how and why of introns will require genome level information about splicing. One element of this will involve understanding splicing patterns and how they are regulated globally. Another element will involve understanding how splicing patterns change during evolution. To begin we study yeast, since it has the simplest known eukaryotic genome. In these pages we have listed known spliceosomal introns in the yeast genome and documented the splice sites actually used. Through the use of microarrays designed to monitor splicing, we are beginning to identify and analyze splice site context in terms of the nature and activities of the trans-acting factors that mediate splice site recognition. In this edition (version 3.0), we include expression data that relates to the efficiency of splicing relative to other processes in strains of yeast lacking nonessential splicing factors. These data are displayed on each intron page for browsing and can be downloaded for other types of analysis." |
| 76. AAindex: Amino Acid Index Database |
URL: http://www.genome.jp/aaindex/ Categories: Protein Property Databases AAindex is a database of amino acid indices and amino acid mutation matrices. An amino acid index is a set of 20 numerical values representing various physico-chemical and biochemical properties of amino acids. An amino acid mutation matrix is generally 20 x 20 numerical values representing similarity of amino acids. AAindex consists of two sections: AAindex1 for the collection of published amino acid indices and AAindex2 for the collection of published amino acid mutation matrices. Each entry of either AAindex1 or AAindex2 consists of the definition, the reference information, a list of related entries in terms of the correlation coefficient and the actual data. The database may be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.ad.jp/aaindex/ ) or may be downloaded by anonymous FTP (ftp://ftp.genome.ad.jp/db/genomenet/aaindex/ ). Citation for the above abstract: Kawashima, Shuichi, Kanehisa, Minoru AAindex: Amino Acid index database Nucl. Acids Res. 2000 28: 374- © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/374 |
| 77. PFD: Protein Folding Database |
URL: http://pfd.med.monash.edu.au/ Categories: Protein Property Databases, Protein Structure Databases We have developed a new database that collects all protein folding data into a single, easily accessible public resource. The Protein Folding Database (PFD) contains annotated structural, methodological, kinetic and thermodynamic data for more than 50 proteins, from 39 families. A user-friendly web interface has been developed that allows powerful searching, browsing and information retrieval, whilst providing links to other protein databases. The database structure allows visualization of folding data in a useful and novel way, with a long-term aim of facilitating data mining and bioinformatics approaches. PFD can be accessed freely at http://pfd.med.monash.edu.au. Citation for the above abstract: Fulton, Kate F., Devlin, Glyn L., Jodun, Rachel A., Silvestri, Linda, Bottomley, Stephen P., Fersht, Alan R., Buckle, Ashley M. PFD: a database for the investigation of protein folding kinetics and stability Nucl. Acids Res. 2005 33: D279-283 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D279 |
| 78. ProTherm Thermodynamic Database for Proteins and Mutants |
URL: http://gibk26.bse.kyutech.ac.jp/jouhou/protherm/protherm.html Categories: Protein Property Databases ProTherm and ProNIT are two thermodynamic databases that contain experimentally determined thermodynamic parameters of protein stability and protein-nucleic acid interactions, respectively. The current versions of both the databases have considerably increased the total number of entries and enhanced search interface with added new fields, improved search, display and sorting options. As on September 2005, ProTherm release 5.0 contains 17,113 entries from 771 proteins, retrieved from 1497 scientific articles (approximately 20% increase in data from the previous version). ProNIT release 2.0 contains 4900 entries from 273 research articles, representing 158 proteins. Both databases can be queried using WWW interfaces. Both quick search and advanced search are provided on this web page to facilitate easy retrieval and display of the data from these databases. ProTherm is freely available online at http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html and ProNIT at http://gibk26.bse.kyutech.ac.jp/jouhou/pronit/pronit.html. Citation for the above abstract: Kumar, M. D. Shaji, Bava, K. Abdulla, Gromiha, M. Michael, Prabakaran, Ponraj, Kitajima, Koji, Uedaira, Hatsuho, Sarai, Akinori ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions Nucl. Acids Res. 2006 34: D204-206 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D204 |
| 79. REFOLD: a Database for Protein Renaturation |
URL: http://refold.med.monash.edu.au/ Categories: Protein Property Databases A large proportion of proteins expressed in Escherichia coli form inclusion bodies and thus require renaturation to attain a functional conformation for analysis. In this process, identifying and optimizing the refolding conditions and methodology is often rate limiting. In order to address this problem, we have developed REFOLD, a web-accessible relational database containing the published methods employed in the refolding of recombinant proteins. Currently, REFOLD contains >300 entries, which are heavily annotated such that the database can be searched via multiple parameters. We anticipate that REFOLD will continue to grow and eventually become a powerful tool for the optimization of protein renaturation. REFOLD is freely available at http://refold.med.monash.edu.au. Citation for the above abstract: Chow, Michelle K. M., Amin, Abdullah A., Fulton, Kate F., Fernando, Thushan, Kamau, Lawrence, Batty, Chris, Louca, Michael, Ho, Storm, Whisstock, James C., Bottomley, Stephen P., Buckle, Ashley M. The REFOLD database: a tool for the optimization of protein expression and refolding Nucl. Acids Res. 2006 34: D207-212 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D207 |
| 80. LOCATE: a mouse protein subcellular localization database |
URL: http://locate.imb.uq.edu.au/ Categories: Protein Localization and Targeting Databases We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for 40% of the mouse proteome. It is available at http://locate.imb.uq.edu.au. Citation for the above abstract: Fink, J. Lynn, Aturaliya, Rajith N., Davis, Melissa J., Zhang, Fasheng, Hanson, Kelly, Teasdale, Melvena S., Kai, Chikatoshi, Kawai, Jun, Carninci, Piero, Hayashizaki, Yoshihide, Teasdale, Rohan D. LOCATE: a mouse protein subcellular localization database Nucl. Acids Res. 2006 34: D213-217 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D213 |
| 81. Proteome 2D-PAGE Database |
URL: http://www.mpiib-berlin.mpg.de/2D-PAGE/ Categories: Proteomics Databases "The Proteome 2D-PAGE Database is a curated database for storing and investigating proteomics data. The database currently contains about 2.500 identified spots and about 300 mass peaklists in 18 reference maps representing experiments from 13 different organisms." |
| 82. Biozon |
URL: http://biozon.org/ Categories: Protein Domain and Protein Classification Databases, Proteomics Databases Biological entities are strongly related and mutually dependent on each other. Therefore, there is a growing need to corroborate and integrate data from different resources and aspects of biological systems in order to analyze them effectively. Biozon is a unified biological database that integrates heterogeneous data types such as proteins, structures, domain families, protein-protein interactions and cellular pathways, and establishes the relationships between them. All data are integrated on to a single graph schema centered around the non-redundant set of biological objects that are shared by each source. This integration results in a highly connected graph structure that provides a more complete picture of the known context of a given object that cannot be determined from any one source. Currently, Biozon integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32,000 protein structures, 150,000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND. Biozon augments source data with locally derived data such as 5 billion pairwise protein alignments and 8 million structural alignments. The user may form complex cross-type queries on the graph structure, add similarity relations to form fuzzy queries and rank the results based on analysis of the edge structure similar to Google PageRank, online at Biozon.org. Citation for the above abstract: Birkland, Aaron, Yona, Golan BIOZON: a hub of heterogeneous biological data Nucl. Acids Res. 2006 34: D235-242 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D235 |
| 83. DynaProt 2D: Proteome Database of Lactococcus lactis |
URL: http://www.wzw.tum.de/proteomik/lactis/ Categories: Proteomics Databases DynaProt 2D presents an advanced online database for dynamic access to proteomes and two-dimensional (2D) gels. The database was designed to administer complete in silico proteomes and links them with experimental proteomic data in the manner of 2D electrophoresis gels (IPG-Dalt). The 2D gels serve as reference maps in 2D gel analysis as well as tools for navigation of the database to switch between experimental and predicted data. Therefore, all identified spots in the gels are clickable and linked with summarized protein information. The protein information tables contain calculated characteristics, which are often used in proteomics, such as the molecular weight, isoelectric point, codon adaptation index, grand average of hydropathicity, etc. The design of the database permits online extension of gel data and protein attributes without knowledge of any software language. Besides navigation via 2D gels, the clear graphical user interface permits quick and intuitive searching throughout complete proteomes and supports, e.g. the search for proteins with isoelectric points within pH ranges of interest or protein classes (e.g. ribosomal proteins or transporters). The first organism implemented in the database is Lactococcus lactis. The database is available at www.wzw.tum.de/proteomik/lactis. Citation for the above abstract: Drews, Oliver, Gorg, Angelika DynaProt 2D: an advanced proteomic database for dynamic online access to proteomes and two-dimensional electrophoresis gels Nucl. Acids Res. 2005 33: D583-587 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D583 |
| 84. GELBANK |
URL: http://gelbank.anl.gov/ Categories: Proteomics Databases GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel patterns of proteomes from organisms with known genome information (available at http://gelbank.anl.gov and ftp://bioinformatics.anl.gov/gelbank/). Currently it includes 131 completed, mostly microbial proteomes available from the National Center for Biotechnology Information. A web interface allows the upload of 2D gel patterns and their annotation for registered users. The images are organized by species, tissue type, separation method, sample type and staining method. The database can be queried based on protein or 2DE-pattern attributes. A web interface allows registered users to assign molecular weight and pH gradient profiles to their own 2D gel patterns as well as to link protein identifications to a given spot on the pattern. The website presents all of the submitted 2D gel patterns where the end-user can dynamically display the images or parts of images along with molecular weight, pH profile information and linked protein identification. A collection of images can be selected for the creation of animations from which the user can select sub-regions of interest and unlimited 2D gel patterns for visualization. The website currently presents 233 identifications for 81 gel patterns for Homo sapiens, Methanococcus jannaschii, Pyro coccus furiosus, Shewanella oneidensis, Escherichia coli and Deinococcus radiodurans. Citation for the above abstract: Babnigg, Gyorgy, Giometti, Carol S. GELBANK: a database of annotated two-dimensional gel electrophoresis patterns of biological systems with completed genomes Nucl. Acids Res. 2004 32: D582-585 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D582 |
| 85. OPD: Open Proteomics Database |
URL: http://bioinformatics.icmb.utexas.edu/OPD/ Categories: Proteomics Databases "OPD is a public database for storing and disseminating mass spectrometry based proteomics data. The database currently contains roughly 1,200,000 spectra representing experiments from 4 different organisms." |
| 86. PEP: Predictions for Entire Proteomes |
URL: http://cubic.bioc.columbia.edu/pep/ Categories: General Genomics Databases, Proteomics Databases PEP is a database of Predictions for Entire Proteomes. The database contains summaries of analyses of protein sequences from a range of organisms representing all three major kingdoms of life: eukaryotes, prokaryotes and archaea. All proteins publicly available for organisms were aligned against SWISS-PROT, TrEMBL and PDB. Additionally, the following annotations are provided: secondary structure, transmembrane helices, coiled coils, regions of low complexity, signal peptides, PROSITE motifs, nuclear localization signals and classes of cellular function. Proteins that contain long regions without regular secondary structure are also identified. We have produced a related database of structural domain-like fragments derived from PEP and clusters based on homology between all fragments. The PEP database, fragments and clusters are distributed freely as a set of flat files and have been integrated into SRS. The PEP group of databases can be accessed from: http://cubic.bioc.columbia.edu/pep. Citation for the above abstract: Carter, Phil, Liu, Jinfeng, Rost, Burkhard PEP: Predictions for Entire Proteomes Nucl. Acids Res. 2003 31: 410-413 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/410 |
| 87. plantMarkers: a Database of Predicted Molecular Markers |
URL: http://markers.btk.fi/ Categories: Proteomics Databases Molecular markers are required in a broad spectrum of gene screening approaches, ranging from gene-mapping within traditional ‘forward’-genetics approaches through QTL identification studies to genotyping and haplotyping studies. As we enter the post-genomics era, the need for genetic markers does not diminish, even in the species with fully sequenced genomes. PlantMarkers is a genetic marker database that contains a comprehensive pool of predicted molecular markers. We have adopted contemporary techniques to identify putative single nucleotide polymorphism (SNP), simple sequence repeat (SSR) and conserved orthologue set markers. A systematic approach to identify as broad a range of putative markers has been undertaken by screening the available openSputnik unigene consensus sequences from over 50 plant species. A web presence at http://markers.btk.fi provides functionality so that a user may search for species-specific markers on the basis of many specific criteria not limited to non-synonymous SNPs segregating between different varieties or measured polymorphic SSRs. Feedback forms are provided with all sequence entries to enable inclusion of, for example, map location for markers validated by the research community. Citation for the above abstract: Rudd, Stephen, Schoof, Heiko, Mayer, Klaus PlantMarkers--a database of predicted molecular markers from plants Nucl. Acids Res. 2005 33: D628-632 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D628 |
| 88. RESID Database |
URL: http://www-nbrf.georgetown.edu/pirwww/dbinfo/resid.html Categories: Protein Structure Databases, Proteomics Databases The RESID Database is a comprehensive collection of annotations and structures for protein pre-, co- and post-translational modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link modifications. The RESID Database includes: systematic and alternate names, atomic formulas and masses, enzyme activities generating the modifications, keywords, literature citations, Gene Ontology cross-references, Protein Information Resource (PIR) and SWISS-PROT protein sequence database feature table annotations, structure diagrams and molecular models. This database is freely accessible on the Internet through the European Bioinformatics Institute at http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-page+LibInfo+-lib+RESID, through the National Cancer Institute — Frederick Advanced Biomedical Computing Center at http://www.ncifcrf.gov/RESID, or through the Protein Information Resource at http://pir.georgetown.edu/pirwww/dbinfo/resid.html. Citation for the above abstract: Garavelli, John S. The RESID Database of Protein Modifications: 2003 developments Nucl. Acids Res. 2003 31: 499-501 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/499 |
| 89. SWISS-2DPAGE: Two-dimensional Polyacrylamide Gel Electrophoresis Database |
URL: http://www.expasy.org/ch2d/ Categories: Proteomics Databases SWISS-2DPAGE (http://www.expasy.ch/ch2d/ ) is an annotated two-dimensional polyacrylamide gel electrophoresis (2-DE) database established in 1993. The current release contains 24 reference maps from human and mouse biological samples, as well as from Saccharomyces cerevisiae, Escherichia coli and Dictyostelium discoideum origin. These reference maps have now 2824 identified spots, corresponding to 614 separate protein entries in the database, in addition to virtual entries for each SWISS-PROT sequence or any user-entered amino acids sequence. Last year improvements in the SWISS-2DPAGE database are as follows: three new maps have been created and several others have been updated; cross-references to newly built federated 2-DE databases have been added; new functions to access the data have been provided through the ExPASy proteomics server. Citation for the above abstract: Hoogland, Christine, Sanchez, Jean-Charles, Tonella, Luisa, Binz, Pierre-Alain, Bairoch, Amos, Hochstrasser, Denis F., Appel, Ron D. The 1999 SWISS-2DPAGE database update Nucl. Acids Res. 2000 28: 286-288 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/286 |
| 90. BRENDA: BRaunschweig ENzyme DAtabase |
URL: http://www.brenda.uni-koeln.de/ Categories: Enzyme and Enzyme Nomenclature Databases BRENDA (BRaunschweig ENzyme DAtabase) represents a comprehensive collection of enzyme and metabolic information, based on primary literature. The database contains data from at least 83,000 different enzymes from 9800 different organisms, classified in approximately 4200 EC numbers. BRENDA includes biochemical and molecular information on classification and nomenclature, reaction and specificity, functional parameters, occurrence, enzyme structure, application, engineering, stability, disease, isolation and preparation, links and literature references. The data are extracted and evaluated from approximately 46,000 references, which are linked to PubMed as long as the reference is cited in PubMed. In the past year BRENDA has undergone major changes including a large increase in updating speed with >50% of all data updated in 2002 or in the first half of 2003, the development of a new EC-tree browser, a taxonomy-tree browser, a chemical substructure search engine for ligand structure, the development of controlled vocabulary, an ontology for some information fields and a thesaurus for ligand names. The database is accessible free of charge to the academic community at http://www.brenda. uni-koeln.de. Citation for the above abstract: Schomburg, Ida, Chang, Antje, Ebeling, Christian, Gremse, Marion, Heldt, Christian, Huhn, Gregor, Schomburg, Dietmar BRENDA, the enzyme database: updates and major new developments Nucl. Acids Res. 2004 32: D431-433 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D431 |
| 91. ENZYME: Enzyme Nomenclature Database |
URL: http://www.expasy.org/enzyme/ Categories: Enzyme and Enzyme Nomenclature Databases The ENZYME database is a repository of information related to the nomenclature of enzymes. In recent years it has became an indispensable resource for the development of metabolic databases. The current version contains information on 3705 enzymes. It is available through the ExPASy WWW server (http://www.expasy.ch/enzyme/ ). Citation for the above abstract: Bairoch, Amos The ENZYME database in 2000 Nucl. Acids Res. 2000 28: 304-305 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/304 |
| 92. Enzyme Nomenclature |
URL: http://www.chem.qmul.ac.uk/iubmb/enzyme/ Categories: Enzyme and Enzyme Nomenclature Databases "The complete contents of Enzyme Nomenclature, 1992 (plus subsequent supplements and other changes) are listed below in enzyme number order giving just the recommended name. Each entry provides a link to details of that enzyme. Alternatively if looking for a specific reaction used in the classification of enzymes the broad outline defined by the first two numbers are given below. Each of these subclass entries is linked to a location where the category is subdivided to sub-subclasses. These in turn are linked to a list of recommended names for each enzyme in the sub-subclass." |
| 93. IntEnz |
URL: http://www.ebi.ac.uk/intenz/index.html Categories: Enzyme and Enzyme Nomenclature Databases IntEnz is the name for the Integrated relational Enzyme database and is the official version of the Enzyme Nomenclature. The Enzyme Nomenclature comprises recommendations of the Nomenclature Committee of the International Union of Bio chemistry and Molecular Biology (NC-IUBMB) on the nomenclature and classification of enzyme-catalysed reactions. IntEnz is supported by NC-IUBMB and contains enzyme data curated and approved by this committee. The database IntEnz is available at http://www.ebi.ac.uk/intenz. Citation for the above abstract: Fleischmann, Astrid, Darsow, Michael, Degtyarenko, Kirill, Fleischmann, Wolfgang, Boyce, Sinead, Axelsen, Kristian B., Bairoch, Amos, Schomburg, Dietmar, Tipton, Keith F., Apweiler, Rolf IntEnz, the integrated relational enzyme database Nucl. Acids Res. 2004 32: D434-437 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D434 |
| 94. PDBrtf |
URL: http://cgl.imim.es/pdbrtf/ Categories: Enzyme and Enzyme Nomenclature Databases "Representativity of Target Families in the Protein Data Bank." |
| 95. PRECISE: Predicted and Consensus Interaction Sites in Enzymes |
URL: http://precise.bu.edu/precisedb/ Categories: Enzyme and Enzyme Nomenclature Databases PRECISE (Predicted and Consensus Interaction Sites in Enzymes) is a database of interactions between the amino acid residues of an enzyme and its ligands (substrate and transition state analogs, cofactors, inhibitors and products). It is available online at http://precise.bu.edu/. In the current version, all information on interactions is extracted from the enzyme–ligand complexes in the Protein Data Bank (PDB) by performing the following steps: (i) clustering homologous enzyme chains such that, in each cluster, the proteins have the same EC number and all sequences are similar; (ii) selecting a representative chain for each cluster; (iii) selecting ligand types; (iv) finding non-bonded interactions and hydrogen bonds; and (v) summing the interactions for all chains within the cluster. The output of the search is the color-coded sequence of the representative. The colors indicate the total number of interactions found at each amino acid position in all chains of the cluster. Clicking on a residue displays a detailed list of interactions for that residue. Optional filters allow restricting the output to selected chains in the cluster, to non-bonded or hydrogen bonding interactions, and to selected ligand types. The binding site information is essential for understanding and altering substrate specificity and for the design of enzyme inhibitors. Citation for the above abstract: Sheu, Shu-Hsien, Lancia, David R., Jr, Clodfelter, Karl H., Landon, Melissa R., Vajda, Sandor PRECISE: a Database of Predicted and Consensus Interaction Sites in Enzymes Nucl. Acids Res. 2005 33: D206-211 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D206 |
| 96. SCOPEC: a Database of Protein Catalytic Domains |
URL: http://www.enzome.com/databases/scopec.php Categories: Enzyme and Enzyme Nomenclature Databases MOTIVATION: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain-function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. RESULTS: SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain-EC relationships in SCOPEC will gain a marked improvement over this base line. AVAILABILITY: The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com Citation for the above abstract: Richard A. George , Ruth V. Spriggs , Janet M. Thornton , Bissan Al-Lazikani , and Mark B. Swindells SCOPEC: a database of protein catalytic domains Bioinformatics 20: i130-i136. © 2004 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/20/suppl_1/i130 |
| 97. TECRDB: Thermodynamics of Enzyme-catalyzed Reactions Database |
URL: http://xpdb.nist.gov/enzyme_thermodynamics/ Categories: Enzyme and Enzyme Nomenclature Databases Summary: The Thermodynamics of Enzyme-catalyzed Reactions Database (TECRDB) is a comprehensive collection of thermodynamic data on enzyme-catalyzed reactions. The data, which consist of apparent equilibrium constants and calorimetrically determined molar enthalpies of reaction, are the primary experimental results obtained from thermodynamic studies of biochemical reactions. The results from 1000 published papers containing data on 400 different enzyme-catalyzed reactions constitute the essential information in the database. The information is managed using Oracle and is available on the Web. Citation for the above abstract: Robert N. Goldberg , Yadu B. Tewari , and Talapady N. Bhat Thermodynamics of enzyme-catalyzed reactions—a database for quantitative biochemistry Bioinformatics Advance Access published on November 1, 2004, DOI 10.1093/bioinformatics/bth314. Bioinformatics 20: 2874-2877. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/16/2874 |
| 98. BioCarta: Charting Pathways of Life |
URL: http://www.biocarta.com/genes/ Categories: Intermolecular Interactions and Signaling Pathways Databases, Metabolic Pathway Databases "Observe how genes interact in dynamic graphical models. Our online maps depict molecular relationships from areas of active research. In an "open source" approach, this community-fed forum constantly integrates emerging proteomic information from the scientific community. It also catalogs and summarizes important resources providing information for over 120,000 genes from multiple species. Find both classical pathways as well as current suggestions for new pathways." |
| 99. BioCyc |
URL: http://biocyc.org/ Categories: Metabolic Pathway Databases "The BioCyc collection of databases provides electronic reference sources on the pathways and genomes of different organisms. Currently, detailed organism-specific databases are available for 14 species. In addition, the MetaCyc metabolic pathway database contains literature-derived metabolic pathway data for 160 species. Scientists can use BioCyc databases to visualize the layout of genes within a chromosome, or of an individual biochemical reaction, or of a complete biochemical pathway. The structures of chemical compounds can be displayed in pathways and reactions. The navigation capabilities of the software allow a user to move from a display of an enzyme to a display of a reaction that the enzyme catalyzes, or to the gene that encodes the enzyme. The interface supports a variety of queries, such as generating a display of the map positions of all genes that code for enzymes within a given biochemical pathway. As well as being used as a reference source to look up individual facts, BioCyc databases support computational studies of the metabolism, such as design of novel biochemical pathways for biotechnology, studies of the evolution of metabolic pathways, and simulation of metabolic pathways. BioCyc is linked to other biological databases containing protein and nucleic-acid sequence data, bibliographic data, protein structures, and descriptions of different strains." |
| 100. BioSilico: An Integrated Metabolic Database System |
URL: http://biosilico.kaist.ac.kr/ Categories: Metabolic Pathway Databases BioSilico is a web-based database system that facilitates the search and analysis of metabolic pathways. Heterogeneous metabolic databases including LIGAND, ENZYME, EcoCyc and MetaCyc are integrated in a systematic way, thereby allowing users to efficiently retrieve the relevant information on enzymes, biochemical compounds and reactions. In addition, it provides well-designed view pages for more detailed summary information. BioSilico is developed as an extensible system with a robust systematic architecture. Citation for the above abstract: Bo Kyeng Hou , Jin Sik Kim , Ji Hoon Jun , Dong-Yup Lee , Yong Wook Kim , Sujin Chae , Mira Roh , Yong-Ho In , and Sang Yup Lee BioSilico: an integrated metabolic database system Bioinformatics Advance Access published on November 22, 2004, DOI 10.1093/bioinformatics/bth363. Bioinformatics 20: 3270-3272. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/17/3270 |
| 101. BRITE: Biomolecular Relations in Information Transmission and Expression |
URL: http://www.genome.jp/brite/ Categories: Metabolic Pathway Databases "BRITE is a database of binary relations for network computation and logical reasoning involving genes, proteins, and other biological molecules. It contains diverse sets of binary relations, including the generalized protein interactions that underlie the KEGG pathway diagrams, systematic experimental data on protein-protein interactions by yeast two-hybrid systems, expression similarity relations by microarray gene expression profiles, cross-reference links between database entries, and parent-child relations in the hierarchies of terminology (ontologies). The BRITE project is supported by the Institute for Bioinformatics Research and Develpment (BIRD) of the Japan Science and Technology Agency (JST) and also by a Grant-in-Aid for Scientific Research in Priority Areas from the Ministry of Education, Culture, Sports, Science and Technology (MEXT)." |
| 102. BSD: the Biodegradative Strain Database |
URL: http://bsd.cme.msu.edu/bsd/index.html Categories: Drug and Drug Design Databases, Metabolic Pathway Databases The Biodegradative Strain Database (BSD) is a freely-accessible, web-based database providing detailed information on degradative bacteria and the hazardous substances that they degrade, including corresponding literature citations, relevant patents and links to additional web-based biological and chemical data. The BSD (http://bsd.cme.msu.edu) is being developed within the phylogenetic framework of the Ribosomal Database Project II (RDPII: http://rdp.cme.msu.edu/html) to provide a biological complement to the chemical and degradative pathway data of the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD: http://umbbd.ahc.umn.edu). Data is accessible through a series of strain, chemical and reference lists or by keyword search. The web site also includes on-line data submission and user survey forms to solicit user contributions and suggestions. The current release contains information on over 250 degradative bacterial strains and 150 hazardous substances. The transformation of xenobiotics and other environmentally toxic compounds by microorganisms is central to strategies for biocatalysis and the bioremediation of contaminated environments. However, practical, comprehensive, strain-level information on biocatalytic/biodegradative microbes is not readily available and is often difficult to compile. Similarly, for any given environmental contaminant, there is no single resource that can provide comparative information on the array of identified microbes capable of degrading the chemical. A web site that consolidates and cross-references strain, chemical and reference data related to biocatalysis, biotransformation, biodegradation and bioremediation would be an invaluable tool for academic and industrial researchers and environmental engineers. Citation for the above abstract: Urbance, John W., Cole, James, Saxman, Paul, Tiedje, James M. BSD: the Biodegradative Strain Database Nucl. Acids Res. 2003 31: 152-155 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/152 |
| 103. KEGG: Kyoto Encyclopedia of Genes and Genomes |
URL: http://www.genome.jp/kegg/ Categories: General Genomics Databases, Metabolic Pathway Databases The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource (http://www.genome.jp/kegg/) provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps. Citation for the above abstract: Kanehisa, Minoru, Goto, Susumu, Hattori, Masahiro, Aoki-Kinoshita, Kiyoko F., Itoh, Masumi, Kawashima, Shuichi, Katayama, Toshiaki, Araki, Michihiro, Hirakawa, Mika From genomics to chemical genomics: new developments in KEGG Nucl. Acids Res. 2006 34: D354-357 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D354 |
| 104. Klotho: Biochemical Compounds Declarative Database |
URL: http://www.biocheminfo.org/klotho/ Categories: Metabolic Pathway Databases, Small Molecule Structure Databases "A sufficiently realistic large-scale model needs to include detailed information on both the structure of the molecular parts and their functions in biochemical reactions. We are developing representations for molecules and reaction biochemistry for use in databases of biochemical function. Our approach is to capture the 'natural language' of biochemistry in a layered graph grammar, Klotho, which permits interconversion among a family of equivalent representations for compounds, and then operate on these with rules which express chemical and mechanistic aspects of the biochemical reaction (Atropos)." |
| 105. KEGG LIGAND Database |
URL: http://www.genome.jp/ligand/ Categories: Metabolic Pathway Databases, Small Molecule Structure Databases LIGAND is a composite database comprising three sections: COMPOUND for the information about metabolites and other chemical compounds, REACTION for the collection of substrate-product relations representing metabolic and other reactions, and ENZYME for the information about enzyme molecules. The current release (as of September 7, 2001) includes 7298 compounds, 5166 reactions and 3829 enzymes. In addition to the keyword search provided by the DBGET/LinkDB system, a substructure search to the COMPOUND and REACTION sections is now available through the World Wide Web (http://www.genome.ad.jp/ligand/). LIGAND may be also downloaded by anonymous FTP (ftp://ftp.genome.ad.jp/pub/kegg/ligand/). Citation for the above abstract: Goto, Susumu, Okuno, Yasushi, Hattori, Masahiro, Nishioka, Takaaki, Kanehisa, Minoru LIGAND: database of chemical compounds and reactions in biological pathways Nucl. Acids Res. 2002 30: 402-404 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/402 |
| 106. MetaCyc Encyclopedia of Metabolic Pathways |
URL: http://metacyc.org/ Categories: General Genomics Databases, Metabolic Pathway Databases MetaCyc is a database of metabolic pathways and enzymes located at http://MetaCyc.org/. Its goal is to serve as a metabolic encyclopedia, containing a collection of non-redundant pathways central to small molecule metabolism, which have been reported in the experimental literature. Most of the pathways in MetaCyc occur in microorganisms and plants, although animal pathways are also represented. MetaCyc contains metabolic pathways, enzymatic reactions, enzymes, chemical compounds, genes and review-level comments. Enzyme information includes substrate specificity, kinetic properties, activators, inhibitors, cofactor requirements and links to sequence and structure databases. Data are curated from the primary literature by curators with expertise in biochemistry and molecular biology. MetaCyc serves as a readily accessible comprehensive resource on microbial and plant pathways for genome analysis, basic research, education, metabolic engineering and systems biology. Querying, visualization and curation of the database is supported by SRI's Pathway Tools software. The PathoLogic component of Pathway Tools is used in conjunction with MetaCyc to predict the metabolic network of an organism from its annotated genome. SRI and the European Bioinformatics Institute employed this tool to create pathway/genome databases (PGDBs) for 165 organisms, available at the BioCyc.org website. These PGDBs also include predicted operons and pathway hole fillers. Citation for the above abstract: Caspi, Ron, Foerster, Hartmut, Fulcher, Carol A., Hopkinson, Rebecca, Ingraham, John, Kaipa, Pallavi, Krummenacker, Markus, Paley, Suzanne, Pick, John, Rhee, Seung Y., Tissier, Christophe, Zhang, Peifen, Karp, Peter D. MetaCyc: a multiorganism database of metabolic pathways and enzymes Nucl. Acids Res. 2006 34: D511-516 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D511 |
| 107. Metagrowth |
URL: http://igs-server.cnrs-mrs.fr/axenic/ Categories: Metabolic Pathway Databases, Prokaryote Databases Metagrowth is a new type of knowledge base developed to guide the experimental studies of culture conditions of obligate parasitic bacteria. We have gathered biological evidences giving possible clues to the development of the axenic (i.e. 'cell-free') growth of obligate parasites from various sources including published literature, genomic sequence information, metabolic databases and transporter databases. The database entries are composed of those evidences and specific hypotheses derived from them. Currently, 200 entries are available for Rickettsia prowazekii, Rickettsia conorii, Tropheryma whipplei, Treponema pallidum, Mycobacterium tuberculosis and Coxiella burnetii. The web interface of Metagrowth helps users to design new axenic culture media eventually suitable for those bacteria. Metagrowth is accessible at http://igs-server.cnrs-mrs.fr/axenic/. Citation for the above abstract: Ogata, Hiroyuki, Claverie, Jean-Michel Metagrowth: a new resource for the building of metabolic hypotheses in microbiology Nucl. Acids Res. 2005 33: D321-324 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D321 |
| 108. PathDB |
URL: http://www.ncgr.org/pathdb/ Categories: Metabolic Pathway Databases "PathDB is both a data repository and a system for building and visualizing cellular networks targeted for the gene expression, proteomics, and metabolic profiling communities. Uses include finding all pathways and phenotypes associated with genes in a cluster or validating computational predicted associations with known biological data. Innovations with the data model resulted in a progression from a concrete model of metabolism, primarily supporting curation from literature, to an abstract model for all kinds of cellular function, e.g. signal cascade, gene regulatory, protein-protein interaction, and protein-small molecule binding data, as well as metabolism. Leveraging off the new data model, concentration shifted to importing large data-sets which drove the development of our Import Framework and an elegant solution to 'publish and subscribe' for data warehousing. Researchers can now more easily focus and combine data of interest with our flexible data model and Import Framework innovations and file up-load capabilities. NCGR's current public pathways database houses curated Arabidopsis literature, Gene Ontology data, and data from currently published large-scale experiments in yeast with transcriptional binding factors from Richard Young's lab at MIT staged for addition." |
| 109. UM-BBD: the University of Minnesota Biocatalysis/Biodegradation Database |
URL: http://umbbd.ahc.umn.edu/ Categories: Metabolic Pathway Databases As the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://umbbd.ahc.umn.edu/) starts its second decade, it includes information on over 900 compounds, over 600 enzymes, nearly 1000 reactions and about 350 microorganism entries. Its Biochemical Periodic Tables have grown to include biological information for almost all stable, non-noble-gas elements (http://umbbd.ahc.umn.edu/periodic/). Its Pathway Prediction System (PPS) (http://umbbd.ahc.umn.edu/predict/) is now an internationally recognized, open system for predicting microbial catabolism of organic compounds. Graphical display of PPS rules, a stand-alone version of the PPS and guidance for PPS users are being developed. The next decade should see the PPS, and the UM-BBD on which it is based, find increasing use by national and international government agencies, commercial organizations and educational institutions. Citation for the above abstract: Ellis, Lynda B. M., Roe, Dave, Wackett, Lawrence P. The University of Minnesota Biocatalysis/Biodegradation Database: the first decade Nucl. Acids Res. 2006 34: D517-521 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D517 |
| 110. AffinDB: Affinity database for protein-ligand complexes |
URL: http://www.agklebe.de/affinity Categories: Drug and Drug Design Databases, Intermolecular Interactions and Signaling Pathways Databases AffinDB is a database of affinity data for structurally resolved protein–ligand complexes from the Protein Data Bank (PDB). It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein–ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH value of the measurement, ligand molecular weight, and publication data (author, journal and year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design. Citation for the above abstract: Block, Peter, Sotriffer, Christoph A., Dramburg, Ingo, Klebe, Gerhard AffinDB: a freely accessible database of affinities for protein-ligand complexes from the PDB Nucl. Acids Res. 2006 34: D522-526 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D522 |
| 111. WIT: What Is There |
URL: http://www-wit.mcs.anl.gov/wit3/ Categories: General Genomics Databases, Metabolic Pathway Databases The WIT (What Is There) (http://wit.mcs.anl.gov/WIT2/ ) system has been designed to support comparative analysis of sequenced genomes and to generate metabolic reconstructions based on chromosomal sequences and metabolic modules from the EMP/MPW family of databases. This system contains data derived from about 40 completed or nearly completed genomes. Sequence homologies, various ORF-clustering algorithms, relative gene positions on the chromosome and placement of gene products in metabolic pathways (metabolic reconstruction) can be used for the assignment of gene functions and for development of overviews of genomes within WIT. The integration of a large number of phylogenetically diverse genomes in WIT facilitates the understanding of the physiology of different organisms. Citation for the above abstract: Overbeek, Ross, Larsen, Niels, Pusch, Gordon D., D'Souza, Mark, Jr, Evgeni Selkov, Kyrpides, Nikos, Fonstein, Michael, Maltsev, Natalia, Selkov, Evgeni WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction Nucl. Acids Res. 2000 28: 123-125 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/123 |
| 112. ACTIVITY |
URL: http://wwwmgs.bionet.nsc.ru/mgs/systems/activity/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases MOTIVATION: The goal of the work was to develop a WWW-oriented computer system providing a maximal integration of informational and software resources on the regulation of gene expression and navigation through them. Rapid growth of the variety and volume of information accumulated in the databases on regulation of gene expression necessarily requires the development of computer systems for automated discovery of the knowledge that can be further used for analysis of regulatory genomic sequences. RESULTS: The GeneExpress system developed includes the following major informational and software modules: (1) Transcription Regulation (TRRD) module, which contains the databases on transcription regulatory regions of eukaryotic genes and TRRD Viewer for data visualization; (2) Site Activity Prediction (ACTIVITY), the module for analysis of functional site activity and its prediction; (3) Site Recognition module, which comprises (a) B-DNA-VIDEO system for detecting the conformational and physicochemical properties of DNA sites significant for their recognition, (b) Consensus and Weight Matrices (ConsFrec) and (c) Transcription Factor Binding Sites Recognition (TFBSR) systems for detecting conservative contextual regions of functional sites and their recognition; (4) Gene Networks (GeneNet), which contains an object-oriented database accumulating the data on gene networks and signal transduction pathways, and the Java-based Viewer for exploration and visualization of the GeneNet information; (5) mRNA Translation (Leader mRNA), designed to analyze structural and contextual properties of mRNA 5'-untranslated regions (5'-UTRs) and predict their translation efficiency; (6) other program modules designed to study the structure-function organization of regulatory genomic sequences and regulatory proteins. AVAILABILITY: GeneExpress is available at http://wwwmgs.bionet.nsc. ru/systems/GeneExpress/ and the links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/links/mirrors.html+ ++. Citation for the above abstract: NA Kolchanov , MP Ponomarenko , AS Frolov , EA Ananko , FA Kolpakov , EV Ignatieva , OA Podkolodnaya , TN Goryachkovskaya , IL Stepanenko , TI Merkulova , VV Babenko , YV Ponomarenko , AV Kochetov , NL Podkolodny , DV Vorobiev , SV Lavryushev , DA Grigorovich , YV Kondrakhin , L Milanesi , E Wingender , V Solovyev , and GC Overton Integrated databases and computer systems for studying eukaryotic gene expression Bioinformatics 15: 669-686. © 1999 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/15/7/669 |
| 113. AGRIS: Arabidopsis Gene Regulatory Information Server |
URL: http://arabidopsis.med.ohio-state.edu/ Categories: Arabidopsis thaliana Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases BACKGROUND: The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cis-regulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002). RESULTS: AGRIS is an information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtTFDB (Arabidopsis thaliana transcription factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database). AtTFDB contains information on approximately 1,400 transcription factors identified through motif searches and grouped into 34 families. AtTFDB links the sequence of the transcription factors with available mutants and, when known, with the possible genes they may regulate. AtcisDB consists of the 5' regulatory sequences of all 29,388 annotated genes with a description of the corresponding cis-regulatory elements. Users can search the databases for (i) promoter sequences, (ii) a transcription factor, (iii) a direct target genes for a specific transcription factor, or (vi) a regulatory network that consists of transcription factors and their target genes. CONCLUSION: AGRIS provides the necessary software tools on Arabidopsis transcription factors and their putative binding sites on all genes to initiate the identification of transcriptional regulatory networks in the model dicotyledoneous plant Arabidopsis thaliana. AGRIS can be accessed from http://arabidopsis.med.ohio-state.edu. Citation for the above abstract: Ramana V Davuluri, Hao Sun, Saranyan K Palaniswamy, Nicole Matthews, Carlos Molina, Mike Kurtz, and Erich Grotewold AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors BMC Bioinformatics 2003, 4:25; doi:10.1186/1471-2105-4-25 © 2003 By the Authors. The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/4/25 |
| 114. ASPD: Artificial Selected Proteins/Peptides Database |
URL: http://wwwmgs.bionet.nsc.ru/mgs/gnw/aspd/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases ASPD is a new curated database that incorporates data on full-length proteins, protein domains and peptides that were obtained through in vitro directed evolution processes (mainly by means of phage display). At present, the ASPD database contains data on 195 selection experiments, which were described in 112 original papers. For each experiment, the following information is given: (i) description of the target for binding, (ii) description of the protein or peptide which serves as the template for library construction and description of the native protein which binds the target, (iii) links to the major proteomic databases (SWISS-PROT, PDB, PROSITE and ENZYME), (iv) keywords referring to the biological significance of the experiment, (v) aligned sequences of proteins or peptides retrieved through in vitro evolution and relevant native or constructed sequences, (vi) the number of rounds of selection/amplification and (vii) the number of occurrences of clones with each sequence. The literature data include a full reference, a link to the MEDLINE database and the name of the corresponding author with his email address. ASPD has a user-friendly interface which allows for simple queries using the names of proteins and ligands, as well as keywords describing the biological role of the interaction studied, and also for queries based on authors' names. It is also possible to access the database by means of the SRS system, allowing complex queries. There is a BLAST search tool against the ASPD for looking directly for homologous sequences. Research tools of the ASPD allow the analysis of pairwise correlations in the sequences of proteins and peptides selected against one target. The URL for the ASPD database is http://www.sgi.sscc.ru/mgs/gnw/aspd/. Citation for the above abstract: Valuev, Vadim P., Afonnikov, Dmitry A., Ponomarenko, Mikhail P., Milanesi, Luciano, Kolchanov, Nikolay A. ASPD (Artificially Selected Proteins/Peptides Database): a database of proteins and peptides evolved in vitro Nucl. Acids Res. 2002 30: 200-202 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/200 |
| 115. Cancer Chromosomes |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cancerchromosomes Categories: Cancer Databases, Human Genome Databases, Maps, and Viewers, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases "Three databases, the NCI/NCBI SKY/M-FISH & CGH Database, the NCI Mitelman Database of Chromosome Aberrations in Cancer, and the NCI Recurrent Aberrations in Cancer , are now integrated into NCBI's Entrez system as Cancer Chromosomes. Search for cytogenetic, clinical, and/or reference information. Queries are performed using the same approach as for other Entrez databases such as PubMed and Nucleotide." |
| 116. DBTBS |
URL: http://dbtbs.hgc.jp/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases DBTBS (http://dbtbs.hgc.jp) was originally released in 1999 as a reference database of published transcriptional regulation events in Bacillus subtilis, one of the best studied bacteria. It is essentially a compilation of transcription factors with their regulated genes as well as their recognition sequences, which were experimentally characterized and reported in the literature. Here we report its major update, which contains information on 114 transcription factors, including sigma factors, and 633 promoters of 525 genes. The number of references cited in the database has increased from 291 to 378. It also supports a function to find putative transcription factor binding sites within input sequences by using our collection of weight matrices and consensus patterns. Furthermore, though preliminarily, DBTBS now aims to contribute to comparative genomics by showing the presence or absence of potentially orthologous transcription factors and their corresponding cis-elements on the promoters of their potentially orthologously regulated genes in 50 eubacterial genomes. Citation for the above abstract: Makita, Yuko, Nakao, Mitsuteru, Ogasawara, Naotake, Nakai, Kenta DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics Nucl. Acids Res. 2004 32: D75-77 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D75 |
| 117. DBTSS: Database of Transcriptional Start Sites |
URL: http://dbtss.hgc.jp/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases DBTSS was first constructed in 2002 based on precise, experimentally determined 5' end clones. Several major updates and additions have been made since the last report. First, the number of human clones has drastically increased, going from 190,964 to 1,359,000. Second, information about potential alternative promoters is presented because the number of 5' end clones is now sufficient to determine several promoters for one gene. Namely, we defined putative promoter groups by clustering transcription start sites (TSSs) separated by <500 bases. A total of 8308 human genes and 4276 mouse genes were found to have putative multiple promoters. Third, DBTSS provides detailed sequence comparisons of user-specified TSSs. Finally, we have added TSS information for zebrafish, malaria and schyzon (a red algae model organism). DBTSS is accessible at http://dbtss.hgc.jp. Citation for the above abstract: Yamashita, Riu, Suzuki, Yutaka, Wakaguri, Hiroyuki, Tsuritani, Katsuki, Nakai, Kenta, Sugano, Sumio DBTSS: DataBase of Human Transcription Start Sites, progress report 2006 Nucl. Acids Res. 2006 34: D86-89 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D86 |
| 118. HTPSELEX |
URL: http://www.isrec.isb-sib.ch/htpselex/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases HTPSELEX is a public database providing access to primary and derived data from high-throughput SELEX experiments aimed at characterizing the binding specificity of transcription factors. The resource is primarily intended to serve computational biologists interested in building models of transcription factor binding sites from large sets of binding sequences. The guiding principle is to make available all information that is relevant for this purpose. For each experiment, we try to provide accurate information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, assembled clone sequences (concatemers) and complete sets of in vitro selected protein-binding tags. In addition, we offer in-house derived binding sites models. HTPSELEX also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols. The FTP site contains the trace archives and database flatfiles. The web server offers user-friendly interfaces for viewing individual entries and quality-controlled download of SELEX sequence libraries according to a user-defined sequencing quality threshold. HTPSELEX is available from ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex/ and http://www.isrec.isb-sib.ch/htpselex. Citation for the above abstract: Jagannathan, Vidhya, Roulet, Emmanuelle, Delorenzi, Mauro, Bucher, Philipp HTPSELEX--a database of high-throughput SELEX libraries for transcription factor binding sites Nucl. Acids Res. 2006 34: D90-94 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D90 |
| 119. DoOP: Databases of Orthologous Promoters |
URL: http://doop.abc.hu/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21,061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically. Citation for the above abstract: Barta, Endre, Sebestyen, Endre, Palfy, Tamas B., Toth, Gabor, Ortutay, Csaba P., Patthy, Laszlo DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants Nucl. Acids Res. 2005 33: D86-90 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D86 |
| 120. DPInteract |
URL: http://arep.med.harvard.edu/dpinteract/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases "This dataset is being collected with several purposes in mind 1. Cataloging demonstrated sites and non-sites for E.coli DNA-binding proteins 2. Aiding the annotation of such sites in other E.coli databases and sequence entries 3. Interpreting the results of whole-genome in vivo methylation protection experiments (Nature 360: 606-610; J Bacteriol 176: 3438-3441) 4. Developing better computational tools for recognizing DNA binding proteins in sequence data" |
| 121. EPD: The Eukaryotic Promoter Database |
URL: http://www.epd.isb-sib.ch/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, experimentally defined by a transcription start site (TSS). Access to promoter sequences is provided by pointers to positions in the corresponding genomes. Promoter evidence comes from conventional TSS mapping experiments for individual genes, or, starting from release 73, from mass genome annotation projects. Subsets of promoter sequences with customized 5' and 3' extensions can be downloaded from the EPD website. The focus of current development efforts is to reach complete promoter coverage for important model organisms as soon as possible. To speed up this process, a new class of preliminary promoter entries has been introduced as of release 83, which requires less stringent admission criteria. As part of a continuous integration process, new web-based interfaces have been developed, which allow joint analysis of promoter sequences with other bioinformatics resources developed by our group, in particular programs offered by the Signal Search Analysis Server, and gene expression data stored in the CleanEx database. EPD can be accessed at http://www.epd.isb-sib.ch. Citation for the above abstract: Schmid, Christoph D., Perier, Rouaida, Praz, Viviane, Bucher, Philipp EPD in its twentieth year: towards complete promoter coverage of selected model organisms Nucl. Acids Res. 2006 34: D82-85 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D82 |
| 122. GeneNet |
URL: http://wwwmgs.bionet.nsc.ru/mgs/gnw/genenet/ Categories: Intermolecular Interactions and Signaling Pathways Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases The GeneNet system is designed for collection and analysis of the data on gene and metabolic networks, signal transduction pathways and kinetic characteristics of elementary processes. In the past 2 years, the GeneNet structure was considerably improved: (i) the current version of the database is now implemented using ORACLE9i; (ii) the capacities to describe the structure of the protein complexes and the interactions between the units are increased; (iii) two tables with kinetic constants and more detailed descriptions of certain reactions were added; and (iv) a module for kinetic modeling was supplemented. The current SRS release of the GeneNet database contains 37 graphical maps of gene networks, as well as descriptions of 1766 proteins, 1006 genes, 241 small molecules and 3254 relationships between gene network units, and 552 kinetic constants. Information distributed between 16 interlinked tables was obtained by annotating 1980 journal publications. SRS release of the GeneNet database, the graphical viewer and the modeling section are available at http://wwwmgs.bionet.nsc.ru/mgs/gnw/genenet/. Citation for the above abstract: Ananko, E. A., Podkolodny, N. L., Stepanenko, I. L., Podkolodnaya, O. A., Rasskazov, D. A., Miginsky, D. S., Likhoshvai, V. A., Ratushny, A. V., Podkolodnaya, N. N., Kolchanov, N. A. GeneNet in 2005 Nucl. Acids Res. 2005 33: D425-427 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D425 |
| 123. The JASPAR Database |
URL: http://jaspar.cgb.ki.se/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases JASPAR is the most complete open-access collection of transcription factor binding site (TFBS) matrices. In this new release, JASPAR grows into a meta-database of collections of TFBS models derived by diverse approaches. We present JASPAR CORE—an expanded version of the original, non-redundant collection of annotated, high-quality matrix-based transcription factor binding profiles, JASPAR FAM—a collection of familial TFBS models and JASPAR phyloFACTS—a set of matrices computationally derived from statistically overrepresented, evolutionarily conserved regulatory region motifs from mammalian genomes. JASPAR phyloFACTS serves as a non-redundant extension to JASPAR CORE, enhancing the overall breadth of JASPAR for promoter sequence analysis. The new release of JASPAR is available at http://jaspar.genereg.net. Citation for the above abstract: Vlieghe, Dominique, Sandelin, Albin, De Bleser, Pieter J., Vleminckx, Kris, Wasserman, Wyeth W., van Roy, Frans, Lenhard, Boris A new generation of JASPAR, the open-access repository for transcription factor binding site profiles Nucl. Acids Res. 2006 34: D95-97 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D95 |
| 124. MAPPER: Multi-genome Analysis of Positions and Patterns of Elements of Regulation |
URL: http://bio.chip.org/mapper Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases We describe a comprehensive map of putative transcription factor binding sites (TFBSs) across multiple genomes created using a search method that relies on hidden Markov models built from experimentally determined TFBSs. Using the information in the TRANSFAC and JASPAR databases, we built 1134 models for TFBSs and used them to scan regions 10 kb upstream of the start of the transcript for all known genes in the human, mouse and Drosophila melanogaster genomes. The results, together with homology information on clusters of ortholog genes across the three genomes, were used to create a multi-organism catalog of annotated TFBSs. The catalog can be queried through a web interface accessible at http://bio.chip.org/mapper that allows the identification, visualization and selection of TFBSs occurring in the promoter of a gene of interest and also the common factors predicted to bind across the cluster of orthologs that includes that gene. Alternatively, the interface allows the user to retrieve binding sites for a single transcription factor of interest in a single gene or in all genes of the human, mouse or fruit fly genomes. Citation for the above abstract: Marinescu, Voichita D., Kohane, Isaac S., Riva, Alberto The MAPPER database: a multi-genome catalog of putative transcription factor binding sites Nucl. Acids Res. 2005 33: D91-97 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D91 |
| 125. ooTFD: object-oriented Transcription Factors Database |
URL: http://www.ifti.org/ootfd/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases ooTFD (object-oriented Transcription Factors Database) is an object-oriented successor to TFD. This database is aimed at capturing information regarding the polypeptide interactions which comprise and define the properties of transcription factors. ooTFD contains information about transcription factor binding sites, as well as composite relationships within transcription factors, which frequently occur as multisubunit proteins that form a complex interface to cellular processes outside the transcription machinery through protein-protein interactions. In the past year, a few additions and changes were made to this database and associated tools, which are accessible through the IFTI-MIRAGE web site at http://www.ifti.org/ Citation for the above abstract: Ghosh, David Object-oriented Transcription Factors Database (ooTFD) Nucl. Acids Res. 2000 28: 308-310 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/308 |
| 126. OPD: Osteo-Promoter Database |
URL: http://www.opd.tau.ac.il/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases "Osteo-Promoter Database (OPD) is a catalogic database of functional genes in osteogenic proliferation and differentiation. OPD analyzes promoters of genes which differentiates along with the osteogenic pathway. Uniqueness of OPD is the analysis of promoter matrix attachment regions (MARs) which allocates AT-rich sites in promoters.Interaction between AT-rich sites in the DNA to AT-hook motif of the protein is important component of production regulator proteins complex, which controls transcription of genes in the cell. Expanding the knowledge of AT-rich sites in the promoters of specific genes leads to construction of regulation system for transcription in bone tissue." |
| 127. PLACE: A Database of Plant Cis-acting Regulatory DNA Elements |
URL: http://www.dna.affrc.go.jp/PLACE/ Categories: General Plant Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases PLACE (http://www.dna.affrc.go.jp/htdocs/PLACE/) is a database of nucleotide sequence motifs found in plant cis-acting regulatory DNA elements. Motifs were extracted from previously published reports on genes in vascular plants. In addition to the motifs originally reported, their variations in other genes or in other plant species in later reports are also compiled. Documents for each motif in the PLACE database contains, in addition to a motif sequence, a brief definition and description of each motif, and relevant literature with PubMed ID numbers and GenBank accession numbers where available. Users can search their query sequences for cis-elements using the Signal Scan program at our web site. The results will be reported in one of the three forms. Clicking the PLACE accession numbers in the result report will open the pertinent motif document. Clicking the PubMed or GenBank accession number in the document will allow users to access to these databases, and to read the of the literature or the annotation in the DNA database. This report summarizes the present status of this database and available tools. Citation for the above abstract: Higo, K, Ugawa, Y, Iwamoto, M, Korenaga, T Plant cis-acting regulatory DNA elements (PLACE) database: 1999 Nucl. Acids Res. 1999 27: 297-300 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/297 |
| 128. Polygenic Signaling Pathways |
URL: http://www.polygenicpathways.co.uk Categories: Gene-, System-, or Disease- Specific Databases "This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease." |
| 129. Relemed |
URL: http://www.relemed.com/ Categories: MEDLINE Interfaces BACKGROUND: Receiving extraneous articles in response to a query submitted to MEDLINE/PubMed is common. When submitting a multi-word query (which is the majority of queries submitted), the presence of all query words within each article may be a necessary condition for retrieving relevant articles, but not sufficient. Ideally a relationship between the query words in the article is also required. We propose that if two words occur within an article, the probability that a relation between them is explained is higher when the words occur within adjacent sentences versus remote sentences. Therefore, sentence-level concurrence can be used as a surrogate for existence of the relationship between the words.In order to avoid the irrelevant articles, one solution would be to increase the search specificity. Another solution is to estimate a relevance score to sort the retrieved articles. However among the >30 retrieval services available for MEDLINE, only a few estimate a relevance score, and none detects and incorporates the relation between the query words as part of the relevance score. RESULTS: We have developed "Relemed", a search engine for MEDLINE. Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article. It uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words. It also estimates a relevance score and sorts the results on this basis, thus shifting irrelevant articles lower down the list.In two case studies, we demonstrate that the most relevant articles appear at the top of the Relemed results, while this is not necessarily the case with a PubMed search. We have also shown that a Relemed search includes not only all the articles retrieved by PubMed, but potentially additional relevant articles, due to the extended 'automatic term mapping' and text-word searching features implemented in Relemed. CONCLUSION: By using sentence-level matching, Relemed can deliver higher specificity, thus eliminating more false-positive articles. By introducing an appropriate relevance metric, the most relevant articles on which the user wishes to focus are listed first. Relemed also shrinks the displayed text, and hence the time spent scanning the articles. Citation for the above abstract: Siadaty MS, Shu J, Knaus WA. Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles. BMC Med Inform Decis Mak. 2007 Jan 10;7:1. © 2007 By the authors The full text of the article can be found at: http://www.biomedcentral.com/1472-6947/7/1 |
| 130. PlantProm |
URL: http://mendel.cs.rhul.ac.uk/mendel.php Categories: General Plant Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DB contains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions (-200 : +51) with TSS on the fixed position +201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support ftp: vector machine approach for promoter identification is under development. PlantProm DB is available at http://mendel.cs.rhul.ac.uk/ and http://www.softberry.com/. Citation for the above abstract: Shahmuradov, Ilham A., Gammerman, Alex J., Hancock, John M., Bramley, Peter M., Solovyev, Victor V. PlantProm: a database of plant promoter sequences Nucl. Acids Res. 2003 31: 114-117 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/114 |
| 131. PRODORIC: Prokaryotic Database of Gene Regulation |
URL: http://prodoric.tu-bs.de/ Categories: Intermolecular Interactions and Signaling Pathways Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases The database PRODORIC aims to systematically organize information on prokaryotic gene expression, and to integrate this information into regulatory networks. The present version focuses on pathogenic bacteria such as Pseudomonas aeruginosa. PRODORIC links data on environmental stimuli with trans-acting transcription factors, cis-acting promoter elements and regulon definition. Interactive graphical representations of operon, gene and promoter structures including regulator-binding sites, transcriptional and translational start sites, supplemented with information on regulatory proteins are available at varying levels of detail. The data collection provided is based on exhaustive analyses of scientific literature and computational sequence prediction. Included within PRODORIC are tools to define and predict regulator binding sites. It is accessible at http://prodoric.tu-bs.de. Citation for the above abstract: Munch, Richard, Hiller, Karsten, Barg, Heiko, Heldt, Dana, Linz, Simone, Wingender, Edgar, Jahn, Dieter PRODORIC: prokaryotic database of gene regulation Nucl. Acids Res. 2003 31: 266-269 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/266 |
| 132. PromEC |
URL: http://bioinfo.md.huji.ac.il/marg/promec Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases PromEC is an updated compilation of Escherichia coli mRNA promoter sequences. It includes documentation on the location of experimentally identified mRNA transcriptional start sites on the E. coli chromosome, as well as the actual sequences in the promoter region. The database was updated as of July 2000 and includes 472 entries. PromEC is accessible at http://bioinfo.md.huji.ac. il/marg/promec Citation for the above abstract: Hershberg, Ruti, Bejerano, Gill, Santos-Zavaleta, Alberto, Margalit, Hanah PromEC: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites Nucl. Acids Res. 2001 29: 277-0 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/277 |
| 133. RegulonDB |
URL: http://regulondb.ccg.unam.mx/index.html Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases RegulonDB is the internationally recognized reference database of Escherichia coli K-12 offering curated knowledge of the regulatory network and operon organization. It is currently the largest electronically-encoded database of the regulatory network of any free-living organism. We present here the recently launched RegulonDB version 5.0 radically different in content, interface design and capabilities. Continuous curation of original scientific literature provides the evidence behind every single object and feature. This knowledge is complemented with comprehensive computational predictions across the complete genome. Literature-based and predicted data are clearly distinguished in the database. Starting with this version, RegulonDB public releases are synchronized with those of EcoCyc since our curation supports both databases. The complex biology of regulation is simplified in a navigation scheme based on three major streams: genes, operons and regulons. Regulatory knowledge is directly available in every navigation step. Displays combine graphic and textual information and are organized allowing different levels of detail and biological context. This knowledge is the backbone of an integrated system for the graphic display of the network, graphic and tabular microarray comparisons with curated and predicted objects, as well as predictions across bacterial genomes, and predicted networks of functionally related gene products. Access RegulonDB at http://regulondb.ccg.unam.mx. Citation for the above abstract: Salgado, Heladia, Gama-Castro, Socorro, Peralta-Gil, Martin, Diaz-Peredo, Edgar, Sanchez-Solano, Fabiola, Santos-Zavaleta, Alberto, Martinez-Flores, Irma, Jimenez-Jacinto, Veronica, Bonavides-Martinez, Cesar, Segura-Salazar, Juan, Martinez-Antonio, Agustino, Collado-Vides, Julio RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions Nucl. Acids Res. 2006 34: D394-397 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D394 |
| 134. rSNP_Guide |
URL: http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases The analysis of gene regulatory networks has become one of the most challenging problems of the postgenomic era. Earlier we developed rSNP_Guide (http://util.bionet.nsc.ru/databases/rsnp.html), a computer system and database devoted to prediction of transcription factor (TF) binding sites (TF sites), which can be responsible for disease phenotypes. The prediction results were confirmed by 70 known relationships between TF sites and diseases, as well as by site-directed mutagenesis data. The rSNP_Guide is being investigated as a tool for TF site annotation. Previously analyzed and characterized cases of altered TF sites were used to annotate potential sites of the same type and at the same location in homologous genes. Based on 20 TF sites with known alterations in TF binding to DNA, we localized 245 potential TF sites in homologous genes. For these potential TF sites, rSNP_Guide estimates TF-DNA interaction according to three categories: 'present', 'weak', and 'absent'. The significance of each assignment is statistically measured. Citation for the above abstract: Ponomarenko, Julia V., Merkulova, Tatyana I., Orlova, Galina V., Fokin, Oleg N., Gorshkova, Elena V., Frolov, Anatoly S., Valuev, Vadim P., Ponomarenko, Mikhail P. rSNP_Guide, a database system for analysis of transcription factor binding to DNA with variations: application to genome annotation Nucl. Acids Res. 2003 31: 118-121 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/118 |
| 135. SCPD: The Promoter Database of Saccharomyces cerevisiae |
URL: http://cgsigma.cshl.org/jian/ Categories: Fungal Genome Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases MOTIVATION: In order to facilitate a systematic study of the promoters and transcriptionally regulatory cis-elements of the yeast Saccharomyces cerevisiae on a genomic scale, we have developed a comprehensive yeast-specific promoter database, SCPD. RESULTS: Currently SCPD contains 580 experimentally mapped transcription factor (TF) binding sites and 425 transcriptional start sites (TSS) as its primary data entries. It also contains relevant binding affinity and expression data where available. In addition to mechanisms for promoter information (including sequence) retrieval and a data submission form, SCPD also provides some simple but useful tools for promoter sequence analysis. AVAILABILITY: SCPD can be accessed from the URL http://cgsigma.cshl.org/jian. The database is continually updated. Citation for the above abstract: J Zhu , and MQ Zhang SCPD: a promoter database of the yeast Saccharomyces cerevisiae Bioinformatics 15: 607-611. © 1999 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/15/7/607 |
| 136. SELEX_DB |
URL: http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, RNA Sequence Databases SELEX_DB is an online resource containing both the experimental data on in vitro selected DNA/RNA oligomers (aptamers) and the applets for recognition of these oligomers. Since in vitro experimental data are evidently system-dependent, the new release of the SELEX_DB has been supplemented by the database SYSTEM storing the experimental design. In addition, the recognition applet package, SELEX_TOOLS, applying in vitro selected data to annotation of the genome DNA, is accompanied by the cross-validation test database CROSS_TEST discriminating the sites (natural or other) related to in vitro selected sites out of random DNA. By cross-validation testing, we have unexpectedly observed that the recognition accuracy increases with the growth of homology between the training and test sets of protein binding sequences. For natural sites, the recognition accuracy was lower than that for the nearest protein homologs and higher than that for distant homologs and non-homologous proteins binding the common site. The current SELEX_DB release is available at http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/. Citation for the above abstract: Ponomarenko, Julia V., Orlova, Galina V., Frolov, Anatoly S., Gelfand, Mikhail S., Ponomarenko, Mikhail P. SELEX_DB: a database on in vitro selected oligomers adapted for recognizing natural sites and for analyzing both SNPs and site-directed mutagenesis data Nucl. Acids Res. 2002 30: 195-199 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/195 |
| 137. SKY/M-FISH and CGH Database |
URL: http://www.ncbi.nlm.nih.gov/sky/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases "The goal of the SKY/M-FISH and CGH database is to provide a public platform for investigators to share and compare their molecular cytogenetic data. The database is open to everyone and all users can view an individual investigator's public data or compare public cases from different investigators. Those wishing to contribute their own data must register and can choose to keep their data private for a period not to exceed two years. ... Spectral Karyotyping (SKY), Multiplex Fluorescence In Situ Hybridization (M-FISH) and Comparative Genomic Hybridization (CGH) are complementary fluorescent molecular cytogenetic techniques. SKY/M-FISH permits the simultaneous visualization of each human or mouse chromosome in a different color, facilitating the identification of chromosomal aberrations. CGH utilizes the hybridization of differentially labeled tumor and reference DNA to generate a map of DNA copy number changes in tumor genomes." |
| 138. TESS: Transcription Element Search System |
URL: http://www.cbil.upenn.edu/tess/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases "TESS is a web tool for predicting transcription factor binding sites in DNA sequences. It can identify binding sites using site or consensus strings and positional weight matrices from the TRANSFAC, IMD, and our CBIL-GibbsMat database. You may also include your own site or consensus strings and/or weight matrices in the search. TESS assigns a TESS job number to all sequence search jobs. The job results are stored on our server for a period of time specified in the search submit form. During this time you may recall the search results using the form on this page. TESS can also email results to you as a tab-delimited file suitable for loading into a spreadsheet program. TESS also has data browsing and querying capabilities to help you learn about the factors that were predicted to bind to your sequence." |
| 139. Tractor DB: Transcriptional Factor Database |
URL: http://www.tractor.lncc.br/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published. Citation for the above abstract: Gonzalez, Abel D., Espinosa, Vladimir, Vasconcelos, Ana T., Perez-Rueda, Ernesto, Collado-Vides, Julio TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes Nucl. Acids Res. 2005 33: D98-102 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D98 |
| 140. TRANSCompel |
URL: http://www.gene-regulation.com/pub/databases.html#transcompel Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases Originating from COMPEL, the TRANSCompel database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor--DNA and factor--factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html. Citation for the above abstract: Kel-Margoulis, Olga V., Kel, Alexander E., Reuter, Ingmar, Deineko, Igor V., Wingender, Edgar TRANSCompel(R): a database on composite regulatory elements in eukaryotic genes Nucl. Acids Res. 2002 30: 332-334 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/332 |
| 141. TRANSFAC |
URL: http://www.gene-regulation.com/pub/databases.html#transfac Categories: Microarray Data and other Gene Expression Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases The TRANSFAC® database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel® on composite elements have been further enhanced on various levels. A new web interface with different search options and integrated versions of MatchTM and PatchTM provides increased functionality for TRANSFAC®. The list of databases which are linked to the common GENE table of TRANSFAC® and TRANSCompel® has been extended by: Ensembl, UniGene, EntrezGene, HumanPSDTM and TRANSPROTM. Standard gene names from HGNC, MGI and RGD, are included for human, mouse and rat genes, respectively. With the help of InterProScan, Pfam, SMART and PROSITE domains are assigned automatically to the protein sequences of the transcription factors. TRANSCompel® contains now, in addition to the COMPEL table, a separate table for detailed information on the experimental EVIDENCE on which the composite elements are based. Finally, for TRANSFAC®, in respect of data growth, in particular the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed. The here described public releases, TRANSFAC® 7.0 and TRANSCompel® 7.0, are accessible under http://www.gene-regulation.com/pub/databases.html. Citation for the above abstract: Matys, V., Kel-Margoulis, O. V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A. E., Wingender, E. TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes Nucl. Acids Res. 2006 34: D108-110 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D108 |
| 142. TRANSPATH |
URL: http://www.gene-regulation.com/pub/databases.html#transpath Categories: Intermolecular Interactions and Signaling Pathways Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases TRANSPATH® is a database about signal transduction events. It provides information about signaling molecules, their reactions and the pathways these reactions constitute. The representation of signaling molecules is organized in a number of orthogonal hierarchies reflecting the classification of the molecules, their species-specific or generic features, and their post-translational modifications. Reactions are similarly hierarchically organized in a three-layer architecture, differentiating between reactions that are evidenced by individual publications, generalizations of these reactions to construct species-independent ‘reference pathways’ and the ‘semantic projections’ of these pathways. A number of search and browse options allow easy access to the database contents, which can be visualized with the tool PathwayBuilderTM. The module PathoSign adds data about pathologically relevant mutations in signaling components, including their genotypes and phenotypes. TRANSPATH® and PathoSign can be used as encyclopaedia, in the educational process, for vizualization and modeling of signal transduction networks and for the analysis of gene expression data. TRANSPATH® Public 6.0 is freely accessible for users from non-profit organizations under http://www.gene-regulation.com/pub/databases.html. Citation for the above abstract: Krull, Mathias, Pistor, Susanne, Voss, Nico, Kel, Alexander, Reuter, Ingmar, Kronenberg, Deborah, Michael, Holger, Schwarzer, Knut, Potapov, Anatolij, Choi, Claudia, Kel-Margoulis, Olga, Wingender, Edgar TRANSPATH(R): an information resource for storing and visualizing signaling pathways and their pathological aberrations Nucl. Acids Res. 2006 34: D546-551 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D546 |
| 143. TRED: Transcriptional Regulatory Element Database |
URL: http://rulai.cshl.edu/tred Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases In order to understand gene regulation, accurate and comprehensive knowledge of transcriptional regulatory elements is essential. Here, we report our efforts in building a mammalian Transcriptional Regulatory Element Database (TRED) with associated data analysis functions. It collects cis- and trans-regulatory elements and is dedicated to easy data access and analysis for both single-gene-based and genome-scale studies. Distinguishing features of TRED include: (i) relatively complete genome-wide promoter annotation for human, mouse and rat; (ii) availability of gene transcriptional regulation information including transcription factor binding sites and experimental evidence; (iii) data accuracy is ensured by hand curation; (iv) efficient user interface for easy and flexible data retrieval; and (v) implementation of on-the-fly sequence analysis tools. TRED can provide good training datasets for further genome-wide cis-regulatory element prediction and annotation, assist detailed functional studies and facilitate the decipher of gene regulatory networks (http://rulai.cshl.edu/TRED). Citation for the above abstract: Zhao, Fang, Xuan, Zhenyu, Liu, Lihua, Zhang, Michael Q. TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies Nucl. Acids Res. 2005 33: D103-107 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D103 |
| 144. TRRD: Transcription Regulatory Regions Database |
URL: http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/ Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases Transcription Regulatory Regions Database (TRRD) is an informational resource containing an integrated description of the gene transcription regulation. An entry of the database corresponds to a gene and contains the data on localization and functions of the transcription regulatory regions as well as gene expression patterns. TRRD contains only experimental data that are inputted into the database through annotating scientific publication. TRRD release 6.0 comprises the information on 1167 genes, 5537 transcription factor binding sites, 1714 regulatory regions, 14 locus control regions and 5335 expression patterns obtained through annotating 3898 scientific papers. This information is arranged in seven databases: TRRDGENES (general gene description), TRRDLCR (locus control regions); TRRDUNITS (regulatory regions: promoters, enhancers, silencers, etc.), TRRDSITES (transcription factor binding sites), TRRDFACTORS (transcription factors), TRRDEXP (expression patterns) and TRRDBIB (experimental publications). Sequence Retrieval System (SRS) is used as a basic tool for navigating and searching TRRD and integrating it with external informational and software resources. The visualization tool, TRRD Viewer, provides the information representation in a form of maps of gene regulatory regions. The option allowing nucleotide sequences to be searched for according to their homology using BLAST is also included. TRRD is available at http://www.bionet.nsc.ru/trrd/. Citation for the above abstract: Kolchanov, N. A., Ignatieva, E. V., Ananko, E. A., Podkolodnaya, O. A., Stepanenko, I. L., Merkulova, T. I., Pozdnyakov, M. A., Podkolodny, N. L., Naumochkin, A. N., Romashchenko, A. G. Transcription Regulatory Regions Database (TRRD): its status in 2002 Nucl. Acids Res. 2002 30: 312-317 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/312 |
| 145. TrSDB: A Proteome Database of Transcription Factors |
URL: http://ibb.uab.es/trsdb Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases TrSDB-TranScout Database-(http://ibb.uab.es/trsdb) is a proteome database of eukaryotic transcription factors based upon predicted motifs by TranScout and data sources such as InterPro and Gene Ontology Annotation. Nine eukaryotic proteomes are included in the current version. Extensive and diverse information for each database entry, different analyses considering TranScout classification and similarity relationships are offered for research on transcription factors or gene expression. Citation for the above abstract: Hermoso, Antoni, Aguilar, Daniel, Aviles, Francesc X., Querol, Enrique TrSDB: a proteome database of transcription factors Nucl. Acids Res. 2004 32: D171-173 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D171 |
| 146. 16S and 23S Ribosomal RNA Mutation Database |
URL: http://ribosome.fandm.edu/ Categories: RNA Sequence Databases Expanded versions of the Ribosomal RNA Mutation Databases provide lists of mutated positions in 16S and 16S-like ribosomal RNA (16SMDBexp) and 23S and 23S-like ribosomal RNA (23SMDBexp) and the identity of each alteration. Alterations from organisms other than Escherichia coli are reported at positions according to the E.coli numbering system. Information provided for each mutation includes: (i) a brief description of the phenotype(s) associated with each mutation, (ii) whether a mutant phenotype has been detected by in vivo or in vitro methods, and (iii) relevant literature citations. Citation for the above abstract: Triman, KL, Peister, A, Goel, RA Expanded versions of the 16S and 23S ribosomal RNA mutation databases (16SMDBexp and 23SMDBexp) Nucl. Acids Res. 1998 26: 280-284 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/280 |
| 147. SfN Neuroscience Database Gateway |
URL: http://ndg.sfn.org/ Categories: Metadatabases and Directories "Databases are of growing importance in neuroscience, as in many other biomedical research fields. The Neuroscience Database Gateway is a new resource for SfN [Society for Neuroscience] members, aimed at promoting awareness and facilitating access to relevant neuroscience databases." |
| 148. 5S Ribosomal RNA Database |
URL: http://biobases.ibch.poznan.pl/5SData/ Categories: RNA Sequence Databases Ribosomal 5S RNA (5S rRNA) is an integral component of the large ribosomal subunit in all known organisms with the exception only of mitochondrial ribosomes of fungi and animals. It is thought to enhance protein synthesis by stabilization of a ribosome structure. This paper presents the updated database of 5S rRNA and their genes (5S rDNA). Its short characteristics are presented in the Introduction. The database contains 2280 primary structures of 5S rRNA and 5S rRNA genes. These include 536 eubacterial, 61 archaebacterial, 1611 eukaryotic and 72 organelle sequences. The database is available on line through the World Wide Web at http://biobases.ibch.poznan.pl/5SData/. Citation for the above abstract: Szymanski, Maciej, Barciszewska, Miroslawa Z., Erdmann, Volker A., Barciszewski, Jan 5S Ribosomal RNA Database Nucl. Acids Res. 2002 30: 176-178 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/176 |
| 149. Aptamer Database |
URL: http://aptamer.icmb.utexas.edu/ Categories: RNA Sequence Databases The aptamer database is designed to contain comprehensive sequence information on aptamers and unnatural ribozymes that have been generated by in vitro selection methods. Such data are not normally collected in 'natural' sequence databases, such as GenBank. Besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility, the database serves as a valuable resource for theoretical biologists who describe and explore fitness landscapes. The database is updated monthly and is publicly available at http://aptamer. icmb.utexas.edu/. Citation for the above abstract: Lee, Jennifer F., Hesselberth, Jay R., Meyers, Lauren Ancel, Ellington, Andrew D. Aptamer Database Nucl. Acids Res. 2004 32: D95-100 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D95 |
| 150. ARED: Human AU-Rich Element-Containing mRNA Database |
URL: http://rc.kfshrc.edu.sa/ared/ Categories: RNA Sequence Databases A comprehensive search that utilized a large set of mRNA data from human genome databases and additionally, expressed sequence tag (EST) database characterized this latest update of AU-rich elements (AREs) containing mRNA database (ARED). A large number of ARE-mRNA, as much as 4000, were recovered and include many of ARE alternative forms. This number represents as much as 5–8% of the human genes depending on the entire number of genes. The new ARED does not contain only larger and diverse number of ARE-mRNAs but additional functionality and enhanced search capabilities are given in the database website http://rc.kfshrc.edu.sa/ared/. These include class and cluster of AREs, source mRNAs, EST evidence, buildup information, retrieval of lists of genes, and integration with current and new NCBI data, such as Entrez ID and Unigene. Gene Ontology analysis shows there are significant differences in functional diversity of ARED when compared with the overall genome. Many of ARE-genes mediate regulatory processes, reactions to outside stimuli, RNA metabolism, and developmental processes particularly those of early and transient responses. The wide interest in mRNA turnover and importance of AREs in health and disease signify the compilation of ARE-genes. Citation for the above abstract: Bakheet, Tala, Williams, Bryan R. G., Khabar, Khalid S. A. ARED 3.0: the large and diverse AU-rich transcriptome Nucl. Acids Res. 2006 34: D111-114 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D111 |
| 151. The European Ribosomal RNA Database |
URL: http://www.psb.ugent.be/rRNA/ Categories: RNA Sequence Databases The European ribosomal RNA database aims to compile all complete or nearly complete ribosomal RNA sequences from both the small (SSU) and large (LSU) ribosomal subunits. All sequences are available in aligned format. Sequence alignment is based on the secondary structure of the molecules, as determined by comparative sequence analysis. Additional information about the sequences, such as taxonomic classification of the organism from which they have been obtained, and literature references are also provided. In order to identify the closest relatives to newly determined sequences, BLAST searches can be performed, after which the best matching sequences are aligned and a phylogenetic tree is inferred. As of 2003, the European ribosomal RNA database is maintained at Ghent University (Belgium). The database can be consulted at http://www.psb.ugent.be/rRNA/. Citation for the above abstract: Wuyts, Jan, Perriere, Guy, Van de Peer, Yves The European ribosomal RNA database Nucl. Acids Res. 2004 32: D101-103 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D101 |
| 152. GtRDB: The Genomic tRNA Database |
URL: http://rna.wustl.edu/GtRDB/ Categories: RNA Sequence Databases "This genomic tRNA database contains tRNA identifications made by the program tRNAscan-SE (Lowe & Eddy, Nucl Acids Res 25: 955-964, 1997) on complete or nearly complete genomes. Unless otherwise noted, all annotation is automated, and has not been inspected for agreement with published literature. Inevitably with automated sequence analysis, we find exceptions to general identification rules, isoacceptor type predictions (esp. due to variable post-transcriptional anticodon modification), and questionable tRNA identifications (due to pseudogenes, SINES, or other tRNA-derived elements). We attempt to document all cases we come across, and welcome feedback on new or unrecognized discrepancies." |
| 153. gpDB: A Database of G-proteins and Their Interaction with GPCRs |
URL: http://bioinformatics.biol.uoa.gr/gpDB Categories: Neuroscience Databases BACKGROUND: G protein-coupled receptors (GPCRs) transduce signals from extracellular space into the cell, through their interaction with G proteins, which act as switches forming hetero-trimers composed of different subunits (alpha,beta,gamma). The alpha subunit of the G protein is responsible for the recognition of a given GPCR. Whereas specialised resources for GPCRs, and other groups of receptors, are already available, currently, there is no publicly available database focusing on G proteins and containing information about their coupling specificity with their respective receptors. Description gpDB is a publicly accessible G proteins/GPCRs relational database. Including species homologs, the database contains detailed information for 418 G protein monomers (272 Galpha, 87 Gbeta and 59 Ggamma) and 2782 GPCRs sequences belonging to families with known coupling to G proteins. The GPCRs and the G proteins are classified according to a hierarchy of different classes, families and sub-families, based on extensive literature search. The main innovation besides the classification of both G proteins and GPCRs is the relational model of the database, describing the known coupling specificity of the GPCRs to their respective a subunit of G proteins, a unique feature not available in any other database. There is full sequence information with cross-references to publicly available databases, references to the literature concerning the coupling specificity and the dimerization of GPCRs and the user may submit advanced queries for text search. Furthermore, we provide a pattern search tool, an interface for running BLAST against the database and interconnectivity with PRED-TMR, PRED-GPCR and TMRPres2D. CONCLUSIONS: The database will be very useful, for both experimentalists and bioinformaticians, for the study of G protein/GPCR interactions and for future development of predictive algorithms. It is available for academics, via a web browser at the URL: http://bioinformatics.biol.uoa.gr/gpDB. Citation for the above abstract: Antigoni L Elefsinioti, Pantelis G Bagos, Ioannis C Spyropoulos, and Stavros J Hamodrakas A database for G proteins and their interaction with GPCRs. BMC Bioinformatics 2004, 5:208; doi:10.1186/1471-2105-5-208 © 2004 By the Authors. The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/5/208 |
| 154. gRNA: Guide RNA Database |
URL: http://biosun.bio.tu-darmstadt.de/goringer/gRNA/gRNA.html Categories: RNA Sequence Databases The RNA editing process within the mitochondria of kinetoplastid organisms is controlled by small, trans -acting RNA molecules referred to as guide RNAs. The guide RNA database is a compilation of published guide RNA sequences, currently containing 254 entries from 11 different organisms. Additional information includes RNA secondary and tertiary structure models, information on the gene localisation, literature citations and other relevant facts. Citation for the above abstract: Hinz, S, Goringer, HU The guide RNA database (3.0) Nucl. Acids Res. 1999 27: 168- © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/168 |
| 155. HIV Sequence Database |
URL: http://hiv-web.lanl.gov/content/hiv-db/mainpage.html Categories: HIV/AIDS Databases, RNA Sequence Databases, Viral Databases "The sequence database is based on HIV and SIV sequences downloaded from Genbank. We annotate these sequences with information from the literature, and sometimes from the authors. What information we add depends on what we can find, and ranges from sample information (sampling year, - country, - city), patient information (risk group, infection country and - year, sex, known epidemiological links to other patients); biological information about the virus (phenotype, tropism, coreceptor usage), technical information about the sample treatment and sequencing method, and (for a small number of important strains) extensive notes about their origin and derivation. In the future we hope to add information about treatment status of the patients and about HLA types. At least as important as the database itself is the search interface that provides access to it. In addition to straightforward searches on many fields in the database, this tool allows the user to download alignments of certain regions, either all sequences there are for that region or a selection based on user-defined criteria. This can be very important for comparing one's sequence to existing sequences in the database; one of the most time-consuming tasks in sequence analysis used to be locating the appropriate region in sequences from the database." |
| 156. HuSiDa: Human siRNA Database |
URL: http://itb1.biologie.hu-berlin.de/~nebulus/sirna/ Categories: RNA Sequence Databases Small interfering RNAs (siRNAs) have become a standard tool in functional genomics. Once incorporated into the RNA-induced silencing complex (RISC), siRNAs mediate the specific recognition of corresponding target mRNAs and their cleavage. However, only a small fraction of randomly chosen siRNA sequences is able to induce efficient gene silencing. In common laboratory practice, successful RNA interference experiments typically require both, the labour and cost-intensive identification of an active siRNA sequence and the optimization of target cell line-specific procedures for optimal siRNA delivery. To optimize the design and performance of siRNA experiments, we have established the human siRNA database (HuSiDa). The database provides sequences of published functional siRNA molecules targeting human genes and important technical details of the corresponding gene silencing experiments, including the mode of siRNA generation, recipient cell lines, transfection reagents and procedures and direct links to published references (PubMed). The database can be accessed at http://www.human-siRNA-database.net. We used the siRNA sequence information stored in the database for scrutinizing published sequence selection parameters for efficient gene silencing. Citation for the above abstract: Truss, Matthias, Swat, Maciej, Kielbasa, Szymon M., Schafer, Reinhold, Herzel, Hanspeter, Hagemeier, Christian HuSiDa--the human siRNA database: an open-access database for published functional siRNA sequences and technical details of efficient transfer into recipient cells Nucl. Acids Res. 2005 33: D108-111 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D108 |
| 157. HyPaLib |
URL: http://bibiserv.techfak.uni-bielefeld.de/HyPa/ Categories: RNA Sequence Databases The database, called HyPaLib (for Hybrid Pattern Library), contains annotated structural elements characteristic for certain classes of structural and/or functional RNAs. These elements are described in a language specifically designed for this purpose. The language allows convenient specification of hybrid patterns, i.e. motifs consisting of sequence features and structural elements together with sequence similarity and thermodynamic constraints. We are currently developing software tools that allow a user to search sequence databases for any pattern in HyPaLib, thus providing functionality which is similar to PROSITE, but dedicated to the more complex patterns in RNA sequences. HyPaLib is available at http://bibiserv.techfak.uni-bielefeld.de/HyPa/. Citation for the above abstract: Graf, Stefan, Strothmann, Dirk, Kurtz, Stefan, Steger, Gerhard HyPaLib: a database of RNAs and RNA structural elements defined by hybrid patterns Nucl. Acids Res. 2001 29: 196-198 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/196 |
| 158. IRESite: The database of experimentally verified IRES structures |
URL: http://www.iresite.org/ Categories: RNA Sequence Databases IRESite is an exhaustive, manually annotated non-redundant relational database focused on the IRES elements (Internal Ribosome Entry Site) and containing information not available in the primary public databases. IRES elements were originally found in eukaryotic viruses hijacking initiation of translation of their host. Later on, they were also discovered in 5'-untranslated regions of some eukaryotic mRNA molecules. Currently, IRESite presents up to 92 biologically relevant aspects of every experiment, e.g. the nature of an IRES element, its functionality/defectivity, origin, size, sequence, structure, its relative position with respect to surrounding protein coding regions, positive/negative controls used in the experiment, the reporter genes used to monitor IRES activity, the measured reporter protein yields/activities, and references to original publications as well as cross-references to other databases, and also comments from submitters and our curators. Furthermore, the site presents the known similarities to rRNA sequences as well as RNA–protein interactions. Special care is given to the annotation of promoter-like regions. The annotated data in IRESite are bound to mostly complete, full-length mRNA, and whenever possible, accompanied by original plasmid vector sequences. New data can be submitted through the publicly available web-based interface at http://www.iresite.org and are curated by a team of lab-experienced biologists. Citation for the above abstract: Mokrejs, Martin, Vopalensky, Vaclav, Kolenaty, Ondrej, Masek, Tomas, Feketova, Zuzana, Sekyrova, Petra, Skaloudova, Barbora, Kriz, Vitezslav, Pospisek, Martin IRESite: the database of experimentally verified IRES structures (www.iresite.org) Nucl. Acids Res. 2006 34: D125-130 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D125 |
| 159. miRNA: the microRNA Registry |
URL: http://www.sanger.ac.uk/Software/Rfam/mirna/ Categories: RNA Sequence Databases The miRNA Registry provides a service for the assignment of miRNA gene names prior to publication. A comprehensive and searchable database of published miRNA sequences is accessible via a web interface (http://www.sanger.ac.uk/Software/Rfam/mirna/), and all sequence and annotation data are freely available for download. Release 2.0 of the database contains 506 miRNA entries from six organisms. Citation for the above abstract: Griffiths-Jones, Sam The microRNA Registry Nucl. Acids Res. 2004 32: D109-111 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D109 |
| 160. Mobile group II introns database |
URL: http://www.fp.ucalgary.ca/group2introns/ Categories: RNA Sequence Databases Group II introns are self-splicing RNAs and retroelements found in bacteria and lower eukaryotic organelles. During the past several years, they have been uncovered in surprising numbers in bacteria due to the genome sequencing projects; however, most of the newly sequenced introns are not correctly identified. We have initiated an ongoing web site database for mobile group II introns in order to provide correct information on the introns, particularly in bacteria. Information in the web site includes: (1) introductory information on group II introns; (2) detailed information on subfamilies of intron RNA structures and intron-encoded proteins; (3) a listing of identified introns with correct boundaries, RNA secondary structures and other detailed information; and (4) phylogenetic and evolutionary information. The comparative data should facilitate study of the function, spread and evolution of group II introns. The database can be accessed at http://www.fp.ucalgary.ca/group2introns/. Citation for the above abstract: Dai, Lixin, Toor, Navtej, Olson, Robert, Keeping, Andrew, Zimmerly, Steven Database for mobile group II introns Nucl. Acids Res. 2003 31: 424-426 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/424 |
| 161. Non-canonical Base Pair Database |
URL: http://prion.bchs.uh.edu/bp_type/ Categories: Nucleic Acid Structure Databases, RNA Sequence Databases The secondary and tertiary structure of an RNA molecule typically includes a number of non-canonical base-base interactions. The known occurrences of these interactions are tabulated in the NCIR database, which can be accessed from http://prion.bchs.uh.edu/bp_type/. The number of examples is now over 1400, which is an increase of >700% since the database was first published. This dramatic increase reflects the addition of data from the recently published crystal structures of the 50S (2.4 A) and 30S (3.0 A) ribosomal subunits. In addition, non-canonical interactions observed in published crystal and NMR structures of tRNAs, group I introns, ribozymes, RNA aptamers and synthetic oligonucleotides are included. Properties associated with these interactions, such as sequence context, sugar pucker conformation, glycosidic angle conformation, melting temperature, chemical shift and free energy, are also reported when available. Out of the 29 anticipated pairs with at least two hydrogen bonds, 28 have been observed to date. In addition, several novel examples, not generally predicted, have also been encountered, bringing the total of such pairs to 36. Added to this list are a variety of single, bifurcated, triple and quadruple interactions. The most common non-canonical pairs are the sheared GA, GA imino, AU reverse Hoogsteen, and the GU and AC wobble pairs. The most frequent triple interaction connects N3 of an A with the amino of a G that is also involved in a standard Watson-Crick pair. Citation for the above abstract: Nagaswamy, Uma, Larios-Sanz, Maia, Hury, James, Collins, Shakaala, Zhang, Zhengdong, Zhao, Qin, Fox, George E. NCIR: a database of non-canonical interactions in known RNA structures Nucl. Acids Res. 2002 30: 395-397 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/395 |
| 162. Noncoding RNAs Database |
URL: http://biobases.ibch.poznan.pl/ncRNA/ Categories: RNA Sequence Databases The noncoding RNAs database is a collection of currently available sequence data on RNAs, which have no protein-coding capacity and have been implicated in regulation of cellular processes. The RNAs included in the database form very heterogenous group of molecules that act on different levels of information transmission in the cell. It includes RNAs acting on the level of chromatin structure, transcriptional and translational regulation of gene expression, modulation of protein function and regulation of subcellular distribution of RNAs and proteins. Those RNAs, with potential regulatory functions have been identified in prokaryotic, animal and plant cells. The database can be accessed at http://biobases.ibch.poznan.pl/ncRNA/. Citation for the above abstract: Szymanski, Maciej, Erdmann, Volker A., Barciszewski, Jan Noncoding regulatory RNAs database Nucl. Acids Res. 2003 31: 429-431 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/429 |
| 163. Ensembl |
URL: http://www.ensembl.org/ Categories: Human Genome Databases, Maps, and Viewers, Model Organisms and Comparative Genomics Databases The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased from 4 to 19, with the addition of the mammalian genomes of Rhesus macaque and Opossum, the chordate genome of Ciona intestinalis and the import and integration of the yeast genome. The year has also seen extensive improvements to both data analysis and presentation, with the introduction of a redesigned website, the addition of RNA gene and regulatory annotation and substantial improvements to the integration of human genome variation data. Citation for the above abstract: Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X. M., Flicek, P., Graf, S., Hammond, M., Herrero, J., Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Kokocinski, F., Kulesha, E., London, D., Longden, I., Melsopp, C., Meidl, P., Overduin, B., Parker, A., Proctor, G., Prlic, A., Rae, M., Rios, D., Redmond, S., Schuster, M., Sealy, I., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Stabenau, A., Stalker, J., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., Hubbard, T. J. P. Ensembl 2006 Nucl. Acids Res. 2006 34: D556-561 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D556 |
| 164. NONCODE |
URL: http://www.bioinfo.org.cn/NONCODE/index.htm Categories: RNA Sequence Databases NONCODE is an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NONCODE are as follows: (i) the ncRNAs in NONCODE include almost all the types of ncRNAs, except transfer RNAs and ribosomal RNAs. (ii) All ncRNA sequences and their related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature: more than 80% of the entries are based on experimental data. (iii) Based on the cellular process and function, which a given ncRNA is involved in, we introduced a novel classification system, labeled process function class, to integrate existing classification systems. (iv) In addition, some 1100 ncRNAs have been grouped into nine other classes according to whether they are specific to gender or tissue or associated with tumors and diseases, etc. (v) NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequence, regulatory elements in the flanking sequences, secondary structure, related publications and other information. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Access is free for all users through a web interface at http://noncode.bioinfo.org.cn. Citation for the above abstract: Liu, Changning, Bai, Baoyan, Skogerbo, Geir, Cai, Lun, Deng, Wei, Zhang, Yong, Bu, Dongbo, Zhao, Yi, Chen, Runsheng NONCODE: an integrated knowledge database of non-coding RNAs Nucl. Acids Res. 2005 33: D112-115 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D112 |
| 165. Plant snoRNA Database |
URL: http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/home Categories: General Plant Databases, RNA Sequence Databases The Plant snoRNA database (http://www.scri.sari.ac.uk/plant_snoRNA/) provides information on small nucleolar RNAs from Arabidopsis and eighteen other plant species. Information includes sequences, expression data, methylation and pseudouridylation target modification sites, initial gene organization (polycistronic, single gene and intronic) and the number of gene variants. The Arabidopsis information is divided into box C/D and box H/ACA snoRNAs, and within each of these groups, by target sites in rRNA, snRNA or unknown. Alignments of orthologous genes and gene variants from different plant species are available for many snoRNA genes. Plant snoRNA genes have been given a standard nomenclature, designed wherever possible, to provide a consistent identity with yeast and human orthologues. Citation for the above abstract: Brown, John W. S., Echeverria, Manuel, Qu, Liang-Hu, Lowe, Todd M., Bachellerie, Jean-Pierre, Huttenhofer, Alexander, Kastenmayer, James P., Green, Pamela J., Shaw, Paul, Marshall, Dave F. Plant snoRNA database Nucl. Acids Res. 2003 31: 432-435 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/432 |
| 166. PLANTncRNAs: Noncoding RNAs in Plants |
URL: http://www.prl.msu.edu/PLANTncRNAs/ Categories: General Plant Databases, RNA Sequence Databases "We have collected existing data on plant noncoding RNAs and expanded on this by examining about 20,000 Arabidopsis ESTs for characteristics of noncoding RNAs. About 15 putative Arabidopsis ncRNAs have been reported in the literature or have been annotated. Several have homologs in other plants, but all appear to be plant-specific with the exception of SRP RNA. Conversely, none of about 30 ncRNAs reported from yeast, bacteria or animal systems have homologs in Arabidopsis. To identify additional genes that appear to encode ncRNAs, we used computational tools to filter out the protein coding genes from those corresponding to 20,000 EST clones. What remained were 39 clones that either had the characteristics of ncRNAs (19), peptide coding RNAs (pepRNAs)(9) or could not be differentiated between the two categories(11). Again none of these clones had homologs outside the plant kingdom indicating that most ncRNAs of Arabidopsis are likely plant-specific." |
| 167. PLMItRNA: a Database for tRNA Molecules and Genes in Mitochondria of Photosynthetic Eukaryotes |
URL: http://bighost.area.ba.cnr.it/PLMItRNA/ Categories: Mitochondrial Genes and Proteins Databases, RNA Sequence Databases The updated version of PLMItRNA reports information and multialignments on 609 genes and 34 tRNA molecules active in the mitochondria of Viridiplantae (27 Embryophyta and 10 Chlorophyta), and photosynthetic algae (one Cryptophyta, four Rhodophyta and two Stramenopiles). Colour-code based tables reporting the different genetic origin of identified genes allow hyper-textual link to single entries. Promoter sequences identified for tRNA genes in the mitochondrial genomes of Angiospermae are also reported. The PLMItRNA database is accessible at http://bighost.area.ba.cnr.it/PLMItRNA/. Citation for the above abstract: Rainaldi, Guglielmo, Volpicella, Mariateresa, Licciulli, Flavio, Liuni, Sabino, Gallerani, Raffaele, Ceci, Luigi R. PLMItRNA, a database on the heterogeneous genetic origin of mitochondrial tRNA genes and tRNAs in photosynthetic eukaryotes Nucl. Acids Res. 2003 31: 436-438 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/436 |
| 168. PolyA_DB: Polyadenylation Database |
URL: http://polya.umdnj.edu/polyadb/ Categories: RNA Sequence Databases Messenger RNA polyadenylation is one of the key post-transcriptional events in eukaryotic cells. A large number of genes in mammalian species can undergo alternative polyadenylation, which leads to mRNAs with variable 3' ends. As the 3' end of mRNAs often contains cis elements important for mRNA stability, mRNA localization and translation, the implications of the regulation of polyadenylation can be multifold. Alternative polyadenylation is controlled by cis elements and trans factors, and is believed to occur in a tissue- or disease-specific manner. Given the availability of many databases devoted to other aspects of mRNA metabolism, such as transcriptional initiation and splicing, systematic information on polyadenylation, including alternative polyadenylation and its regulation, is noticeably lacking. Here, we present a database named polyA_DB, through which we strive to provide several types of information regarding polyadenylation in mammalian species: (i) polyadenylation sites and their locations with respect to the genomic structure of genes; (ii) cis elements surrounding polyadenylation sites; (iii) comparison of polyadenylation configuration between orthologous genes; and (iv) tissue/organ information for alternative polyadenylation sites. Currently, polyA_DB contains 45,565 polyadenylation sites for 25,097 human and mouse genes, representing the most comprehensive polyadenylation database till date. The database is accessible via the website (http://polya.umdnj.edu/polyadb). Citation for the above abstract: Zhang, Haibo, Hu, Jun, Recce, Michael, Tian, Bin PolyA_DB: a database for mammalian mRNA polyadenylation Nucl. Acids Res. 2005 33: D116-120 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D116 |
| 169. PseudoBase |
URL: http://wwwbio.leidenuniv.nl/~Batenburg/PKB.html Categories: RNA Sequence Databases PseudoBase is a database containing structural, functional and sequence data related to RNA pseudo-knots. It can be reached at http://wwwbio.LeidenUniv.nl/ approximately Batenburg/PKB.html. For each pseudoknot, thirteen items are stored, for example the relevant sequence, the stem positions of the pseudoknot, the EMBL accession number of the sequence and the support that can be given regarding the reliability of the pseudo-knot. Since the last publication, information on sizes of the stems and the loops in the pseudoknots has been added. Also added are alternative entries that produce surveys of where the pseudoknots are, sorted according to stem size or loop size. Citation for the above abstract: van Batenburg, F. H. D., Gultyaev, A. P., Pleij, C. W. A. PseudoBase: structural information on RNA pseudoknots Nucl. Acids Res. 2001 29: 194-195 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/194 |
| 170. Rfam: RNA Families Database of Alignments and CMs |
URL: http://www.sanger.ac.uk/Software/Rfam/ Categories: Nucleic Acid Structure Databases, RNA Sequence Databases Rfam is a comprehensive collection of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars. Rfam aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences. The data provide the first glimpses of conservation of multiple ncRNA families across a wide taxonomic range. A small number of large families are essential in all three kingdoms of life, with large numbers of smaller families specific to certain taxa. Recent improvements in the database are discussed, together with challenges for the future. Rfam is available on the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/. Citation for the above abstract: Griffiths-Jones, Sam, Moxon, Simon, Marshall, Mhairi, Khanna, Ajay, Eddy, Sean R., Bateman, Alex Rfam: annotating non-coding RNAs in complete genomes Nucl. Acids Res. 2005 33: D121-124 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D121 |
| 171. Ribosomal Database Project (RDP-II) |
URL: http://rdp.cme.msu.edu/ Categories: RNA Sequence Databases, Taxonomy and Identification Databases The Ribosomal Database Project-II (RDP-II) pro-vides data, tools and services related to ribosomal RNA sequences to the research community. Through its website (http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data, analysis services, and phylogenetic inferences (trees) derived from these data. RDP-II release 8.1 contains 16 277 prokaryotic, 5201 eukaryotic, and 1503 mitochondrial small subunit rRNA sequences in aligned and annotated format. The current public beta release of 9.0 debuts a new regularly updated alignment of over 50 000 annotated (eu)bacterial sequences. New analysis services include a sequence search and selection tool (Hierarchy Browser) and a phylogenetic tree building and visualization tool (Phylip Interface). A new interactive tutorial guides users through the basics of rRNA sequence analysis. Other services include probe checking, phylogenetic placement of user sequences, screening of users' sequences for chimeric rRNA sequences, automated alignment, production of similarity matrices, and services to plan and analyze terminal restriction fragment polymorphism (T-RFLP) experiments. The RDP-II email address for questions or comments is rdpstaff@msu.edu. Citation for the above abstract: Cole, J. R., Chai, B., Marsh, T. L., Farris, R. J., Wang, Q., Kulam, S. A., Chandra, S., McGarrell, D. M., Schmidt, T. M., Garrity, G. M., Tiedje, J. M. The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy Nucl. Acids Res. 2003 31: 442-443 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/442 |
| 172. RISSC: Ribosomal Internal Spacer Sequence Collection |
URL: http://miracle.umh.es/rissc/ Categories: RNA Sequence Databases, Taxonomy and Identification Databases A novel database, under the acronym RISSC (Ribosomal Intergenic Spacer Sequence Collection), has been created. It compiles more than 1600 entries of edited DNA sequence data from the 16S-23S ribosomal spacers present in most prokaryotes and organelles (e.g. mitochondria and chloroplasts) and is accessible through the Internet (http://ulises.umh.es/RISSC), where systematic searches for specific words can be conducted, as well as BLAST-type sequence searches. Additionally, a characteristic feature of this region, the presence/absence and nature of tRNA genes within the spacer, is included in all the entries, even when not previously indicated in the original database. All these combined features could provide a useful documentation tool for studies on evolution, identification, typing and strain characterization, among others. Citation for the above abstract: Garcia-Martinez, Jesus, Bescos, Ignacio, Rodriguez-Sala, Jesus Javier, Rodriguez-Valera, Francisco RISSC: a novel database for ribosomal 16S-23S RNA genes spacer regions Nucl. Acids Res. 2001 29: 178-180 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/178 |
| 173. RNA Modification Database |
URL: http://medlib.med.utah.edu/RNAmods/ Categories: RNA Sequence Databases The RNA Modification Database (http://medlib.med.utah.edu/RNAmods/ ) provides a comprehensive listing of naturally modified nucleosides in RNA. Each file includes: chemical structure; common name and symbol; type(s) of RNA in which found and corresponding phylogenetic distribution; Chemical Abstracts registry number and index name; and initial literature citations for structure characterization and chemical synthesis. New features include capability to search database files by name or substructural features, modifications in tmRNA, and links to related data and sites. Citation for the above abstract: Rozenski, J, Crain, PF, McCloskey, JA The RNA Modification Database: 1999 update Nucl. Acids Res. 1999 27: 196-197 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/196 |
| 174. RNAdb |
URL: http://research.imb.uq.edu.au/rnadb/ Categories: RNA Sequence Databases In recent years, there have been increasing numbers of transcripts identified that do not encode proteins, many of which are developmentally regulated and appear to have regulatory functions. Here, we describe the construction of a comprehensive mammalian noncoding RNA database (RNAdb) which contains over 800 unique experimentally studied non-coding RNAs (ncRNAs), including many associated with diseases and/or developmental processes. The database is available at http://research.imb.uq.edu.au/RNAdb and is searchable by many criteria. It includes microRNAs and snoRNAs, but not infrastructural RNAs, such as rRNAs and tRNAs, which are catalogued elsewhere. The database also includes over 1100 putative antisense ncRNAs and almost 20,000 putative ncRNAs identified in high-quality murine and human cDNA libraries, with more to be added in the near future. Many of these RNAs are large, and many are spliced, some alternatively. The database will be useful as a foundation for the emerging field of RNomics and the characterization of the roles of ncRNAs in mammalian gene expression and regulation. Citation for the above abstract: Pang, Ken C., Stephen, Stuart, Engstrom, Par G., Tajul-Arifin, Khairina, Chen, Weisan, Wahlestedt, Claes, Lenhard, Boris, Hayashizaki, Yoshihide, Mattick, John S. RNAdb--a comprehensive mammalian noncoding RNA database Nucl. Acids Res. 2005 33: D125-130 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D125 |
| 175. siRNAdb |
URL: http://sirna.cgb.ki.se/ Categories: RNA Sequence Databases Short interfering RNAs (siRNAs) are a popular method for gene-knockdown, acting by degrading the target mRNA. Before performing experiments it is invaluable to locate and evaluate previous knockdown experiments for the gene of interest. The siRNA database provides a gene-centric view of siRNA experimental data, including siRNAs of known efficacy and siRNAs predicted to be of high efficacy by a combination of methods. Linked to these sequences is information such as siRNA thermodynamic properties and the potential for sequence-specific off-target effects. The database enables the user to evaluate an siRNA's potential for inhibition and non-specific effects. The database is available at http://siRNA.cgb.ki.se. Citation for the above abstract: Chalk, Alistair M., Warfinge, Richard E., Georgii-Hemming, Patrick, Sonnhammer, Erik L. L. siRNAdb: a database of siRNA sequences Nucl. Acids Res. 2005 33: D131-134 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D131 |
| 176. Small RNA Database |
URL: http://condor.bcm.tmc.edu/smallRNA/smallrna.html Categories: RNA Sequence Databases The small RNA database is a compilation of all the small size RNA sequences available to date, including nuclear, nucleolar, cytoplasmic and mitochondria small RNAs from eukaryotic organisms and small RNAs from prokaryotic cells as well as viruses. Currently, approximately 600 small RNA sequences are in our database. It also gives the sources of individual RNAs and their GenBank accession numbers. The small RNA database can be accessed through the WWW (World Wide Web). Our WWW URL address is: http://mbcr.bcm.tmc. edu/smallRNA/smallrna.html . The new small RNA sequences published since our last compilation are listed in this paper (Table 1). Citation for the above abstract: Gu, J, Chen, Y, Reddy, R Small RNA database Nucl. Acids Res. 1998 26: 160-162 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/160 |
| 177. SRPDB: Signal Recognition Particle Database |
URL: http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html Categories: Individual Protein Family Databases, RNA Sequence Databases Maintained at the University of Texas Health Science Center at Tyler, Texas, the tmRNA database (tmRDB) is accessible at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html with mirror sites located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Royal Veterinary and Agricultural University, Denmark (http://tmrdb.kvl.dk/). The signal recognition particle database (SRPDB) at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html is mirrored at http://srpdb.kvl.dk/ and the University of Goteborg (http://bio.lundberg.gu.se/dbs/SRPDB/SRPDB.html). The databases assist in investigations of the tmRNP (a ribonucleoprotein complex which liberates stalled bacterial ribosomes) and the SRP (a particle which recognizes signal sequences and directs secretory proteins to cell membranes). The curated tmRNA and SRP RNA alignments consider base pairs supported by comparative sequence analysis. Also shown are alignments of the tmRNA-associated proteins SmpB, ribosomal protein S1, alanyl-tRNA synthetase and Elongation Factor Tu, as well as the SRP proteins SRP9, SRP14, SRP19, SRP21, SRP54 (Ffh), SRP68, SRP72, cpSRP43, Flhf, SRP receptor (alpha) and SRP receptor (beta). All alignments can be easily examined using a new exploratory browser. The databases provide links to high-resolution structures and serve as depositories for structures obtained by molecular modeling. Citation for the above abstract: Andersen, Ebbe Sloth, Rosenblad, Magnus Alm, Larsen, Niels, Westergaard, Jesper Cairo, Burks, Jody, Wower, Iwona K., Wower, Jacek, Gorodkin, Jan, Samuelsson, Tore, Zwieb, Christian The tmRDB and SRPDB resources Nucl. Acids Res. 2006 34: D163-168 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D163 |
| 178. Subviral RNA Database |
URL: http://subviral.med.uottawa.ca/ Categories: RNA Sequence Databases, Viral Databases We describe here the establishment of an online database containing a large number of sequences and related data on viroids, viroid-like RNAs and human hepatitis delta virus (vHDV) in a customizable and user-friendly format. Citation for the above abstract: Pelchat, Martin, Rocheleau, Lynda, Perreault, Jonathan, Perreault, Jean-Pierre SubViral RNA: a database of the smallest known auto-replicable RNA species Nucl. Acids Res. 2003 31: 444-445 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/444 |
| 179. The Small Subunit rRNA Modification Database |
URL: http://medstat.med.utah.edu/SSUmods/ Categories: RNA Sequence Databases The Small Subunit rRNA Modification Database provides a listing of reported post-transcriptionally modified nucleosides and sequence sites in small subunit rRNAs from bacteria, archaea and eukarya. Data are compiled from reports of full or partial rRNA sequences, including RNase T1 oligonucleotide catalogs reported in earlier literature in studies of phylogenetic relatedness. Options for data presentation include full sequence maps, some of which have been assembled by database curators with the aid of contemporary gene sequence data, and tabular forms organized by source organism or chemical identity of the modification. A total of 32 rRNA sequence alignments are provided, annotated with sites of modification and chemical identities of modifications if known, with provision for scrolling full sequences or user-dictated subsequences for comparative viewing for organisms of interest. The database can be accessed through the World Wide Web at http://medlib.med.utah.edu/SSUmods. Citation for the above abstract: McCloskey, James A., Rozenski, Jef The Small Subunit rRNA Modification Database Nucl. Acids Res. 2005 33: D135-138 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D135 |
| 180. The tmRNA Website |
URL: http://www.indiana.edu/~tmrna/ Categories: RNA Sequence Databases tmRNA combines tRNA- and mRNA-like properties and ameliorates problems arising from stalled ribosomes. Research on the mechanism, structure and biology of tmRNA is served by the tmRNA website (http://www.indiana.edu/~ tmrna), a collection of sequences, alignments, secondary structures and other information. Because many of these sequences are not in GenBank, a BLAST server has been added; another new feature is an abbreviated alignment for the tRNA-like domain only. Many tmRNA sequences from plastids have been added, five found in public sequence data and another 10 generated by direct sequencing; detection in early-branching members of the green plastid lineage brings coverage to all three primary plastid lineages. The new sequences include the shortest known tmRNA sequence. While bacterial tmRNAs usually have a lone pseudoknot upstream of the mRNA segment and a string of three or four pseudoknots downstream, plastid tmRNAs collectively show loss of pseudoknots at both postions. The pseudoknot-string region is also too short to contain the usual pseudoknot number in another new entry, the tmRNA sequence from a bacterial endosymbiont of insect cells, Tremblaya princeps. Pseudoknots may optimize tmRNA function in free-living bacteria, yet become dispensible when the endosymbiotic lifestyle relaxes selective pressure for fast growth. Citation for the above abstract: Gueneau de Novoa, Pulcherie, Williams, Kelly P. The tmRNA website: reductive evolution of tmRNA in plastids and other endosymbionts Nucl. Acids Res. 2004 32: D104-108 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D104 |
| 181. tmRDB: tmRNA Database |
URL: http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html Categories: RNA Sequence Databases Maintained at the University of Texas Health Science Center at Tyler, Texas, the tmRNA database (tmRDB) is accessible at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html with mirror sites located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Royal Veterinary and Agricultural University, Denmark (http://tmrdb.kvl.dk/). The signal recognition particle database (SRPDB) at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html is mirrored at http://srpdb.kvl.dk/ and the University of Goteborg (http://bio.lundberg.gu.se/dbs/SRPDB/SRPDB.html). The databases assist in investigations of the tmRNP (a ribonucleoprotein complex which liberates stalled bacterial ribosomes) and the SRP (a particle which recognizes signal sequences and directs secretory proteins to cell membranes). The curated tmRNA and SRP RNA alignments consider base pairs supported by comparative sequence analysis. Also shown are alignments of the tmRNA-associated proteins SmpB, ribosomal protein S1, alanyl-tRNA synthetase and Elongation Factor Tu, as well as the SRP proteins SRP9, SRP14, SRP19, SRP21, SRP54 (Ffh), SRP68, SRP72, cpSRP43, Flhf, SRP receptor (alpha) and SRP receptor (beta). All alignments can be easily examined using a new exploratory browser. The databases provide links to high-resolution structures and serve as depositories for structures obtained by molecular modeling. Citation for the above abstract: Andersen, Ebbe Sloth, Rosenblad, Magnus Alm, Larsen, Niels, Westergaard, Jesper Cairo, Burks, Jody, Wower, Iwona K., Wower, Jacek, Gorodkin, Jan, Samuelsson, Tore, Zwieb, Christian The tmRDB and SRPDB resources Nucl. Acids Res. 2006 34: D163-168 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D163 |
| 182. Compilation of tRNA Sequences and Sequences of tRNA Genes |
URL: http://www.staff.uni-bayreuth.de/~btc914/search/index.html Categories: RNA Sequence Databases Maintained at the Universitat Bayreuth, Bayreuth, Germany, the Compilation of tRNA Sequences and Sequences of tRNA Genes is accessible at the URL http://www.tRNA.uni-bayreuth.de with mirror site located at the Institute of Protein Research, Pushchino, Russia (http://alpha.protres.ru/trnadbase). The compilation is a searchable, periodically updated database of currently available tRNA sequences. The present version of the database contains a new Genomic tRNA Compilation including the sequences of tRNA genes from genomic sequences published up to July 2003. It consists of about 5800 tRNA gene sequences from 111 organisms covering archaea, bacteria, higher and lower eukarya. The former Compilation of tRNA Genes (up to the end of 1998) and the updated Compilation tRNA Sequences (561 entries) are also supported by the new software. The database can be explored by using multiple search criteria and sequence templates. The database provides a service that allows to obtain statistical information on the occurrences of certain bases at given positions of the tRNA sequences. This allows phylogenic studies and search for identity elements in respect to interactions of tRNAs with various enzymes. Citation for the above abstract: Sprinzl, Mathias, Vassilenko, Konstantin S. Compilation of tRNA sequences and sequences of tRNA genes Nucl. Acids Res. 2005 33: D139-140 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D139 |
| 183. Yeast snoRNA Database |
URL: http://www.bio.umass.edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_DataBase.html Categories: Fungal Genome Databases, RNA Sequence Databases Small nucleolar RNAs (snoRNAs) are involved in cleavage of rRNA, modification of rRNA nucleotides and, perhaps, other aspects of ribosome biogenesis in eukaryotic cells. Scores of snoRNAs have been discovered in recent years from various eukaryotes, and the total number is predicted to be up to 200 different snoRNA species per individual organism. We have created a comprehensive database for snoRNAs from the yeast Saccharomyces cerevisiae which allows easy access to detailed information about each species known (almost 70 snoRNAs are featured). The database consists of three major parts: (i) a utilities section; (ii) a master table; and (iii) a collection of tables for the individual snoRNAs. The utilities section provides an introduction to the database. The master table lists all known S. cerevisiae snoRNAs and their major properties. Information in the individual tables includes: alternate names, size, family classification, genomic organization, sequences (with major features identified), GenBank accession numbers, occurrence of homologues, gene disruption phenotypes, functional properties and associated RNAs and proteins. All information is accompanied with appropriate literature references. The database is available on the World Wide Web (http://www.bio.umass. edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_ DataBase.html), and should be useful for a wide range of snoRNA studies. Citation for the above abstract: Samarsky, DA, Fournier, MJ A comprehensive database for the small nucleolar RNAs from Saccharomyces cerevisiae Nucl. Acids Res. 1999 27: 161-164 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/161 |
| 184. EXProt: database for EXPerimentally verified Protein functions |
URL: http://www.cmbi.kun.nl/EXProt/ Categories: General Protein Sequence Databases EXProt is a non-redundant protein database containing a selection of entries from genome annotation projects and public databases, aimed at including only proteins with an experimentally verified function. In EXProt release 2.0 we have collected entries from the Pseudomonas aeruginosa community annotation project (PseudoCAP), the Escherichia coli genome and proteome database (GenProtEC) and the translated coding sequences from the Prokaryotes division of EMBL nucleotide sequence database, which are described as having an experimentally verified function. Each entry in EXProt has a unique ID number and contains information about the species, amino acid sequence, functional annotation and, in most cases, links to references in MEDLINE/PubMed and to the entry in the original database. EXProt is indexed in SRS at CMBI (http://www.cmbi.kun.nl/srs/) and can be searched with BLAST and FASTA through the EXProt web page (http://www.cmbi.kun.nl/EXProt/). Citation for the above abstract: Ursing, Bjorn M., van Enckevort, Frank H. J., Leunissen, Jack A. M., Siezen, Roland J. EXProt: a database for proteins with an experimentally verified function Nucl. Acids Res. 2002 30: 50-51 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/50 |
| 185. NCBI Protein database |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein Categories: General Protein Sequence Databases "The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences submitted to Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) (sequences from solved structures)." |
| 186. PA-GOSUB |
URL: http://www.cs.ualberta.ca/~bioinfo/PA/GOSUB/ Categories: General Protein Sequence Databases PA-GOSUB (Proteome Analyst: Gene Ontology Molecular Function and Subcellular Localization) is a publicly available, web-based, searchable and downloadable database that contains the sequences, predicted GO molecular functions and predicted subcellular localizations of more than 107,000 proteins from 10 model organisms (and growing), covering the major kingdoms and phyla for which annotated proteomes exist (http://www.cs.ualberta.ca/~bioinfo/PA/GOSUB). The PA-GOSUB database effectively expands the coverage of subcellular localization and GO function annotations by a significant factor (already over five for subcellular localization, compared with Swiss-Prot v42.7), and more model organisms are being added to PA-GOSUB as their sequenced proteomes become available. PA-GOSUB can be used in three main ways. First, a researcher can browse the pre-computed PA-GOSUB annotations on a per-organism and per-protein basis using annotation-based and text-based filters. Second, a user can perform BLAST searches against the PA-GOSUB database and use the annotations from the homologs as simple predictors for the new sequences. Third, the whole of PA-GOSUB can be downloaded in either FASTA or comma-separated values (CSV) formats. Citation for the above abstract: Lu, Paul, Szafron, Duane, Greiner, Russell, Wishart, David S., Fyshe, Alona, Pearcy, Brandon, Poulin, Brett, Eisner, Roman, Ngo, Danny, Lamb, Nicholas PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization Nucl. Acids Res. 2005 33: D147-153 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D147 |
| 187. Polygenic Signaling Pathways |
URL: http://www.polygenicpathways.co.uk Categories: Gene-, System-, or Disease- Specific Databases "This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease." |
| 188. Polygenic Signaling Pathways |
URL: http://www.polygenicpathways.co.uk Categories: Gene-, System-, or Disease- Specific Databases "This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease." |
| 189. Polygenic Signaling Pathways |
URL: http://www.polygenicpathways.co.uk Categories: Gene-, System-, or Disease- Specific Databases "This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease." |
| 190. PIR-PSD: Protein Sequence Database |
URL: http://pir.georgetown.edu/ Categories: General Protein Sequence Databases The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files. Citation for the above abstract: Wu, Cathy H., Yeh, Lai-Su L., Huang, Hongzhan, Arminski, Leslie, Castro-Alvear, Jorge, Chen, Yongxing, Hu, Zhangzhi, Kourtesis, Panagiotis, Ledley, Robert S., Suzek, Baris E., Vinayaka, C.R., Zhang, Jian, Barker, Winona C. The Protein Information Resource Nucl. Acids Res. 2003 31: 345-347 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/345 |
| 191. PRF: Peptide Research Foundation Databases |
URL: http://www4.prf.or.jp/en/ Categories: General Protein Sequence Databases "You can search Literature Database (PRF/LITDB) and Protein/Peptide Sequence Database (PRF/SEQDB) of PRF." |
| 192. Swiss-Prot |
URL: http://www.expasy.org/sprot/ Categories: General Protein Sequence Databases The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org. Citation for the above abstract: Boeckmann, Brigitte, Bairoch, Amos, Apweiler, Rolf, Blatter, Marie-Claude, Estreicher, Anne, Gasteiger, Elisabeth, Martin, Maria J., Michoud, Karine, O'Donovan, Claire, Phan, Isabelle, Pilbout, Sandrine, Schneider, Michel The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 Nucl. Acids Res. 2003 31: 365-370 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/365 |
| 193. TrEMBL |
URL: http://www.expasy.org/sprot/ Categories: General Protein Sequence Databases The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org. Citation for the above abstract: Boeckmann, Brigitte, Bairoch, Amos, Apweiler, Rolf, Blatter, Marie-Claude, Estreicher, Anne, Gasteiger, Elisabeth, Martin, Maria J., Michoud, Karine, O'Donovan, Claire, Phan, Isabelle, Pilbout, Sandrine, Schneider, Michel The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 Nucl. Acids Res. 2003 31: 365-370 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/365 |
| 194. Real Time PCR Primer Sets Database |
URL: http://www.realtimeprimers.org/ Categories: Molecular Probe and Primer Databases |
| 195. EPIMHC: A Curated Database of MHC Ligands |
URL: http://immunax.dfci.harvard.edu/bioinformatics/epimhc/ Categories: Immunological Databases SUMMARY: EPIMHC is a relational database of MHC-binding peptides and T cell epitopes that are observed in real proteins. Currently, the database contains 4867 distinct peptide sequences from various sources, including 84 tumor-associated antigens. The EPIMHC database is accessible through a web server that has been designed to facilitate research in computational vaccinology. Importantly, peptides resulting from a query can be selected to derive specific motif-matrices. Subsequently, these motif-matrices can be used in combination with a dynamic algorithm for predicting MHC-binding peptides from user-provided protein queries. AVAILABILITY: The EPIMHC database server is hosted by the Dana-Farber Cancer Institute at the site http://immunax.dfci.harvard.edu/bioinformatics/epimhc/ Citation for the above abstract: Reche PA, Zhang H, Glutting JP, Reinherz EL. EPIMHC: a curated database of MHC-binding peptides for customized computational vaccinology. Bioinformatics. 2005 May 1;21(9):2140-1. Epub 2005 Jan 18. © 2005 Oxford University Press. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15657103 |
| 196. GDB: The GDB Human Genome Database |
URL: http://www.gdb.org/ Categories: Human Genome Databases, Maps, and Viewers The Genome Database (GDB, http://www.gdb.org ) is a public repository of data on human genes, clones, STSs, polymorphisms and maps. GDB entries are highly cross-linked to each other, to literature citations and to entries in other databases, including the sequence databases, OMIM, and the Mouse Genome Database. Mapping data from large genome centers and smaller mapping efforts are added to GDB on an ongoing basis. The database can be searched by a variety of methods, ranging from keyword searches to complex queries. Major functionality extensions in the last year include the ongoing computation of integrated human genome maps, called Comprehensive Maps, and the use of those maps to support positional queries and graphic displays. The capabilities of the GDB map viewer (Mapview) have been extended to include map printing and the graphical display of ad hoc query results. The HUGO Nomenclature Committee continues to curate the proposed and official gene symbols and related data in collaboration with GDB. As genome research shifts its emphasis from mapping to sequencing and functional analysis, the scope of the GDB schema is being extended. We are in the process of adding representations of gene function and expression, and improving our representation of human polymorphism and mutation. Citation for the above abstract: Letovsky, SI, Cottingham, RW, Porter, CJ, Li, PWD GDB: the Human Genome Database Nucl. Acids Res. 1998 26: 94-99 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/94 |
| 197. GRL: Gene Resource Locator |
URL: http://grl.gi.k.u-tokyo.ac.jp/ Categories: Human Genome Databases, Maps, and Viewers Since the advent of the draft human genome sequence there has been growing interest in transcriptome analysis based on genomic data. The Gene Resource Locator (GRL) assembles gene maps that include information on gene-expression patterns, cis-elements in regulatory regions and alternatively spliced transcripts. The database was constructed using customized software, and currently contains 2.2 million alignments (exon-intron structures). The alignments have been annotated and integrated into a system that encompasses approximately 90 000 EST loci sharing common exons, 8091 alternatively spliced transcript groups, 10 801 expression-profile groups, 8066 candidate regulatory regions in full-length cDNAs, and 1 million SNP loci. We have used Flash technology to build a dynamic web viewer that facilitates browsing through the millions of alignments. All of the information is available through the World Wide Web at the Gene Resource Locator web site (http://grl.gi.k.u-tokyo.ac.jp). Citation for the above abstract: Honkura, Toshihiko, Ogasawara, Jun, Yamada, Tomoyuki, Morishita, Shinichi The Gene Resource Locator: gene locus maps for transcriptome analysis Nucl. Acids Res. 2002 30: 221-225 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/221 |
| 198. UniProt: Universal Protein Resource |
URL: http://www.pir.uniprot.org/ Categories: General Protein Sequence Databases The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/. Citation for the above abstract: Wu, Cathy H., Apweiler, Rolf, Bairoch, Amos, Natale, Darren A., Barker, Winona C., Boeckmann, Brigitte, Ferro, Serenella, Gasteiger, Elisabeth, Huang, Hongzhan, Lopez, Rodrigo, Magrane, Michele, Martin, Maria J., Mazumder, Raja, O'Donovan, Claire, Redaschi, Nicole, Suzek, Baris The Universal Protein Resource (UniProt): an expanding universe of protein information Nucl. Acids Res. 2006 34: D187-191 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D187 |
| 199. CyBase: A Database of Cyclic Proteins |
URL: http://research.imb.uq.edu.au/cybase Categories: Protein Property Databases CyBase is a curated database and information source for backbone-cyclized proteins. The database incorporates naturally occurring cyclic proteins as well as synthetic derivatives, grafted analogues and acyclic permutants. The database provides a centralized repository of information on all aspects of cyclic protein biology and addresses issues pertaining to the management and searching of topologically circular sequences. The database is freely available at http://research.imb.uq.edu.au/cybase. Citation for the above abstract: Mulvenna, Jason P., Wang, Conan, Craik, David J. CyBase: a database of cyclic protein sequence and structure Nucl. Acids Res. 2006 34: D192-194 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D192 |
| 200. UniParc |
URL: http://www.uniprot.org/database/archive.shtml/ Categories: General Protein Sequence Databases The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks. Citation for the above abstract: Bairoch, Amos, Apweiler, Rolf, Wu, Cathy H., Barker, Winona C., Boeckmann, Brigitte, Ferro, Serenella, Gasteiger, Elisabeth, Huang, Hongzhan, Lopez, Rodrigo, Magrane, Michele, Martin, Maria J., Natale, Darren A., O'Donovan, Claire, Redaschi, Nicole, Yeh, Lai-Su L. The Universal Protein Resource (UniProt) Nucl. Acids Res. 2005 33: D154-159 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D154 |
| 201. UniRef |
URL: http://www.pir.uniprot.org/database/nref.shtml Categories: General Protein Sequence Databases The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks. Citation for the above abstract: Bairoch, Amos, Apweiler, Rolf, Wu, Cathy H., Barker, Winona C., Boeckmann, Brigitte, Ferro, Serenella, Gasteiger, Elisabeth, Huang, Hongzhan, Lopez, Rodrigo, Magrane, Michele, Martin, Maria J., Natale, Darren A., O'Donovan, Claire, Redaschi, Nicole, Yeh, Lai-Su L. The Universal Protein Resource (UniProt) Nucl. Acids Res. 2005 33: D154-159 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D154 |
| 202. DBcat |
URL: http://www.infobiogen.fr/services/dbcat/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases The DBcat (http://www.infobiogen.fr/services/dbcat ) is a comprehensive catalog of biological databases, maintained and curated at Infobiogen. It contains 500 databases classified by application domains. The DBcat is a structured flat-file library, that can be searched by means of an SRS server or a dedicated Web interface. The files are available for download from Infobiogen anonymous ftp server. Citation for the above abstract: Discala, Claude, Benigni, Xavier, Barillot, Emmanuel, Vaysseix, Guy DBcat: a catalog of 500 biological databases Nucl. Acids Res. 2000 28: 8-9 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/8 |
| 203. HGNC: Human Gene Nomenclature Database |
URL: http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl Categories: General Human Genetics Databases, Genome Annotation Terms, Ontology, and Nomenclature Databases The HUGO Gene Nomenclature Committee (HGNC) aims to give every human gene a unique and ideally meaningful name and symbol. The HGNC database, previously known as Genew, contains over 22 000 public records with approved human gene nomenclature and associated information. The database has undergone major improvements throughout the last year, is publicly available for online searching at http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl and has a new custom downloads interface at http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/gdlw.pl. Citation for the above abstract: Eyre, Tina A., Ducluzeau, Fabrice, Sneddon, Tam P., Povey, Sue, Bruford, Elspeth A., Lush, Michael J. The HUGO Gene Nomenclature Database, 2006 updates Nucl. Acids Res. 2006 34: D319-321 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D319 |
| 204. GO: Gene Ontology |
URL: http://www.geneontology.org/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases The Gene Ontology (GO) project (http://www.geneontology.org) develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://song.sourceforge.net/). The GO Consortium continues to improve to the vocabulary content, reflecting the impact of several novel mechanisms of incorporating community input. A growing number of model organism databases and genome annotation groups contribute annotation sets using GO terms to GO's public repository. Updates to the AmiGO browser have improved access to contributed genome annotations. As the GO project continues to grow, the use of the GO vocabularies is becoming more varied as well as more widespread. The GO project provides an ontological annotation system that enables biologists to infer knowledge from large amounts of data. Citation for the above abstract: Gene Ontology Consortium, The Gene Ontology (GO) project in 2006 Nucl. Acids Res. 2006 34: D322-326 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D322 |
| 205. GOA: Gene Ontology Annotation |
URL: http://www.ebi.ac.uk/GOA/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk. Citation for the above abstract: Camon, Evelyn, Magrane, Michele, Barrell, Daniel, Lee, Vivian, Dimmer, Emily, Maslen, John, Binns, David, Harte, Nicola, Lopez, Rodrigo, Apweiler, Rolf The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology Nucl. Acids Res. 2004 32: D262-266 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D262 |
| 206. IUPAC Nomenclature database |
URL: http://www.chem.qmul.ac.uk/iupac/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases "Recommendations on Organic & Biochemical Nomenclature, Symbols & Terminology etc." |
| 207. IUBMB Nomenclature database |
URL: http://www.chem.qmul.ac.uk/iubmb/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases "Recommendations on Biochemical & Organic Nomenclature, Symbols & Terminology etc." |
| 208. IUPHAR-RD |
URL: http://www.iuphar-db.org/iuphar-rd/ Categories: Drug and Drug Design Databases, Genome Annotation Terms, Ontology, and Nomenclature Databases "... the official database of the IUPHAR [The International Union of Pharmacology] Committee on Receptor Nomenclature and Drug Classification." |
| 209. PANTHER: Protein ANalysis THrough Evolutionary Relationships |
URL: https://panther.appliedbiosystems.com/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise. These subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (ontology terms and pathways), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences. The latest version, 5.0, contains 6683 protein families, divided into 31,705 subfamilies, covering approximately 90% of mammalian protein-coding genes. PANTHER 5.0 includes a number of significant improvements over previous versions, most notably (i) representation of pathways (primarily signaling pathways) and association with subfamilies and individual protein sequences; (ii) an improved methodology for defining the PANTHER families and subfamilies, and for building the HMMs; (iii) resources for scoring sequences against PANTHER HMMs both over the web and locally; and (iv) a number of new web resources to facilitate analysis of large gene lists, including data generated from high-throughput expression experiments. Efforts are underway to add PANTHER to the InterPro suite of databases, and to make PANTHER consistent with the PIRSF database. PANTHER is now publicly available without restriction at http://panther.appliedbiosystems.com. Citation for the above abstract: Mi, Huaiyu, Lazareva-Ulitsky, Betty, Loo, Rozina, Kejariwal, Anish, Vandergriff, Jody, Rabkin, Steven, Guo, Nan, Muruganujan, Anushya, Doremieux, Olivier, Campbell, Michael J., Kitano, Hiroaki, Thomas, Paul D. The PANTHER database of protein families, subfamilies, functions and pathways Nucl. Acids Res. 2005 33: D284-288 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D284 |
| 210. SOURCE |
URL: http://source.stanford.edu/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases, Microarray Data and other Gene Expression Databases The explosion in the number of functional genomic datasets generated with tools such as DNA microarrays has created a critical need for resources that facilitate the interpretation of large-scale biological data. SOURCE is a web-based database that brings together information from a broad range of resources, and provides it in manner particularly useful for genome-scale analyses. SOURCE's GeneReports include aliases, chromosomal location, functional descriptions, GeneOntology annotations, gene expression data, and links to external databases. We curate published microarray gene expression datasets and allow users to rapidly identify sets of co-regulated genes across a variety of tissues and a large number of conditions using a simple and intuitive interface. SOURCE provides content both in gene and cDNA clone-centric pages, and thus simplifies analysis of datasets generated using cDNA microarrays. SOURCE is continuously updated and contains the most recent and accurate information available for human, mouse, and rat genes. By allowing dynamic linking to individual gene or clone reports, SOURCE facilitates browsing of large genomic datasets. Finally, SOURCEs batch interface allows rapid extraction of data for thousands of genes or clones at once and thus facilitates statistical analyses such as assessing the enrichment of functional attributes within clusters of genes. SOURCE is available at http://source.stanford.edu. Citation for the above abstract: Diehn, Maximilian, Sherlock, Gavin, Binkley, Gail, Jin, Heng, Matese, John C., Hernandez-Boussard, Tina, Rees, Christian A., Cherry, J. Michael, Botstein, David, Brown, Patrick O., Alizadeh, Ash A. SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data Nucl. Acids Res. 2003 31: 219-223 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/219 |
| 211. UMLS: Unified Medical Language System |
URL: http://umlsks.nlm.nih.gov/ Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases The Unified Medical Language System (http://umlsks.nlm.nih.gov) is a repository of biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 2 million names for some 900,000 concepts from more than 60 families of biomedical vocabularies, as well as 12 million relations among these concepts. Vocabularies integrated in the UMLS Metathesaurus include the NCBI taxonomy, Gene Ontology, the Medical Subject Headings (MeSH), OMIM and the Digital Anatomist Symbolic Knowledge Base. UMLS concepts are not only inter-related, but may also be linked to external resources such as GenBank. In addition to data, the UMLS includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap). The UMLS knowledge sources are updated quarterly. All vocabularies are available at no fee for research purposes within an institution, but UMLS users are required to sign a license agreement. The UMLS knowledge sources are distributed on CD-ROM and by FTP. Citation for the above abstract: Bodenreider, Olivier The Unified Medical Language System (UMLS): integrating biomedical terminology Nucl. Acids Res. 2004 32: D267-270 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D267 |
| 212. ICB: Identification and Classification of Bacteria Database |
URL: http://www.mbio.co.jp/icb Categories: Taxonomy and Identification Databases The Identification and Classification of Bacteria (ICB) database (http:/www.mbio.co.jp/icb) contains currently available information about the DNA gyrase subunit B (gyrB) gene in bacteria. The database is designed to provide the scientific community with a reference point for using gyrB as an evolutionary and taxonomic marker. Nucleic and amino acid sequence data are currently available for over 850 strains, along with alignments at several different taxonomic levels and an exhaustive review of primer selection and background information. Citation for the above abstract: Watanabe, Kanako, Nelson, James, Harayama, Shigeaki, Kasai, Hiroaki ICB database: the gyrB database for identification and classification of bacteria Nucl. Acids Res. 2001 29: 344-345 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/344 |
| 213. NCBI Taxonomy Browser |
URL: http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html Categories: Taxonomy and Identification Databases The NCBI taxonomy database indexes over 165 000 named organisms that are represented in the databases with at least one nucleotide or protein sequence. The Taxonomy Browser can be used to view the taxonomic position or retrieve data from any of the principal Entrez databases for a particular organism or group. The Taxonomy Browser also displays links to the Map Viewer, Genomic BLAST services, the Trace Archive, and to model organism and taxonomic databases via LinkOut. Searches of the NCBI taxonomy may be made on the basis of whole, partial or phonetically spelled organism names, but links to organisms commonly used in biological research are provided. The Entrez Taxonomy system adds the ability to display custom taxonomic trees representing user-defined subsets of the full NCBI taxonomy. Citation for the above excerpt: Wheeler, David L., Barrett, Tanya, Benson, Dennis A., Bryant, Stephen H., Canese, Kathi, Church, Deanna M., DiCuccio, Michael, Edgar, Ron, Federhen, Scott, Helmberg, Wolfgang, Kenton, David L., Khovayko, Oleg, Lipman, David J., Madden, Thomas L., Maglott, Donna R., Ostell, James, Pontius, Joan U., Pruitt, Kim D., Schuler, Gregory D., Schriml, Lynn M., Sequeira, Edwin, Sherry, Steven T., Sirotkin, Karl, Starchenko, Grigory, Suzek, Tugba O., Tatusov, Roman, Tatusova, Tatiana A., Wagner, Lukas, Yaschenko, Eugene Database resources of the National Center for Biotechnology Information Nucl. Acids Res. 2005 33: D39-45 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D39 |
| 214. RIDOM: Ribosomal Differentiation of Medical Microorganisms |
URL: http://www.ridom.de/ Categories: Taxonomy and Identification Databases The ribosomal differentiation of medical micro-organisms (RIDOM) web server, first described by Harmsen et al. [Harmsden,D., Rothganger,J., Singer,C., Albert,J. and Frosch,M. (1999) Lancet, 353, 291], is an evolving electronic resource designed to provide micro-organism differentiation services for medical identification needs. The diagnostic procedure begins with a specimen partial small subunit ribosomal DNA (16S rDNA) sequence. Resulting from a similarity search, a species or genus name for the specimen in question will be returned. Where the first results are ambiguous or do not define to species level, hints for further molecular, i.e. internal transcribed spacer, and conventional phenotypic differentiation will be offered ('sequential and polyphasic approach'). Additionally, each entry in RIDOM contains detailed medical and taxonomic information linked, context-sensitive, to external World Wide Web services. Nearly all sequences are newly determined and the sequence chromatograms are available for intersubjective quality control. Similarity searches are now also possible by direct submission of trace files (ABI or SCF format). Based on the PHRED/PHRAP software, error probability measures are attached to each predicted nucleotide base and visualised with a new 'Trace Editor'. The RIDOM web site is directly accessible on the World Wide Web at http://www.ridom.de/. The email address for questions and comments is webmaster@ridom.de. Citation for the above abstract: Harmsen, Dag, Rothganger, Jorg, Frosch, Matthias, Albert, Jurgen RIDOM: Ribosomal Differentiation of Medical Micro-organisms Database Nucl. Acids Res. 2002 30: 416-417 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/416 |
| 215. CroW 21: The Human Chromosome 21 Database at the Weizmann Institute |
URL: http://genecards.weizmann.ac.il/crow21/ Categories: Human Genome Databases, Maps, and Viewers Recent enhancements and current research in the GeneCards (GC) (http://bioinfo.weizmann.ac.il/cards/) project are described, including the addition of gene expression profiles and integrated gene locations. Also highlighted are the contributions of specialized associated human gene-centric databases developed at the Weizmann Institute. These include the Unified Database (UDB) (http://bioinfo.weizmann.ac.il/udb) for human genome mapping, the human Chromosome 21 database at the Weizmann Insti-tute (CroW 21) (http://bioinfo.weizmann.ac.il/crow21), and the Human Olfactory Receptor Data Explora-torium (HORDE) (http://bioinfo.weizmann.ac.il/HORDE). The synergistic relationships amongst these efforts have positively impacted the quality, quantity and usefulness of the GeneCards gene compendium. Citation for the above abstract: Safran, Marilyn, Chalifa-Caspi, Vered, Shmueli, Orit, Olender, Tsviya, Lapidot, Michal, Rosen, Naomi, Shmoish, Michael, Peter, Yakov, Glusman, Gustavo, Feldmesser, Ester, Adato, Avital, Peter, Inga, Khen, Miriam, Atarot, Tal, Groner, Yoram, Lancet, Doron Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE Nucl. Acids Res. 2003 31: 142-146 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/142 |
| 216. Tree of Life Web Project |
URL: http://tolweb.org/tree/phylogeny.html Categories: Taxonomy and Identification Databases "The Tree of Life Web Project (ToL) is a collaborative effort of biologists from around the world. On more than 3000 World Wide Web pages, the project provides information about the diversity of organisms on Earth, their evolutionary history (phylogeny), and characteristics. Each page contains information about a particular group of organisms (e.g., echinoderms, tyrannosaurs, phlox flowers, cephalopods, club fungi, or the salamanderfish of Western Australia). ToL pages are linked one to another hierarchically, in the form of the evolutionary tree of life. Starting with the root of all Life on Earth and moving out along diverging branches to individual species, the structure of the ToL project thus illustrates the genetic connections between all living things." |
| 217. BAMS: The Brain Architecture Management System |
URL: http://brancusi.usc.edu/bkms/ Categories: Neuroscience Databases The brain's structural organization is so complex that 2,500 years of analysis leaves pervasive uncertainty about (i) the identity of its basic parts (regions with their neuronal cell types and pathways interconnecting them), (ii) nomenclature, (iii) systematic classification of the parts with respect to topographic relationships and functional systems and (iv) the reliability of the connectional data itself. Here we present a prototype knowledge management system (http://brancusi.usc.edu/bkms/) for analyzing the architecture of brain networks in a systematic, interactive and extendable way. It supports alternative interpretations and models, is based on fully referenced and annotated data and can interact with genomic and functional knowledge management systems through web services protocols. Citation for the above abstract: Mihail Bota, Hong-Wei Dong & Larry W Swanson From gene networks to brain networks Nature Neuroscience 6, 795 - 799 (2003) © 2003 Nature Publishing Group. The full abstract can be found at: http://www.nature.com/cgi-taf/DynaPage.taf?file=/neuro/journal/v6/n8/abs/nn1096.html&dynoptions=doi1105518408 |
| 218. Atlas of Genetics and Cytogenetics in Oncology and Haematology |
URL: http://www.infobiogen.fr/services/chromcancer/ Categories: Cancer Databases, Gene-, System-, or Disease- Specific Databases, Metadatabases and Directories The 'Atlas of Genetics and Cytogenetics in Oncology and Haematology' (http://www.infobiogen.fr/services/chromcancer) contains concise and updated cards on genes involved in cancer, cytogenetics and clinical entities in oncology, and cancer-prone diseases, a portal towards genetics/cancer, and teaching materials in genetics. This database is made for and by researchers and clinicians, who are encouraged to contribute. The Atlas is part of the genome project and it participates in research on cancer epidemiology. Citation for the above abstract: Huret, Jean-Loup, Dessen, Philippe, Bernheim, Alain Atlas of Genetics and Cytogenetics in Oncology and Haematology, year 2003 Nucl. Acids Res. 2003 31: 272-274 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/272 |
| 219. CGED: Cancer Gene Expression Database |
URL: http://cged.hgc.jp Categories: Cancer Databases, Microarray Data and other Gene Expression Databases Gene expression profiling of cancer tissues is expected to contribute to our understanding of cancer biology as well as developments of new methods of diagnosis and therapy. Our collaborative efforts in Japan have been mainly focused on solid tumors such as breast, colorectal and hepatocellular cancers. The expression data are obtained by a high-throughput RT-PCR technique, and patients are recruited mainly from a single hospital. In the cancer gene expression database (CGED), the expression and clinical data are presented in a way useful for scientists interested in specific genes or biological functions. The data can be retrieved either by gene identifiers or by functional categories defined by Gene Ontology terms or the Swiss-Prot annotation. Expression patterns of multiple genes, selected by names or similarity search of the patterns, can be compared. Visual presentation of the data with sorting function enables users to easily recognize of relationships between gene expression and clinical parameters. Data for other cancers such as lung and thyroid cancers will be added in the near future. The URL of CGED is http://cged.hgc.jp. Citation for the above abstract: Kato, Kikuya, Yamashita, Riu, Matoba, Ryo, Monden, Morito, Noguchi, Shinzaburo, Takagi, Toshihisa, Nakai, Kenta Cancer gene expression database (CGED): a database for gene expression profiling with accompanying clinical information of human cancer tissues Nucl. Acids Res. 2005 33: D533-536 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D533 |
| 220. COSMIC: Catalogue Of Somatic Mutations In Cancer |
URL: http://www.sanger.ac.uk/genetics/CGP/cosmic/ Categories: Cancer Databases The discovery of mutations in cancer genes has advanced our understanding of cancer. These results are dispersed across the scientific literature and with the availability of the human genome sequence will continue to accrue. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website have been developed to store somatic mutation data in a single location and display the data and other information related to human cancer. To populate this resource, data has currently been extracted from reports in the scientific literature for somatic mutations in four genes, BRAF, HRAS, KRAS2 and NRAS. At present, the database holds information on 66 634 samples and reports a total of 10 647 mutations. Through the web pages, these data can be queried, displayed as figures or tables and exported in a number of formats. COSMIC is an ongoing project that will continue to curate somatic mutation data and release it through the website. Citation for the above abstract: Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague J, Futreal PA, Stratton MR, Wooster R. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer. 2004 Jul 19;91(2):355-8. © 2004 Nature Publishing Group. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15188009 |
| 221. Database of Germline p53 Mutations |
URL: http://www.lf2.cuni.cz/win/projects/germline_mut_p53.htm Categories: Cancer Databases We created a comprehensive database covering all published cases of germline p53 mutations. The current version lists 580 tumours in 448 individuals belonging to 122 independent pedigrees. The database describes each p53 mutation (type of the mutation, exon and codon affected by the mutation, nucleotide and amino acid change), each family (family history of cancer, diagnosis of Li-Fraumeni syndrome), each affected individual (sex, generation, p53 status, from which parent the mutation was inherited) and each tumour (type, age of onset, p53 status-loss of heterozygosity, immunostaining). Each entry contains the original reference(s). The database is freely available and can be obtained from http://www.lf2.cuni.cz Citation for the above abstract: Sedlacek, Z, Kodet, R, Poustka, A, Goetz, P A database of germline p53 mutations in cancer-prone families Nucl. Acids Res. 1998 26: 214-215 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/214 |
| 222. Human p53, Human hprt, Rodent lacI and Rodent lacZ Databases |
URL: http://www.ibiblio.org/dnam/mainpage.html Categories: Cancer Databases We have created databases and software applications for the analysis of DNA mutations at the human p53 gene, the human hprt gene and both the rodent transgenic lacI and lacZ loci. The databases themselves are stand-alone dBASE files and the software for analysis of the databases runs on IBM-compatible computers with Microsoft Windows. Each database has a separate software analysis program. The software created for these databases permit the filtering, ordering, report generation and display of information in the database. In addition, a significant number of routines have been developed for the analysis of single base substitutions. One method of obtaining the databases and software is via the World Wide Web. Open the following home page with a Web Browser: http://sunsite.unc.edu/dnam/mainpage.html . Alternatively, the databases and programs are available via public FTP from: ftp://anonymous@sunsite.unc.edu. There is no password required to enter the system. The databases and software are found beneath the subdirectory: pub/academic/biology/dna-mutations. Two other programs are available at the site, a program for comparison of mutational spectra and a program for entry of mutational data into a relational database. Citation for the above abstract: Cariello, NF, Douglas, GR, Gorelick, NJ, Hart, DW, Wilson, JD, Soussi, T Databases and software for the analysis of mutations in the human p53 gene, human hprt gene and both the lacI and lacZ gene in transgenic rodents Nucl. Acids Res. 1998 26: 198-199 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/198 |
| 223. IARC TP53 Database |
URL: http://www-p53.iarc.fr/index.html Categories: Cancer Databases Since 1989, about 570 different p53 mutations have been identified in more than 8000 human cancers. A database of these mutations was initiated by M. Hollstein and C. C. Harris in 1990. This database originally consisted of a list of somatic point mutations in the p 53 gene of human tumors and cell lines, compiled from the published literature and made available in a standard electronic form. The database is maintained at the International Agency for Research on Cancer (IARC) and updated versions are released twice a year (January and July). The current version (July 1997) contains records on 6800 published mutations and will surpass the 8000 mark in the January 1998 release. The database now contains information on somatic and germline mutations in a new format to facilitate data retrieval. In addition, new tools are constructed to improve data analysis, such as a Mutation Viewer Java applet developed at the European Bioinformatics Institute (EBI) to visualise the location and impact of mutations on p53 protein structure. The database is available in different electronic formats at IARC (http://www.iarc. fr/p53/homepage.htm ) or from the EBI server (http://www.ebi.ac.uk ). The IARC p53 website also provides reports on database analysis and links with other p53 sites as well as with related databases. In this report, we describe the criteria for inclusion of data, the revised format and the new visualisation tools. We also briefly discuss the relevance of p 53 mutations to clinical and biological questions. Citation for the above abstract: Hainaut, P, Hernandez, T, Robinson, A, Rodriguez-Tome, P, Flores, T, Hollstein, M, Harris, CC, Montesano, R IARC Database of p53 gene mutations in human tumors and cell lines: updated compilation, revised formats and new visualisation tools Nucl. Acids Res. 1998 26: 205-213 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/205 |
| 224. MTB: Mouse Tumor Biology Database |
URL: http://tumor.informatics.jax.org/ Categories: Cancer Databases, Model Organisms and Comparative Genomics Databases The Mouse Tumor Biology (MTB) Database serves as a curated, integrated resource for information about tumor genetics and pathology in genetically defined strains of mice (i.e., inbred, transgenic and targeted mutation strains). Sources of information for the database include the published scientific literature and direct data submissions by the scientific community. Researchers access MTB using Web-based query forms and can use the database to answer such questions as 'What tumors have been reported in transgenic mice created on a C57BL/6J background?', 'What tumors in mice are associated with mutations in the Trp53 gene?' and 'What pathology images are available for tumors of the mammary gland regardless of genetic background?'. MTB has been available on the Web since 1998 from the Mouse Genome Informatics web site (http://www.informatics.jax.org). We have recently implemented a number of enhancements to MTB including new query options, redesigned query forms and results pages for pathology and genetic data, and the addition of an electronic data submission and annotation tool for pathology data. Citation for the above abstract: Bult, Carol J., Krupke, Debra M., Naf, Dieter, Sundberg, John P., Eppig, Janan T. Web-based access to mouse models of human cancers: the Mouse Tumor Biology (MTB) Database Nucl. Acids Res. 2001 29: 95-97 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/95 |
| 225. OrCGDB: Oral Cancer Gene Database |
URL: http://www.tumor-gene.org/Oral/oral.html Categories: Cancer Databases The Oral Cancer Gene Database (OrCGDB; http://www.tumor-gene. org/Oral/oral.html) was developed to provide the biomedical community with easy access to the latest information on the genes involved in oral cancer. The information is stored in a relational database and accessed through a WWW interface. The OrCGDB is organized by gene name, which is linked to information describing properties of the gene. This information is stored as a collection of findings ('facts') that are entered by the database curator in a semi-structured format from information in primary publications using a WWW interface. These facts include causes of oncogenic activation, chromosomal localization of the gene, mutations associated with the gene, the biochemical identity and activity of the gene product, synonyms for the gene name and a variety of clinical information. Each fact is associated with a MEDLINE citation. The user can search the OrCGDB by gene name or by entering a textword. The OrCGDB is part of a larger WWW-based tumor gene database and represents a new approach to catalog and display the research literature. Citation for the above abstract: Levine, Alan E., Steffen, David L. OrCGDB: a database of genes involved in oral cancer Nucl. Acids Res. 2001 29: 300-302 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/300 |
| 226. RTCGD: Mouse Retroviral Tagged Cancer Gene Database |
URL: http://rtcgd.ncifcrf.gov/ Categories: Cancer Databases, Model Organisms and Comparative Genomics Databases Retroviral insertional mutagenesis in mouse hematopoietic tumors provides a potent cancer gene discovery tool in the post-genome-sequence era. To manage multiple high-throughput insertional mutagenesis screening projects, we developed the Retroviral Tagged Cancer Gene Database (RTCGD; http://RTCGD.ncifcrf.gov). A sequence analysis pipeline determines the genomic position of each retroviral integration site cloned from a mouse tumor, the distance between it and the nearest candidate disease gene(s) and its orientation with respect to the candidate gene(s). The pipeline also identifies genomic regions that are targets of retroviral integration in more than one tumor (common integration sites, CISs) and are thus likely to encode a disease gene. Users can search the database using a specified gene symbol, chromosome number or tumor model to identify both CIS genes and unique viral integration sites or compare the integration sites cloned by different laboratories using different models. As a default setting, users first review the CIS Lists and then Clone Lists. CIS Lists describe CISs and their candidate disease genes along with links to other public databases and clone lists. Clone Lists describe the viral integration site clones along with the tumor model and tumor type from which they were cloned, candidate disease gene(s), genomic position and orientation of the integrated provirus with respect to the candidate gene(s). It also provides a pictorial view of the genomic location of each integration site relative to neighboring genes and markers. Researchers can identify integrations of interest and compare their results with those for multiple tumor models and tumor types using RTCGD. Citation for the above abstract: Akagi, Keiko, Suzuki, Takeshi, Stephens, Robert M., Jenkins, Nancy A., Copeland, Neal G. RTCGD: retroviral tagged cancer gene database Nucl. Acids Res. 2004 32: D523-527 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D523 |
| 227. SNP500Cancer |
URL: http://snp500cancer.nci.nih.gov/ Categories: Cancer Databases The SNP500Cancer database provides sequence and genotype assay information for candidate SNPs useful in mapping complex diseases, such as cancer. The database is an integral component of the NCI Cancer Genome Anatomy Project (http://cgap.nci.nih.gov). SNP500Cancer reports sequence analysis of anonymized control DNA samples (n = 102 Coriell samples representing four self-described ethnic groups: African/African-American, Caucasian, Hispanic and Pacific Rim). The website is searchable by gene, chromosome, gene ontology pathway, dbSNP ID and SNP500Cancer SNP ID. As of October 2005, the database contains >13 400 SNPs, 9124 of which have been sequenced in the SNP500Cancer population. For each analysed SNP, gene location and >200 bp of surrounding annotated sequence (including nearby SNPs) are provided, with frequency information in total and per subpopulation as well as calculation of Hardy–Weinberg equilibrium for each subpopulation. The website provides the conditions for validated sequencing and genotyping assays, as well as genotype results for the 102 samples, in both viewable and downloadable formats. A subset of sequence validated SNPs with minor allele frequency >5% are entered into a high-throughput pipeline for genotyping analysis to determine concordance for the same 102 samples. In addition, the results of genotype analysis for select validated SNP assays (defined as 100% concordance between sequence analysis and genotype results) are posted for an additional 280 samples drawn from the Human Diversity Panel (HDP). SNP500Cancer provides an invaluable resource for investigators to select SNPs for analysis, design genotyping assays using validated sequence data, choose selected assays already validated on one or more genotyping platforms, and select reference standards for genotyping assays. The SNP500Cancer database is freely accessible via the web page at http://snp500cancer.nci.nih.gov. Citation for the above abstract: Packer, Bernice R., Yeager, Meredith, Burdett, Laura, Welch, Robert, Beerman, Michael, Qi, Liqun, Sicotte, Hugues, Staats, Brian, Acharya, Mekhala, Crenshaw, Andrew, Eckert, Andrew, Puri, Vinita, Gerhard, Daniela S., Chanock, Stephen J. SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes Nucl. Acids Res. 2006 34: D617-621 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D617 |
| 228. SV40 Large T-Antigen Mutant Database |
URL: http://supernova.bio.pitt.edu/pipaslab/ Categories: Cancer Databases The SV40 T antigen database (http://www.pitt.edu/~pipslab/ ) lists viruses and plasmids expressing mutant forms of large T antigen. Each entry contains information regarding the mutant designation, mutant type, virus strain, nucleotide change, amino acid change and pertinent references. The database is now available as an internet searchable index. Citation for the above abstract: Robinson, CG, Pipas, JM SV40 large tumor antigen (T antigen): database of mutants Nucl. Acids Res. 1998 26: 295-296 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/295 |
| 229. The Tumor Gene Family of Databases |
URL: http://www.tumor-gene.org/tgdf.html Categories: Cancer Databases "The Tumor Gene Family of Databases contains information about genes which are targets for cancer-causing mutations; proto-oncogenes and tumor supressor genes. Its goal is to provide a standard set of facts (e.g. protein size, biochemical activity, chromosomal location, ...) about all known tumor genes. At present, the database contains over 2600 facts on over 300 genes. These databases are designed to for biomedical researchers who work with tumor genes. Anyone is free to search it, but if you are not in this group, it may not be very useful to you. The Tumor Gene Database Family is a consortium of more specialized databases.
|
| 230. UMD-p53 Database |
URL: http://p53.free.fr/ Categories: Cancer Databases The tumor suppressor gene TP53 (p53) is the most extensively studied gene involved in human cancers. More than 1,400 publications have reported mutations of this gene in 150 cancer types for a total of 14,971 mutations. To exploit this huge bulk of data, specific analytic tools were highly warranted. We therefore developed a locus-specific database software called UMD-p53. This database compiles all somatic and germline mutations as well as polymorphisms of the TP53 gene which have been reported in the published literature since 1989, or unpublished data submitted to the database curators. The database is available at www.umd.necker.fr or at http://p53.curie.fr/. In this paper, we describe recent developments of the UMD-p53 database. These developments include new fields and routines. For example, the analysis of putative acceptor or donor splice sites is now automated and gives new insight for the causal role of "silent mutations." Other routines have also been created such as the prescreening module, the UV module, and the cancer distribution module. These new improvements will help users not only for molecular epidemiology and pharmacogenetic studies but also for patient-based studies. To achieve theses purposes we have designed a procedure to check and validate data in order to reach the highest quality data. Citation for the above abstract: Beroud C, Soussi T. The UMD-p53 database: new mutations and analysis tools. Hum Mutat. 2003 Mar;21(3):176-81. © 2003 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12619103 |
| 231. ALPSbase |
URL: http://research.nhgri.nih.gov/alps/ Categories: Gene-, System-, or Disease- Specific Databases, Immunological Databases "Autoimmune Lymphoproliferative Syndrome (ALPS) is a recently recognized disease in which a genetic defect in programmed cell death, or apoptosis, leads to breakdown of lymphocyte regulation. Patients with ALPS have chronic enlargement of the spleen and lymph nodes, various manifestations of autoimmunity, and elevation of a normally rare population of "double negative T cells" (DNTs), T lymphocytes expressing neither cluster differentiation CD4 nor CD8 surface antigens. When lymphocytes from patients with ALPS are cultured in vitro, they are resistant to apoptosis as compared to cells from healthy controls. Most patients with ALPS have mutations in a gene now named TNFRSF6 (tumor necrosis factor receptor gene superfamily member 6). This gene encodes the cell surface receptor for the major apoptosis pathway in mature lymphocytes. The gene and protein have had several names including Fas (used here), APO-1 and APT1. ALPS is subdivided into: 1) Type Ia, ALPS with mutant Fas; 2) Type Ib, lymphadenopathy and systemic lupus erythematosus with mutation in the ligand for Fas; 3) Type II, ALPS with mutant caspase-10 or caspase-8; and 4) Type III, ALPS as yet without a defined genetic cause." |
| 232. Androgen Receptor Gene Mutations Database |
URL: http://www.androgendb.mcgill.ca/ Categories: Gene-, System-, or Disease- Specific Databases The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 374 to 605, and the number of AR-interacting proteins described has increased from 23 to 70, both over the past 3 years. A 3D model of the AR ligand-binding domain (AR LBD) has been added to give a better understanding of gene structure-function relationships. In addition, silent mutations have now been reported in both androgen insensitivity syndrome (AIS) and prostate cancer (CaP) cases. The database also now incorporates information on the exon 1 CAG repeat expansion disease, spinobulbar muscular atrophy (SBMA), as well as CAG repeat length variations associated with risk for female breast, uterine endometrial, colorectal, and prostate cancer, as well as for male infertility. The possible implications of somatic mutations, as opposed to germline mutations, in the development of future locus-specific mutation databases (LSDBs) is discussed. The database is available on the Internet (http://www.mcgill.ca/androgendb/). Citation for the above abstract: Gottlieb B, Beitel LK, Wu JH, Trifiro M. The androgen receptor gene mutations database (ARDB): 2004 update. Hum Mutat. 2004 Jun;23(6):527-33. © 2004 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15146455 |
| 233. AngioDB: Database of Angiogenesis and Angiogenesis-related Molecules |
URL: http://angiodb.snu.ac.kr/ Categories: Gene-, System-, or Disease- Specific Databases Angiogenesis is the formation of new capillaries sprouting from pre-existing vessels. Angiogenesis occurs in a variety of normal physiological and pathological conditions and is regulated by a balance of stimulatory and inhibitory angiogenic factors. The control of this balance may fail and result in the formation of a pathologic capillary network during the development of many diseases. Therefore, we developed the angiogenesis database (AngioDB), which can provide a signaling network of angiogenesis-related biomolecules in human. Each record of AngioDB consisted of 12 fields and was developed by using a relational database management system. For the retrieval of data, Active Server Page (ASP) technology was integrated in this system. Users can access the database by a query or imagemap browsing program. The retrieving system also provides a list of angiogenesis-related molecules classified by three categories, and the database has an external link to NCBI databases. AngioDB is available via the Internet at http://angiodb.snu.ac.kr/. Citation for the above abstract: Sohn, Tae-Kwon, Moon, Eun-Joung, Lee, Seok-Ki, Cho, Hwan-Gue, Kim, Kyu-Won AngioDB: database of angiogenesis and angiogenesis-related molecules Nucl. Acids Res. 2002 30: 369-371 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/369 |
| 234. BayGenomics |
URL: http://baygenomics.ucsf.edu/ Categories: Gene-, System-, or Disease- Specific Databases The BayGenomics gene-trap resource (http://baygenomics.ucsf.edu) provides researchers with access to thousands of mouse embryonic stem (ES) cell lines harboring characterized insertional mutations in both known and novel genes. Each cell line contains an insertional mutation in a specific gene. The identity of the gene that has been interrupted can be determined from a DNA sequence tag. Approximately 75% of our cell lines contain insertional mutations in known mouse genes or genes that share strong sequence similarities with genes that have been identified in other organisms. These cell lines readily transmit the mutation to the germline of mice and many mutant lines of mice have already been generated from this resource. BayGenomics provides facile access to our entire database, including sequence tags for each mutant ES cell line, through the World Wide Web. Investigators can browse our resource, search for specific entries, download any portion of our database and BLAST sequences of interest against our entire set of cell line sequence tags. They can then obtain the mutant ES cell line for the purpose of generating knockout mice. Citation for the above abstract: Stryke, Doug, Kawamoto, Michiko, Huang, Conrad C., Johns, Susan J., King, Leslie A., Harper, Courtney A., Meng, Elaine C., Lee, Roy E., Yee, Alice, L'Italien, Larry, Chuang, Pao-Tien, Young, Stephen G., Skarnes, William C., Babbitt, Patricia C., Ferrin, Thomas E. BayGenomics: a resource of insertional mutations in mouse embryonic stem cells Nucl. Acids Res. 2003 31: 278-281 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/278 |
| 235. BTKbase: Mutation Registry for X-linked Agammaglobulinemia |
URL: http://bioinf.uta.fi/BTKbase/ Categories: Gene-, System-, or Disease- Specific Databases X-linked agammaglobulinemia (XLA) is an immunodeficiency caused by mutations in the gene coding for Bruton's agammaglobulinemia tyrosine kinase (BTK). A database (BTKbase) of BTK mutations has been compiled and the recent update lists 463 mutation entries from 406 unrelated families showing 303 unique molecular events. In addition to mutations, the database also lists variants or polymorphisms. Each patient is given a unique patient identity number (PIN). Information is included regarding the phenotype including symptoms. Mutations in all the five domains of BTK have been noticed to cause the disease, the most common event being missense mutations. The mutations appear almost uniformly throughout the molecule and frequently affect CpG sites that code for arginine residues. The putative structural implications of all the missense mutations are given in the database. The improved version of the registry having a number of new features is available at http://www. helsinki.fi/science/signal/btkbase.html Citation for the above abstract: Vihinen, M, Brandau, O, Branden, LJ, Kwan, SP, Lappalainen, I, Lester, T, Noordzij, JG, Ochs, HD, Ollila, J, Pienaar, SM, Riikonen, P, Saha, BK, Smith, CIE BTKbase, mutation database for X-linked agammaglobulinemia (XLA) Nucl. Acids Res. 1998 26: 242-247 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/242 |
| 236. CarpeDB: A Comprehensive Database on the Genetics of Epilepsy |
URL: http://www.carpedb.ua.edu/ Categories: Gene-, System-, or Disease- Specific Databases "CarpeDB, a dynamic epilepsy genetics database sponsored by a National Science Foundation CAREER Award and the Department of Biological Sciences at The University of Alabama, is now available to the public! Although information pertinent to the study of epilepsy genetics has been widely available online, researchers interested in the genetics of epilepsy were required to utilize various sources for data collection. CarpeDB serves as a novel source for epilepsy researchers by featuring scores of "epilepsy genes" and associated publications in one locus. Furthermore, multiple genes implicated in epilepsy are also implicated in other human disorders. Consequently, the use of CarpeDB need not be limited to epilepsy researchers." |
| 237. CASRdb: Calcium Sensing Receptor Databases |
URL: http://www.casrdb.mcgill.ca/ Categories: Gene-, System-, or Disease- Specific Databases Familial hypocalciuric hypercalcemia (FHH) is caused by heterozygous loss-of-function mutations in the calcium-sensing receptor (CASR), in which the lifelong hypercalcemia is generally asymptomatic. Homozygous loss-of-function CASR mutations manifest as neonatal severe hyperparathyroidism (NSHPT), a rare disorder characterized by extreme hypercalcemia and the bony changes of hyperparathyroidism, which occur in infancy. Activating mutations in the CASR gene have been identified in several families with autosomal dominant hypocalcemia (ADH), autosomal dominant hypoparathyroidism, or hypocalcemic hypercalciuria. Individuals with ADH may have mild hypocalcemia and relatively few symptoms. However, in some cases seizures can occur, especially in younger patients, and these often happen during febrile episodes due to intercurrent infection. Thus far, 112 naturally-occurring mutations in the human CASR gene have been reported, of which 80 are unique and 32 are recurrent. To better understand the mutations causing defects in the CASR gene and to define specific regions relevant for ligand-receptor interaction and other receptor functions, the data on mutations were collected and the information was centralized in the CASRdb (www.casrdb.mcgill.ca), which is easily and quickly accessible by search engines for retrieval of specific information. The information can be searched by mutation, genotype-phenotype, clinical data, in vitro analyses, and authors of publications describing the mutations. CASRdb is regularly updated for new mutations and it also provides a mutation submission form to ensure up-to-date information. The home page of this database provides links to different web pages that are relevant to the CASR, as well as disease clinical pages, sequence of the CASR gene exons, and position of mutations in the CASR. The CASRdb will help researchers to better understand and analyze the mutations, and aid in structure-function analyses. Citation for the above abstract: Pidasheva S, D'Souza-Li L, Canaff L, Cole DE, Hendy GN. CASRdb: calcium-sensing receptor locus-specific database for mutations causing familial (benign) hypocalciuric hypercalcemia, neonatal severe hyperparathyroidism, and autosomal dominant hypocalcemia. Hum Mutat. 2004 Aug;24(2):107-11. © 2004 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15241791 |
| 238. Database of Human Type I and Type III Collagen Mutations |
URL: http://www.le.ac.uk/genetics/collagen/ Categories: Gene-, System-, or Disease- Specific Databases The collagens are a large and diverse family of proteins which are found in the extracellular matrix. In common with one another, the 19 known collagen types have triple-helical domains of variable length but they differ with respect to their overall size and the nature and location of their globular domains. Collagen mutations lead to heritable defects of connective tissues and mutation data for collagen types I and III are presented here. The mutation data are accessible on the world wide web at http://www.le.ac.uk/genetics/collagen/ Citation for the above abstract: Dalgleish, R The Human Collagen Mutation Database 1998 Nucl. Acids Res. 1998 26: 253-255 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/253 |
| 239. Cytokine Gene Polymorphism in Human Disease: On-line Databases |
URL: http://www.bris.ac.uk/pathandmicro/services/GAI/cytokine4.htm Categories: Gene-, System-, or Disease- Specific Databases, General Polymorphism Databases The pathologies of many infectious, autoimmune and malignant diseases are influenced by the profiles of cytokine production in pro-inflammatory (TH1) and anti-inflammatory (TH2) T cells. Interindividual differences in cytokine profiles appear to be due, at least in part, to allelic polymorphism within regulatory regions of cytokine gene. Many studies have examined the relationship between cytokine gene polymorphism, cytokine gene expression in vitro, and the susceptibility to and clinical severity of diseases. A review of the findings of these studies is presented. An on-line version featuring appropriate updates is accessible from the World Wide Web site, http://www.pam.bris.ac.uk/services/GAI/cytokine4.htm. Citation for the above abstract: Bidwell J, Keen L, Gallagher G, Kimberly R, Huizinga T, McDermott MF, Oksenberg J, McNicholl J, Pociot F, Hardt C, D'Alfonso S. Cytokine gene polymorphism in human disease: on-line databases. Genes Immun. 1999 Sep;1(1):3-19. © 1999 Nature Publishing Group. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11197303 |
| 240. EICO DB: Expression-based Imprint Candidate Organiser |
URL: http://fantom2.gsc.riken.jp/EICODB/ Categories: Gene-, System-, or Disease- Specific Databases We have developed an integrated database that is specialized for the study of imprinted disease genes. The database contains novel candidate imprinted genes identified by the RIKEN full-length mouse cDNA microarray study, information on validated single nucleotide polymorphisms (SNPs) to confirm imprinting using reciprocal mouse crosses and the predicted physical position of imprinting-related disease loci in the mouse and human genomes. It has two user-friendly search interfaces: the SNP-central view (MuSCAT: MoUse SNP CATalog) and the candidate gene-central view (CITE: Candidate Imprinted Transcripts by Expression). The database, EICO (Expression-based Imprint Candidate Organizer), can be accessed via the World Wide Web (http://fantom2.gsc.riken.jp/EICODB/) and the DAS client software. These data and interfaces facilitate understanding of the mechanism of imprinting in mammalian inherited traits. Citation for the above abstract: Nikaido, Itoshi, Saito, Chika, Wakamoto, Akiko, Tomaru, Yasuhiro, Arakawa, Takahiro, Hayashizaki, Yoshihide, Okazaki, Yasushi EICO (Expression-based Imprint Candidate Organizer): finding disease-related imprinted genes Nucl. Acids Res. 2004 32: D548-551 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D548 |
| 241. EpoDB: Erythropoiesis Database |
URL: http://www.cbil.upenn.edu/EpoDB/ Categories: Gene-, System-, or Disease- Specific Databases EpoDB is a database of genes expressed in vertebrate red blood cells. It is also a prototype for the creation of cell and tissue-specific databases from multiple external sources. The information in EpoDB obtained from GenBank, SWISS-PROT, Transfac, TRRD and GERD is curated to provide high quality data for sequence analysis aimed at understanding gene regulation during erythropoiesis. New protocols have been developed for data integration and updating entries. Using a BLAST-based algorithm, we have grouped GenBank entries representing the same gene together. This sequence similarity protocol was also used to identify new entries to be included in EpoDB. We have recently implemented our database in Sybase (relational tables) in addition to SICStus Prolog to provide us with greater flexibility in asking complex queries that utilize information from multiple sources. New additions to the public web site (http://www.cbil.upenn.edu/epodb) for accessing EpoDB are the ability to retrieve groups of entries representing different variants of the same gene and to retrieve gene expression data. The BLAST query has been enhanced by incorporating BLASTView, an interactive and graphical display of BLAST results. We have also enhanced the queries for retrieving sequence from specified genes by the addition of MEME, a motif discovery tool, to the integrated analysis tools which include CLUSTALW and TESS. Citation for the above abstract: Stoeckert, CJ, Jr, Salas, F, Brunk, B, Overton, GC EpoDB: a prototype database for the analysis of genes expressed during vertebrate erythropoiesis Nucl. Acids Res. 1999 27: 200-203 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/200 |
| 242. ERGDB: Estrogen Responsive Genes Database |
URL: http://research.i2r.a-star.edu.sg/promoter/Ergdb-v11/ Categories: Gene-, System-, or Disease- Specific Databases ERGDB is an integrated knowledge database dedicated to genes responsive to estrogen. Genes included in ERGDB are those whose expression levels are experimentally proven to be either up-regulated or down-regulated by estrogen. Genes included are identified based on publications from the PubMed database and each record has been manually examined, evaluated and selected for inclusion by biologists. ERGDB aims to be a unified gateway to store, search, retrieve and update information about estrogen responsive genes. Each record contains links to relevant databases, such as GenBank, LocusLink, Refseq, PubMed and ATCC. The unique feature of ERGDB is that it contains information on the dependence of gene reactions on experimental conditions. In addition to basic information about the genes, information for each record includes gene functional description, experimental methods used, tissue or cell type, gene reaction, estrogen exposure time and the summary of putative estrogen response elements if the gene’s promoter sequence was available. Through a web interface at http://sdmc.i2r.a-star.edu.sg/ergdb/cgi-bin/explore.pl users can either browse or query ERGDB. Access is free for academic and non-profit users. Citation for the above abstract: Tang, Suisheng, Han, Hao, Bajic, Vladimir B. ERGDB: Estrogen Responsive Genes Database Nucl. Acids Res. 2004 32: D533-536 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D533 |
| 243. EyeSite |
URL: http://eyesite.cryst.bbk.ac.uk/ Categories: Gene-, System-, or Disease- Specific Databases The EyeSite is a web-based database of protein families for proteins that function in the eye and their homologous sequences. The resource clusters proteins at different levels of homology in order to faciltate functional annotation of sequences and modelling of proteins from structural homologues. Eye proteins are organized into the tissue types in which they function and are clustered into homologous families using a novel protocol employing the TribeMCL algorithm. Homologous families are further subdivided into sequence clusters for which multiple sequence alignments are generated. Structural annotations from the CATH domain database are provided for nearly 90% of the sequences, and protein family annotations from the Pfam database for 86%. Homology models have also been generated where appropriate. The EyeSite is stored in a relational database and is extensively linked to other online bioinformatics resources to help relate allelic variants, annotations and clinical details to the derived data in the database. The EyeSite is available for online search, sequence information and model retrieval at http://eyesite.cryst.bbk.ac.uk/. Citation for the above abstract: Lee, David A., Fefeu, Sandrine, Edo-Ukeh, Adrian A., Orengo, Christine A., Slingsby, Christine EyeSite: a semi-automated database of protein families in the eye Nucl. Acids Res. 2004 32: D148-152 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D148 |
| 244. FUNPEP: Information System for "Low Complexity Sequence Regions" |
URL: http://swift.cmbi.kun.nl/swift/FUNPEP/gergo/ Categories: Gene-, System-, or Disease- Specific Databases, Individual Protein Family Databases "This part of the FUNPEP project is a bit different from all the others. The peptides on these pages were not chosen because of some kind of sequence similarity, what is more, they hardly have any. Their common, and very starnge property is the ability to form amyloid plaques (or fibrils). The exact structure and the formation of these supermolacular structures are still subject of research, but there are lots of promising results. As a part of the FUNPEP project, we made a small collection of peptides, which are known to form these amyloid plaques. Sequences, including respective animal analogues, were extracted from SWISSPROT, and aligned. These sequences and some words about the peptides can be found under the links in the table below. Some molecular modelling was also perfomed, to show some possible structures of amyloids." |
| 245. GOLD.db: Genomics Of Lipid-associated Disorders |
URL: http://gold.tugraz.at/ Categories: Gene-, System-, or Disease- Specific Databases BACKGROUND: The GOLD.db (Genomics of Lipid-Associated Disorders Database) was developed to address the need for integrating disparate information on the function and properties of genes and their products that are particularly relevant to the biology, diagnosis management, treatment, and prevention of lipid-associated disorders. Description: The GOLD.db (http://gold.tugraz.at) provides a reference for pathways and information about the relevant genes and proteins in an efficiently organized way. The main focus was to provide biological pathways with image maps and visual pathway information for lipid metabolism and obesity-related research. This database provides also the possibility to map gene expression data individually to each pathway. Gene expression at different experimental conditions can be viewed sequentially in context of the pathway. Related large scale gene expression data sets were provided and can be searched for specific genes to integrate information regarding their expression levels in different studies and conditions. Analytic and data mining tools, reagents, protocols, references, and links to relevant genomic resources were included in the database. Finally, the usability of the database was demonstrated using an example about the regulation of Pten mRNA during adipocyte differentiation in the context of relevant pathways. CONCLUSIONS: The GOLD.db will be a valuable tool that allow researchers to efficiently analyze patterns of gene expression and to display them in a variety of useful and informative ways, allowing outside researchers to perform queries pertaining to gene expression results in the context of biological processes and pathways. Citation for the above abstract: Hubert Hackl, Michael Maurer, Bernhard Mlecnik, Jurgen Hartler, Gernot Stocker, Diego Miranda-Saavedra, and Zlatko Trajanoski GOLD.db: genomics of lipid-associated disorders database BMC Genomics 2004, 5:93; doi:10.1186/1471-2164-5-93 © 2003 By the Authors The full text of the article can be found at: http://www.biomedcentral.com/1471-2164/5/93 |
| 246. HaemB: Haemophilia B Mutation Database |
URL: http://www.kcl.ac.uk/ip/petergreen/haemBdatabase.html Categories: Gene-, System-, or Disease- Specific Databases The eighth edition of the haemophilia B database (http://www.umds.ac.uk/molgen/haemBdatabase.htm ) lists in an easily accessible form all known factor IX mutations due to small changes (base substitutions and short additions and/or deletions of <30 bp) identified in haemophilia B patients. The 1713 patient entries are ordered by the nucleotide number of their mutation. Where known, details are given on: factor IX activity, factor IX antigen in circulation, presence of inhibitor and origin of mutation. References to published mutations are given and the laboratories generating the data are indicated. Citation for the above abstract: Giannelli, F, Green, PM, Sommer, SS, Poon, M, Ludwig, M, Schwaab, R, Reitsma, PH, Goossens, M, Yoshioka, A, Figueiredo, MS, Brownlee, GG Haemophilia B: database of point mutations and short additions and deletions--eighth edition Nucl. Acids Res. 1998 26: 265-268 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/265 |
| 247. HbVar: A Database of Human Hemoglobin Variants and Thalassemias |
URL: http://globin.cse.psu.edu/globin/hbvar/ Categories: Gene-, System-, or Disease- Specific Databases HbVar (http://globin.cse.psu.edu/globin/hbvar/) is a relational database developed by a multi-center academic effort to provide up-to-date and high quality information on the genomic sequence changes leading to hemoglobin variants and all types of thalassemia and hemoglobinopathies. Extensive information is recorded for each variant and mutation, including sequence alterations, biochemical and hematological effects, associated pathology, ethnic occurrence and references. In addition to the regular updates to entries, we report two significant advances: (i) The frequencies for a large number of mutations causing ß-thalassemia in at-risk populations have been extracted from the published literature and made available for the user to query upon. (ii) HbVar has been linked with the GALA (Genome Alignment and Annotation database, available at http://globin.cse.psu.edu/gala/) so that users can combine information on hemoglobin variants and thalassemia mutations with a wide spectrum of genomic data. It also expands the capacity to view and analyze the data, using tools within GALA and the University of California at Santa Cruz (UCSC) Genome Browser. Citation for the above abstract: Patrinos, George P., Giardine, Belinda, Riemer, Cathy, Miller, Webb, Chui, David H. K., Anagnou, Nicholas P., Wajcman, Henri, Hardison, Ross C. Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies Nucl. Acids Res. 2004 32: D537-541 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D537 |
| 248. HemBase |
URL: http://hembase.niddk.nih.gov/ Categories: Gene-, System-, or Disease- Specific Databases, Microarray Data and other Gene Expression Databases Hembase (http://hembase.niddk.nih.gov) is an integrated browser and genome portal designed for web-based examination of the human erythroid transcriptome. To date, Hembase contains 15,752 entries from erythroblast Expressed Sequenced Tags (ESTs) and 380 referenced genes relevant for erythropoiesis. The database is organized to provide a cytogenetic band position, a unique name as well as a concise annotation for each entry. Search queries may be performed by name, keyword or cytogenetic location. Search results are linked to primary sequence data and three major human genome browsers for access to information considered current at the time of each search. Hembase provides interested scientists and clinical hematologists with a genome-based approach toward the study of erythroid biology. Citation for the above abstract: Goh, Sung-Ho, Lee, Y. Terry, Bouffard, Gerard G., Miller, Jeffery L. Hembase: browser and genome portal for hematology and erythroid biology Nucl. Acids Res. 2004 32: D572-574 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D572 |
| 249. HemoPDB: Hematopoietic Promoter Database |
URL: http://bioinformatics.med.ohio-state.edu/HemoPDB/ Categories: Gene-, System-, or Disease- Specific Databases, Microarray Data and other Gene Expression Databases Hematopoiesis describes the process of the normal formation and development of blood cells, involving both proliferation and differentiation from stem cells. Abnormalities in this developmental program yield blood cell diseases, such as leukemia. Although, in recent years, extensive molecular research in normal hematopoietic development has characterized transcription factors and their binding sites in the target gene promoters, the information generated is highly fragmented. In order to integrate this important regulatory information with the corresponding genomic sequences, we have developed a new database called Hematopoiesis Promoter Database (HemoPDB). HemoPDB is a comprehensive resource focused on transcriptional regulation during hematopoietic development and associated aberrances that result in malignancy. HemoPDB (version 1.0) contains 246 promoter sequences and 604 experimentally known cis-regulatory elements of 187 different transcription factors, with links to published references. Orthologous promoters from different species are linked with each other and displayed in the same database record, accompanied by a visual image of the promoters and corresponding annotations of cis-regulatory elements. HemoPDB may be searched for the promoter of a specific gene, transcription factors and target genes, and genes that are expressed in a certain cell type or lineage, through a user-friendly web interface at http://bioinformatics.med.ohio-state.edu/HemoPDB. Links to the documentation and other technical details are provided on this website. Citation for the above abstract: Pohar, Twyla T., Sun, Hao, Davuluri, Ramana V. HemoPDB: Hematopoiesis Promoter Database, an information resource of transcriptional regulation in blood cell development Nucl. Acids Res. 2004 32: D86-90 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D86 |
| 250. HORDE: Human Olfactory Receptor Data Exploratorium |
URL: http://bioportal.weizmann.ac.il/HORDE/ Categories: Gene-, System-, or Disease- Specific Databases Recent enhancements and current research in the GeneCards (GC) (http://bioinfo.weizmann.ac.il/cards/) project are described, including the addition of gene expression profiles and integrated gene locations. Also highlighted are the contributions of specialized associated human gene-centric databases developed at the Weizmann Institute. These include the Unified Database (UDB) (http://bioinfo.weizmann.ac.il/udb) for human genome mapping, the human Chromosome 21 database at the Weizmann Insti-tute (CroW 21) (http://bioinfo.weizmann.ac.il/crow21), and the Human Olfactory Receptor Data Explora-torium (HORDE) (http://bioinfo.weizmann.ac.il/HORDE). The synergistic relationships amongst these efforts have positively impacted the quality, quantity and usefulness of the GeneCards gene compendium. Citation for the above abstract: Safran, Marilyn, Chalifa-Caspi, Vered, Shmueli, Orit, Olender, Tsviya, Lapidot, Michal, Rosen, Naomi, Shmoish, Michael, Peter, Yakov, Glusman, Gustavo, Feldmesser, Ester, Adato, Avital, Peter, Inga, Khen, Miriam, Atarot, Tal, Groner, Yoram, Lancet, Doron Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE Nucl. Acids Res. 2003 31: 142-146 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/142 |
| 251. HOX-Pro: Homeobox Genes DataBase |
URL: http://www.iephb.nw.ru/labs/lab38/spirov/hox_pro/hox-pro00.html Categories: Gene-, System-, or Disease- Specific Databases The HOX Pro database contains information about the organization, function and evolution of gene ensembles, notably the homeobox-containing genes. It is now clear that a subset of genes containing the homeobox motif play key roles in the orchestration of genes which control embryonic patterning, morphogenesis, cell differentiation and malignant transformation. The HOX Pro contains a broad spectrum of information including images, diagrams and animations. Currently this amounts to approximately 700 HTML pages together with 400 images which contain information on 200 groups of genes and 90 promoters, in turn linked to maps of 13 HOX clusters and nine genetic networks. There are about 700 sequences of individual hox-genes of animals classified in approximately 200 homologous or paralogous groups. Graphical representation of HOX clusters and Hox-based networks is accomplished by means of flow and 3D diagrams, JavaScript animations and Java applets. The HOX Pro now includes sections presenting data mining and data simulation issues. The DB is located at http://www.iephb.nw.ru/hoxpro. Citation for the above abstract: Spirov, Alexander V., Borovsky, Mikhail, Spirova, Olesya A. HOX Pro DB: the functional genomics of hox ensembles Nucl. Acids Res. 2002 30: 351-353 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/351 |
| 252. HPMR: Human Plasma Membrane Receptome |
URL: http://receptome.stanford.edu/HPMR/ Categories: Gene-, System-, or Disease- Specific Databases, Microarray Data and other Gene Expression Databases Intercellular communication in multicellular organisms requires the relay of extracellular signals by cell surface proteins to the interiors of cells. The availability of genome sequences from humans and several model organisms has facilitated the identification of several human plasma membrane receptor families and allowed the analysis of their phylogeny. This review provides a global categorization of most known signal transduction-associated receptors as enzymes, recruiters, and latent transcription factors. The evolution of known families of human plasma membrane signaling receptors was traced in current literature and validated by sequence relatedness. This global analysis reveals themes that recur during receptor evolution and allows the formulation of hypotheses for the origins of receptors. The human receptor families involved in signaling (with the exception of channels) are presented in the Human Plasma Membrane Receptome database. Citation for the above abstract: Ben-Shlomo I, Yu Hsu S, Rauch R, Kowalski HW, Hsueh AJ. Signaling receptome: a genomic and evolutionary perspective of plasma membrane receptors involved in signal transduction. Science's STKE. 2003 Jun 17;2003(187):RE9. © 2003 Science's STKE. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12815191 |
| 253. Human PAX2 Allelic Variant Database |
URL: http://pax2.hgu.mrc.ac.uk/ Categories: Gene-, System-, or Disease- Specific Databases Mutations in the PAX2 gene are associated with developmental eye, kidney and ear anomalies with the disease commonly known as renal-coloboma syndrome. The mutations found to date show marked differences in phenotype making it difficult to predict the clinical effects of a PAX2 mutation. The database was created to satisfy the need for a single source of information about PAX2 mutations for researchers and clinicians. It also fills the need for a database to which researchers can submit new mutation information with minimal difficulty. Neutral polymorphisms are also included in the database as this information is also important to researchers. It is hoped that this database will provide a valuable tool for research and clinical diagnosis of renal-coloboma syndrome. Information about each mutation in the database is stored in 59 fields which are designed to provide as much information about each mutation as possible. Citation for the above excerpt: Leslie McNoe, Alastair Brown, Mark McKie, and Michael Eccles The Human PAX2 Mutation Database Nucl. Acids Res. 1999 27: On-line © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/1/DC1/37 |
| 254. Human PAX6 Allelic Variant Database |
URL: http://pax6.hgu.mrc.ac.uk/ Categories: Gene-, System-, or Disease- Specific Databases The Human PAX6 Mutation Database contains details of 94 mutations of the PAX6 gene. A Microsoft Access program is used by the Curator to store, update and search the database entries. Mutations can be entered directly by the Curator, or imported from submissions made via the World Wide Web. The PAX6 Mutation Database web page at URL http://www.hgu.mrc.ac.uk/Softdata/PAX6/ provides information about PAX6, as well as a fill-in form through which new mutations can be submitted to the Curator. A search facility allows remote users to query the database. A plain text format file of the data can be downloaded via the World Wide Web. The Curation program contains prior knowledge of the genetic code and of the PAX6 gene including cDNA sequence, location of intron/exon boundaries, and protein domains, so that the minimum of information need be provided by the submitter or Curator. Citation for the above abstract: Brown, A, McKie, M, van Heyningen, V, Prosser, J The Human PAX6 Mutation Database Nucl. Acids Res. 1998 26: 259-264 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/259 |
| 255. IL2Rgbase: X-linked SCID Mutation Database |
URL: http://research.nhgri.nih.gov/scid/ Categories: Gene-, System-, or Disease- Specific Databases, Immunological Databases "X-linked severe combined immunodeficiency (XSCID or X-SCID) is an immune disorder caused by mutations in the X-linked gene IL2RG, which encodes the common gamma chain (c) of the lymphocyte receptors for interleukin-2 (IL-2) and many other cytokines. A database of human XSCID mutations (IL2RGbase) has been assembled. Information on new mutations may be submitted online." |
| 256. DOQCS: Database of Quantitative Cellular Signaling |
URL: http://doqcs.ncbs.res.in/ Categories: Intermolecular Interactions and Signaling Pathways Databases, Neuroscience Databases MOTIVATION: Analysis of cellular signaling interactions is expected to pose an enormous informatics challenge, perhaps even larger than analyzing the genome. The complex networks arising from signaling processes are traditionally represented as block diagrams. A key step in the evolution toward a more quantitative understanding of signaling is to explicitly specify the kinetics of all chemical reaction steps in a pathway. Technical advances in proteomics and high-throughput protein interaction assays promise a flood of such quantitative data. While annotations, molecular information and pathway connectivity have been compiled in several databases, and there are several proposals for general cell model description languages, there is currently little experience with databases of chemical kinetics and reaction level models of signaling networks. RESULTS: The Database of Quantitative Cellular Signaling is a repository of models of signaling pathways. It is intended both to serve the growing field of chemical-reaction level simulation of signaling networks, and to anticipate issues in large-scale data management for signaling chemistry. AVAILABILITY: The Database of Quantitative Cellular Signaling is available at http://doqcs.ncbs.res.in. Links to the signaling model simulator, GENESIS/Kinetikit are at http://www.ncbs.res.in/~bhalla/kkit/index.html and are also provided from within the database. The database source code is available under the GNU Public License. Citation for the above abstract: Sudhir Sivakumaran , Sridhar Hariharaputran , Jyoti Mishra , and Upinder S. Bhalla The Database of Quantitative Cellular Signaling: management and analysis of chemical kinetic models of signaling networks Bioinformatics 19: 408-415. © 2003 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/19/3/408 |
| 257. NTSA Workbench Rabbit Atlas |
URL: http://soma.npa.uiuc.edu/isnpa/atlas/rabbit/index.html Categories: Brain Atlases "The Rabbit Atlas was constructed from the digitized photographs of a set of wet slices, where each slice was 40 microns thick. The resulting atlas is high resolution, which is necessary due to the detail with which we wish to enter spacially-dependent data. This image set contains 779 images, where each image is 587x486 pixels." |
| 258. NTSA Workbench Electric Fish Atlas |
URL: http://soma.npa.uiuc.edu/isnpa/atlas/fish/index.html Categories: Brain Atlases "This image set contains 626 images, where each image is 450x452 pixels." Atlas data from: Maler L, Sas E, Johnston S, Ellis W (1991) An Atlas of the Brain of the Electric Fish Apteronotus leptorhynchus J. Chem. Neuroanat. 4, 1-38. |
| 259. NTSA Workbench Human fMRI Atlas |
URL: http://soma.npa.uiuc.edu/isnpa/atlas/human/index.html Categories: Brain Atlases "The Human fMRI images are from two Talairach datasets. This image set contains 91 images, where each image is 91x109 pixels." |
| 260. NTSA Workbench |
URL: http://soma.npa.uiuc.edu/isnpa/overview.html Categories: Neuroscience Databases "Much current knowledge about brain function is based on analysis of firing patterns of individual neurons. As in many other areas of science, this field is experiencing an explosion of data. Huge data sets are being amassed with new computer-based data acquisition systems and techniques for recording simultaneously from many neurons. Neural modeling generates massive simulated data sets that need to be processed, analyzed and compared with experimental data. The goal of this project is to develop an information handling system that will make routine the storage, organization, retrieval, analysis and sharing of experimental and simulated neuronal data. The ultimate aim is to develop a set of tools, techniques and standards that can be disseminated to help meet the needs of a large community of neuroscientists who work with neuronal data." |
| 261. Duke / Southampton Archive of Neuronal Morphology |
URL: http://neuron.duke.edu/cells/ Categories: Neuroscience Databases "This site is intended to facilitate the free exchange of data between groups studying neuronal morphology." |
| 262. L-Neuron Virtual Neuromorphology Electronic Database |
URL: http://krasnow.gmu.edu/L-Neuron/L-Neuron/home.htm Categories: Neuroscience Databases It is generally assumed that the variability of neuronal morphology has an important effect on the connectivitity and response within the nervous system, but this effect has not been thoroughly investigated. Neuroanatomical archives represent a crucial tool to explore structure-function relationships in the brain. We are developing computational tools to describe, generate, store, and render large sets of three-dimensional neuronal structures in a format that is both compact, quantitative, accurate, and readily accessible to the neuroscientist. ... We are developing two programs, L-Neuron and ArborVitae, which implement several global and local algorithms, to investigate systematically the potential of the "computational neuroanatomy" approach for neuroscience databases. We virtually generated anatomically plausible neurons for several morphological classes, including cerebellar Purkinje cells, hippocampal pyramidal and granule cells, and spinal cord motoneurons. Citation for the above excerpts: Giorgio Ascoli, Jeffrey Krichmar, Slawomir Nasuto, and Steven Senft Local and global approaches in computational neuroanatomy Philosophical Transactions of The Royal Society: Biological Sciences. 2001. 356(1412) © 2001 The Royal Society. The full text of the article can be found at: http://www.hirn.uni-duesseldorf.de/rk/neurodat.htm#MORPHOLOGY |
| 263. Mouse Brain Atlases |
URL: http://www.mbl.org/mbl_main/atlas.html Categories: Brain Atlases Hosted by MBL: The Mouse Brain Library. |
| 264. Karolinska Institute: Databases in Medicine and Related Areas |
URL: http://kib.ki.se/databas/list_databases_en.asp Categories: Metadatabases and Directories "Some of the databases are only accessible for KI members." |
| 265. Internet Directory for Botany |
URL: http://www.botany.net/IDB/ Categories: Metadatabases and Directories "The Internet Directory of Botany is an index to botanical information available on the Internet, compiled by Anthony R. Brach [brach (at) oeb.harvard.edu] (Harvard University Herbarium, Cambridge / Missouri Botanical Garden, St. Louis, USA, www page), Raino Lampinen (Botanical Museum, Finnish Museum of Natural History, University of Helsinki, Finland; www page), Shunguo Liu [cathay (at) cathay.net] (SHL Systemhouse, Edmonton, Canada; www page) and Keith McCree (Oakridge, Oregon; www page). The alphabetical list (formerly the List of WWW Sites of Interest to Botanists) was compiled by Anthony R. Brach. It was originally posted on TAXACOM in March 1995. HTML format is created and maintained by Shunguo Liu. The subject category list (formerly A Collection of Botany Related URLs), has been discontinued. It was created by Raino Lampinen and maintained from Autumn 1993 to the year 2000. It started as a personal bookmark list of botanical gopher sites, then in March 1994 also included www sites, and was made available via WWW in December 1994." |
| 266. LNI Cortical Neuron Database |
URL: http://neurodatabase.org Categories: Neuroscience Databases "Our cortical neurophysiology database project uses a data model that is a subset of our Common Data Model for neuroscience and biophysical data archiving and exchange. Every component of the Common Data Model is a member of one of five superclasses that together span the complex domain of contemporary neurophysiology. The evolving model is designed to become as well an open extensible standard for describing and sharing data models, metadata, and dataset formats of a wide range of neuroscience data resources: a blueprint for neuroscience data exchange." |
| 267. DG CST: Disease Genes Conserved Sequence Tags Database |
URL: http://143.225.208.11/cst3/ Categories: General Human Genetics Databases The identification and study of evolutionarily conserved genomic sequences that surround disease-related genes is a valuable tool to gain insight into the functional role of these genes and to better elucidate the pathogenetic mechanisms of disease. We created the DG-CST (Disease Gene Conserved Sequence Tags) database for the identification and detailed annotation of human-mouse conserved genomic sequences that are localized within or in the vicinity of human disease-related genes. CSTs are defined as sequences that show at least 70% identity between human and mouse over a length of at least 100 bp. The database contains CST data relative to over 1088 genes responsible for monogenetic human genetic diseases or involved in the susceptibility to multifactorial/polygenic diseases. DG-CST is accessible via the internet at http://dgcst.ceinge.unina.it/ and may be searched using both simple and complex queries. A graphic browser allows direct visualization of the CSTs and related annotations within the context of the relative gene and its transcripts. Citation for the above abstract: Boccia, Angelo, Petrillo, Mauro, di Bernardo, Diego, Guffanti, Alessandro, Mignone, Flavio, Confalonieri, Stefano, Luzi, Lucilla, Pesole, Graziano, Paolella, Giovanni, Ballabio, Andrea, Banfi, Sandro DG-CST (Disease Gene Conserved Sequence Tags), a database of human-mouse conserved elements associated to disease genes Nucl. Acids Res. 2005 33: D505-510 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D505 |
| 268. GenAtlas |
URL: http://www.genatlas.org/ Categories: General Human Genetics Databases "Founded in 1986, GENATLAS compiles the information relevant to the mapping efforts of the Human Genome Project. At this day ( 07/10/2003), this information is collected from more than 48000 articles in the literature, collected in more than 870 reviews. The articles are daily analyzed by annotators to update the GENATLAS database. Only the objects with a known cytogenetic location are retained." |
| 269. GeneCards |
URL: http://bioinfo.weizmann.ac.il/cards/index.shtml Categories: General Human Genetics Databases, Human Genome Databases, Maps, and Viewers Recent enhancements and current research in the GeneCards (GC) (http://bioinfo.weizmann.ac.il/cards/) project are described, including the addition of gene expression profiles and integrated gene locations. Also highlighted are the contributions of specialized associated human gene-centric databases developed at the Weizmann Institute. These include the Unified Database (UDB) (http://bioinfo.weizmann.ac.il/udb) for human genome mapping, the human Chromosome 21 database at the Weizmann Insti-tute (CroW 21) (http://bioinfo.weizmann.ac.il/crow21), and the Human Olfactory Receptor Data Explora-torium (HORDE) (http://bioinfo.weizmann.ac.il/HORDE). The synergistic relationships amongst these efforts have positively impacted the quality, quantity and usefulness of the GeneCards gene compendium. Citation for the above abstract: Safran, Marilyn, Chalifa-Caspi, Vered, Shmueli, Orit, Olender, Tsviya, Lapidot, Michal, Rosen, Naomi, Shmoish, Michael, Peter, Yakov, Glusman, Gustavo, Feldmesser, Ester, Adato, Avital, Peter, Inga, Khen, Miriam, Atarot, Tal, Groner, Yoram, Lancet, Doron Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE Nucl. Acids Res. 2003 31: 142-146 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/142 |
| 270. Genetics Home Reference |
URL: http://ghr.nlm.nih.gov/ Categories: General Human Genetics Databases "Genetics Home Reference is the National Library of Medicine's web site for consumer information about genetic conditions and the genes or chromosomes responsible for those conditions." |
| 271. HAGR: Human Ageing Genomic Resources |
URL: http://genomics.senescence.info/ Categories: General Human Genetics Databases The Human Ageing Genomic Resources (HAGR) is a collection of online resources for studying the biology of human ageing. HAGR features two main databases: GenAge and AnAge. GenAge is a curated database of genes related to human ageing. Entries were primarily selected based on genetic perturbations in animal models and human diseases as well as an extensive literature review. Each entry includes a variety of automated and manually curated information, including, where available, protein-protein interactions, the relevant literature, and a description of the gene and how it relates to human ageing. The goal of GenAge is to provide the most complete and comprehensive database of genes related to human ageing on the Internet as well as render an overview of the genetics of human ageing. AnAge is an integrative database describing the ageing process in several organisms and featuring, if available, maximum life span, taxonomy, developmental schedules and metabolic rate, making AnAge a unique resource for the comparative biology of ageing. Associated with the databases are data-mining tools and software designed to investigate the role of genes and proteins in the human ageing process as well as analyse ageing across different taxa. HAGR is freely available to the academic community at http://genomics.senescence.info. Citation for the above abstract: de Magalhaes, Joao Pedro, Costa, Joana, Toussaint, Olivier HAGR: the Human Ageing Genomic Resources Nucl. Acids Res. 2005 33: D537-543 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D537 |
| 272. HCAD: Human Chromosome Aberration Database |
URL: http://www.pdg.cnb.uam.es/UniPub/HCAD/ Categories: General Human Genetics Databases Recurrent chromosome aberrations are an important resource when associating human pathologies to specific genes. However, for technical reasons a large number of chromosome breakpoints are defined only at the level of cytobands and many of the genes involved remain unidentified. We developed a web-based information system that mines the scientific literature and generates textual and comprehensive information on all human breakpoints. We show that the statistical analysis of this textual information and its combination with genomic data can identify genes directly involved in DNA rearrangements. The Human Chromosome Aberration Database (HCAD) is publicly accessible at http://www.pdg.cnb.uam.es/UniPub/HCAD/. Citation for the above abstract: Hoffmann, Robert, Dopazo, Joaquin, Cigudosa, Juan C., Valencia, Alfonso HCAD, closing the gap between breakpoints and genes Nucl. Acids Res. 2005 33: D511-513 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D511 |
| 273. HERVd: Human Endogenous Retrovirus database |
URL: http://herv.img.cas.cz/ Categories: General Human Genetics Databases, Viral Databases An elaboration of HERVd (http://herv.img.cas.cz) is being carried out in two directions. One of them is the integration and better classification of families that diverge considerably from typical retroviral genomes. This leads to a more precise identification of members with individual families. The second improvement is better accessibility of the database and connection with human genome annotation. Citation for the above abstract: Paces, Jan, Pavlicek, Adam, Zika, Radek, Kapitonov, Vladimir V., Jurka, Jerzy, Paces, Vaclav HERVd: the Human Endogenous RetroViruses Database: update Nucl. Acids Res. 2004 32: D50- © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D50 |
| 274. HGMD: Human Gene Mutation Database |
URL: http://www.hgmd.org/ Categories: General Human Genetics Databases, General Polymorphism Databases The Human Gene Mutation Database (HGMD) constitutes a comprehensive core collection of data on germ-line mutations in nuclear genes underlying or associated with human inherited disease (www.hgmd.org). Data catalogued includes: single base-pair substitutions in coding, regulatory and splicing-relevant regions; micro-deletions and micro-insertions; indels; triplet repeat expansions as well as gross deletions; insertions; duplications; and complex rearrangements. Each mutation is entered into HGMD only once in order to avoid confusion between recurrent and identical-by-descent lesions. By March 2003, the database contained in excess of 39,415 different lesions detected in 1,516 different nuclear genes, with new entries currently accumulating at a rate exceeding 5,000 per annum. Since its inception, HGMD has been expanded to include cDNA reference sequences for more than 87% of listed genes, splice junction sequences, disease-associated and functional polymorphisms, as well as links to data present in publicly available online locus-specific mutation databases. Although HGMD has recently entered into a licensing agreement with Celera Genomics (Rockville, MD), mutation data will continue to be made freely available via the Internet. Citation for the above abstract: Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, Abeysinghe S, Krawczak M, Cooper DN. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat. 2003 Jun;21(6):577-81. © 2003 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12754702 |
| 275. OMIM: Online Mendelian Inheritance in Man |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=OMIM Categories: General Human Genetics Databases, General Polymorphism Databases Online Mendelian Inheritance in Man (OMIM) is a comprehensive, authoritative and timely knowledgebase of human genes and genetic disorders compiled to support human genetics research and education and the practice of clinical genetics. Started by Dr Victor A. McKusick as the definitive reference Mendelian Inheritance in Man, OMIM (http://www.ncbi.nlm.nih.gov/omim/) is now distributed electronically by the National Center for Biotechnology Information, where it is integrated with the Entrez suite of databases. Derived from the biomedical literature, OMIM is written and edited at Johns Hopkins University with input from scientists and physicians around the world. Each OMIM entry has a full-text summary of a genetically determined phenotype and/or gene and has numerous links to other genetic databases such as DNA and protein sequence, PubMed references, general and locus-specific mutation databases, HUGO nomenclature, MapViewer, GeneTests, patient support groups and many others. OMIM is an easy and straightforward portal to the burgeoning information in human genetics. Citation for the above abstract: Hamosh, Ada, Scott, Alan F., Amberger, Joanna S., Bocchini, Carol A., McKusick, Victor A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders Nucl. Acids Res. 2005 33: D514-517 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D514 |
| 276. NDB: Nucleic Acid Database |
URL: http://ndbserver.rutgers.edu/ Categories: Nucleic Acid Structure Databases The Nucleic Acid Database was established in 1991 as a resource to assemble and distribute structural information about nucleic acids. Over the years, the NDB has developed generalized software for processing, archiving, querying and distributing structural data for nucleic acid-containing structures. The architecture and capabilities of the Nucleic Acid Database, as well as some of the research enabled by this resource, are presented in this article. Citation for the above abstract: Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C. The Nucleic Acid Database. Acta Crystallogr D Biol Crystallogr. 2002 Jun;58(Pt 6 No 1):889-98. Epub 2002 May 29. © 2003 International Union of Crystallography The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12037326 |
| 277. NTDB: Thermodynamic Database for Nucleic Acids |
URL: http://ntdb.chem.cuhk.edu.hk/ Categories: Nucleic Acid Structure Databases The second release of Thermodynamic Database for Nucleic Acids, NTDB 2.0, includes more than 4600 entries (250% increase over release 1.0). It contains sequence types and details of several thermodynamic parameters (enthalpy, DeltaH; entropy, DeltaS; Gibbs free energy, DeltaG; melting temperature, T(m)), experimental models and methods for extracting thermodynamic parameters, buffer conditions as well as all relevant literature information. In addition, the database statistics and references related to NTDB are included. Information on normal and modified nucleobases and nucleosides are collected in a new section 'Nucleoside' whereby data collected thus far will be release in NTDB 2.0. The NTDB is freely available at http://ntdb.chem.cuhk.edu.hk. Citation for the above abstract: Chiu, Wing Lok Abe Kurtz, Sze, Chun Ngai, Ma, Nap Tak, Chiu, Lai Fan, Leung, Chung Wai, Au-Yeung, Steve Chik Fun NTDB: Thermodynamic Database for Nucleic Acids, Version 2.0 Nucl. Acids Res. 2003 31: 483-485 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/483 |
| 278. RNABase: The RNA Structure Database |
URL: http://www.rnabase.org/ Categories: Nucleic Acid Structure Databases RNABase is a unified database of all three-dimensional structures containing RNA deposited in either the Protein Data Bank (PDB) or Nucleic Acid Data Base (NDB). For each structure, RNABase contains a brief summary as well as annotation of conformational parameters, identification of possible model errors, Ramachandran-style conformational maps and classification of ribonucleotides into conformers. These same analyses can also be performed on structures submitted by users. To facilitate access, structures are automatically placed into a variety of functional and structural categories, including: ribozymes, pseudoknots, etc. RNABase can be freely accessed on the web at http://www.rnabase.org. We are committed to maintaining this database indefinitely. Citation for the above abstract: Murthy, Venkatesh L., Rose, George D. RNABase: an annotated database of RNA structures Nucl. Acids Res. 2003 31: 502-504 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/502 |
| 279. SCOR: Structural Classification of RNA |
URL: http://scor.lbl.gov/ Categories: Nucleic Acid Structure Databases SCOR, the Structural Classification of RNA (http://scor.lbl.gov), is a database designed to provide a comprehensive perspective and understanding of RNA motif three-dimensional structure, function, tertiary interactions and their relationships. SCOR 2.0 represents a major expansion and introduces a new classification organization. The new version represents the classification as a Directed Acyclic Graph (DAG), which allows a classification node to have multiple parents, in contrast to the strictly hierarchical classification used in SCOR 1.2. SCOR 2.0 supports three types of query terms in the updated search engine: PDB or NDB identifier, nucleotide sequence and keyword. We also provide parseable XML files for all information. This new release contains 511 RNA entries from the PDB as of 15 May 2003. A total of 5880 secondary structural elements are classified: 2104 hairpin loops and 3776 internal loops. RNA motifs reported in the literature, such as 'Kink turn' and 'GNRA loops', are now incorporated into the structural classification along with definitions and descriptions. Citation for the above abstract: Tamura, Makio, Hendrix, Donna K., Klosterman, Peter S., Schimmelman, Nancy R. B., Brenner, Steven E., Holbrook, Stephen R. SCOR: Structural Classification of RNA, version 2.0 Nucl. Acids Res. 2004 32: D182-184 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D182 |
| 280. GeneAnnot |
URL: http://genecards.weizmann.ac.il/geneannot/ Categories: Human Genome Databases, Maps, and Viewers, Human ORFs MOTIVATION: High density oligonucleotide arrays are usually annotated in a one-to-one fashion, with each probeset assigned to one gene. However, in reality, subsets of oligonucleotides in a probeset may match sequences within more than one gene, potentially leading to misinterpretations. Moreover, a gene is often represented by more than one probeset, and analyzing probe matches at the mRNA level can help one deduce whether these probesets are derived from the same or different splice variants. RESULTS: The GeneAnnot system comprehensively documents the many-to-many relationship between oligonucleotide array probesets and annotated genes in GeneCards. It performs pairwise alignments between the probe sequences and gene transcripts, and assigns sensitivity and specificity scores to each probeset/gene pair. AVAILABILITY: http://genecards.weizmann.ac.il/geneannot/ SUPPLEMENTARY INFORMATION: Program description and statistics http://genecards.weizmann.ac.il/geneannot/DOC/index.html Citation for the above abstract: Vered Chalifa-Caspi , Itai Yanai , Ron Ophir , Naomi Rosen , Michael Shmoish , Hila Benjamin-Rodrig , Maxim Shklar , Tsippi Iny Stein , Orit Shmueli , Marilyn Safran , and Doron Lancet GeneAnnot: comprehensive two-way linking between oligonucleotide array probesets and GeneCards genes Bioinformatics Advance Access published on June 12, 2004, DOI 10.1093/bioinformatics/bth081. Bioinformatics 20: 1457-1458. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/9/1457 |
| 281. GeneLoc |
URL: http://genecards.weizmann.ac.il/geneloc/ Categories: Human Genome Databases, Maps, and Viewers MOTIVATION:Despite the numerous available whole-genome mapping resources, no comprehensive, integrated map of the human genome yet exists. RESULTS: GeneLoc, software adjunct to GeneCards and UDB, integrates gene lists by comparing genomic coordinates at the exon level and assigns unique and meaningful identifiers to each gene. Citation for the above abstract: Naomi Rosen , Vered Chalifa-Caspi , Orit Shmueli , Avital Adato , Michal Lapidot , Julie Stampnitzky , Marilyn Safran , and Doron Lancet GeneLoc: exon-based integration of human genome maps Bioinformatics 19: 222-224i. © 2003 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/19/suppl_1/i222 |
| 282. GenMapDB: Human Bacterial Artificial Chromosome Map Database |
URL: http://genomics.med.upenn.edu/genmapdb/ Categories: Human Genome Databases, Maps, and Viewers GenMapDB (http://genomics.med.upenn.edu/genmapdb) is a repository of human bacterial artificial chromosome (BAC) clones mapped by our laboratory to sequence-tagged site markers. Currently, GenMapDB contains over 3000 mapped clones that span 19 chromosomes, chromosomes 2, 4, 5, 9-22, X and Y. This database provides positional information about human BAC clones from the RPCI-11 human male BAC library. It also contains restriction fragment analysis data and end sequences of the clones. GenMapDB is freely available to the public. The main purpose of GenMapDB is to organize the mapping data and to allow the research community to search for mapped BAC clones that can be used in gene mapping studies and chromosomal mutation analysis projects. Citation for the above abstract: Morley, Michael, Arcaro, Melissa, Burdick, Joshua, Yonescu, Raluca, Reid, Thomas, Kirsch, Ilan R., Cheung, Vivian G. GenMapDB: a database of mapped human BAC clones Nucl. Acids Res. 2001 29: 144-147 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/144 |
| 283. HOWDY: Human Organized Whole Genome Database |
URL: http://www-alis.tokyo.jst.go.jp/HOWDY/ Categories: Human Genome Databases, Maps, and Viewers HOWDY is an integrated database system for accessing and analyzing human genomic information (http://www-alis.tokyo.jst.go.jp/HOWDY/). HOWDY stores information about relationships between genetic objects and the data extracted from a number of databases. HOWDY consists of an Internet accessible user interface that allows thorough searching of the human genomic databases using the gene symbols and their aliases. It also permits flexible editing of the sequence data. The database can be searched using simple words and the search can be restricted to a specific cytogenetic location. Linear maps displaying markers and genes on contig sequences are available, from which an object can be chosen. Any search starting point identifies all the information matching the query. HOWDY provides a convenient search environment of human genomic data for scientists unsure which database is most appropriate for their search. Citation for the above abstract: Hirakawa, Mika HOWDY: an integrated database system for human genome research Nucl. Acids Res. 2002 30: 152-157 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/152 |
| 284. HuGeMap |
URL: http://www.infobiogen.fr/services/Hugemap/ Categories: Human Genome Databases, Maps, and Viewers The HuGeMap database stores the major genetic and physical maps of the human genome. HuGeMap is accessible on the Web at http://www. infobiogen.fr/services/Hugemap and through a CORBA server. A standard genome map data format for the interconnection of genome map databases was defined in collaboration with the EBI. The HuGeMap CORBA server provides this interconnection using the interface definition language IDL. Two graphical user interfaces were developed for the visualization of the HuGeMap data: ZoomMap (http://www.infobiogen.fr/services/zomit/Zoom Map.html) for navigation by zooming and data transformation via magic lenses, and MappetShow (http://www.infobiogen.fr/services/Mappet) for visualizing and comparing maps. Citation for the above abstract: Barillot, E, Pook, S, Guyon, F, Cussat-Blanc, C, Viara, E, Vaysseix, G The HuGeMap Database: interconnection and visualization of human genome maps Nucl. Acids Res. 1999 27: 119-122 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/119 |
| 285. Human BAC Ends Database |
URL: http://www.tigr.org/tdb/humgen/bac_end_search/bac_end_intro.html Categories: Human Genome Databases, Maps, and Viewers The Human BAC Ends database includes all non-redundant human BAC end sequences (BESs) generated by The Institute for Genomic Research (TIGR), the University of Washington (UW) and California Institute of Technology (CalTech). It incorporates the available BAC mapping data from different genome centers and the annotation results of each end sequence for the contents of repeats, ESTs and STS markers. For each BAC end the database integrates the sequence, the phred quality scores, the map and the annotation, and provides links to sites of the library information, the reports of GenBank, dbGSS and GDB, and other relevant data. The database is freely accessible via the web and supports sequence or clone searches and anonymous FTP. The relevant sites and resources are described at http://www.tigr.org/ tdb/humgen/bac_end_search/bac_end_intro.html Citation for the above abstract: Zhao, Shaying Human BAC Ends Nucl. Acids Res. 2000 28: 129-132 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/129 |
| 286. Human Genome Segmental Duplication Database |
URL: http://projects.tcag.ca/humandup/ Categories: Human Genome Databases, Maps, and Viewers BACKGROUND: Previous studies have suggested that recent segmental duplications, which are often involved in chromosome rearrangements underlying genomic disease, account for some 5% of the human genome. We have developed rapid computational heuristics based on BLAST analysis to detect segmental duplications, as well as regions containing potential sequence misassignments in the human genome assemblies. RESULTS: Our analysis of the June 2002 public human genome assembly revealed that 107.4 of 3,043.1 megabases (Mb) (3.53%) of sequence contained segmental duplications, each with size equal or more than 5 kb and 90% identity. We have also detected that 38.9 Mb (1.28%) of sequence within this assembly is likely to be involved in sequence misassignment errors. Furthermore, we have identified a significant subset (199,965 of 2,327,473 or 8.6%) of single-nucleotide polymorphisms (SNPs) in the public databases that are not true SNPs but are potential paralogous sequence variants. CONCLUSION: Using two distinct computational approaches, we have identified most of the sequences in the human genome that have undergone recent segmental duplications. Near-identical segmental duplications present a major challenge to the completion of the human genome sequence. Potential sequence misassignments detected in this study would require additional efforts to resolve. Citation for the above abstract: Cheung J, Estivill X, Khaja R, MacDonald JR, Lau K, Tsui L-C, Scherer SW. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 2003;4(4):R25. Epub 2003 Mar 17. © 2003 Cheung et al,; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL. The full text of the article can be found at: http://genomebiology.com/2003/4/4/R25 |
| 287. IXDB: The Integrated X Chromosome Database |
URL: http://ixdb.mpimg-berlin-dahlem.mpg.de/ Categories: Human Genome Databases, Maps, and Viewers Chromosome specific databases are an important research tool as they integrate data from different directions, such as genetic and physical mapping data, expression data, sequences etc. They supplement the genome-wide repositories in molecular biology, such as GenBank, Swiss-Prot or OMIM, which usually concentrate on one type of information. The Integrated X Chromosome Database (IXDB, http://ixdb.mpimg-berlin-dahlem.mpg.de/) is a repository for physical mapping data of the human X chromosome and aims at providing a global view of genomic data at a chromosomal level. We present here an update of IXDB which includes schema extensions for storing submaps and sequence information, additional links to external databases, and the integration of an increasing number of physical and transcript mapping data. The gene data was completely updated according to the approved gene symbols of the HUGO Nomenclature Committee. IXDB receives over 1000 queries per month, an indication that its content is valuable to researchers seeking mapping data of the human X chromosome. Citation for the above abstract: Leser, U, Roest Crollius, H, Lehrach, H, Sudbrak, R IXDB, an X chromosome integrated database (update) Nucl. Acids Res. 1999 27: 123-127 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/123 |
| 288. RHdb: Radiation Hybrid Database |
URL: http://corba.ebi.ac.uk/RHdb/ Categories: Human Genome Databases, Maps, and Viewers Since July 1995, the European Bioinformatics Institute (EBI) has maintained RHdb (http://www.ebi.ac.uk/RHdb), a public database for radiation hybrid data. Radiation hybrid mapping is an important technique for determining high resolution maps. RHdb is also served by CORBA servers. The EBI is an Outstation of the European Molecular Biology Laboratory (EMBL). Citation for the above abstract: Rodriguez-Tome, Patricia, Lijnzaad, Philip RHdb: the Radiation Hybrid database Nucl. Acids Res. 2001 29: 165-166 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/165 |
| 289. TCAG: The Chromosome 7 Annotation Project |
URL: http://www.chr7.org/ Categories: Human Genome Databases, Maps, and Viewers DNA sequence and annotation of the entire human chromosome 7, encompassing nearly 158 million nucleotides of DNA and 1917 gene structures, are presented. To generate a higher order description, additional structural features such as imprinted genes, fragile sites, and segmental duplications were integrated at the level of the DNA sequence with medical genetic data, including 440 chromosome rearrangement breakpoints associated with disease. This approach enabled the discovery of candidate genes for developmental diseases including autism. Citation for the above abstract: Scherer SW, Cheung J, MacDonald JR, Osborne LR, Nakabayashi K, et al. Human chromosome 7: DNA sequence and biology. Science. 2003 May 2;300(5620):767-72. Epub 2003 Apr 10. © 2003 The American Association for the Advancement of Science. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=12690205&dopt=Abstract |
| 290. TRbase: A Database of Tandem Repeats in the Human Genome |
URL: http://bioinfo.ex.ac.uk/trbase/ Categories: Human Genome Databases, Maps, and Viewers MOTIVATION: Tandem repeats are associated with disease genes, play an important role in evolution and are important in genomic organisation and function. Although much research has been done on short perfect patterns of repeats, there has been less focus on imperfect repeats. Thus there is an acute need for a tandem repeats database that provides reliable and up to date information on both perfect and imperfect tandem repeats in the human genome and relates these to disease genes. RESULTS: This paper presents a web-accessible relational tandem repeat database that relates tandem repeats to gene locations and disease genes of the human genome. In contrast to other available databases, this database identifies both perfect and imperfect repeats of 1 to 2000 bp unit lengths. The utility of this database has been illustrated by analysing these repeats for their distribution and frequencies across chromosomes and genomic locations and between protein coding and non-coding regions. The applicability of this database to identify diseases associated with previously uncharacterised tandem repeats is demonstrated. AVAILABILITY: TRbase is available at http://bioinfo.ex.ac.uk/trbase. Citation for the above abstract: T. Boby , A.-M. Patch , and S. J. Aves TRbase: a database relating tandem repeats to disease genes for the human genome Bioinformatics Advance Access published on October 12, 2004, DOI 10.1093/bioinformatics/bti059. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/bti059v1 |
| 291. UCSC Genome Browser |
URL: http://genome.ucsc.edu/ Categories: Human Genome Databases, Maps, and Viewers The University of California Santa Cruz Genome Browser Database (GBD) contains sequence and annotation data for the genomes of about a dozen vertebrate species and several major model organisms. Genome annotations typically include assembly data, sequence composition, genes and gene predictions, mRNA and expressed sequence tag evidence, comparative genomics, regulation, expression and variation data. The database is optimized to support fast interactive performance with web tools that provide powerful visualization and querying capabilities for mining the data. The Genome Browser displays a wide variety of annotations at all scales from single nucleotide level up to a full chromosome. The Table Browser provides direct access to the database tables and sequence data, enabling complex queries on genome-wide datasets. The Proteome Browser graphically displays protein properties. The Gene Sorter allows filtering and comparison of genes by several metrics including expression data and several gene properties. BLAT and In Silico PCR search for sequences in entire genomes in seconds. These tools are highly integrated and provide many hyperlinks to other databases and websites. The GBD, browsing tools, downloadable data files and links to documentation and other information can be found at http://genome.ucsc.edu/. Citation for the above abstract: Hinrichs, A. S., Karolchik, D., Baertsch, R., Barber, G. P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T. S., Harte, R. A., Hsu, F., Hillman-Jackson, J., Kuhn, R. M., Pedersen, J. S., Pohl, A., Raney, B. J., Rosenbloom, K. R., Siepel, A., Smith, K. E., Sugnet, C. W., Sultan-Qurraie, A., Thomas, D. J., Trumbower, H., Weber, R. J., Weirauch, M., Zweig, A. S., Haussler, D., Kent, W. J. The UCSC Genome Browser Database: update 2006 Nucl. Acids Res. 2006 34: D590-598 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D590 |
| 292. UniSTS |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unists Categories: Human Genome Databases, Maps, and Viewers "UniSTS is a comprehensive database of sequence tagged sites (STSs) derived from STS-based maps and other experiments. STSs are defined by PCR primer pairs and are associated with additional information such as genomic position, genes, and sequences." |
| 293. FANTOM: Functional Annotation of Mouse |
URL: http://fantom2.gsc.riken.go.jp/ Categories: Human ORFs Only a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics. Citation for the above abstract: Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, et al.; FANTOM Consortium; RIKEN Genome Exploration Research Group Phase I & II Team. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002 Dec 5;420(6915):563-73. © 2002 Nature Publishing Group. The full text of the article can be found at: http://www.nature.com/cgi-taf/DynaPage.taf?file=/nature/journal/v420/n6915/full/nature01266_fs.html |
| 294. Hoppsigen: Homologous Processed Pseudogenes Database |
URL: http://pbil.univ-lyon1.fr/databases/hoppsigen.html Categories: Human ORFs Processed pseudogenes result from reverse transcribed mRNAs. In general, because processed pseudogenes lack promoters, they are no longer functional from the moment they are inserted into the genome. Subsequently, they freely accumulate substitutions, insertions and deletions. Moreover, the ancestral structure of processed pseudogenes could be easily inferred using the sequence of their functional homologous genes. Owing to these characteristics, processed pseudogenes represent good neutral markers for studying genome evolution. Recently, there is an increasing interest for these markers, particularly to help gene prediction in the field of genome annotation, functional genomics and genome evolution analysis (patterns of substitution). For these reasons, we have developed a method to annotate processed pseudogenes in complete genomes. To make them useful to different fields of research, we stored them in a nucleic acid database after having annotated them. In this work, we screened both mouse and human complete genomes from ENSEMBL to find processed pseudogenes generated from functional genes with introns. We used a conservative method to detect processed pseudogenes in order to minimize the rate of false positive sequences. Within processed pseudogenes, some are still having a conserved open reading frame and some have overlapping gene locations. We designated as retroelements all reverse transcribed sequences and more strictly, we designated as processed pseudogenes, all retroelements not falling in the two former categories (having a conserved open reading or overlapping gene locations). We annotated 5823 retroelements (5206 processed pseudogenes) in the human genome and 3934 (3428 processed pseudogenes) in the mouse genome. Compared to previous estimations, the total number of processed pseudogenes was underestimated but the aim of this procedure was to generate a high-quality dataset. To facilitate the use of processed pseudogenes in studying genome structure and evolution, DNA sequences from processed pseudogenes, and their functional reverse transcribed homologs, are now stored in a nucleic acid database, HOPPSIGEN. HOPPSIGEN can be browsed on the PBIL (Pole Bioinformatique Lyonnais) World Wide Web server (http://pbil.univ-lyon1.fr/) or fully downloaded for local installation. Citation for the above abstract: Adel, Khelifi, Laurent, Duret, Dominique, Mouchiroud HOPPSIGEN: a database of human and mouse processed pseudogenes Nucl. Acids Res. 2005 33: D59-66 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D59 |
| 295. H-InvDB: H-Invitational Database |
URL: http://www.h-invitational.jp/ Categories: Human ORFs The human genome sequence defines our inherent biological potential; the realization of the biology encoded therein requires knowledge of the function of each gene. Currently, our knowledge in this area is still limited. Several lines of investigation have been used to elucidate the structure and function of the genes in the human genome. Even so, gene prediction remains a difficult task, as the varieties of transcripts of a gene may vary to a great extent. We thus performed an exhaustive integrative characterization of 41,118 full-length cDNAs that capture the gene transcripts as complete functional cassettes, providing an unequivocal report of structural and functional diversity at the gene level. Our international collaboration has validated 21,037 human gene candidates by analysis of high-quality full-length cDNA clones through curation using unified criteria. This led to the identification of 5,155 new gene candidates. It also manifested the most reliable way to control the quality of the cDNA clones. We have developed a human gene database, called the H-Invitational Database (H-InvDB; http://www.h-invitational.jp/). It provides the following: integrative annotation of human genes, description of gene structures, details of novel alternative splicing isoforms, non-protein-coding RNAs, functional domains, subcellular localizations, metabolic pathways, predictions of protein three-dimensional structure, mapping of known single nucleotide polymorphisms (SNPs), identification of polymorphic microsatellite repeats within human genes, and comparative results with mouse full-length cDNAs. The H-InvDB analysis has shown that up to 4% of the human genome sequence (National Center for Biotechnology Information build 34 assembly) may contain misassembled or missing regions. We found that 6.5% of the human gene candidates (1,377 loci) did not have a good protein-coding open reading frame, of which 296 loci are strong candidates for non-protein-coding RNA genes. In addition, among 72,027 uniquely mapped SNPs and insertions/deletions localized within human genes, 13,215 nonsynonymous SNPs, 315 nonsense SNPs, and 452 indels occurred in coding regions. Together with 25 polymorphic microsatellite repeats present in coding regions, they may alter protein structure, causing phenotypic effects or resulting in disease. The H-InvDB platform represents a substantial contribution to resources needed for the exploration of human biology and pathology. Citation for the above abstract: Imanishi T, Itoh T, Suzuki Y, O'Donovan C, Fukuchi S, et al. (2004) Integrative Annotation of 21,037 Human Genes Validated by Full-Length cDNA Clones. PLoS Biol 2(6): e162. © 2004 By Imanishi et al. The full text of the article can be found at: http://biology.plosjournals.org/plosonline/?request=get-document&doi=10.1371/journal.pbio.0020162 |
| 296. HPRD: Human Protein Reference Database |
URL: http://www.hprd.org/ Categories: Human ORFs, Protein Property Databases The rapid pace at which genomic and proteomic data is being generated necessitates the development of tools and resources for managing data that allow integration of information from disparate sources. The Human Protein Reference Database (http://www.hprd.org) is a web-based resource based on open source technologies for protein information about several aspects of human proteins including protein-protein interactions, post-translational modifications, enzyme-substrate relationships and disease associations. This information was derived manually by a critical reading of the published literature by expert biologists and through bioinformatics analyses of the protein sequence. This database will assist in biomedical discoveries by serving as a resource of genomic and proteomic information and providing an integrated view of sequence, structure, function and protein networks in health and disease. Citation for the above abstract: Mishra, Gopa R., Suresh, M., Kumaran, K., Kannabiran, N., Suresh, Shubha, Bala, P., Shivakumar, K., Anuradha, N., Reddy, Raghunath, Raghavan, T. Madhan, Menon, Shalini, Hanumanthu, G., Gupta, Malvika, Upendran, Sapna, Gupta, Shweta, Mahesh, M., Jacob, Bincy, Mathew, Pinky, Chatterjee, Pritam, Arun, K. S., Sharma, Salil, Chandrika, K. N., Deshpande, Nandan, Palvankar, Kshitish, Raghavnath, R., Krishnakanth, R., Karathia, Hiren, Rekha, B., Nayak, Rashmi, Vishnupriya, G., Kumar, H. G. Mohan, Nagini, M., Kumar, G. S. Sameer, Jose, Rojan, Deepthi, P., Mohan, S. Sujatha, Gandhi, T. K. B., Harsha, H. C., Deshpande, Krishna S., Sarker, Malabika, Prasad, T. S. Keshava, Pandey, Akhilesh Human protein reference database--2006 update Nucl. Acids Res. 2006 34: D411-414 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D411 |
| 297. HUGE: Human Unidentified Gene-Encoded Large Proteins |
URL: http://www.kazusa.or.jp/huge/ Categories: Human ORFs We have been developing a Human Unidentified Gene-Encoded (HUGE) protein database (http://www.kazusa.or.jp/huge) to summarize results from sequence analysis of human novel large (>4 kb) cDNAs identified in the Kazusa cDNA sequencing project. At present, HUGE contains 2031 cDNA entries (KIAA cDNAs), for each of which a gene/protein characteristic table has been prepared. Since we have been shifting our research attention from the identification and cloning of novel cDNAs to the functional analysis of the proteins encoded by these cDNAs (KIAA proteins), we have not substantially increased the number of cDNA entries in HUGE for some time. Instead, we have manually curated 451 KIAA cDNAs in order to prepare a set of genetic resources to facilitate the functional analysis of KIAA proteins. In addition, we have updated the contents of the corresponding gene/protein characteristic tables in HUGE and have constructed two subsidiary databases, HUGEppi (http://www. kazusa.or.jp/huge/ppi) and ROUGE (http://www. kazusa.or.jp/rouge), to make available the results from our study of KIAA protein function. HUGEppi shows detailed information on protein-protein interactions detected between 84 pairs of KIAA proteins by yeast two-hybrid screening. ROUGE summarizes the results of computer-assisted analyses of approximately 1000 mouse homologues of human large cDNAs that we identified. Citation for the above abstract: Kikuno, Reiko, Nagase, Takahiro, Nakayama, Manabu, Koga, Hisashi, Okazaki, Noriko, Nakajima, Daisuke, Ohara, Osamu HUGE: a database for human KIAA proteins, a 2004 update integrating HUGEppi and ROUGE Nucl. Acids Res. 2004 32: D502-504 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D502 |
| 298. HUNT: Human Full Length cDNA Database |
URL: http://www.hri.co.jp/HUNT/ Categories: Human ORFs The Helix Research Institute (HRI) in Japan is releasing 4356 HUman Novel Transcripts and related information in the newly established HUNT database. The institute is a joint research project principally funded by the Japanese Ministry of International Trade and Industry, and the clones were sequenced in the governmental New Energy and Industrial Technology Development Organization (NEDO) Human cDNA Sequencing Project. The HUNT database contains an extensive amount of annotation from advanced analysis and represents an essential bioinformatics contribution towards understanding of the gene function. The HRI human cDNA clones were obtained from full-length enriched cDNA libraries constructed with the oligo-capping method and have resulted in novel full-length cDNA sequences. A large fraction has little similarity to any proteins of known function and to obtain clues about possible function we have developed original analysis procedures. Any putative function deduced here can be validated or refuted by complementary analysis results. The user can also extract information from specific categories like PROSITE patterns, PFAM domains, PSORT localization, transmembrane helices and clones with GENIUS structure assignments. The HUNT database can be accessed at http://www.hri.co.jp/HUNT. Citation for the above abstract: Yudate, Henrik T., Suwa, Makiko, Irie, Ryotaro, Matsui, Hiroshi, Nishikawa, Tetsuo, Nakamura, Yoshitaka, Yamaguchi, Daisuke, Peng, Zhang Zhi, Yamamoto, Tomoyuki, Nagai, Keiichi, Hayashi, Koji, Otsuki, Tetsuji, Sugiyama, Tomoyasu, Ota, Toshio, Suzuki, Yutaka, Sugano, Sumio, Isogai, Takao, Masuho, Yasuhiko HUNT: launch of a full-length cDNA database from the Helix Research Institute Nucl. Acids Res. 2001 29: 185-188 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/185 |
| 299. IPI: International Protein Index |
URL: http://www.ebi.ac.uk/IPI/IPIhelp.html Categories: Proteomics Databases Despite the complete determination of the genome sequence of several higher eukaryotes, their proteomes remain relatively poorly defined. Information about proteins identified by different experimental and computational methods is stored in different databases, meaning that no single resource offers full coverage of known and predicted proteins. IPI (the International Protein Index) has been developed to address these issues and offers complete nonredundant data sets representing the human, mouse and rat proteomes, built from the Swiss-Prot, TrEMBL, Ensembl and RefSeq databases. Citation for the above abstract: Kersey PJ, Duarte J, Williams A, Karavidopoulou Y, Birney E, Apweiler R. The International Protein Index: an integrated database for proteomics experiments. Proteomics. 2004 Jul;4(7):1985-8. © 2004 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15221759 |
| 300. LIFEdb: Database for Localization, Interaction, Functional Assays, and Expression of Proteins |
URL: http://www.lifedb.de/ Categories: Human ORFs LIFEdb (http://www.LIFEdb.de) integrates data from large-scale functional genomics assays and manual cDNA annotation with bioinformatics gene expression and protein analysis. New features of LIFEdb include (i) an updated user interface with enhanced query capabilities, (ii) a configurable output table and the option to download search results in XML, (iii) the integration of data from cell-based screening assays addressing the influence of protein-overexpression on cell proliferation and (iv) the display of the relative expression (‘Electronic Northern’) of the genes under investigation using curated gene expression ontology information. LIFEdb enables researchers to systematically select and characterize genes and proteins of interest, and presents data and information via its user-friendly web-based interface. Citation for the above abstract: Mehrle, Alexander, Rosenfelder, Heiko, Schupp, Ingo, del Val, Coral, Arlt, Dorit, Hahne, Florian, Bechtel, Stephanie, Simpson, Jeremy, Hofmann, Oliver, Hide, Winston, Glatting, Karl-Heinz, Huber, Wolfgang, Pepperkok, Rainer, Poustka, Annemarie, Wiemann, Stefan The LIFEdb database in 2006 Nucl. Acids Res. 2006 34: D415-418 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D415 |
| 301. MGC: Mammalian Genome Collection |
URL: http://mgc.nci.nih.gov/ Categories: Human ORFs The National Institutes of Health Mammalian Gene Collection (MGC) Program is a multiinstitutional effort to identify and sequence a cDNA clone containing a complete ORF for each human and mouse gene. ESTs were generated from libraries enriched for full-length cDNAs and analyzed to identify candidate full-ORF clones, which then were sequenced to high accuracy. The MGC has currently sequenced and verified the full ORF for a nonredundant set of >9,000 human and >6,000 mouse genes. Candidate full-ORF clones for an additional 7,800 human and 3,500 mouse genes also have been identified. All MGC sequences and clones are available without restriction through public databases and clone distribution networks (see http:mgc.nci.nih.gov). Citation for the above abstract: Strausberg RL, Feingold EA, Grouse LH, Derge JG, Klausner RD, et al.; Mammalian Gene Collection Program Team. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci U S A. 2002 Dec 24;99(26):16899-903. Epub 2002 Dec 11. © 2002 The National Academy of Sciences of the United States of America. The full text of the article can be found at: http://www.pnas.org/cgi/content/full/99/26/16899 |
| 302. NetAffx Analysis Center |
URL: http://www.affymetrix.com/analysis/index.affx Categories: Human ORFs, Microarray Data and other Gene Expression Databases NetAffx (http://www.affymetrix.com) details and annotates probesets on Affymetrix GeneChip microarrays. These annotations include (i) static information specific to the probeset composition; (ii) sequence annotations extracted from public databases; and (iii) protein sequence-level annotations derived from public domain programs, as well as libraries of hidden Markov models (HMMs) developed at Affymetrix. For each probeset, NetAffx lists the probe sequences, and the consensus sequence interrogated by the probes; for the larger chip sets, interactive maps display this sequence data in genomic context. Sequence annotations include Gene Ontology (GO) terms and depiction of GO graph relationships; predicted protein domains and motifs; orthologous sequences; links to relevant pathways; and links to public databases including UniGene, LocusLink, SWISS-PROT and OMIM. Citation for the above abstract: Liu, Guoying, Loraine, Ann E., Shigeta, Ron, Cline, Melissa, Cheng, Jill, Valmeekam, Venu, Sun, Shaw, Kulp, David, Siani-Rose, Michael A. NetAffx: Affymetrix probesets and annotations Nucl. Acids Res. 2003 31: 82-86 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/82 |
| 303. ORFDB |
URL: http://orf.invitrogen.com Categories: Human ORFs The ORFDB (http://orf.invitrogen.com/) represents an ongoing effort at Invitrogen Corporation to integrate relevant scientific data with an evolving collection of human and mouse Open Reading Frame (ORF) clones (Ultimate ORF Clones). The ORFDB serves as a central data warehouse enabling researchers to search the ORF collection through its web portal ORFBrowser, allowing researchers to find the Ultimate ORF clones by blast, keyword, GenBank accession, gene symbol, clone ID, Unigene ID, LocusLink ID or through functional relationships by browsing the collection via the Gene Ontology (GO) Browser. As of October 2003, the ORFDB contains 6200 human and 2870 mouse Ultimate ORF clones. All Ultimate ORF clones have been fully sequenced with high quality, and are matched to public reference protein sequences. In addition, the cloned ORFs have been extensively annotated across six categories: Gene, ORF, Clone Format, Protein, SNP and Genomic links, with the information assembled in a format termed the ORFCard. The ORFCard represents an information repository that documents the sequence quality, alignment with respect to public protein sequences, and the latest publicly available information associated with each human and mouse gene represented in the collection. Citation for the above abstract: Liang, Feng, Matrubutham, Udayakumar, Parvizi, Babak, Yen, Jessica, Duan, Daniel, Mirchandani, Jyotika, Hashima, Sandra, Nguyen, Uyen, Ubil, Eric, Loewenheim, Jake, Yu, Xin, Sipes, Sara, Williams, Wendy, Wang, Ling, Bennett, Robert, Carrino, John ORFDB: an information resource linking scientific content to a high-quality Open Reading Frame (ORF) collection Nucl. Acids Res. 2004 32: D595-599 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D595 |
| 304. ParaDB |
URL: http://abi.marseille.inserm.fr/paradb/ Categories: Human ORFs We present ParaDB (http://abi.marseille.inserm.fr/paradb/), a new database for large-scale paralogy studies in vertebrate genomes. We intended to collect all information (sequence, mapping and phylogenetic data) needed to map and detect new paralogous regions, previously defined as Paralogons. The AceDB database software was used to generate graphical objects and to organize data. General data were automatically collated from public sources (Ensembl, GadFly and RefSeq). ParaDB provides access to data derived from whole genome sequences (Homo sapiens, Mus musculus and Drosophila melanogaster): cDNA and protein sequences, positional information, bibliographical links. In addition, we provide BLAST results for each protein sequence, InParanoid orthologs and 'In-Paralogs' data, previously established paralogy data, and, to compare vertebrates and Drosophila, orthology data. Citation for the above abstract: Leveugle, Magalie, Prat, Karine, Perrier, Nadine, Birnbaum, Daniel, Coulier, Francois ParaDB: a tool for paralogy mapping in vertebrate genomes Nucl. Acids Res. 2003 31: 63-67 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/63 |
| 305. STACKdb |
URL: http://www.sanbi.ac.za/Dbases.html Categories: Human ORFs STACK is a tool for detection and visualisation of expressed transcript variation in the context of developmental and pathological states. The datasystem organizes and reconstructs human transcripts from available public data in the context of expression state. The expression state of a transcript can include developmental state, pathological association, site of expression and isoform of expressed transcript. STACK consensus transcripts are reconstructed from clusters that capture and reflect the growing evidence of transcript diversity. The comprehensive capture of transcript variants is achieved by the use of a novel clustering approach that is tolerant of sub-sequence diversity and does not rely on pairwise alignment. This is in contrast with other gene indexing projects. STACK is generated at least four times a year and represents the exhaustive processing of all publicly available human EST data extracted from GenBank. This processed information can be explored through 15 tissue-specific categories, a disease-related category and a whole-body index and is accessible via WWW at http://www.sanbi.ac.za/Dbases.html. STACK represents a broadly applicable resource, as it is the only reconstructed transcript database for which the tools for its generation are also broadly available (http://www.sanbi.ac.za/CODES). Citation for the above abstract: Christoffels, Alan, Gelder, Antoine van, Greyling, Gary, Miller, Robert, Hide, Tania, Hide, Winston STACK: Sequence Tag Alignment and Consensus Knowledgebase Nucl. Acids Res. 2001 29: 234-238 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/234 |
| 306. SYSTERS: Large-scale Protein Clustering and Protein Family Database |
URL: http://systers.molgen.mpg.de/ Categories: Human ORFs The SYSTERS project aims to provide a meaningful partitioning of the whole protein sequence space by a fully automatic procedure. A refined two-step algorithm assigns each protein to a family and a superfamily. The sequence data underlying SYSTERS release 4 now comprise several protein sequence databases derived from completely sequenced genomes (ENSEMBL, TAIR, SGD and GeneDB), in addition to the comprehensive Swiss-Prot/TrEMBL databases. The SYSTERS web server (http://systers.molgen.mpg.de) provides access to 158 153 SYSTERS protein families. To augment the automatically derived results, information from external databases like Pfam and Gene Ontology are added to the web server. Furthermore, users can retrieve pre-processed analyses of families like multiple alignments and phylogenetic trees. New query options comprise a batch retrieval tool for functional inference about families based on automatic keyword extraction from sequence annotations. A new access point, PhyloMatrix, allows the retrieval of phylogenetic profiles of SYSTERS families across organisms with completely sequenced genomes. Citation for the above abstract: Meinel, Thomas, Krause, Antje, Luz, Hannes, Vingron, Martin, Staub, Eike The SYSTERS Protein Family Database in 2005 Nucl. Acids Res. 2005 33: D226-229 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D226 |
| 307. trome, trEST, and trGEN |
URL: ftp://ftp.isrec.isb-sib.ch/pub/databases/ Categories: Human ORFs We previously introduced two new protein databases (trEST and trGEN) of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Here, we present the updates made on these two databases plus a new database (trome), which uses alignments of EST data to HTG or full genomes to generate virtual transcripts and coding sequences. This new database is of higher quality and since it contains the information in a much denser format it is of much smaller size. These new databases are in a Swiss-Prot-like format and are updated on a weekly basis (trEST and trGEN) or every 3 months (trome). They can be downloaded by anonymous ftp from ftp://ftp.isrec.isb-sib.ch/pub/databases. Citation for the above abstract: Sperisen, Peter, Iseli, Christian, Pagni, Marco, Stevenson, Brian J., Bucher, Philipp, Jongeneel, C. Victor trome, trEST and trGEN: databases of predicted protein sequences Nucl. Acids Res. 2004 32: D509-511 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D509 |
| 308. TIGR Gene Indices |
URL: http://www.tigr.org/tdb/tgi/ Categories: General Genomics Databases, Human ORFs Although the list of completed genome sequencing projects has expanded rapidly, sequencing and analysis of expressed sequence tags (ESTs) remain a primary tool for discovery of novel genes in many eukaryotes and a key element in genome annotation. The TIGR Gene Indices (http://www.tigr.org/tdb/tgi) are a collection of 77 species-specific databases that use a highly refined protocol to analyze gene and EST sequences in an attempt to identify and characterize expressed transcripts and to present them on the Web in a user-friendly, consistent fashion. A Gene Index database is constructed for each selected organism by first clustering, then assembling EST and annotated cDNA and gene sequences from GenBank. This process produces a set of unique, high-fidelity virtual transcripts, or tentative consensus (TC) sequences. The TC sequences can be used to provide putative genes with functional annotation, to link the transcripts to genetic and physical maps, to provide links to orthologous and paralogous genes, and as a resource for comparative and functional genomic analysis. Citation for the above abstract: Lee, Y., Tsai, J., Sunkara, S., Karamycheva, S., Pertea, G., Sultana, R., Antonescu, V., Chan, A., Cheung, F., Quackenbush, J. The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes Nucl. Acids Res. 2005 33: D71-74 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D71 |
| 309. UniGene |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=unigene Categories: Human ORFs UniGene (16), is a system for automatically partitioning GenBank sequences, including ESTs, into a non-redundant set of gene-oriented clusters. UniGene clusters are created for all organisms for which there are 70 000 or more ESTs in GenBank and now includes ESTs from 16 animals and 13 plants. Each UniGene cluster contains sequences that represent a unique gene, and is linked to related information, such as the tissue types in which the gene is expressed, model organism protein similarities, the LocusLink report for the gene and its map location. In the human UniGene June 2003 release (build 161), over 5.5 million human ESTs in GenBank have been reduced 50-fold in number to 108 000 sequence clusters. The UniGene collection has been used as a source of unique sequences for the fabrication of microarrays for the large-scale study of gene expression (17). UniGene databases are updated weekly with new EST sequences, and bimonthly with newly characterized sequences. Citation for the above excerpt: Wheeler, David L., Church, Deanna M., Edgar, Ron, Federhen, Scott, Helmberg, Wolfgang, Madden, Thomas L., Pontius, Joan U., Schuler, Gregory D., Schriml, Lynn M., Sequeira, Edwin, Suzek, Tugba O., Tatusova, Tatiana A., Wagner, Lukas Database resources of the National Center for Biotechnology Information: update Nucl. Acids Res. 2004 32: D35-40 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D35 |
| 310. VEGA: The Vertebrate Genome Annotation Database |
URL: http://vega.sanger.ac.uk/ Categories: Human ORFs, Model Organisms and Comparative Genomics Databases The Vertebrate Genome Annotation (Vega) database (http://vega.sanger.ac.uk) has been designed to be a community resource for browsing manual annotation of finished sequences from a variety of vertebrate genomes. Its core database is based on an Ensembl-style schema, extended to incorporate curation-specific metadata. In collaboration with the genome sequencing centres, Vega attempts to present consistent high-quality annotation of the published human chromosome sequences. In addition, it is also possible to view various finished regions from other vertebrates, including mouse and zebrafish. Vega displays only manually annotated gene structures built using transcriptional evidence, which can be examined in the browser. Attempts have been made to standardize the annotation procedure across each vertebrate genome, which should aid comparative analysis of orthologues across the different finished regions. Citation for the above abstract: Ashurst, J. L., Chen, C.-K., Gilbert, J. G. R., Jekosch, K., Keenan, S., Meidl, P., Searle, S. M., Stalker, J., Storey, R., Trevanion, S., Wilming, L., Hubbard, T. The Vertebrate Genome Annotation (Vega) database Nucl. Acids Res. 2005 33: D459-465 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D459 |
| 311. INFEVERS |
URL: http://fmf.igh.cnrs.fr/infevers/ Categories: Gene-, System-, or Disease- Specific Databases We have established the INFEVERS--INternet periodic FEVERS--website (which is freely accessible at http://fmf.igh.cnrs.fr/infevers/). Our objectives were to develop a specialist site to gather updated information on mutations responsible for hereditary inflammatory disorders: i.e. Familial Mediterranean Fever (FMF), TRAPS (TNF Receptor 1A Associated Syndrome), HIDS (HyperIgD Syndrome), MWS (Muckle-Wells Syndrome)/FCU (Familial Cold Urticaria)/CINCA (Chronic Infantile Neurological Cutaneous and Articular Syndrome). Contributors submit their novel mutations through a 3 step form. Depending on the disease concerned, a member of the editorial board is automatically solicited to overview and validate new submissions, via a special secured web interface. If accepted, the new mutation is available on the INFEVERS web site and the discoverer, who is informed by email, is credited by having his/her name and date of the discovery on the site. The INFEVERS gateway provides researchers and clinicians with a common access location for information on similar diseases, allowing a rapid overview of the corresponding genetic defects at a glance. Furthermore, it is interactive and extendable according to the latest genes discovered. Citation for the above abstract: Sarrauste de Menthiere, Cyril, Terriere, Stephane, Pugnere, Denis, Ruiz, Manuel, Demaille, Jacques, Touitou, Isabelle INFEVERS: the Registry for FMF and hereditary inflammatory disorders mutations Nucl. Acids Res. 2003 31: 282-285 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/282 |
| 312. KinMutBase: A Registry of Disease-causing Mutations in Protein Kinase Domains |
URL: http://bioinf.uta.fi/KinMutBase/ Categories: Gene-, System-, or Disease- Specific Databases KinMutBase (http://www.uta.fi/imt/bioinfo/KinMutBase/) is a registry of mutations in human protein kinases related to disorders. Kinases are essential cellular signaling molecules, in which mutations can lead to diseases, including immunodeficiencies, cancers and endocrine disorders. The first release of KinMutBase contained information for protein tyrosine kinases. The current release includes also serine/threonine protein kinases, as well as an update of the tyrosine kinases. There are 251 entries altogether, representing 337 families and 621 patients. Mutations appear both in conserved hallmark residues of the kinases as well as in non-homologous sites. The KinMutBase WWW pages provide plenty of information, namely mutation statistics and display, clickable sequences with mutations and changes to restriction enzyme patterns. Citation for the above abstract: Stenberg, Kaj A. E., Riikonen, Pentti T., Vihinen, Mauno KinMutBase, a database of human disease-causing protein kinase mutations Nucl. Acids Res. 2000 28: 369-371 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/369 |
| 313. Lowe Syndrome Mutation Database |
URL: http://research.nhgri.nih.gov/lowe/ Categories: Gene-, System-, or Disease- Specific Databases "Lowe oculocerebrorenal syndrome is an X-linked disorder caused by mutations in the OCRL1 gene, which encodes a 105-kDa Golgi protein with phosphatidylinositol (4,5) bisphosphate 5-phosphatase activity. A database of mutations causing Lowe syndrome has been established. Information on new mutations may be submitted online." |
| 314. NCL Mutation Database |
URL: http://www.ucl.ac.uk/ncl/ Categories: Gene-, System-, or Disease- Specific Databases The neuronal ceroid lipofuscinoses (NCL), also known as Batten disease, are a group of inherited severe neurodegenerative disorders primarily affecting children. They are characterised by the accumulation of autofluorescent storage material in many cells. Children suffer from visual failure, seizures, progressive physical and mental decline and premature death, associated with the loss of cortical neurones. Six genes have been identified that cause human NCL (CLN1, CLN2, CLN3, CLN5, CLN6, CLN8), and approximately 150 mutations have been described. The majority of mutations result in a characteristic disease course for each gene. However, mutations associated with later disease onset or a more protracted disease course have also been described. At least seven common mutations exist, either with a world-wide distribution or associated with families from specific countries. All mutations are described in the NCL Mutation Database (http://www.uc.ac.uk/ncl). Citation for the above abstract: Mole SE. The genetic spectrum of human neuronal ceroid-lipofuscinoses. Brain Pathol. 2004 Jan;14(1):70-6. © 2003 International Society of Neuropathology The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=14997939 |
| 315. PAHdb: Phenylalanine Hydroxylase Locus Knowledgebase |
URL: http://www.pahdb.mcgill.ca/ Categories: Gene-, System-, or Disease- Specific Databases PAHdb (http://www.mcgill.ca/pahdb ) is a curated relational database (Fig. 1) of nucleotide variation in the human PAH cDNA (GenBank U49897). Among 328 different mutations by state (Fig. 2) the majority are rare mutations causing hyperphenylalaninemia (HPA) (OMIM 261600), the remainder are polymorphic variants without apparent effect on phenotype. PAHdb modules contain mutations, polymorphic haplotypes, genotype-phenotype correlations, expression analysis, sources of information and the reference sequence; the database also contains pages of clinical information and data on three ENU mouse orthologues of human HPA. Only six different mutations account for 60% of human HPA chromosomes worldwide, mutations stratify by population and geographic region, and the Oriental and Caucasian mutation sets are different (Fig. 3). PAHdb provides curated electronic publication and one third of its incoming reports are direct submissions. Each different mutation receives a systematic (nucleotide) name and a unique identifier (UID). Data are accessed both by a Newsletter and a search engine on the website; integrity of the database is ensured by keeping the curated template offline. There have been >6500 online interrogations of the website. Citation for the above abstract: Nowacki, PM, Byck, S, Prevost, L, Scriver, CR PAH Mutation Analysis Consortium Database: 1997. Prototype for relational locus-specific mutation databases Nucl. Acids Res. 1998 26: 220-225 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/220 |
| 316. PEDB: Prostate Expression Databases |
URL: http://www.pedb.org/ Categories: Gene-, System-, or Disease- Specific Databases, Microarray Data and other Gene Expression Databases The Prostate Expression Databases (PEDB and mPEDB) are online resources designed to allow researchers to access and analyze gene expression information derived from the human and murine prostate, respectively. Human PEDB archives more than 84 000 Expressed Sequence Tags (ESTs) from 38 prostate cDNA libraries in a curated relational database that provides detailed library information including tissue source, library construction methods, sequence diversity and sequence abundance. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library sequence comparisons. Recent enhancements to PEDB include (i) the development of a murine prostate expression database, mPEDB, that complements the human gene expression information in PEDB, (ii) the assembly of a non-redundant sequence set or 'prostate unigene' that represents the diversity of gene expression in the prostate, and (iii) an expanded search tool that supports both text-based and BLAST queries. PEDB and mPEDB are accessible via the World Wide Web at http://www.pedb.org and http://www.mpedb.org. Citation for the above abstract: Nelson, Peter S., Pritchard, Colin, Abbott, Denise, Clegg, Nigel The human (PEDB) and mouse (mPEDB) Prostate Expression Databases Nucl. Acids Res. 2002 30: 218-220 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/218 |
| 317. PGDB: Human Prostate Gene DataBase |
URL: http://www.ucsf.edu/pgdb/ Categories: Gene-, System-, or Disease- Specific Databases The Prostate Gene Database (PGDB: http://www.ucsf.edu/pgdb) is a curated and integrated database of genes or genomic loci related to the human prostate and prostatic diseases. Currently, PGDB covers genes involved in a number of molecular and genetic events of the prostate including gene amplification, mutation, gross deletion, methylation, polymorphism, linkage and over-expression, as published in the literature. Genes that are specifically expressed in prostate, as evidenced by analysis of data from expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE), are also included. There are a total of 165 unique entries in the database. Users can either browse or query the PGDB through a web interface. For each gene, in addition to basic gene information and rich cross-references to other databases, inclusive and relevant literature references are provided to support the inclusion of the gene in the database. Detailed expression data calculated from the UniGene and SAGEmap databases are also presented. Citation for the above abstract: Li, Long-Cheng, Zhao, Hong, Shiina, Hiroaki, Kane, Christopher J., Dahiya, Rajvir PGDB: a curated and integrated database of genes related to the prostate Nucl. Acids Res. 2003 31: 291-293 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/291 |
| 318. PHEXdb |
URL: http://www.phexdb.mcgill.ca/ Categories: Gene-, System-, or Disease- Specific Databases X-linked hypophosphatemia (XLH) is a dominant disorder of phosphate (Pi) homeostasis characterized by growth retardation, rachitic and osteomalacic bone disease, hypophosphatemia, and renal defects in Pi reabsorption and vitamin D metabolism. The gene responsible for XLH was identified by positional cloning and designated PHEX (formerly PEX) to depict a Phosphate regulating gene with homology to Endopeptidases on the X chromosome. To date, 131 mutations in the PHEX gene have been reported. We undertook to centralize information on mutations in the PHEX gene by establishing a database search tool, PHEXdb (http://data.mch.mcgill.ca/phexdb). This site is dedicated to the collection and distribution of information on PHEX mutations, and is accessible to the scientific community. PHEXdb provides a submission form to allow the addition of newly identified mutations in the PHEX gene. Users can search the database by mutation, phenotype, and authors who have published or submitted mutations. The PHEXdb home page includes links to information pages, which refer to recent publications on PHEX, XLH, and murine Hyp and Gy homologues, and to other web pages relevant to XLH. This resource will facilitate the identification of PHEX structure-function relationships and phenotype-genotype correlations. Citation for the above abstract: Sabbagh Y, Jones AO, Tenenhouse HS. PHEXdb, a locus-specific database for mutations causing X-linked hypophosphatemia. Hum Mutat. 2000;16(1):1-6. © 2000 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=10874297 |
| 319. PTCH1 Mutation Database |
URL: http://www.cybergene.se/cgi-bin/w3-msql/ptchbase/index.html Categories: Gene-, System-, or Disease- Specific Databases "The aim of the present database is to provide an easily accessible resource to researchers and geneticists listing all available information on mutations and polymorphisms in the PTCH1 gene. The information is freely accessible and can be freely used provided the source of information is acknowledged." |
| 320. RB1 Gene Mutation Database |
URL: http://www.d-lohmann.de/Rb/ Categories: Gene-, System-, or Disease- Specific Databases Mutations in both alleles of the RB1 gene are causal for the development of retinoblastoma, a childhood tumor of the eye. The spectrum of somatic and germline mutations in this gene is dominated by small mutations. Data on small mutations are listed in a locus specific database available at http://www.d-lohmann.de/Rb/mutations.html. Analysis of 368 reported small mutations reveals considerable heterogeneity. A notable recurrence of transitions is observed at 13 CpG-dinucleotides that are part of CGA codons or splice donor sites. Most mutations create a premature termination codon. With few exceptions, patients heterozygous for mutations of this kind develop bilateral retinoblastoma. Missense mutations and inframe deletions are rare. Some of these mutations are associated with a distinct phenotype marked by incomplete penetrance and reduced expressivity. Citation for the above abstract: Lohmann DR. RB1 gene mutations in retinoblastoma. Hum Mutat. 1999;14(4):283-8. © 1999 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=10502774 |
| 321. SCAdb: A Candidate Gene Database for Spinocerebellar Ataxias |
URL: http://ymbc.ym.edu.tw/cgi-bin/SCA/list.cgi?display=map Categories: Gene-, System-, or Disease- Specific Databases "The SCA database is a human spinocerebellar ataxia disease database designed to provide clinicians and researcher who study this disease the information of all possible short tandem repeats-containing genes within the disease-mapped chromosome regions, their genetic-links, priority selection tool, primer design tools and some of the experimental test results of these genes. The SCA database candidate genes are generate from the database contain various sequence phases of NCBI High-throughput Genomic Sequences (HTGS) locate on the disease mapped chromosome region determined by serveral genetic and physical maps. These sequecnes are annotated by the NCBI UniGene database and indentical sequecens from different species from PIR NREF database. The web site provide direct text search (base on keywords or terms), conditional search (STR length, types or relationship between repeats and gene, priority selection (define the scoing matrix) and sequence search (BLAST, pattern search, aroundSTR and candidate genes match)." |
| 322. T1DBase: Type 1 Diabetes Database |
URL: http://t1dbase.org/ Categories: Gene-, System-, or Disease- Specific Databases T1DBase (http://T1DBase.org) is a public website and database that supports the type 1 diabetes (T1D) research community. The site is currently focused on the molecular genetics and biology of T1D susceptibility and pathogenesis. It includes the following datasets: annotated genome sequence for human, rat and mouse; information on genetically identified T1D susceptibility regions in human, rat and mouse, and genetic linkage and association studies pertaining to T1D; descriptions of NOD mouse congenic strains; the Beta Cell Gene Expression Bank, which reports expression levels of genes in beta cells under various conditions, and annotations of gene function in beta cells; data on gene expression in a variety of tissues and organs; and biological pathways from KEGG and BioCarta. Tools on the site include the GBrowse genome browser, site-wide context dependent search, Connect-the-Dots for connecting gene and other identifiers from multiple data sources, Cytoscape for visualizing and analyzing biological networks, and the GESTALT workbench for genome annotation. All data are open access and all software is open source. Citation for the above abstract: Smink, Luc J., Helton, Erin M., Healy, Barry C., Cavnor, Christopher C., Lam, Alex C., Flamez, Daisy, Burren, Oliver S., Wang, Yang, Dolman, Geoffrey E., Burdick, David B., Everett, Vincent H., Glusman, Gustavo, Laneri, Davide, Rowen, Lee, Schuilenburg, Helen, Walker, Neil M., Mychaleckyj, Josyf, Wicker, Linda S., Eizirik, Decio L., Todd, John A., Goodman, Nathan T1DBase, a community web-based resource for type 1 diabetes research Nucl. Acids Res. 2005 33: D544-549 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D544 |
| 323. The Autism Chromosome Rearrangement Database |
URL: http://projects.tcag.ca/autism/ Categories: Gene-, System-, or Disease- Specific Databases Autism is a neurodevelopmental disorder characterized by clinical, etiologic and genetic heterogeneity. It is often associated with other conditions, such as disorders of the CNS (tuberous sclerosis), developmental delay, attention deficit, epilepsy, and anxiety and mood disorders. Our survey found cytogenetically visible chromosomal anomalies in ~7.4% (129/1749) of autistic patients documented as well as several sub-microscopic variants. Almost every chromosome is affected by numeric or structural aberrations. Among the most consistent cytogenetics findings are fragile X and duplication of maternal 15q11-q13. Molecular cytogenetics, together with genome scans and linkage/association studies, point to ³22 chromosome regions harbouring putative autism susceptibility genes, such as 2q32, 3q25-q27, 7q31-q35, 15q11-q13, 16p13, Xp22, and Xq13. We hypothesize that there might be at least three types of autism susceptibility genes/mutations that can be (i) specific to an individual patient or family, (ii) in a genetically isolated sub-population and (iii) a common factor shared amongst different populations. The genes/mutations could act alone or interact with other genetic and/or epigenetic or environmental factors, causing autism or related disorders. This review emphasizes the potential of analysing chromosomal rearrangements as a means to rapidly define candidate disease loci for further investigation. To facilitate ongoing research we have established a new database of autism-associated chromosomal anomalies (http://tcag.bioinfo.sickkids.on.ca/autism). Citation for the above abstract: Xu, J., Zwaigenbaum, L., Szatmari, P. and Scherer, S.W. Molecular Cytogenetics of Autism. Current Genomics 5(4), 347-364. 2004. © 2004 Bentham Science Publishers Ltd. The full text of the article can be found at: http://projects.tcag.ca/autism/XuJie-MS.pdf |
| 324. The Lafora Progressive Myoclonus Epilepsy Mutation and Polymorphism Database |
URL: http://projects.tcag.ca/lafora/ Categories: Gene-, System-, or Disease- Specific Databases "The data can be viewed using the XRT Table Browser, and where possible, links to external sources such as NCBI, Pubmed are provided." |
| 325. ANTIMIC |
URL: http://research.i2r.a-star.edu.sg/Templar/DB/ANTIMIC/ Categories: Drug and Drug Design Databases Antimicrobial peptides (AMPs) are important components of the innate immune system of many species. These peptides are found in eukaryotes, including mammals, amphibians, insects and plants, as well as in prokaryotes. Other than having pathogen-lytic properties, these peptides have other activities like antitumor activity, mitogen activity, or they may act as signaling molecules. Their short length, fast and efficient action against microbes and low toxicity to mammals have made them potential candidates as peptide drugs. In many cases they are effective against pathogens that are resistant to conventional antibiotics. They can serve as natural templates for the design of novel antimicrobial drugs. Although there are vast amounts of data on natural AMPs, they are not available through one central resource. We have developed a comprehensive database (ANTIMIC, http://research.i2r. a-star.edu.sg/Templar/DB/ANTIMIC/) of known and putative AMPs, which contains approximately 1700 of these peptides. The database is integrated with tools to facilitate efficient extraction of data and their analysis at molecular level, as well as search for new AMPs. These tools include BLAST, PDB structure viewer and the Antimic profile module. Citation for the above abstract: Brahmachary, M., Krishnan, S. P. T., Koh, J. L. Y., Khan, A. M., Seah, S. H., Tan, T. W., Brusic, V., Bajic, V. B. ANTIMIC: a database of antimicrobial sequences Nucl. Acids Res. 2004 32: D586-589 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D586 |
| 326. APD: Antimicrobial Peptide Database |
URL: http://aps.unmc.edu/AP/main.php Categories: Drug and Drug Design Databases An antimicrobial peptide database (APD) has been established based on an extensive literature search. It contains detailed information for 525 peptides (498 antibacterial, 155 antifungal, 28 antiviral and 18 antitumor). APD provides interactive interfaces for peptide query, prediction and design. It also provides statistical data for a select group of or all the peptides in the database. Peptide information can be searched using keywords such as peptide name, ID, length, net charge, hydrophobic percentage, key residue, unique sequence motif, structure and activity. APD is a useful tool for studying the structure-function relationship of antimicrobial peptides. The database can be accessed via a web-based browser at the URL: http://aps.unmc.edu/AP/main.html. Citation for the above abstract: Wang, Zhe, Wang, Guangshun APD: the Antimicrobial Peptide Database Nucl. Acids Res. 2004 32: D590-592 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D590 |
| 327. DART: Drug Adverse Reaction Targets |
URL: http://xin.cz3.nus.edu.sg/group/drt/dart.asp Categories: Drug and Drug Design Databases An adverse drug reaction (ADR) often results from interaction of a drug or its metabolites with specific protein targets important in normal cellular function. Knowledge about these targets is both important in facilitating the study of the mechanisms of ADRs and in new drug discovery. It is also useful in the development and testing of rational drug design and safety evaluation tools. The Drug Adverse Reaction Database (DART) is intended to provide comprehensive information about adverse effect targets of drugs described in the literature. Moreover, proteins involved in adverse effect targets of chemicals not yet confirmed as ADR targets are also included as potential targets. This database gives physiological function of each target, binding drugs/agonists/antagonists/activators/inhibitors, IC(50) values of the inhibitors, corresponding adverse effects, and type of ADR induced by drug binding to a target. Cross-links to other databases are also introduced to facilitate the access of information about the sequence, 3-dimensional structure, function, and nomenclature of each target along with drug/ligand binding properties, and related literature. The database currently contains entries for 147 ADR targets and 89 potential targets. A total of 187 adverse reaction conditions, 257 drugs, and 1080 ligands known to bind to each of these targets are also currently described. Each entry can be retrieved through multiple search methods including target name, target physiological function, adverse effect, ligand name, and biological pathways. A special page is provided for contribution of new or additional information. This database can be accessed at http://xin.cz3.nus.edu.sg/group/drt/dart.asp. Citation for the above abstract: Ji ZL, Han LY, Yap CW, Sun LZ, Chen X, Chen YZ. Drug Adverse Reaction Target Database (DART) : proteins related to adverse drug reactions. Drug Saf. 2003;26(10):685-90. © 2003 "http://thesius.ingentaselect.com/vl=2991151/cl=77/nw=1/rpsv/ij/adis/01145916/contp1.htm">Adis International The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12862503 |
| 328. HIV Drug Resistance Database |
URL: http://resdb.lanl.gov/Resist_DB/default.htm Categories: Drug and Drug Design Databases, HIV/AIDS Databases, Viral Databases "The database is a compilation of mutations in HIV genes that confer resistance to anti-HIV drugs. To begin a search simply fill out the search form and press the "Start Search" button. The resulting list of records that match your search parameters is displayed as a table. The table does not show all the fields for each record (for example, most bibliographic information is not shown). To see a detailed view about a particular record click on the leftmost field (Gene) which is an active link to the detailed view in which all information is displayed." |
| 329. HIV Molecular Immunology Database |
URL: http://www.hiv.lanl.gov/content/immunology/index.html/ Categories: Drug and Drug Design Databases, HIV/AIDS Databases, Viral Databases "The HIV Molecular Immunology Database is an annotated, searchable collection of HIV-1 cytotoxic and helper T-cell epitopes and antibody binding sites. These data are also printed in the HIV Molecular Immunology compendium which is updated yearly and provided free of charge to scientific researchers ... The goal of this database is to provide a comprehensive listing of defined HIV epitopes." |
| 330. Peptaibol Database |
URL: http://www.cryst.bbk.ac.uk/peptaibol/ Categories: Drug and Drug Design Databases, Individual Protein Family Databases The Peptaibol Database is a sequence and structure resource for the unusual class of peptides known as peptaibols. These peptides exhibit antibiotic and membrane channel-forming activities. The database includes sequence, biological source and bibliographical data for the naturally occurring peptaibols. Information is also collated for the growing number of peptaibol 3D structures determined by either crystallography or NMR spectroscopy. The database can be obtained as a whole or can be queried by name, group, sequence motif, biological origin and/or literature reference. The Peptaibol Database can be freely accessed at http://www.cryst.bbk.ac.uk/peptaibol. Citation for the above abstract: Whitmore, Lee, Wallace, B. A. The Peptaibol Database: a database for sequences and structures of naturally occurring peptaibols Nucl. Acids Res. 2004 32: D593-594 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D593 |
| 331. PharmGKB: The Pharmacogenomics and Pharmacogenetics Knowledge Base |
URL: http://www.pharmgkb.org/ Categories: Drug and Drug Design Databases The Pharmacogenetics Knowledge Base (PharmGKB; http://www.pharmgkb.org/) contains genomic, phenotype and clinical information collected from ongoing pharmacogenetic studies. Tools to browse, query, download, submit, edit and process the information are available to registered research network members. A subset of the tools is publicly available. PharmGKB currently contains over 150 genes under study, 14 Coriell populations and a large ontology of pharmacogenetics concepts. The pharmacogenetic concepts and the experimental data are interconnected by a set of relations to form a knowledge base of information for pharmacogenetic researchers. The information in PharmGKB, and its associated tools for processing that information, are tailored for leading-edge pharmacogenetics research. The PharmGKB project was initiated in April 2000 and the first version of the knowledge base went online in February 2001. Citation for the above abstract: Hewett, Micheal, Oliver, Diane E., Rubin, Daniel L., Easton, Katrina L., Stuart, Joshua M., Altman, Russ B., Klein, Teri E. PharmGKB: the Pharmacogenetics Knowledge Base Nucl. Acids Res. 2002 30: 163-165 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/163 |
| 332. Scorpion |
URL: http://research.i2r.a-star.edu.sg:8080/scorpion/ Categories: Drug and Drug Design Databases Increasing interest in the studies of toxins and the requirements for better structural and functional annotations have created a need for improved data management in the field of toxins. The molecular database, SCORPION, contains more than 200 entries of fully referenced scorpion toxin data including primary sequences, three-dimensional structures, structural and functional annotations of scorpion toxins along with relevant literature references. SCORPION has a set of search tools that allow users to extract data and perform specific queries. These entries have been compiled from public databases and literature, cleaned of errors and enriched with additional structural and functional information. The grouping of scorpion toxins provides a basis for extending and clarifying the existing structural and functional classifications. The bioinformatics modules in SCORPION facilitate analyses aimed at classification of scorpion toxins and identification of sequence patterns associated with specific structural or functional properties of scorpion toxins. Citation for the above abstract: Srinivasan KN, Gopalakrishnakone P, Tan PT, Chew KC, Cheng B, Kini RM, Koh JL, Seah SH, Brusic V. SCORPION, a molecular database of scorpion toxins. Toxicon. 2002 Jan;40(1):23-31. © 2002 Elsevier B.V. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11602275 |
| 333. TTD: Therapeutic Target Database |
URL: http://xin.cz3.nus.edu.sg/group/cjttd/ttd.asp Categories: Drug and Drug Design Databases "A database to provide information about the known and explored therapeutic protein and nucleic acid targets, the targeted disease, pathway information and the corresponding drugs/ligands directed at each of these targets. Also included in this database are links to relevant databases that contain information about the function, sequence, 3D structure, ligand binding properties, enzyme nomenclature and related literatures of each target." |
| 334. aMAZE |
URL: http://www.amaze.ulb.ac.be/ Categories: Intermolecular Interactions and Signaling Pathways Databases The aMAZE LightBench (http://www.amaze.ulb. ac.be/) is a web interface to the aMAZE relational database, which contains information on gene expression, catalysed chemical reactions, regulatory interactions, protein assembly, as well as metabolic and signal transduction pathways. It allows the user to browse the information in an intuitive way, which also reflects the underlying data model. Moreover links are provided to literature references, and whenever appropriate, to external databases. Citation for the above abstract: Lemer, Christian, Antezana, Erick, Couche, Fabian, Fays, Frederic, Santolaria, Xavier, Janky, Rekin's, Deville, Yves, Richelle, Jean, Wodak, Shoshana J. The aMAZE LightBench: a web interface to a relational database of cellular processes Nucl. Acids Res. 2004 32: D443-448 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D443 |
| 335. BIND: Biomolecular Interaction Network Database |
URL: http://bind.ca/ Categories: Intermolecular Interactions and Signaling Pathways Databases The Biomolecular Interaction Network Database (BIND: http://bind.ca) archives biomolecular interaction, complex and pathway information. A web-based system is available to query, view and submit records. BIND continues to grow with the addition of individual submissions as well as interaction data from the PDB and a number of large-scale interaction and complex mapping experiments using yeast two hybrid, mass spectrometry, genetic interactions and phage display. We have developed a new graphical analysis tool that provides users with a view of the domain composition of proteins in interaction and complex records to help relate functional domains to protein interactions. An interaction network clustering tool has also been developed to help focus on regions of interest. Continued input from users has helped further mature the BIND data specification, which now includes the ability to store detailed information about genetic interactions. The BIND data specification is available as ASN.1 and XML DTD. Citation for the above abstract: Bader, Gary D., Betel, Doron, Hogue, Christopher W. V. BIND: the Biomolecular Interaction Network Database Nucl. Acids Res. 2003 31: 248-250 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/248 |
| 336. DDIB: Database of Domain Interactions and Bindings |
URL: http://www.scbit.org/dd/index.html Categories: Intermolecular Interactions and Signaling Pathways Databases "DDIB collects information of domain-domain interactions and domain interactions with biological molecules such as RNA, DNA, peptides, inorganic ions, phospholipids and cholesterols. Most of the data were extracted automatically from publication abstracts in MEDLINE, and the rest were collected from other public databases, research laboratories and individual scientists. In addition, DDIB includes many putative domain-domain interactions inferred from documented protein-protein interactions. To provide comprehensive knowledge of a domain, DDIB also integrates relevant information from PFAM, InterPro, GO and KEGG databases." |
| 337. DIP: Database of Interacting Proteins |
URL: http://dip.doe-mbi.ucla.edu/ Categories: Intermolecular Interactions and Signaling Pathways Databases The Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu) aims to integrate the diverse body of experimental evidence on protein-protein interactions into a single, easily accessible online database. Because the reliability of experimental evidence varies widely, methods of quality assessment have been developed and utilized to identify the most reliable subset of the interactions. This CORE set can be used as a reference when evaluating the reliability of high-throughput protein-protein interaction data sets, for development of prediction methods, as well as in the studies of the properties of protein interaction networks. Citation for the above abstract: Salwinski, Lukasz, Miller, Christopher S., Smith, Adam J., Pettit, Frank K., Bowie, James U., Eisenberg, David The Database of Interacting Proteins: 2004 update Nucl. Acids Res. 2004 32: D449-451 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D449 |
| 338. DRC: Database of Ribosomal Crosslinks |
URL: http://www.mpimg-berlin-dahlem.mpg.de/~ag_ribo/ag_brimacombe/drc/ Categories: Intermolecular Interactions and Signaling Pathways Databases The Database of Ribosomal Cross-links (DRC) was created in 1997. Here we describe new data incorporated into this database and several new features of the DRC. The DRC is freely available via World Wide Web at http://visitweb.com/database/ or http://www. mpimg-berlin-dahlem.mpg.de/ approximately ag_ribo/ag_brimacombe/drc/ Citation for the above abstract: Baranov, PV, Kubarenko, AV, Gurvich, OL, Shamolina, TA, Brimacombe, R The Database of Ribosomal Cross-links: an update Nucl. Acids Res. 1999 27: 184-185 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/184 |
| 339. Het-PDB Navi |
URL: http://daisy.bio.nagoya-u.ac.jp/golab/hetpdbnavi.html Categories: Intermolecular Interactions and Signaling Pathways Databases, Small Molecule Structure Databases The genomes of more than 100 species have been sequenced, and the biological functions of encoded proteins are now actively being researched. Protein function is based on interactions between proteins and other molecules. One approach to assuming protein function based on genomic sequence is to predict interactions between an encoded protein and other molecules. As a data source for such predictions, knowledge regarding known protein-small molecule interactions needs to be compiled. We have, therefore, surveyed interactions between proteins and other molecules in Protein Data Bank (PDB), the protein three-dimensional (3D) structure database. Among 20,685 entries in PDB (April, 2003), 4,189 types of small molecules were found to interact with proteins. Biologically relevant small molecules most often found in PDB were metal ions, such as calcium, zinc, and magnesium. Sugars and nucleotides were the next most common. These molecules are known to act as cofactors for enzymes and/or stabilizers of proteins. In each case of interactions between a protein and small molecule, we found preferred amino acid residues at the interaction sites. These preferences can be the basis for predicting protein function from genomic sequence and protein 3D structures. The data pertaining to these small molecules were collected in a database named Het-PDB Navi., which is freely available at http://daisy.nagahama-i-bio.ac.jp/golab/hetpdbnavi.html and linked to the official PDB home page. Citation for the above abstract: Akihiro Yamaguchi , Kei Iida , Nobuaki Matsui , Shirou Tomoda , Kei Yura , and Mitiko Go Het-PDB Navi.: A Database for Protein–Small Molecule Interactions J Biochem (Tokyo) 135: 79-84. © 2004 The Japanese Biochemical Society. The full abstract can be found at: http://jb.oupjournals.org/cgi/content/abstract/135/1/79 |
| 340. hp-DPI: Helicobacter pylori Database of Protein Interactomes |
URL: http://dpi.nhri.org.tw/protein/hp/ORF/index.php Categories: Intermolecular Interactions and Signaling Pathways Databases SUMMARY: We implemented a statistical model into our protein interaction database for validation of two-hybrid assays of Helicobacter pylori, and prediction of putative protein interactions not yet discovered experimentally. To present the enormous amount of experimental and inferred protein interaction networking maps, the H. pylori Database of Protein Interactomes (hp-DPI) is developed with a succinct yet comprehensive visualization tool integrated with annotation from Genbank, GO, and KEGG. hp-DPI, is first built with, but not limited to, H. pylori protein interactions and is expected to naturally include other organisms' protein interacting relationships in the future. AVAILABILITY: hp-DPI can be accessed at http://dpi.nhri.org.tw/hp/. Citation for the above abstract: Chung-Yen Lin , Chia-Ling Chen , Chi-Shiang Cho , Li-Ming Wang , Chia-Ming Chang , Pao-Yang Chen , Chen-Zen Lo , and Chao A. Hsiung hp-DPI: Helicobacter pylori database of protein interactomes- embracing experimental and inferred interactions Bioinformatics Advance Access published on November 16, 2004, DOI 10.1093/bioinformatics/bti101. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/bti101v2 |
| 341. HPID: Human Protein Interaction database |
URL: http://www.hpid.org Categories: Intermolecular Interactions and Signaling Pathways Databases The Human Protein Interaction Database (http://www.hpid.org) was designed (1) to provide human protein interaction information pre-computed from existing structural and experimental data, (2) to predict potential interactions between proteins submitted by users and (3) to provide a depository for new human protein interaction data from users. Two types of interaction are available from the pre-computed data: (1) interactions at the protein superfamily level and (2) those transferred from the interactions of yeast proteins. Interactions at the superfamily level were obtained by locating known structural interactions of the PDB in the SCOP domains and identifying homologs of the domains in the human proteins. Interactions transferred from yeast proteins were obtained by identifying homologs of the yeast proteins in the human proteins. For each human protein in the database and each query submitted by users, the protein superfamilies and yeast proteins assigned to the protein are shown, along with their interacting partners. We have also developed a set of web-based programs so that users can visualize and analyze protein interaction networks in order to explore the networks further. AVAILABILITY: http://www.hpid.org. Citation for the above abstract: Kyungsook Han , Byungkyu Park , Hyongguen Kim , Jinsun Hong , and Jong Park HPID: The Human Protein Interaction Database Bioinformatics Advance Access published on October 12, 2004, DOI 10.1093/bioinformatics/bth253. Bioinformatics 20: 2466-2470. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/15/2466 |
| 342. OPHID: Online Predicted Human Interaction Database |
URL: http://ophid.utoronto.ca/ Categories: Intermolecular Interactions and Signaling Pathways Databases MOTIVATION: High-throughput experiments are being performed at an ever-increasing rate to systematically elucidate protein-protein interaction (PPI) networks for model organisms, while the complexities of higher eukaryotes have prevented these experiments for humans. RESULTS: The Online Predicted Human Interaction Database (OPHID) is a web-based database of predicted interactions between human proteins. It combines the literature-derived human PPI from BIND, HPRD and MINT, with predictions made from S. cerevisiae, C. elegans, D. melanogaster, and M. musculus. The 23,889 predicted interactions currently listed in OPHID are evaluated using protein domains, gene co-expression and Gene Ontology terms. OPHID can be queried using single or multiple IDs, and results can be visualized using our custom graph visualization program. AVAILABILITY: Freely available to academic users at http://ophid.utoronto.ca, both in tab-delimited and PSI-MI formats. Commercial users, please contact I.J. Citation for the above abstract: Kevin R. Brown , and Igor Jurisica Online Predicted Human Interaction Database Bioinformatics Advance Access published on January 18, 2005, DOI 10.1093/bioinformatics/bti273. © 2005 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/bti273v1 |
| 343. IntAct |
URL: http://www.ebi.ac.uk/intact Categories: Intermolecular Interactions and Signaling Pathways Databases IntAct provides an open source database and toolkit for the storage, presentation and analysis of protein interactions. The web interface provides both textual and graphical representations of protein interactions, and allows exploring interaction networks in the context of the GO annotations of the interacting proteins. A web service allows direct computational access to retrieve interaction networks in XML format. IntAct currently contains approximately 2200 binary and complex interactions imported from the literature and curated in collaboration with the Swiss-Prot team, making intensive use of controlled vocabularies to ensure data consistency. All IntAct software, data and controlled vocabularies are available at http://www.ebi.ac.uk/intact. Citation for the above abstract: Hermjakob, Henning, Montecchi-Palazzi, Luisa, Lewington, Chris, Mudali, Sugath, Kerrien, Samuel, Orchard, Sandra, Vingron, Martin, Roechert, Bernd, Roepstorff, Peter, Valencia, Alfonso, Margalit, Hanah, Armstrong, John, Bairoch, Amos, Cesareni, Gianni, Sherman, David, Apweiler, Rolf IntAct: an open source molecular interaction database Nucl. Acids Res. 2004 32: D452-455 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D452 |
| 344. ICBS: Inter-Chain Beta-Sheets |
URL: http://www.igb.uci.edu/servers/icbs/ Categories: Intermolecular Interactions and Signaling Pathways Databases Motivation: Interchain β-sheet (ICBS) interactions occur widely in protein quaternary structures, interactions between proteins, and protein aggregation. These interactions play a central role in many biological processes and in diseases ranging from AIDS and cancer to anthrax and Alzheimer’s. Results: We create a comprehensive databse of ICBS interactions that is updated on a weekly basis and allows entries to be sorted and searched by relevance and other criteria through a simple Web interface. We derive a simple ICBS index to quantify the relative contributions of the β-ladders in the overall interchain interaction and compute first- and second-order statistics regarding amino acid composition and pairing at different relative positions in the β-strands. Analysis of the database reveals a 15.8% prevalence of significant ICBS interactions, the majority of which involve the formation of antiparallel β-sheets and many of which involve the formation of dimers and oligomers. The frequencies of amino acids in ICBS interfaces are similar to those in intrachain β-sheet interfaces. A full range of noncovalent interactions between side chains complement the hydrogen-bonding interactions between the main chains. Polar amino acids pair preferentially with polar amino acids and nonpolar amino acids pair preferentially with nonpolar amino acids among antiparallel (i, j) pairs. We anticipate that the statistics and insights gained from the database will guide the development of agents that control interchain β-sheet interactions and that the database will help identify new protein interactions and targets for these agents. Citation for the above excerpt: Yimeng Dou, Pierre Francois Baisnée, Gianluca Pollastri, Yann Pécout, James Nowick, and Pierre Baldi. ICBS: A Database of Interactions Between Protein Chains Mediated by ß Sheet Formation. Technical report (2004) © 2004 By Dou et al. The full text of the article can be found at: http://contact14.ics.uci.edu/publications/TR-IGB-04.pdf |
| 345. InterDom |
URL: http://interdom.lit.org.sg/ Categories: Intermolecular Interactions and Signaling Pathways Databases, Protein Domain and Protein Classification Databases Advances in proteomics technology have enabled new proteins to be discovered at an unprecedented speed, and high throughput experimental methods have been developed to detect protein interactions and complexes en masse. Such bottom-up, data-driven approach has resulted in data that may be uninformative or potentially errorful, requiring further validation and annotation. The InterDom database focuses on providing supporting evidence for the detected protein interactions based on putative protein domain interactions. Using an integrative approach, InterDom derives potential domain interactions by combining data from multiple sources, ranging from domain fusions, protein interactions and complexes, to scientific literature. The InterDom database is available at http://InterDom.lit.org.sg. Citation for the above abstract: Ng, See-Kiong, Zhang, Zhuo, Tan, Soon-Heng, Lin, Kui InterDom: a database of putative interacting protein domains for validating predicted protein interactions and complexes Nucl. Acids Res. 2003 31: 251-254 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/251 |
| 346. MPID: MHC-Peptide Interaction Database |
URL: http://surya.bic.nus.edu.sg/mpid/ Categories: Immunological Databases, Intermolecular Interactions and Signaling Pathways Databases SUMMARY: Binding of short antigenic peptides to Major histocompatibility complex (MHC) proteins is the first step in T-cell mediated immune response. To understand the structural principles governing MHC-specific peptide recognition and binding, we have developed the MHC-Peptide Interaction Database (MPID), containing sequence-structure-function information. MPID (version 1.2) contains curated x-ray crystallographic data on 86 MHC peptide complexes, with precomputed interaction parameters (solvent accessibility, hydrogen bonds, gap volume and gap index). A user-friendly web interface and query tools will facilitate the development of predictive algorithms for MHC-peptide binding from a structural viewpoint. AVAILABILITY: Freely accessible from http://surya.bic.nus.edu.sg/mpid. Citation for the above abstract: Kunde Ramamoorthy Govindarajan , Pandjassarame Kangueane , Tin Wee Tan , and Shoba Ranganathan MPID: MHC-Peptide Interaction Database for sequence-structure-function information on peptides binding to MHC molecules Bioinformatics 19: 309-310. © 2003 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/19/2/309 |
| 347. PDZBase |
URL: http://icb.med.cornell.edu/services/pdz/start Categories: Intermolecular Interactions and Signaling Pathways Databases SUMMARY: PDZBase is a database that aims to contain all known PDZ-domain mediated protein-protein interactions. Currently, PDZBase contains approximately 300 such interactions, which have been manually extracted from >200 articles. The database can be queried through both sequence motif and keyword-based searches, and the sequences of interacting proteins can be visually inspected through alignments (for the comparison of several interactions), or as residue based diagrams including schematic secondary structure information (for individual complexes). AVAILABILITY: http://icb.med.cornell.edu/services/pdz/start. Citation for the above abstract: Thijs Beuming , Lucy Skrabanek , Masha Y. Niv , Piali Mukherjee , and Harel Weinstein PDZBase: a protein-protein interaction database for PDZ-domains Bioinformatics Advance Access published on October 28, 2004, DOI 10.1093/bioinformatics/bti098. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/bti098v1 |
| 348. PINdb: Proteins Interacting in the Nucleus database |
URL: http://pin.mskcc.org/ Categories: Intermolecular Interactions and Signaling Pathways Databases SUMMARY: Proteins Interacting in the Nucleus database (PINdb) is a database of protein complexes purified from the nucleus of human and yeast cells. It is compiled from the published literature and existing databases. Currently, PINdb contains mostly protein complexes that may be involved in gene transcription. To facilitate comparative analyses and identification of protein complexes, the compositional information is integrated with standardized gene nomenclature, annotation and protein sequences from public databases. The PINdb web interface provides a number of tools for (1) comparison of protein complexes, (2) search for a protein complex by its published name or by a partial list of its components and (3) browsing specific subsets or a functional classification of the complexes. Availablity: http://pin.mskcc.org Citation for the above abstract: Phuong-Van Luc , and Paul Tempst PINdb: a database of nuclear protein complexes from human and yeast Bioinformatics Advance Access published on June 12, 2004, DOI 10.1093/bioinformatics/bth114. Bioinformatics 20: 1413-1415. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/9/1413 |
| 349. POINT: Prediction Of INTeractome |
URL: http://point.nchc.org.tw/ Categories: Intermolecular Interactions and Signaling Pathways Databases One possible path towards understanding the biological function of a target protein is through the discovery of how it interfaces within protein-protein interaction networks. The goal of this study was to create a virtual protein-protein interaction model using the concepts of orthologous conservation (or interologs) to elucidate the interacting networks of a particular target protein. POINT (the prediction of interactome database) is a functional database for the prediction of the human protein-protein interactome based on available orthologous interactome datasets. POINT integrates several publicly accessible databases, with emphasis placed on the extraction of a large quantity of mouse, fruit fly, worm and yeast protein-protein interactions datasets from the Database of Interacting Proteins (DIP), followed by conversion of them into a predicted human interactome. In addition, protein-protein interactions require both temporal synchronicity and precise spatial proximity. POINT therefore also incorporates correlated mRNA expression clusters obtained from cell cycle microarray databases and subcellular localization from Gene Ontology to further pinpoint the likelihood of biological relevance of each predicted interacting sets of protein partners. Citation for the above abstract: Tao-Wei Huang , An-Chi Tien , Wen-Shien Huang , Yuan-Chii G. Lee , Chin-Lin Peng , Huei-Hun Tseng , Cheng-Yan Kao , and Chi-Ying F. Huang POINT: a database for the prediction of protein–protein interactions based on the orthologous interactome Bioinformatics Advance Access published on November 22, 2004, DOI 10.1093/bioinformatics/bth366. Bioinformatics 20: 3273-3276. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/17/3273 |
| 350. ProNIT: Thermodynamic Database for Protein-Nucleic Acid Interactions |
URL: http://gibk26.bse.kyutech.ac.jp/jouhou/pronit/pronit.html Categories: Intermolecular Interactions and Signaling Pathways Databases ProTherm and ProNIT are two thermodynamic databases that contain experimentally determined thermodynamic parameters of protein stability and protein-nucleic acid interactions, respectively. The current versions of both the databases have considerably increased the total number of entries and enhanced search interface with added new fields, improved search, display and sorting options. As on September 2005, ProTherm release 5.0 contains 17,113 entries from 771 proteins, retrieved from 1497 scientific articles (approximately 20% increase in data from the previous version). ProNIT release 2.0 contains 4900 entries from 273 research articles, representing 158 proteins. Both databases can be queried using WWW interfaces. Both quick search and advanced search are provided on this web page to facilitate easy retrieval and display of the data from these databases. ProTherm is freely available online at http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html and ProNIT at http://gibk26.bse.kyutech.ac.jp/jouhou/pronit/pronit.html. Citation for the above abstract: Kumar, M. D. Shaji, Bava, K. Abdulla, Gromiha, M. Michael, Prabakaran, Ponraj, Kitajima, Koji, Uedaira, Hatsuho, Sarai, Akinori ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions Nucl. Acids Res. 2006 34: D204-206 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D204 |
| 351. PKR: The Protein Kinase Resource |
URL: http://www.kinasenet.org/pkr/ Categories: Enzyme and Enzyme Nomenclature Databases, Intermolecular Interactions and Signaling Pathways Databases Protein kinases and phosphatases play crucial roles in all the major cellular processes, such as signal transduction, cell differentiation, cell proliferation and cell cycle progression. Protein phosphorylation or dephosphorylation can form the basis of many critical processes, including enzyme activation or inactivation, protein localization and protein degradation. Given the importance of protein kinases to cellular development and function, it is critical that there are effective ways of disseminating information on protein kinases to the research community. This review describes such a web resource, 'The Protein Kinase Resource' (http://pkr.sdsc.edu/html/index.shtml), which serves as a repository for cellular and molecular data on protein kinases. Citation for the above abstract: Petretti C, Prigent C. The Protein Kinase Resource: everything you always wanted to know about protein kinases but were afraid to ask. Biol Cell. 2005 Feb 1;97(2):113-118. © 2005 Portland Press Limited. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15656777 |
| 352. PIBASE: A Comprehensive Database of Structurally Defined Interfaces in Proteins |
URL: http://alto.compbio.ucsf.edu/pibase/ Categories: Intermolecular Interactions and Signaling Pathways Databases MOTIVATION: In recent years, the Protein Data Bank (PDB) has experienced rapid growth. To maximize the utility of the high resolution protein-protein interaction data stored in the PDB, we have developed PIBASE, a comprehensive relational database of structurally defined interfaces between pairs of protein domains. It is composed of binary interfaces extracted from structures in the PDB and the Probable Quaternary Structure (PQS) server using domain assignments from the Structural Classification of Proteins (SCOP) and CATH fold classification systems. RESULTS: PIBASE currently contains 158,915 interacting domain pairs between 105,061 domains from 2,125 SCOP families. A diverse set of geometric, physicochemical, and topologic properties are calculated for each complex, its domains, interfaces, and binding sites. A subset of the interface properties are used to remove interface redundancy within PDB entries, resulting in 20,912 distinct domaindomain interfaces. The complexes are grouped into 989 topological classes based on their patterns of domain-domain contacts. The binary interfaces and their corresponding binding sites are categorized into 18,755 and 30,975 topological classes, respectively, based on the topology of secondary structure elements. The utility of the database is illustrated by outlining several current applications. AVAILABILITY: The database is accessible via the world wide web at http://salilab.org/pibase. Citation for the above abstract: Fred P. Davis , and Andrej Sali PIBASE: a comprehensive database of structurally defined protein interfaces Bioinformatics Advance Access published on January 18, 2005, DOI 10.1093/bioinformatics/bti277. © 2005 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/bti277v1 |
| 353. The Dataset of Protein-Protein Interfaces |
URL: http://home.ku.edu.tr/~okeskin/INTERFACE/INTERFACES.html Categories: Intermolecular Interactions and Signaling Pathways Databases Here, we present a diverse, structurally nonredundant data set of two-chain protein-protein interfaces derived from the PDB. Using a sequence order-independent structural comparison algorithm and hierarchical clustering, 3799 interface clusters are obtained. These yield 103 clusters with at least five nonhomologous members. We divide the clusters into three types. In Type I clusters, the global structures of the chains from which the interfaces are derived are also similar. This cluster type is expected because, in general, related proteins associate in similar ways. In Type II, the interfaces are similar; however, remarkably, the overall structures and functions of the chains are different. The functional spectrum is broad, from enzymes/inhibitors to immunoglobulins and toxins. The fact that structurally different monomers associate in similar ways, suggests "good" binding architectures. This observation extends a paradigm in protein science: It has been well known that proteins with similar structures may have different functions. Here, we show that it extends to interfaces. In Type III clusters, only one side of the interface is similar across the cluster. This structurally nonredundant data set provides rich data for studies of protein-protein interactions and recognition, cellular networks and drug design. In particular, it may be useful in addressing the difficult question of what are the favorable ways for proteins to interact. (The data set is available at http://protein3d.ncifcrf.gov/~keskino/ and http://home.ku.edu.tr/~okeskin/INTERFACE/INTERFACES.html.) Citation for the above abstract: Keskin, Ozlem, Tsai, Chung-Jung, Wolfson, Haim, Nussinov, Ruth A new, structurally nonredundant, diverse data set of protein-protein interfaces and its implications Protein Sci 2004 13: 1043-1055 © 2004 The Protein Society. The full abstract can be found at: http://www.proteinscience.org/cgi/content/abstract/13/4/1043 |
| 354. PSIbase |
URL: http://psibase.ngic.re.kr/ Categories: Intermolecular Interactions and Signaling Pathways Databases "PSIbase is a molecular interaction database. It focuses on structural interaction of proteins and their domains. It is based on PSIMAP that is a map of protein interactome. It covers the interaction of all known 3D protein structures. Presently, PSIMAP is based on PDB and SCOP databases. It is also the first protocol that has mapped large-scale structural interactions and that used protein families." |
| 355. Reactome |
URL: http://www.reactome.org/ Categories: Intermolecular Interactions and Signaling Pathways Databases Reactome, located at http://www.reactome.org is a curated, peer-reviewed resource of human biological processes. Given the genetic makeup of an organism, the complete set of possible reactions constitutes its reactome. The basic unit of the Reactome database is a reaction; reactions are then grouped into causal chains to form pathways. The Reactome data model allows us to represent many diverse processes in the human system, including the pathways of intermediary metabolism, regulatory pathways, and signal transduction, and high-level processes, such as the cell cycle. Reactome provides a qualitative framework, on which quantitative data can be superimposed. Tools have been developed to facilitate custom data entry and annotation by expert biologists, and to allow visualization and exploration of the finished dataset as an interactive process map. Although our primary curational domain is pathways from Homo sapiens, we regularly create electronic projections of human pathways onto other organisms via putative orthologs, thus making Reactome relevant to model organism research communities. The database is publicly available under open source terms, which allows both its content and its software infrastructure to be freely used and redistributed. Citation for the above abstract: Joshi-Tope, G., Gillespie, M., Vastrik, I., D'Eustachio, P., Schmidt, E., de Bono, B., Jassal, B., Gopinath, G.R., Wu, G.R., Matthews, L., Lewis, S., Birney, E., Stein, L. Reactome: a knowledgebase of biological pathways Nucl. Acids Res. 2005 33: D428-432 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D428 |
| 356. ROSPath: Reactive Oxygen Species-mediated Signaling Pathway Database |
URL: http://rospath.ewha.ac.kr/ Categories: Intermolecular Interactions and Signaling Pathways Databases To understand complex signaling pathways and networks, it is necessary to develop a formal and structured representation of the available information in a format suitable for analysis by software tools. Due to the complexity and incompleteness of the current biological knowledge about cell signaling, such a device must be able to represent cellular pathways at differing levels of details, one level of information abstract enough to convey an essential signaling flow while hiding its details and another level of information detailed enough to explain the underlying mechanisms that account for the signaling flow described at a more abstract level. We have defined a formal ontology for cell-signaling events that allows us to describe these cellular pathways at various levels of abstraction. Using this formal representation, ROSPath (reactive oxygen species-mediated signaling pathway) database system has been implemented and made available on the web (rospath.ewha.ac.kr). ROSPath is a database system for reactive oxygen species (ROS)-mediated cell signaling pathways and signaling processes in molecular detail, which facilitates a comprehensive understanding of the regulatory mechanisms in signaling pathways. ROSPath includes growth factor-, stress-, and cytokine-induced signaling pathways containing about 500 unique proteins (mostly mammalian) and their related protein states, protein complexes, protein complex states, signaling interactions, signaling steps, and pathways. It is a web-based structured repository of information on the signaling pathways of interest and provides a means for managing data produced by large-scale and high-throughput techniques such as proteomics. Also, software tools are provided for querying, displaying, and analyzing pathways, thus furnishing an integrated web environment for visualizing and manipulating ROS-mediated cell-signaling events. Citation for the above abstract: Paek, Eunok, Park, Jisook, Lee, Kong-Joo Multi-layered Representation for Cell Signaling Pathways Mol Cell Proteomics 2004 3: 1009-1022 © 2004 The American Society for Biochemistry and Molecular Biology, Inc. The full text of the article can be found at: http://www.mcponline.org/cgi/content/full/3/10/1009 |
| 357. SENTRA |
URL: http://www-wit.mcs.anl.gov/sentra/ Categories: Individual Protein Family Databases, Intermolecular Interactions and Signaling Pathways Databases Sentra (http://www-wit.mcs.anl.gov/sentra) is a database of signal transduction proteins with the emphasis on microbial signal transduction. The database was updated to include classes of signal transduction systems modulated by either phosphorylation or methylation reactions such as PAS proteins and serine/threonine kinases, as well as the classical two-component histidine kinases and methyl-accepting chemotaxis proteins. Currently, Sentra contains signal transduction proteins from 43 completely sequenced prokaryotic genomes as well as sequences from SWISS-PROT and TrEMBL. Signal transduction proteins are annotated with information describing conserved domains, paralogous and orthologous sequences, and conserved chromosomal gene clusters. The newly developed user interface supports flexible search capabilities and extensive visualization of the data. Citation for the above abstract: Maltsev, Natalia, Marland, E., Yu, G. X., Bhatnagar, S., Lusk, R. Sentra, a database of signal transduction proteins Nucl. Acids Res. 2002 30: 349-350 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/349 |
| 358. SMART |
URL: http://smart.embl-heidelberg.de/ Categories: Intermolecular Interactions and Signaling Pathways Databases, Protein Domain and Protein Classification Databases The Simple Modular Architecture Research Tool (SMART) is an online resource (http://smart.embl.de/) used for protein domain identification and the analysis of protein domain architectures. Many new features were implemented to make SMART more accessible to scientists from different fields. The new 'Genomic' mode in SMART makes it easy to analyze domain architectures in completely sequenced genomes. Domain annotation has been updated with a detailed taxonomic breakdown and a prediction of the catalytic activity for 50 SMART domains is now available, based on the presence of essential amino acids. Furthermore, intrinsically disordered protein regions can be identified and displayed. The network context is now displayed in the results page for more than 350 000 proteins, enabling easy analyses of domain interactions. Citation for the above abstract: Letunic, Ivica, Copley, Richard R., Pils, Birgit, Pinkert, Stefan, Schultz, Jorg, Bork, Peer SMART 5: domains in the context of genomes and networks Nucl. Acids Res. 2006 34: D257-260 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D257 |
| 359. EROP-Moscow: Endogenous Regulatory OligoPeptide knowledgebase |
URL: http://erop.inbi.ras.ru/ Categories: Human ORFs, Individual Protein Family Databases Natural oligopeptides may regulate nearly all vital processes. To date, the chemical structures of nearly 6000 oligopeptides have been identified from >1000 organisms representing all the biological kingdoms. We have compiled the known physical, chemical and biological properties of these oligopeptides—whether synthesized on ribosomes or by non-ribosomal enzymes—and have constructed an internet-accessible database, EROP-Moscow (Endogenous Regulatory OligoPeptides), which resides at http://erop.inbi.ras.ru. This database enables users to perform rapid searches via many key features of the oligopeptides, and to carry out statistical analysis of all the available information. The database lists only those oligopeptides whose chemical structures have been completely determined (directly or by translation from nucleotide sequences). It provides extensive links with the Swiss-Prot-TrEMBL peptide-protein database, as well as with the PubMed biomedical bibliographic database. EROP-Moscow also contains data on many oligopeptides that are absent from other convenient databases, and is designed for extended use in classifying new natural oligopeptides and for production of novel peptide pharmaceuticals. Citation for the above abstract: Zamyatnin, Alexander A., Borchikov, Alexander S., Vladimirov, Michail G., Voronina, Olga L. The EROP-Moscow oligopeptide database Nucl. Acids Res. 2006 34: D261-266 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D261 |
| 360. STCDB: Signal Transduction Classification Database |
URL: http://bibiserv.techfak.uni-bielefeld.de/stcdb/ Categories: Intermolecular Interactions and Signaling Pathways Databases The Signal Transduction Classification Database (STCDB) is a database of information relative to the classification of signal transduction. It is based primarily on a proposed classification of signal transduction and it describes each type of characterized signal transduction for which a unique ST number has been provided. This document presents, in its first version, the classification of signal transduction in eukaryotic cells. Approved classifications are available for web browsing at http://www.techfak.uni-bielefeld.de/~ mchen/STCDB. Citation for the above abstract: Chen, Ming, Lin, Susana, Hofestaedt, Ralf STCDB: Signal Transduction Classification Database Nucl. Acids Res. 2004 32: D456-458 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D456 |
| 361. Wnt Database |
URL: http://www.stanford.edu/~rnusse/wntwindow.html Categories: Individual Protein Family Databases, Intermolecular Interactions and Signaling Pathways Databases MOTIVATION: Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time consuming and demanding task that requires careful literature analysis and extensive domain specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map. RESULTS: A "gold standard" set of names and assertions was derived by manual scanning of the Wnt genes website (Nusse, 2004) (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer reviewed articles related to Wnt signaling including 3,369 Pubmed and 1,230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun-phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the "gold standard" Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal-pathway databases. AVAILABILITY: The pipeline software components are freely available by request to the authors. Citation for the above abstract: Carlos Santos , Daniela Eggle , and David. J. States Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction Bioinformatics Advance Access published on November 25, 2004, DOI 10.1093/bioinformatics/bti165. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/bti165v1 |
| 362. Bcipep: A Database of B Cell Epitopes |
URL: http://bioinformatics.uams.edu/mirror/bcipep/ Categories: Immunological Databases "Bcipep is a database of B cell epitopes of varying immunogenicity." |
| 363. dbMHC |
URL: http://www.ncbi.nlm.nih.gov/mhc/ Categories: Immunological Databases "dbMHC supports clinical applications and research related to the major histocompatibility complex (MHC) and includes Reagent Database and Clinical sections. The Reagent database provides an open platform for the submission, evaluation and editing of individual DNA typing reagents as well as typing kit information. All reagents are characterized for allele specificity using an updated allele database based on IMGT/HLA. The dbMHC offers several resources for the analysis and display of the MHC and KIR region, e.g. an interactive formatting sequence retrieval tool, and a Sequencing-based typing tool, capable of aligning and interpreting heterozygote sequences. Also featured is dbMHCms, a tool to search descriptive information for known short tandem repeats within the MHC. The Clinical section contains data generated by the 13th international HLA workshop and international HLA working group and includes sections presenting two major IHWG datasets. The first is derived from the IHWG ‘Diversity/Anthropology’ project to determine global HLA allele frequencies in an attempt to shed light on the evolution of HLA polymorphisms. dbMHC can display project data, such as allelic frequencies found in individuals from certain regions of the world, or frequencies for specific loci. The second IHWG dataset is the Hematopoietic Cell Transplantation (HTC) database, containing anonymous data for selected unrelated donor transplants performed worldwide for the treatment of both malignant and non-malignant blood disorders. Online analysis tools available for the HCT data include a query interface and the ability to compute Kaplan–Meier survival plots." Citation for the above excerpt: Wheeler, David L., Barrett, Tanya, Benson, Dennis A., Bryant, Stephen H., Canese, Kathi, Church, Deanna M., DiCuccio, Michael, Edgar, Ron, Federhen, Scott, Helmberg, Wolfgang, Kenton, David L., Khovayko, Oleg, Lipman, David J., Madden, Thomas L., Maglott, Donna R., Ostell, James, Pontius, Joan U., Pruitt, Kim D., Schuler, Gregory D., Schriml, Lynn M., Sequeira, Edwin, Sherry, Steven T., Sirotkin, Karl, Starchenko, Grigory, Suzek, Tugba O., Tatusov, Roman, Tatusova, Tatiana A., Wagner, Lukas, Yaschenko, Eugene Database resources of the National Center for Biotechnology Information Nucl. Acids Res. 2005 33: D39-45 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D39 |
| 364. FIMM: A Database of Functional Molecular Immunology |
URL: http://research.i2r.a-star.edu.sg/fimm/ Categories: Immunological Databases FIMM database (http://sdmc.krdl.org.sg:8080/fimm) contains data relevant to functional molecular immunology, focusing on cellular immunology. It contains fully referenced data on protein antigens, major histocompatibility complex (MHC) molecules, MHC-associated peptides and relevant disease associations. FIMM has a set of search tools for extraction of information and results are presented as lists or as reports. Citation for the above abstract: Schonbach, Christian, Koh, Judice L. Y., Flower, Darren R., Wong, Limsoon, Brusic, Vladimir FIMM, a database of functional molecular immunology: update 2002 Nucl. Acids Res. 2002 30: 226-229 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/226 |
| 365. HaptenDB |
URL: http://www.imtech.res.in/raghava/haptendb/ Categories: Immunological Databases "This database of hapten molecules called Haptendb is a curated database where information is collected and compiled from published literature and web resources. Presently database have more than 2000 entries where each entry provides comprehensive detail about a hapten molecule that include: i) nature of the hapten; ii) methods of anti- hapten antibody production; iii) information about carrier protein; iv) coupling method; v) assay method (used for characterization) and vi) specificities of antibodies. The Haptendb covers wide array of haptens ranging from antibiotics of biomedical importance to pesticides.. It provides internal and external links to various databases/resources to obtain the further information about hapten, their carriers and antibodies. This database may be very useful for studying the serological reactions and production of antibodies, as haptens are the small molecules with known structures, which can be altered to generate antibody of desired specificity." |
| 366. HLA Ligand/Motif DATABASE |
URL: http://hlaligand.ouhsc.edu/ Categories: Immunological Databases We have established an HLA ligand database to provide scientists and clinicians with access to Major Histocompatibility Complex (MHC) class I and II motif and ligand data. The HLA Ligand Database is available on the world wide web at http://hlaligand.ouhsc.edu and contains ligands that have been published in peer-reviewed journals. HLA peptide datasets prove useful in several areas: ligands are important as targets for various immune responses while algorithms built upon ligand datasets allow identification of new peptides without time-consuming experimental procedures. A review of the HLA class I ligands in the database identifies strengths and deficiencies in the database and, therefore, the utility of the dataset for identifying new peptides. For instance, 212 HLA-A phenotypes exist of which 23 have a motif determined and 43 have peptides characterized. In terms of number of ligands, HLA-A*0201 has 258 characterized ligands, A*1101 has 25 peptides, while the remaining two-thirds of the HLA-A phenotypes have less than 10 associated peptide sequences. Characterization of ligands and motifs remains roughly the same at the HLA-B locus while the peptides of the HLA-C locus tend to be less characterized. These data show that 74% of HLA class I molecules do not have ligands represented in the database and thus algorithms based on the dataset could not predict ligands for a majority of the US population. Building upon this dataset and knowledge of HLA allelic frequencies, it is possible to plan a systematic expansion of the HLA class I ligand database to better identify ligands useful throughout the population. Citation for the above abstract: Sathiamurthy M, Hickman HD, Cavett JW, Zahoor A, Prilliman K, Metcalf S, Fernandez Vina M, Hildebrand WH. Population of the HLA ligand database. Tissue Antigens. 2003 Jan;61(1):12-9. © 2003 Blackwell Publishing, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12622773 |
| 367. IMGT: The International ImMunoGeneTics Information System |
URL: http://imgt.cines.fr/ Categories: Immunological Databases The international ImMunoGeneTics information system (IMGT) (http://imgt.cines.fr), created in 1989, by the Laboratoire d'ImmunoGenetique Moleculaire LIGM (Universite Montpellier II and CNRS) at Montpellier, France, is a high-quality integrated knowledge resource specializing in the immunoglobulins (IGs), T cell receptors (TRs), major histocompatibility complex (MHC) of human and other vertebrates, and related proteins of the immune systems (RPI) that belong to the immunoglobulin superfamily (IgSF) and to the MHC superfamily (MhcSF). IMGT includes several sequence databases (IMGT/LIGM-DB, IMGT/PRIMER-DB, IMGT/PROTEIN-DB and IMGT/MHC-DB), one genome database (IMGT/GENE-DB) and one three-dimensional (3D) structure database (IMGT/3Dstructure-DB), Web resources comprising 8000 HTML pages (IMGT Marie-Paule page), and interactive tools. IMGT data are expertly annotated according to the rules of the IMGT Scientific chart, based on the IMGT-ONTOLOGY concepts. IMGT tools are particularly useful for the analysis of the IG and TR repertoires in normal physiological and pathological situations. IMGT is used in medical research (autoimmune diseases, infectious diseases, AIDS, leukemias, lymphomas, myelomas), veterinary research, biotechnology related to antibody engineering (phage displays, combinatorial libraries, chimeric, humanized and human antibodies), diagnostics (clonalities, detection and follow up of residual diseases) and therapeutical approaches (graft, immunotherapy and vaccinology). IMGT is freely available at http://imgt.cines.fr. Citation for the above abstract: Lefranc, Marie-Paule, Giudicelli, Veronique, Kaas, Quentin, Duprat, Elodie, Jabado-Michaloud, Joumana, Scaviner, Dominique, Ginestoux, Chantal, Clement, Oliver, Chaume, Denys, Lefranc, Gerard IMGT, the international ImMunoGeneTics information system(R) Nucl. Acids Res. 2005 33: D593-597 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D593 |
| 368. IMGT/GENE-DB |
URL: http://imgt.cines.fr/cgi-bin/GENElect.jv Categories: Immunological Databases IMGT/GENE-DB is the comprehensive IMGT genome database for immunoglobulin (IG) and T cell receptor (TR) genes from human and mouse, and, in development, from other vertebrates. IMGT/GENE-DB is the international reference for the IG and TR gene nomenclature and works in close collaboration with the HUGO Nomenclature Committee, Mouse Genome Database and genome committees for other species. IMGT/GENE-DB allows a search of IG and TR genes by locus, group and subgroup, which are CLASSIFICATION concepts of IMGT-ONTOLOGY. Short cuts allow the retrieval gene information by gene name or clone name. Direct links with configurable URL give access to information usable by humans or programs. An IMGT/GENE-DB entry displays accurate gene data related to genome (gene localization), allelic polymorphisms (number of alleles, IMGT reference sequences, functionality, etc.) gene expression (known cDNAs), proteins and structures (Protein displays, IMGT Colliers de Perles). It provides internal links to the IMGT sequence databases and to the IMGT Repertoire Web resources, and external links to genome and generalist sequence databases. IMGT/GENE-DB manages the IMGT reference directory used by the IMGT tools for IG and TR gene and allele comparison and assignment, and by the IMGT databases for gene data annotation. IMGT/GENE-DB is freely available at http://imgt.cines.fr. Citation for the above abstract: Giudicelli, Veronique, Chaume, Denys, Lefranc, Marie-Paule IMGT/GENE-DB: a comprehensive database for human and mouse immunoglobulin and T cell receptor genes Nucl. Acids Res. 2005 33: D256-261 © 2005 Oxford University Press. The full text of the article can be found at: IMGT/GENE-DB |
| 369. IMGT/HLA Sequence Database |
URL: http://www.ebi.ac.uk/imgt/hla/ Categories: Immunological Databases The IMGT/HLA database (http://www.ebi.ac.uk/imgt/hla) has provided a centralized repository for the sequences of the alleles named by the WHO Nomenclature Committee for Factors of the HLA System for the past four years. Since its initial release the database has grown and is the primary source of information for the study of sequences of the human major histocompatibilty complex. The initial release of the database contained a limited number of tools. As a result of feedback from our users and developments in HLA we have been able to provide new tools and facilities. The HLA sequences have also been extended to include intron sequences and the 3' and 5' untranslated regions in the alignments and also the inclusion of new genes such as MICA. The IMGT/MHC database (http://www.ebi.ac.uk/imgt/mhc) was released in March 2002 to provide a similar resource for other species. The first release of IMGT/MHC contains the sequences of non-human primates (apes, new and old world monkeys), canines and feline sequences. Further species will be added shortly and the database aims to become the primary source of MHC data for non-human sequences. Citation for the above abstract: Robinson, James, Waller, Matthew J., Parham, Peter, Groot, Natasja de, Bontrop, Ronald, Kennedy, Lorna J., Stoehr, Peter, Marsh, Steven G. E. IMGT/HLA and IMGT/MHC: sequence databases for the study of the major histocompatibility complex Nucl. Acids Res. 2003 31: 311-314 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/311 |
| 370. IMGT/LIGM-DB |
URL: http://imgt.cines.fr Categories: Immunological Databases IMGT/LIGM-DB is the IMGT® comprehensive database of immunoglobulin (IG) and T cell receptor (TR) nucleotide sequences from human and other vertebrate species. It was created in 1989 by LIGM, Montpellier, France and is the oldest and the largest database of IMGT®. IMGT/LIGM-DB includes all germline (non-rearranged) and rearranged IG and TR genomic DNA (gDNA) and complementary DNA (cDNA) sequences published in generalist databases. IMGT/LIGM-DB allows searches from the Web interface according to biological and immunogenetic criteria through five distinct modules depending on the user interest. For a given entry, nine types of display are available including the IMGT flat file, the translation of the coding regions and the analysis by the IMGT/V-QUEST tool. IMGT/LIGM-DB distributes expertly annotated sequences. The annotations hugely enhance the quality and the accuracy of the distributed detailed information. They include the sequence identification, the gene and allele classification, the constitutive and specific motif description, the codon and amino acid numbering, and the sequence obtaining information, according to the main concepts of IMGT-ONTOLOGY. They represent the main source of IG and TR gene and allele knowledge stored in IMGT/GENE-DB and in the IMGT reference directory. IMGT/LIGM-DB is freely available at http://imgt.cines.fr. Citation for the above abstract: Giudicelli, Veronique, Duroux, Patrice, Ginestoux, Chantal, Folch, Geraldine, Jabado-Michaloud, Joumana, Chaume, Denys, Lefranc, Marie-Paule IMGT/LIGM-DB, the IMGT(R) comprehensive database of immunoglobulin and T cell receptor nucleotide sequences Nucl. Acids Res. 2006 34: D781-784 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D781 |
| 371. ABIM Internet biology |
URL: http://www.up.univ-mrs.fr/~wabim/english/biology.html Categories: Metadatabases and Directories "These reference pages are intended to provide some help to biologists looking for informations through the Internet and particularly for information available from W3 servers." |
| 372. Interferon Stimulated Gene Database |
URL: http://www.lerner.ccf.org/labs/williams/xchip-html.cgi Categories: Immunological Databases, Microarray Data and other Gene Expression Databases "Interferons (IFN) are a family of multifunctional cytokines that activate transcription of a subset of genes. The gene products induced by IFN are responsible for the antiviral, antiproliferative and immunomodulatory properties of this cytokine. In order to obtain a more comprehensive understanding of the genes regulated by IFNs we have used different microarray formats to identify over 400 interferon stimulated genes (ISG). To facilitate the dissemination of this data we have compiled a database comprising the ISGs assigned into functional categories. The database is fully searchable and contains links to sequence and Unigene information. The database and the array data is accessible via the world wide web at (http://www.lerner.ccf.org/labs/williams/ ). We intend to add published ISG-sequences and those discovered by further transcript profiling to the database to eventually compile a complete list of ISGs." |
| 373. IPD - ESTDAB: European Searchable Tumour Line Database and Cell Bank |
URL: http://www.ebi.ac.uk/ipd/estdab/ Categories: Immunological Databases IPD-ESTDAB is a database of immunologically characterized melanoma cell lines. The database works in conjunction with the European Searchable Tumour Cell Line Database (ESTDAB) cell bank, which is housed in Tübingen, Germany and provides immunologically characterized tumour cells. The IPD-ESTDAB section of the website provides an online search facility for cells stored in this cell bank. This enables investigators to identify cells possessing specific parameters important for studies of immunity, immunogenetics, gene expression, metastasis, response to chemotherapy and other tumour biological experimentation. The search tool allows for searches based on a single parameter, or clusters of parameters on over 250 different markers for each cell. The detailed reports produced can then be used to identify cells of interest, which in turn can then be obtained from the cell bank. Citation for the above excerpt: Robinson, James, Waller, Matthew J., Stoehr, Peter, Marsh, Steven G. E. IPD--the Immuno Polymorphism Database Nucl. Acids Res. 2005 33: D523-526 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D523 |
| 374. IPD - HPA Sequence Database |
URL: http://www.ebi.ac.uk/ipd/hpa/ Categories: Immunological Databases Human platelet antigens (HPAs) are alloantigens expressed only on platelets, specifically on platelet membrane glycoproteins. These platelet-specific antigens are immunogenic and can result in pathological reactions to transfusion therapy. The HPA nomenclature system was adopted in 1990 (17,18) to overcome problems with the previous nomenclature. Since then more antigens have been described and the molecular basis of many has been resolved. As a result the nomenclature was revised in 2003 (19) and included in the IPD project. The IPD-HPA section contains nomenclature information and additional background material. The different genes in the HPA system have not been sequenced to the same level as some of the other projects and so currently only single nucleotide polymorphisms (SNPs) are used to determine alleles. This information is presented in a grid of SNP for each gene. The IPD and HPA nomenclature committee hope to expand this to provide full sequence alignments when possible. Citation for the above excerpt: Robinson, James, Waller, Matthew J., Stoehr, Peter, Marsh, Steven G. E. IPD--the Immuno Polymorphism Database Nucl. Acids Res. 2005 33: D523-526 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D523 |
| 375. IPD - KIR Sequence Database |
URL: http://www.ebi.ac.uk/ipd/kir/ Categories: Immunological Databases The KIRs are members of the immunoglobulin super family (IgSF) formerly called Killer-cell Inhibitory Receptors. KIRs have been shown to be highly polymorphic both at the allelic and haplotypic levels (1). They are composed of two or three Ig-domains, a transmembrane region and cytoplasmic tail, which can in turn be short (activatory) or long (inhibitory). The Leukocyte Receptor Complex (LRC), which encodes KIR genes, has been shown to be polymorphic, polygenic and complex in a manner similar to the MHC. This complexity in sequences has led to the formation of the KIR nomenclature committee. The nomenclature committee is responsible for the naming of new allele sequences, and produced its first report in 2002 (2); this was complemented by the inclusion of the KIR data into IPD. The IPD-KIR Sequence Database contains the most up-to-date nomenclature and sequence alignments. Also available is an online submission tool that allows the submission of new and confirmatory KIR sequences directly to the KIR nomenclature committee. Sequences submitted to IPD as part of the work of the individual nomenclature committees are based on sequences currently found in the EMBL Nucleotide Sequence Database (EMBL) (3), GenBank (4) and the DDBJ (5). Indeed a requirement of all submissions to IPD is that they have been submitted to these more general databases. Future developments of the IPD-KIR section will involve working with the KIR nomenclature committee to provide nomenclature and tools for study of the complex haplotypes and genotypes currently seen in KIR research. Citation for the above excerpt: Robinson, James, Waller, Matthew J., Stoehr, Peter, Marsh, Steven G. E. IPD--the Immuno Polymorphism Database Nucl. Acids Res. 2005 33: D523-526 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D523 |
| 376. IPD - MHC Sequence Database |
URL: http://www.ebi.ac.uk/ipd/mhc/ Categories: Immunological Databases The MHC sequences of many different species have been reported previously (6–9), along with different nomenclature systems used in the naming and identification of new genes and alleles in each species. The sequences of the MHC from a number of different species are highly conserved between species (10). By bringing the work of different nomenclature committees and the sequences of different species together it is hoped to provide a central resource that will facilitate further research on the MHC of each species and on their comparison. The first release of the IPD-MHC database involved the work of groups specializing in non-human primates, canines (DLA) and felines (FLA) and incorporated all data previously available in the IMGT/MHC database (11). This release included data from 5 species of ape, 16 species of new world monkey, 17 species of old world monkey, as well as data on different canines and felines. Since the first release, we have been able to add sequences from cattle (BoLA), and are now working to include the MHC sequences from swine (SLA), chickens, horses (ELA) and rats (RT1). For each species, there are some differences in the spectrum of data covered but all sections provide the core nomenclature pages and sequence alignments. The nomenclature and alignments follow a similar structure to that of the IPD-KIR section, and the same basic tools are used in both sections. Currently, the IPD-MHC sequence alignments are limited to species-specific alignments; however, we are working to allow cross-species alignments and the inclusion of human sequences from the IMGT/HLA database (12) for comparative purposes. The IPD-MHC Sequence Databases will also contain a submission tool for online submission of new and confirmatory sequences to the appropriate nomenclature committee. Citation for the above excerpt: Robinson, James, Waller, Matthew J., Stoehr, Peter, Marsh, Steven G. E. IPD--the Immuno Polymorphism Database Nucl. Acids Res. 2005 33: D523-526 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D523 |
| 377. JenPep |
URL: http://www.jenner.ac.uk/Jenpep/ Categories: Immunological Databases MOTIVATION: The compilation of quantitative binding data underlies attempts to derive tools for the accurate prediction of epitopes in cellular immunology and is part of our concerted goal to develop practical computational vaccinology. RESULTS: JenPep is a family of relational databases supporting the growing community of immunoinformaticians. It contains quantitative data on peptide binding to Major Histocompatibility Complexes (MHCs) and to Transmembrane Peptide Transporter (TAP), as well as an annotated list of T-cell epitopes. AVAILABILITY: The database is available via the Internet. An HTML interface allowing searching of the database can be found at the following address: http://www.jenner.ac.uk/JenPep. Citation for the above abstract: Martin J. Blythe , Irini A. Doytchinova , and Darren R. Flower JenPep: a database of quantitative functional peptide data for immunology Bioinformatics 18: 434-439. © 2002 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/18/3/434 |
| 378. MHCBN: A Comprehensive Database of MHC Binding and Non-binding Peptides |
URL: http://www.imtech.res.in/raghava/mhcbn/ Categories: Immunological Databases MHCBN is a comprehensive database of Major Histocompatibility Complex (MHC) binding and non-binding peptides compiled from published literature and existing databases. The latest version of the database has 19 777 entries including 17 129 MHC binders and 2648 MHC non-binders for more than 400 MHC molecules. The database has sequence and structure data of (a) source proteins of peptides and (b) MHC molecules. MHCBN has a number of web tools that include: (i) mapping of peptide on query sequence; (ii) search on any field; (iii) creation of data sets; and (iv) online data submission. The database also provides hypertext links to major databases like SWISS-PROT, PDB, IMGT/HLA-DB, GenBank and PUBMED. Citation for the above abstract: Manoj Bhasin , Harpreet Singh , and G. P. S. Raghava MHCBN: a comprehensive database of MHC binding and non-binding peptides Bioinformatics 19: 665-666. © 2003 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/19/5/665 |
| 379. MHCPEP: A Database of MHC Binding Peptides |
URL: http://wehih.wehi.edu.au/mhcpep/ Categories: Immunological Databases MHCPEP (http://wehih.wehi.edu.au/mhcpep/) is a curated database comprising over 13 000 peptide sequences known to bind MHC molecules. Entries are compiled from published reports as well as from direct submissions of experimental data. Each entry contains the peptide sequence, its MHC specificity and where available, experimental method, observed activity, binding affinity, source protein and anchor positions, as well as publication references. The present format of the database allows text string matching searches but can easily be converted for use in conjunction with sequence analysis packages. The database can be accessed via Internet using WWW or FTP. Citation for the above abstract: Brusic, V, Rudy, G, Harrison, LC MHCPEP, a database of MHC-binding peptides: update 1997 Nucl. Acids Res. 1998 26: 368-371 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/368 |
| 380. VBASE2: The Integrative Germ-line V Gene Database |
URL: http://www.dnaplot.de/vbase2/ Categories: Immunological Databases The database VBASE2 provides germ-line sequences of human and mouse immunoglobulin variable (V) genes. It acts as an interconnecting platform between several existing self-contained data systems: VBASE2 integrates genome sequence data and links to the V genes in the Ensembl Genome Browser. For a single V gene sequence, all references to the EMBL nucleotide sequence database are provided, including references for V(D)J rearrangements. Furthermore, cross-references to the VBASE database, the IMGT database and the Kabat database are available. A DAS server allows the display of VBASE2 V genes within the Ensembl Genome Browser. VBASE2 can be accessed either by a web-based text query or by a sequence similarity search with the DNAPLOT software. VBASE2 is available at http://www.vbase2.org, and the DAS server is located at http://www.dnaplot.com/das. Citation for the above abstract: Retter, Ida, Althaus, Hans Helmar, Munch, Richard, Muller, Werner VBASE2, an integrative V gene database Nucl. Acids Res. 2005 33: D671-674 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D671 |
| 381. DBSubLoc: Database of Protein Subcellular Localization |
URL: http://www.bioinfo.tsinghua.edu.cn/dbsubloc.html Categories: Protein Localization and Targeting Databases We have built a protein subcellular localization annotation database, the DBSubLoc database, which is available at http://www.bioinfo.tsinghua. edu.cn/dbsubloc.html. Annotations were taken from primary protein databases, model organism genome projects and literature texts, and then were analyzed to dig out the subcellular localization features of the proteins. The proteins are also classified into different categories. Based on sequence alignment, non-redundant subsets of the database have been built, which may provide useful information for subcellular localization prediction. The database now contains >60,000 protein sequences including approximately 30,000 protein sequences in the non-redundant data sets. Online download, search and Blast tools are also available. Citation for the above abstract: Guo, Tao, Hua, Sujun, Ji, Xinglai, Sun, Zhirong DBSubLoc: database of protein subcellular localization Nucl. Acids Res. 2004 32: D122-124 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D122 |
| 382. MitoNuc |
URL: http://www2.ba.itb.cnr.it/MitoNuc/ Categories: Mitochondrial Genes and Proteins Databases, Protein Localization and Targeting Databases Mitochondria, besides their central role in energy metabolism, have recently been found to be involved in a number of basic processes of cell life and to contribute to the pathogenesis of many degenerative diseases. All functions of mitochondria depend on the interaction of nuclear and organelle genomes. Mitochondrial genomes have been extensively sequenced and analysed and data have been collected in several specialised databases. In order to collect information on nuclear coded mitochondrial proteins we developed MitoNuc, a database containing detailed information on sequenced nuclear genes coding for mitochondrial proteins in Metazoa. The MitoNuc database can be retrieved through SRS and is available via the web site http://bighost.area.ba.cnr.it/mitochondriome where other mitochondrial databases developed by our group, the complete list of the sequenced mitochondrial genomes, links to other mitochondrial sites and related information, are available. The MitoAln database, related to MitoNuc in the previous release, reporting the multiple alignments of the relevant homologous protein coding regions, is no longer supported in the present release. In order to keep the links among entries in MitoNuc from homologous proteins, a new field in the database has been defined: the cluster identifier, an alpha numeric code used to identify each cluster of homologous proteins. A comment field derived from the corresponding SWISS-PROT entry has been introduced; this reports clinical data related to dysfunction of the protein. The logic scheme of MitoNuc database has been implemented in the ORACLE DBMS. This will allow the end-users to retrieve data through a friendly interface that will be soon implemented. Citation for the above abstract: Attimonelli, Marcella, Catalano, Domenico, Gissi, Carmela, Grillo, Giorgio, Licciulli, Flavio, Liuni, Sabino, Santamaria, Monica, Pesole, Graziano, Saccone, Cecilia MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002 Nucl. Acids Res. 2002 30: 172-173 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/172 |
| 383. NESbase |
URL: http://www.cbs.dtu.dk/databases/NESbase/ Categories: Protein Localization and Targeting Databases Protein export from the nucleus is often mediated by a Leucine-rich Nuclear Export Signal (NES). NESbase is a database of experimentally validated Leucine-rich NESs curated from literature. These signals are not annotated in databases such as SWISS-PROT, PIR or PROSITE. Each NESbase entry contains information of whether NES was shown to be necessary and/or sufficient for export, and whether the export was shown to be mediated by the export receptor CRM1. The compiled information was used to make a sequence logo of the Leucine-rich NESs, displaying the conservation of amino acids within a window of 25 residues. Surprisingly, only 36% of the sequences used for the logo fit the widely accepted NES consensus L-x(2,3)-[LIVFM]-x(2,3)-L-x-[LI]. The database is available online at http://www.cbs.dtu.dk/databases/NESbase/. Citation for the above abstract: la Cour, Tanja, Gupta, Ramneek, Rapacki, Kristoffer, Skriver, Karen, Poulsen, Flemming M., Brunak, Soren NESbase version 1.0: a database of nuclear export signals Nucl. Acids Res. 2003 31: 393-396 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/393 |
| 384. NLSdb: A Database of Nuclear Localization Signals |
URL: http://cubic.bioc.columbia.edu/db/NLSdb/ Categories: Protein Localization and Targeting Databases NLSdb is a database of nuclear localization signals (NLSs) and of nuclear proteins. NLSs are short stretches of residues mediating transport of nuclear proteins into the nucleus. The database contains 114 experimentally determined NLSs that were obtained through an extensive literature search. Using 'in silico mutagenesis' this set was extended to 308 experimental and potential NLSs. This final set matched over 43% of all known nuclear proteins and matches no currently known non-nuclear protein. NLSdb contains over 6000 predicted nuclear proteins and their targeting signals from the PDB and SWISS-PROT/TrEMBL databases. The database also contains over 12 500 predicted nuclear proteins from six entirely sequenced eukaryotic proteomes (Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana and Saccharomyces cerevisiae). NLS motifs often co-localize with DNA-binding regions. This observation was used to also annotate over 1500 DNA-binding proteins. NLSdb can be accessed via the web site: http://cubic.bioc.columbia.edu/db/NLSdb/. Citation for the above abstract: Nair, Rajesh, Carter, Phil, Rost, Burkhard NLSdb: database of nuclear localization signals Nucl. Acids Res. 2003 31: 397-399 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/397 |
| 385. NMPdb: Nuclear Matrix Associated Proteins database |
URL: http://www.rostlab.org/db/NMPdb/ Categories: Protein Localization and Targeting Databases The nuclear matrix (NM) is a structure resulting from the aggregation of proteins and RNA in the nucleus of eukaryotic cells; it is the 'sticky bit' that remains after aggressive DNAse digestion and salt extraction protocols. Owing to the important role of the NM in DNA replication, DNA transcription and RNA splicing, the expression pattern of NM proteins has become an important early indicator for numerous cancers/tumors. Recent descriptions of the NM structure distinguish between a network-like 'internal nuclear matrix' (INM) and a 'nuclear shell' that connects the INM to the inner and outer nuclear membranes. A cautious NM preparation protocol reveals a coat of proteins on top of the INM; these proteins are usually referred to as the 'nuclear matrix-associated proteins'. Here, we describe a new database (NMPdb at http://www.rostlab.org/db/NMPdb/) that currently contains details of 398 NM proteins. We collected these data through a semi-automated analysis of over 3000 scientific articles in PubMed. We could match these 398 proteins to 302 protein sequences in UniProt or GenBank. Our NMPdb repository annotates these links along with the following annotations: organism, cell type, PubMed identifier, sequence-based predictions of structural and functional features and for some entries the explicit sequence segment that is responsible for localization (nuclear matrix targeting signal). Citation for the above abstract: Mika, Sven, Rost, Burkhard NMPdb: Database of Nuclear Matrix Proteins Nucl. Acids Res. 2005 33: D160-163 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D160 |
| 386. NOPdb: Nucleolar Proteome Database |
URL: http://www.lamondlab.com/NOPdb/ Categories: Protein Localization and Targeting Databases The Nucleolar Proteome Database (NOPdb) archives data on >700 proteins that were identified by multiple mass spectrometry (MS) analyses from highly purified preparations of human nucleoli, the most prominent nuclear organelle. Each protein entry is annotated with information about its corresponding gene, its domain structures and relevant protein homologues across species, as well as documenting its MS identification history including all the peptides sequenced by tandem MS/MS. Moreover, data showing the quantitative changes in the relative levels of 500 nucleolar proteins are compared at different timepoints upon transcriptional inhibition. Correlating changes in protein abundance at multiple timepoints, highlighted by visualization means in the NOPdb, provides clues regarding the potential interactions and relationships between nucleolar proteins and thereby suggests putative functions for factors within the 30% of the proteome which comprises novel/uncharacterized proteins. The NOPdb (http://www.lamondlab.com/NOPdb) is searchable by either gene names, nucleotide or protein sequences, Gene Ontology terms or motifs, or by limiting the range for isoelectric points and/or molecular weights and links to other databases (e.g. LocusLink, OMIM and PubMed). Citation for the above abstract: Leung, Anthony Kar Lun, Trinkle-Mulcahy, Laura, Lam, Yun Wah, Andersen, Jens S., Mann, Matthias, Lamond, Angus I. NOPdb: Nucleolar Proteome Database Nucl. Acids Res. 2006 34: D218-220 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D218 |
| 387. NPD: Nuclear Protein Database |
URL: http://npd.hgu.mrc.ac.uk/ Categories: Protein Localization and Targeting Databases The Nuclear Protein Database (NPD) is a curated database that contains information on more than 1300 vertebrate proteins that are thought, or are known, to localise to the cell nucleus. Each entry is annotated with information on predicted protein size and isoelectric point, as well as any repeats, motifs or domains within the protein sequence. In addition, information on the sub-nuclear localisation of each protein is provided and the biological and molecular functions are described using Gene Ontology (GO) terms. The database is searchable by keyword, protein name, sub-nuclear compartment and protein domain/motif. Links to other databases are provided (e.g. Entrez, SWISS-PROT, OMIM, PubMed, PubMed Central). Thus, NPD provides a gateway through which the nuclear proteome may be explored. The database can be accessed at http://npd.hgu.mrc.ac.uk and is updated monthly. Citation for the above abstract: Dellaire, G., Farrall, R., Bickmore, W.A. The Nuclear Protein Database (NPD): sub-nuclear localisation and functional annotation of the nuclear proteome Nucl. Acids Res. 2003 31: 328-330 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/328 |
| 388. Nuclear Receptor Resource |
URL: http://nrr.georgetown.edu/NRR/nrrhome.htm Categories: Individual Protein Family Databases, Protein Localization and Targeting Databases Last year, the original Glucocorticoid Receptor Resource was expanded into a comprehensive project: the Nuclear Receptor Resource (NRR, http:// nrr.georgetown.edu/nrr/nrr.html ). The NRR has since been offering comprehensive information on nuclear receptor structure and function, as well as general facts of interest to the scientific community on meetings, funding and employment opportunities. The project now includes individual resources as part of a network which integrates information on glucocorticoid, androgen, mineralocorticoid, thyroid hormone, Vitamin D and peroxisome-proliferator activated receptors. Many investigators have joined the NRR network by filling the Who is who? form available in the NRR home page. This has facilitated communication among scientists in the field and dissemination of data nor otherwise published. Because several investigators have contacted NRR authors over the past few months asking for advice and materials for educational purposes, we have recently decided to include in our project an educational resource on nuclear receptors termed the 'Graphics Library'. The input and suggestions of NRR users do shape the future direction of the project, so we encourage user to give us feedback. Citation for the above abstract: Martinez, E, Moore, DD, Keller, E, Pearce, D, Vanden Heuvel, JP, Robinson, V, Gottlieb, B, MacDonald, P, Simons, S, Jr, Sanchez, E, Danielsen, M The Nuclear Receptor Resource: a growing family Nucl. Acids Res. 1998 26: 239-241 © 1998 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/239 |
| 389. NUREBASE: NUclear REceptor dataBASE |
URL: http://www.ens-lyon.fr/LBMC/laudet/nurebase/nurebase.html Categories: Individual Protein Family Databases, Protein Localization and Targeting Databases Nuclear hormone receptors are an abundant class of ligand-activated transcriptional regulators, found in varying numbers in all animals. Based on our experience of managing the official nomenclature of nuclear receptors, we have developed NUREBASE, a database containing protein and DNA sequences, reviewed protein alignments and phylogenies, taxonomy and annotations for all nuclear receptors. New developments in NUREBASE include explicit declaration of alternative transcripts of each gene, and expression data for human and mouse nuclear receptors. The core of NUREBASE is reviewed, and it is completed by NUREBASE_DAILY, automatically updated every 24 h. All information on accessing and installing NUREBASE may be found at http://www. ens-lyon.fr/LBMC/laudet/nurebase/nurebase.html. Citation for the above abstract: Ruau, David, Duarte, Jorge, Ourjdal, Tarik, Perriere, Guy, Laudet, Vincent, Robinson-Rechavi, Marc Update of NUREBASE: nuclear hormone receptor functional genomics Nucl. Acids Res. 2004 32: D165-167 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D165 |
| 390. OGRe: Organellar Genome Retrieval |
URL: http://drake.physics.mcmaster.ca/ogre/ Categories: Mitochondrial Genes and Proteins Databases, Protein Localization and Targeting Databases Organellar Genome Retrieval (OGRe) is a relational database of complete mitochondrial genome sequences for over 250 Metazoan species. OGRe provides a resource for the comparative analysis of mitochondrial genomes at several levels. At the sequence level, OGRe allows the retrieval of any selected set of mitochondrial genes from any selected set of species. Species are classified using a taxonomic system that allows easy selection of related groups of species. Sequence alignments are also available for some species. At the level of individual nucleotides, the system contains information on base frequencies and codon usage frequencies that can be compared between organisms. At the level of whole genomes, OGRe provides several ways of visualizing information on gene order. Diagrams illustrating the genome arrangement can be generated for any selected set of species automatically from the information in the database. Searches can be done based on gene arrangement to find sets of species that have the same order as one another. Diagrams for pairwise comparison of species can be produced that show the positions of break-points in the gene order and use colour to highlight the sections of the genome that have moved. OGRe is available from http://www.bioinf.man.ac.uk/ogre. Citation for the above abstract: Jameson, Daniel, Gibson, Andrew P., Hudelot, Cendrine, Higgs, Paul G. OGRe: a relational database for comparative analysis of mitochondrial genomes Nucl. Acids Res. 2003 31: 202-206 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/202 |
| 391. PSORTdb: A Database of Protein Subcellular Localizations for Bacteria |
URL: http://db.psort.org/ Categories: Protein Localization and Targeting Databases Information about bacterial subcellular localization (SCL) is important for protein function prediction and identification of suitable drug/vaccine/diagnostic targets. PSORTdb (http://db.psort.org/) is a web-accessible database of SCL for bacteria that contains both information determined through laboratory experimentation and computational predictions. The dataset of experimentally verified information (approximately 2000 proteins) was manually curated by us and represents the largest dataset of its kind. Earlier versions have been used for training SCL predictors, and its incorporation now into this new PSORTdb resource, with its associated additional annotation information and dataset version control, should aid researchers in future development of improved SCL predictors. The second component of this database contains computational analyses of proteins deduced from the most recent NCBI dataset of completely sequenced genomes. Analyses are currently calculated using PSORTb, the most precise automated SCL predictor for bacterial proteins. Both datasets can be accessed through the web using a very flexible text search engine, a data browser, or using BLAST, and the entire database or search results may be downloaded in various formats. Features such as GO ontologies and multiple accession numbers are incorporated to facilitate integration with other bioinformatics resources. PSORTdb is freely available under GNU General Public License. Citation for the above abstract: Rey, Sebastien, Acab, Michael, Gardy, Jennifer L., Laird, Matthew R., deFays, Katalin, Lambert, Christophe, Brinkman, Fiona S. L. PSORTdb: a protein subcellular localization database for bacteria Nucl. Acids Res. 2005 33: D164-168 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D164 |
| 392. SPD: Secreted Protein Database |
URL: http://spd.cbi.pku.edu.cn/ Categories: Protein Localization and Targeting Databases With the improved secreted protein prediction approach and comprehensive data sources, including Swiss-Prot, TrEMBL, RefSeq, Ensembl and CBI-Gene, we have constructed secretomes of human, mouse and rat, with a total of 18 152 secreted proteins. All the entries are ranked according to the prediction confidence. They were further annotated via a proteome annotation pipeline that we developed. We also set up a secreted protein classification pipeline and classified our predicted secreted proteins into different functional categories. To make the dataset more convincing and comprehensive, nine reference datasets are also integrated, such as the secreted proteins from the Gene Ontology Annotation (GOA) system at the European Bioinformatics Institute, and the vertebrate secreted proteins from Swiss-Prot. All these entries were grouped via a TribeMCL based clustering pipeline. We have constructed a web-based secreted protein database, which has been publicly available at http://spd.cbi.pku.edu.cn. Users can browse the database via a GO assignment or chromosomal-location-based interface. Moreover, text query and sequence similarity search are also provided, and the sequence and annotation data can be downloaded freely from the SPD website. Citation for the above abstract: Chen, Yunjia, Zhang, Yong, Yin, Yanbin, Gao, Ge, Li, Songgang, Jiang, Ying, Gu, Xiaocheng, Luo, Jingchu SPD--a web-based secreted protein database Nucl. Acids Res. 2005 33: D169-173 © 2005 Oxford University Press. The full abstract can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D169 |
| 393. THGS: Transmembrane Helices in Genome Sequences |
URL: http://pranag.physics.iisc.ernet.in/thgs/ Categories: Protein Localization and Targeting Databases Transmembrane Helices in Genome Sequences (THGS) is an interactive web-based database, developed to search the transmembrane helices in the user-interested gene sequences available in the Genome Database (GDB). The proposed database has provision to search sequence motifs in transmembrane and globular proteins. In addition, the motif can be searched in the other sequence databases (Swiss-Prot and PIR) or in the macromolecular structure database, Protein Data Bank (PDB). Further, the 3D structure of the corresponding queried motif, if it is available in the solved protein structures deposited in the Protein Data Bank, can also be visualized using the widely used graphics package RASMOL. All the sequence databases used in the present work are updated frequently and hence the results produced are up to date. The database THGS is freely available via the world wide web and can be accessed at http:// pranag.physics.iisc.ernet.in/thgs/ or http://144.16. 71.10/thgs/. Citation for the above abstract: Fernando, S. A., Selvarani, P., Das, Soma, Kumar, Ch. Kiran, Mondal, Sukanta, Ramakumar, S., Sekar, K. THGS: a web-based database of Transmembrane Helices in Genome Sequences Nucl. Acids Res. 2004 32: D125-128 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D125 |
| 394. ASC: Active Sequence Collection |
URL: http://bioinformatica.isa.cnr.it/ASC/ Categories: Protein Sequence Motifs and Active Sites Databases Active Sequences Collection (ASC) is a collection of amino acid sequences, with an unique feature: only short sequences are collected, with a demonstrated biological activity. The current version of ASC consists of three sections: DORRS, a collection of active RGD-containing peptides; TRANSIT, a collection of protein regions active as substrates of transglutaminase enzyme (TGase), and BAC, a collection of short peptides with demonstrated biological activity. Literature references for each entry are reported, as well as cross references to other databases, when available. The current version of ASC includes more than 800 different entries. The main scope of this collection is to offer a new tool to investigate the structural features of protein active sites, additionally to similarity searches against large protein databases or searching for known functional patterns. ASC database is available at the web address http://crisceb.unina2.it/ASC/ which also offers a dedicated query interface to compare user-defined protein sequences with the database, as well as an updating interface to allow contribution of new referenced active sequences. Citation for the above abstract: Facchiano, Angelo M., Facchiano, Antonio, Facchiano, Francesco Active Sequences Collection (ASC) database: a new tool to assign functions to protein sequences Nucl. Acids Res. 2003 31: 379-382 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/379 |
| 395. Blocks |
URL: http://blocks.fhcrc.org/ Categories: Protein Sequence Motifs and Active Sites Databases The Blocks Database WWW (http://blocks.fhcrc.org ) and Email (blocks@blocks.fhcrc.org ) servers provide tools to search DNA and protein queries against the Blocks+ Database of multiple alignments, which represent conserved protein regions. Blocks+ nearly doubles the number of protein families included in the database by adding families from the Pfam-A, ProDom and Domo databases to those from PROSITE and PRINTS. Other new features include improved Block Searcher statistics, searching with NCBI's IMPALA program and 3D display of blocks on PDB structures. Citation for the above abstract: Henikoff, Jorja G., Greene, Elizabeth A., Pietrokovski, Shmuel, Henikoff, Steven Increased coverage of protein families with the Blocks Database servers Nucl. Acids Res. 2000 28: 228-230 © 2000 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/228 |
| 396. COMe: The Bioinorganic Motif Database |
URL: http://www.ebi.ac.uk/come/index.html Categories: Protein Sequence Motifs and Active Sites Databases BACKGROUND: Many characterised proteins contain metal ions, small organic molecules or modified residues. In contrast, the huge amount of data generated by genome projects consists exclusively of sequences with almost no annotation. One of the goals of the structural genomics initiative is to provide representative three-dimensional (3-D) structures for as many protein/domain folds as possible to allow successful homology modelling. However, important functional features such as metal co-ordination or a type of prosthetic group are not always conserved in homologous proteins. So far, the problem of correct annotation of bioinorganic proteins has been largely ignored by the bioinformatics community and information on bioinorganic centres obtained by methods other than crystallography or NMR is only available in literature databases. RESULTS: COMe (Co-Ordination of Metals) represents the ontology for bioinorganic and other small molecule centres in complex proteins. COMe consists of three types of entities: 'bioinorganic motif' (BIM), 'molecule' (MOL), and 'complex proteins' (PRX), with each entity being assigned a unique identifier. A BIM consists of at least one centre (metal atom, inorganic cluster, organic molecule) and two or more endogenous and/or exogenous ligands. BIMs are represented as one-dimensional (1-D) strings and 2-D diagrams. A MOL entity represents a 'small molecule' which, when in complex with one or more polypeptides, forms a functional protein. The PRX entities refer to the functional proteins as well as to separate protein domains and subunits. The complex proteins in COMe are subdivided into three categories: (i) metalloproteins, (ii) organic prosthetic group proteins and (iii) modified amino acid proteins. The data are currently stored in both XML format and a relational database and are available at http://www.ebi.ac.uk/come/. CONCLUSION: COMe provides the classification of proteins according to their 'bioinorganic' features and thus is orthogonal to other classification schemes, such as those based on sequence similarity, 3-D fold, enzyme activity, or biological process. The hierarchical organisation of the controlled vocabulary allows both for annotation and querying at different levels of granularity. Citation for the above abstract: Degtyarenko K, Contrino S. COMe: the ontology of bioinorganic proteins. BMC Struct Biol. 2004 Feb 27;4(1):3. © 2004 By Degtyarenko and Contrino. The full text of the article can be found at: http://www.biomedcentral.com/1472-6807/4/3/ |
| 397. CoPS: Comprehensive Peptide Signature Database |
URL: http://203.90.127.70/copsv2/index.asp Categories: Protein Sequence Motifs and Active Sites Databases We present the development of a Comprehensive database of 12 076 invariant Peptide Signatures (CoPS) derived from 52 bacterial genomes with a minimum occurrence in at least seven organisms. These peptides were observed in functionally similar proteins and are distributed over nearly 1250 different functional proteins. The database provides function, structure and occurrence in biochemical pathways of the proteins containing these signature peptides. It houses additional information on the signature peptides, such as identical match in other motif/pattern (e.g. PROSITE, BLOCKS, PRINTS and Pfam) databases and the database of interacting proteins, human proteome and mutation effect on these signature peptides. There is a wide applicability of this database in the identification of critical functional residues in proteins. The database also facilitates the identification of folding nucleus/structural determinants in proteins and functional assignment to yet unknown proteins. We demonstrate functional assignment to 2605 hypothetical proteins in bacterial genomes and 112 unknown proteins in human using this database. AVAILABILITY: The database can be freely accessed through the following URL: http://203.195.151.46/copsv2/index.html or http://203.90.127.70/copsv2/index.html Citation for the above abstract: Tulika Prakash , Mamta Khandelwal , Dipayan Dasgupta , Debasis Dash , and Samir K. Brahmachari CoPS: Comprehensive Peptide Signature Database Bioinformatics Advance Access published on November 1, 2004, DOI 10.1093/bioinformatics/bth325. Bioinformatics 20: 2886-2888. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/16/2886 |
| 398. CSA: Catalytic Site Atlas |
URL: http://www.ebi.ac.uk/thornton-srv/databases/CSA/ Categories: Protein Sequence Motifs and Active Sites Databases, Protein Structure Databases The Catalytic Site Atlas (CSA) provides catalytic residue annotation for enzymes in the Protein Data Bank. It is available online at http://www.ebi.ac.uk/thornton-srv/databases/CSA. The database consists of two types of annotated site: an original hand-annotated set containing information extracted from the primary literature, using defined criteria to assign catalytic residues, and an additional homologous set, containing annotations inferred by PSI-BLAST and sequence alignment to one of the original set. The CSA can be queried via Swiss-Prot identifier and EC number, as well as by PDB code. CSA Version 1.0 contains 177 original hand- annotated entries and 2608 homologous entries, and covers approximately 30% of all EC numbers found in PDB. The CSA will be updated on a monthly basis to include homologous sites found in new PDBs, and new hand-annotated enzymes as and when their annotation is completed. Citation for the above abstract: Porter, Craig T., Bartlett, Gail J., Thornton, Janet M. The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data Nucl. Acids Res. 2004 32: D129-133 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D129 |
| 399. eBLOCKS |
URL: http://fold.stanford.edu/eblocks/acsearch.html Categories: Protein Sequence Motifs and Active Sites Databases Classifying proteins into families and superfamilies allows identification of functionally important conserved domains. The motifs and scoring matrices derived from such conserved regions provide computational tools that recognize similar patterns in novel sequences, and thus enable the prediction of protein function for genomes. The eBLOCKs database enumerates a cascade of protein blocks with varied conservation levels for each functional domain. A biologically important region is most stringently conserved among a smaller family of highly similar proteins. The same region is often found in a larger group of more remotely related proteins with a reduced stringency. Through enumeration, highly specific signatures can be generated from blocks with more columns and fewer family members, while highly sensitive signatures can be derived from blocks with fewer columns and more members as in a superfamily. By applying PSI-BLAST and a modified K-means clustering algorithm, eBLOCKs automatically groups protein sequences according to different levels of similarity. Multiple sequence alignments are made and trimmed into a series of ungapped blocks. Motifs and position-specific scoring matrices were derived from eBLOCKs and made available for sequence search and annotation. The eBLOCKs database provides a tool for high-throughput genome annotation with maximal specificity and sensitivity. The eBLOCKs database is freely available on the World Wide Web at http://motif.stanford.edu/eblocks/ to all users for online usage. Academic and not-for-profit institutions wishing copies of the program may contact Douglas L. Brutlag (brutlag@stanford.edu). Commercial firms wishing copies of the program for internal installation may contact Jacqueline Tay at the Stanford Office of Technology Licensing (jacqueline.tay@stanford.edu; http://otl.stanford.edu/). Citation for the above abstract: Su, Qiaojuan Jane, Lu, Lin, Saxonov, Serge, Brutlag, Douglas L. eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity Nucl. Acids Res. 2005 33: D178-182 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D178 |
| 400. eF-site: Electrostatic surface of Functional site |
URL: http://ef-site.protein.osaka-u.ac.jp/eF-site/ Categories: Protein Sequence Motifs and Active Sites Databases, Protein Structure Databases The electrostatic-surface of functional site (eF-site) is a database for the molecular surfaces of protein functional sites. To enable browsing of each molecular surface along with the atomic model, we have developed a new three-dimensional interactive viewer, PDBjViewer, that can be used both as an applet and as a stand-alone program. AVAILABILITY: The eF-site database and PDBjViewer are freely available from http://www.pdbj.org/eF-site/ Citation for the above abstract: Kengo Kinoshita , and Haruki Nakamura eF-site and PDBjViewer: database and viewer for protein functional sites Bioinformatics Advance Access published on May 22, 2004, DOI 10.1093/bioinformatics/bth073. Bioinformatics 20: 1329-1330. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/8/1329 |
| 401. eMOTIF |
URL: http://dlb4.stanford.edu/emotif/ Categories: Protein Sequence Motifs and Active Sites Databases The EMOTIF database is a collection of more than 170 000 highly specific and sensitive protein sequence motifs representing conserved biochemical properties and biological functions. These protein motifs are derived from 7697 sequence alignments in the BLOCKS+ database (released on June 23, 2000) and all 8244 protein sequence alignments in the PRINTS database (version 27.0) using the emotif-maker algorithm developed by Nevill-Manning et al. (Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865-5871; Nevill-Manning,C.G., Sethi,K.S., Wu,T. D. and Brutlag,D.L. (1997) ISMB-97, 5, 202-209). Since the amino acids and the groups of amino acids in these sequence motifs represent critical positions conserved in evolution, search algorithms employing the EMOTIF patterns can identify and classify more widely divergent sequences than methods based on global sequence similarity. The emotif protein pattern database is available at http://motif.stanford.edu/emotif/. Citation for the above abstract: Huang, Jimmy Y., Brutlag, Douglas L. The EMOTIF database Nucl. Acids Res. 2001 29: 202-204 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/202 |
| 402. InterPro |
URL: http://www.ebi.ac.uk/interpro/ Categories: Protein Domain and Protein Classification Databases, Protein Sequence Motifs and Active Sites Databases InterPro, an integrated documentation resource of protein families, domains and functional sites, was created to integrate the major protein signature databases. Currently, it includes PROSITE, Pfam, PRINTS, ProDom, SMART, TIGRFAMs, PIRSF and SUPERFAMILY. Signatures are manually integrated into InterPro entries that are curated to provide biological and functional information. Annotation is provided in an abstract, Gene Ontology mapping and links to specialized databases. New features of InterPro include extended protein match views, taxonomic range information and protein 3D structure data. One of the new match views is the InterPro Domain Architecture view, which shows the domain composition of protein matches. Two new entry types were introduced to better describe InterPro entries: these are active site and binding site. PIRSF and the structure-based SUPERFAMILY are the latest member databases to join InterPro, and CATH and PANTHER are soon to be integrated. InterPro release 8.0 contains 11 007 entries, representing 2573 domains, 8166 families, 201 repeats, 26 active sites, 21 binding sites and 20 post-translational modification sites. InterPro covers over 78% of all proteins in the Swiss-Prot and TrEMBL components of UniProt. The database is available for text- and sequence-based searches via a webserver (http://www.ebi.ac.uk/interpro), and for download by anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro). Citation for the above abstract: Mulder, Nicola J., Apweiler, Rolf, Attwood, Teresa K., Bairoch, Amos, Bateman, Alex, et al. InterPro, progress and status in 2005 Nucl. Acids Res. 2005 33: D201-205 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D201 |
| 403. MDB: Metalloprotein Database and Browser |
URL: http://metallo.scripps.edu/ Categories: Protein Sequence Motifs and Active Sites Databases The Metalloprotein Database and Browser (MDB; http://metallo.scripps.edu) at The Scripps Research Institute is a web-accessible resource for metalloprotein research. It offers the scientific community quantitative information on geometrical parameters of metal-binding sites in protein structures available from the Protein Data Bank (PDB). The MDB also offers analytical tools for the examination of trends or patterns in the indexed metal-binding sites. A user can perform interactive searches, metal-site structure visualization (via a Java applet), and analysis of the quantitative data by accessing the MDB through a web browser without requiring an external application or platform-dependent plugin. The MDB also has a non-interactive interface with which other web sites and network-aware applications can seamlessly incorporate data or statistical analysis results from metal-binding sites. The information contained in the MDB is periodically updated with automated algorithms that find and index metal sites from new protein structures released by the PDB. Citation for the above abstract: Castagnetto, Jesus M., Hennessy, Sean W., Roberts, Victoria A., Getzoff, Elizabeth D., Tainer, John A., Pique, Michael E. MDB: the Metalloprotein Database and Browser at The Scripps Research Institute Nucl. Acids Res. 2002 30: 379-382 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/379 |
| 404. O-GLYCBASE |
URL: http://www.cbs.dtu.dk/databases/OGLYCBASE/ Categories: Protein Sequence Motifs and Active Sites Databases O-GLYCBASE is a database of glycoproteins with O-linked glycosylation sites. Entries with at least one experimentally verified O-glycosylation site have been compiled from protein sequence databases and literature. Each entry contains information about the glycan involved, the species, sequence, a literature reference and http-linked cross-references to other databases. Version 4.0 contains 179 protein entries, an approximate 15% increase over the last version. Sequence logos representing the acceptor specificity patterns for GalNAc, GlcNAc, mannosyl and xylosyl transferases are shown. The O-GLYCBASE database is available through the WWW at http://www.cbs.dtu.dk/databases/OGLYCBASE/ Citation for the above abstract: Gupta, R, Birch, H, Rapacki, K, Brunak, S, Hansen, JE O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins Nucl. Acids Res. 1999 27: 370-372 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/370 |
| 405. PDBSITE |
URL: http://wwwmgs.bionet.nsc.ru/mgs/gnw/pdbsite/ Categories: Protein Sequence Motifs and Active Sites Databases The PDBSite database provides comprehensive structural and functional information on various protein sites (post-translational modification, catalytic active, organic and inorganic ligand binding, protein-protein, protein-DNA and protein-RNA interactions) in the Protein Data Bank (PDB). The PDBSite is available online at http://wwwmgs.bionet.nsc.ru/mgs/gnw/pdbsite/. It consists of functional sites extracted from PDB using the SITE records and of an additional set containing the protein interaction sites inferred from the contact residues in heterocomplexes. The PDBSite was set up by automated processing of the PDB. The PDBSite database can be queried through the functional description and the structural characteristics of the site and its environment. The PDBSite is integrated with the PDBSiteScan tool allowing structural comparisons of a protein against the functional sites. The PDBSite enables the recognition of functional sites in protein tertiary structures, providing annotation of function through structure. The PDBSite is updated after each new PDB release. Citation for the above abstract: Ivanisenko, Vladimir A., Pintus, Sergey S., Grigorovich, Dmitry A., Kolchanov, Nickolay A. PDBSite: a database of the 3D structure of protein functional sites Nucl. Acids Res. 2005 33: D183-187 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D183 |
| 406. Phospho.ELM: The Protein Phosphorylation Database |
URL: http://phospho.elm.eu.org/ Categories: Protein Sequence Motifs and Active Sites Databases BACKGROUND: Post-translational phosphorylation is one of the most common protein modifications. Phosphoserine, threonine and tyrosine residues play critical roles in the regulation of many cellular processes. The fast growing number of research reports on protein phosphorylation points to a general need for an accurate database dedicated to phosphorylation to provide easily retrievable information on phosphoproteins. DESCRIPTION: Phospho.ELM http://phospho.elm.eu.org is a new resource containing experimentally verified phosphorylation sites manually curated from the literature and is developed as part of the ELM (Eukaryotic Linear Motif) resource. Phospho.ELM constitutes the largest searchable collection of phosphorylation sites available to the research community. The Phospho.ELM entries store information about substrate proteins with the exact positions of residues known to be phosphorylated by cellular kinases. Additional annotation includes literature references, subcellular compartment, tissue distribution, and information about the signaling pathways involved as well as links to the molecular interaction database MINT. Phospho.ELM version 2.0 contains 1703 phosphorylation site instances for 556 phosphorylated proteins. CONCLUSION: Phospho.ELM will be a valuable tool both for molecular biologists working on protein phosphorylation sites and for bioinformaticians developing computational predictions on the specificity of phosphorylation reactions. Citation for the above abstract: Diella F, Cameron S, Gemund C, Linding R, Via A, Kuster B, Sicheritz-Ponten T, Blom N, Gibson TJ. Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins. BMC Bioinformatics. 2004 Jun 22;5(1):79. © 2004 By Diella et al. The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/5/79 |
| 407. PRINTS |
URL: http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/ Categories: Protein Sequence Motifs and Active Sites Databases The PRINTS database houses a collection of protein fingerprints. These may be used to assign uncharacterised sequences to known families and hence to infer tentative functions. The September 2002 release (version 36.0) includes 1800 fingerprints, encoding approximately 11 000 motifs, covering a range of globular and membrane proteins, modular polypeptides and so on. In addition to its continued steady growth, we report here the development of an automatic supplement, prePRINTS, designed to increase the coverage of the resource and reduce some of the manual burdens inherent in its maintenance. The databases are accessible for interrogation and searching at http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/. Citation for the above abstract: Attwood, T. K., Bradley, P., Flower, D. R., Gaulton, A., Maudling, N., Mitchell, A. L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., Zygouri, C. PRINTS and its automatic supplement, prePRINTS Nucl. Acids Res. 2003 31: 400-402 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/400 |
| 408. PROMISE |
URL: http://metallo.scripps.edu/PROMISE/ Categories: Protein Sequence Motifs and Active Sites Databases The PROMISE (prosthetic centres andmetalions in protein activesites) database aims to present comprehensive sequence, structural, functional and bibliographic information on metalloproteins and other complex proteins, with an emphasis on active site structure and function. The database is available on the WorldWide Web at http://bioinf.leeds.ac.uk/promise/ Citation for the above abstract: Degtyarenko, KN, North, AC, Findlay, JB PROMISE: a database of bioinorganic motifs Nucl. Acids Res. 1999 27: 233-236 © 1999 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/233 |
| 409. PROSITE |
URL: http://www.expasy.org/prosite/ Categories: Protein Sequence Motifs and Active Sites Databases The PROSITE database consists of a large collection of biologically meaningful signatures that are described as patterns or profiles. Each signature is linked to a documentation that provides useful biological information on the protein family, domain or functional site identified by the signature. The PROSITE database is now complemented by a series of rules that can give more precise information about specific residues. During the last 2 years, the documentation and the ScanProsite web pages were redesigned to add more functionalities. The latest version of PROSITE (release 19.11 of September 27, 2005) contains 1329 patterns and 552 profile entries. Over the past 2 years more than 200 domains have been added, and now 52% of UniProtKB/Swiss-Prot entries (release 48.1 of September 27, 2005) have a cross-reference to a PROSITE entry. The database is accessible at http://www.expasy.org/prosite/. Citation for the above abstract: Hulo, Nicolas, Bairoch, Amos, Bulliard, Virginie, Cerutti, Lorenzo, De Castro, Edouard, Langendijk-Genevaux, Petra S., Pagni, Marco, Sigrist, Christian J. A. The PROSITE database Nucl. Acids Res. 2006 34: D227-230 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D227 |
| 410. ProTeus: PROtein TErminUS |
URL: http://www.proteus.cs.huji.ac.il/ Categories: Protein Sequence Motifs and Active Sites Databases "At the two ends of each protein are the amino (N-) and carboxyl (C-) termini. They each have specific biochemical properties that affect the rich repertoire of biological processes in which they may be involved. Thus, signal sequences at the N-terminal assign a protein to the secretory pathway. On the other hand, a PDZ recognition sequence at the C-terminal of synaptic proteins is required for the assembly of functional complexes, channel clustering and sub-membrane cytoskeletal mesh organization. Most search programs do not treat the termini in any special way, and therefore fail to detect short signatures associated with them. We address this problem by using a search algorithm that is based on positional statistics. This is implemented as a software tool called ProTeus (PROtein TErminUS) with which we compiled a list of highly significant protein groups (called SIGs). This search has revealed many known as well as some novel signatures in protein termini. What the proteins in each SIG have in common is a very short significant signature at one of the protein termini, though their overall sequence similarity is low. The entire archive of SIGs is ranked according to functional relevance as reflected by several external annotation sources. We claim that several hundreds of these new SIGs represent previously overlooked signatures that should be experimentally tested." |
| 411. EDGE: Environment, Drugs and Gene Expression |
URL: http://edge.oncology.wisc.edu/edge.php Categories: Microarray Data and other Gene Expression Databases, Toxicology Databases The application of microarray technology to characterize changes in global gene expression in response to chemical exposure holds great promise for toxicant classification and hazard prediction. Unfortunately, the use of different array platforms, different hybridization protocols and different annotation schemes hinders the meaningful comparison of transcriptional profiling data across laboratories. Our solution to this problem is to centralize microarray data generation and to develop easily accessible and uniform informatic tools for the efficient analysis and sharing of toxicogenomic data. In an effort to create such a resource, we developed a standardized set of microarray reagents and reproducible protocols for the analysis of liver gene expression in the mouse model. Moreover, we have generated an initial training set of 117 toxicogenomic profiles. This web-accessible database and informatics suite, known as Environment, Drugs, Genes and Expression, or EDGE, is now available at http://edge.oncology.wisc.edu/edge.php. Citation for the above abstract: Hayes, Kevin R, Vollrath, Aaron L, Zastrow, Gina M, McMillan, Brian J, Craven, Mark W, Jovanovich, Stevan B, Walisser, Jacqueline A, Rank, Dave R, Penn, Sharon G, Reddy, Janardan K., Thomas, Russell S, Bradfield, Christopher A. EDGE: A Centralized Resource for the Comparison, Analysis and Distribution of Toxicogenomic Information Mol Pharmacol 2005 0: mol.104.009175 © 2005 The American Society for Pharmacology and Experimental Therapeutics. The full abstract can be found at: http://molpharm.aspetjournals.org/cgi/content/abstract/mol.104.009175v1 |
| 412. MILANO: Microarray Literature-based Annotation |
URL: http://milano.md.huji.ac.il/ Categories: Microarray Data and other Gene Expression Databases BACKGROUND: High-throughput genomic research tools are becoming standard in the biologist's toolbox. After processing the genomic data with one of the many available statistical algorithms to identify statistically significant genes, these genes need to be further analyzed for biological significance in light of all the existing knowledge. Literature mining - the process of representing literature data in a fashion that is easy to relate to genomic data - is one solution to this problem. RESULTS: We present a web-based tool, MILANO (Microarray Literature-based Annotation), that allows annotation of lists of genes derived from microarray results by user defined terms. Our annotation strategy is based on counting the number of literature co-occurrences of each gene on the list with a user defined term. This strategy allows the customization of the annotation procedure and thus overcomes one of the major limitations of the functional annotations usually provided with microarray results. MILANO expands the gene names to include all their informative synonyms while filtering out gene symbols that are likely to be less informative as literature searching terms. MILANO supports searching two literature databases: GeneRIF and Medline (through PubMed), allowing retrieval of both quick and comprehensive results. We demonstrate MILANO's ability to improve microarray analysis by analyzing a list of 150 genes that were affected by p53 overproduction. This analysis reveals that MILANO enables immediate identification of known p53 target genes on this list and assists in sorting the list into genes known to be involved in p53 related pathways, apoptosis and cell cycle arrest. CONCLUSIONS: MILANO provides a useful tool for the automatic custom annotation of microarray results which is based on all the available literature. MILANO has two major advances over similar tools: the ability to expand gene names to include all their informative synonyms while removing synonyms that are not informative and access to the GeneRIF database which provides short summaries of curated articles relevant to known genes. MILANO is available at http://milano.md.huji.ac.il. Citation for the above abstract: Rubinstein R, Simon I. MILANO - custom annotation of microarray results using automatic literature searches. BMC Bioinformatics. 2005 Jan 20;6(1):12 [Epub ahead of print] © 2005 By Rubinstein and Simon. The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/6/12 |
| 413. GOBASE |
URL: http://gobase.bcm.umontreal.ca/ Categories: Organelle Databases The organelle genome database GOBASE is now in its twelfth release, and includes 350 000 mitochondrial sequences and 118 000 chloroplast sequences, roughly a 3-fold expansion since previously documented. GOBASE also includes a fully reannotated genome sequence of Rickettsia prowazekii, one of the closest bacterial relatives of mitochondria, and will shortly expand to contain more data from bacteria from which organelles originated. All these sequences are now accessible through a single unified interface. Enhancements to the functionality of GOBASE include addition of pages for RNA structures and a page compiling data about the taxonomic distribution of organelle-encoded genes; incorporation of Gene Ontology terms; addition of features deduced from incomplete annotations to sequences in GenBank; marking of type examples in cases where single genes in single species are oversampled within GenBank; and addition of graphics illustrating gene structure and the position of neighbouring genes on a sequence. The database has been reimplemented in PostgreSQL to facilitate development and maintenance, and structural modifications have been made to speed up queries, particularly those related to taxonomy. The GOBASE database can be queried at http://gobase.bcm.umontreal.ca/ and inquiries should be directed to gobase@bch.umontreal.ca. Citation for the above abstract: O'Brien, Emmet A., Zhang, Yue, Yang, LiuSong, Wang, Eric, Marie, Veronique, Lang, B. Franz, Burger, Gertraud GOBASE--a database of organelle and bacterial genome information Nucl. Acids Res. 2006 34: D697-699 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D697 |
| 414. Organelle DB |
URL: http://organelledb.lsi.umich.edu/ Categories: Organelle Databases To efficiently utilize the growing body of available protein localization data, we have developed Organelle DB, a web-accessible database cataloging more than 25,000 proteins from nearly 60 organelles, subcellular structures and protein complexes in 154 organisms spanning the eukaryotic kingdom. Organelle DB is the first on-line resource devoted to the identification and presentation of eukaryotic proteins localized to organelles and subcellular structures. As such, Organelle DB is a strong resource of data from the human proteome as well as from the major model organisms Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster, Caenorhabditis elegans and Mus musculus. In particular, Organelle DB is a central repository of yeast data, incorporating results--and actual fluorescent imagesfrom ongoing large-scale studies of protein localization in S.cerevisiae. Each protein in Organelle DB is presented with its sequence and, as available, a detailed description of its function; functions were extracted from relevant model organism databases, and links to these databases are provided within Organelle DB. To facilitate data interoperability, we have annotated all protein localizations using vocabulary from the Gene Ontology consortium. We also welcome new data for inclusion in Organelle DB, which may be freely accessed at http://organelledb.lsi.umich.edu. Citation for the above abstract: Wiwatwattana, Nuwee, Kumar, Anuj Organelle DB: a cross-species database of protein localization and function Nucl. Acids Res. 2005 33: D598-604 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D598 |
| 415. NCBI Organelle Genome Resources |
URL: http://www.ncbi.nlm.nih.gov/genomes/ORGANELLES/organelles.html Categories: Organelle Databases "This collection of eukaryotic organelle complete genome sequences is a part of the NCBI Reference Sequence (RefSeq) project that provides curated sequence data and related information for the community to use as a standard. At present, only the animal (metazoan) mitochondrial sequences are considered "reviewed", that is, they have been manually curated by the NCBI staff. Other mitochondrial and chloroplast sequences are "provisional" and are presented as found in the source GenBank records. In addition to providing a list of complete mitochondrial and plastid genomes, this site also presents tools that can be used to analyze these sequences. The organisms from which the organelles derive are presented in a taxonomic hierarchy built from the NCBI Taxonomy database. The following resources are available for the "reviewed" reference sequences. We will integrate other reference genomes into these resources once they have been reviewed." |
| 416. Arabidopsis thaliana Chloroplast Protein Database |
URL: http://www.pb.ipw.biol.ethz.ch/index.php?toc=91 Categories: Arabidopsis thaliana Databases, Organelle Databases BACKGROUND: Chloroplasts are plant cell organelles of cyanobacterial origin. They perform essential metabolic and biosynthetic functions of global significance, including photosynthesis and amino acid biosynthesis. Most of the proteins that constitute the functional chloroplast are encoded in the nuclear genome and imported into the chloroplast after translation in the cytosol. Since protein targeting is difficult to predict, many nuclear-encoded plastid proteins are still to be discovered. RESULTS: By tandem mass spectrometry, we identified 690 different proteins from purified Arabidopsis chloroplasts. Most proteins could be assigned to known protein complexes and metabolic pathways, but more than 30% of the proteins have unknown functions, and many are not predicted to localize to the chloroplast. Novel structure and function prediction methods provided more informative annotations for proteins of unknown functions. While near-complete protein coverage was accomplished for key chloroplast pathways such as carbon fixation and photosynthesis, fewer proteins were identified from pathways that are downregulated in the light. Parallel RNA profiling revealed a pathway-dependent correlation between transcript and relative protein abundance, suggesting gene regulation at different levels. CONCLUSIONS: The chloroplast proteome contains many proteins that are of unknown function and not predicted to localize to the chloroplast. Expression of nuclear-encoded chloroplast genes is regulated at multiple levels in a pathway-dependent context. The combined shotgun proteomics and RNA profiling approach is of high potential value to predict metabolic pathway prevalence and to define regulatory levels of gene expression on a pathway scale. Citation for the above abstract: Kleffmann T, Russenberger D, von Zychlinski A, Christopher W, Sjolander K, Gruissem W, Baginsky S. The Arabidopsis thaliana chloroplast proteome reveals pathway abundance and novel protein functions. Curr Biol. 2004 Mar 9;14(5):354-62. © 2004 Elsevier B.V. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15028209 |
| 417. HCV Sequence Database |
URL: http://hcv.lanl.gov/content/hcv-db/index Categories: Viral Databases MOTIVATION: The hepatitis C virus (HCV) is a significant threat to public health worldwide. The virus is highly variable and evolves rapidly, making it an elusive target for the immune system and for vaccine and drug design. At present, some 30 000 HCV sequences have been published. A central website that provides annotated sequences and analysis tools will be helpful to HCV scientists worldwide. RESULTS: The HCV sequence database collects and annotates sequence data and provides them to the public via a website that contains a user-friendly search interface and a large number of sequence analysis tools, based on the model of the highly regarded Los Alamos HIV database. The HCV sequence database was officially launched in September 2003. Since then, its usage has steadily increased and is now at an average of approximately 280 visits per day from distinct IP addresses. AVAILABILITY: The HCV website can be accessed via http://hcv.lanl.gov and http://hcv-db.org CONTACT: hcv-info@lanl.gov. Citation for the above abstract: Carla Kuiken , Karina Yusim , Laura Boykin , and Russell Richardson The Los Alamos hepatitis C sequence database Bioinformatics Advance Access published on February 1, 2005, DOI 10.1093/bioinformatics/bth485. Bioinformatics 21: 379-384. © 2005 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/21/3/379 |
| 418. HCVDB: Hepatitis C Virus Database |
URL: http://hepatitis.ibcp.fr/ Categories: Viral Databases "The aim of HCVDB is to establish correlations between virus sequences and pathology." |
| 419. HIVdb: Stanford HIV Drug Resistance Database |
URL: http://hivdb.stanford.edu/ Categories: HIV/AIDS Databases, Individual Protein Family Databases, Viral Databases The HIV reverse transcriptase and protease sequence database is an on-line relational database that catalogues evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of antiretroviral therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions to GenBank, sequences published in journal articles and sequences of HIV isolates from persons participating in clinical trials. Sequences are linked to data about the source of the sequence, the antiretroviral drug treatment history of the person from whom the sequence was obtained and the results of in vitro drug susceptibility testing. Sequence data on two new molecular targets of HIV drug therapy--gp41 (cell fusion) and integrase--will be added to the database in 2003. Citation for the above abstract: Rhee, Soo-Yon, Gonzales, Matthew J., Kantor, Rami, Betts, Bradley J., Ravela, Jaideep, Shafer, Robert W. Human immunodeficiency virus reverse transcriptase and protease sequence database Nucl. Acids Res. 2003 31: 298-303 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/298 |
| 420. NCBI Viral Genomes Resource |
URL: http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html Categories: Viral Databases The Viral Genomes Project aims to provide molecular standards for viral genomic research. The project has produced over 1,600 records for more than 1,200 different species. The National Center for Biotechnology Information (NCBI) provides access to this data through the Entrez search and retrieval engine and offers visualization of the sequence information at various levels of detail. Taxonomically organized displays, precomputed sequence comparison data, and direct access to analytical tools provide researchers with the ability to analyze and compare viral genomes and proteomes in a fast and convenient manner. The Viral Genomes Project is a collaborative effort between NCBI staff and many dedicated scientists worldwide. The URL for the database is http://www.ncbi.nlm.nih.gov/genomes/VIRUSES/viruses.html. Citation for the above excerpt: Bao, Yiming, Federhen, Scott, Leipe, Detlef, Pham, Vyvy, Resenchuk, Sergei, Rozanov, Mikhail, Tatusov, Roman, Tatusova, Tatiana National Center for Biotechnology Information Viral Genomes Project J. Virol. 2004 78: 7291-7298 © 2004 American Society for Microbiology. The full text of the article can be found at: http://jvi.asm.org/cgi/content/full/78/14/7291?view=full&pmid=15220402 |
| 421. Poxvirus Bioinformatics Resource Center |
URL: http://www.poxvirus.org/ Categories: Viral Databases The Poxvirus Bioinformatics Resource Center (PBRC) has been established to provide informational and analytical resources to the scientific community to aid research directed at providing a better understanding of the Poxviridae family of viruses. The PBRC was specifically established as the result of the concern that variola virus, the causative agent of smallpox, as well as related viruses, might be utilized as biological weapons. In addition, the PBRC supports research on poxviruses that might be considered new and emerging infectious agents such as monkeypox virus. The PBRC consists of a relational database and web application that supports the data storage, annotation, analysis and information exchange goals of the project. The current release consists of over 35 complete genomic sequences of various genera, species and strains of viruses from the Poxviridae family. Sequence and annotation information for these viruses has been obtained from sequences publicly available from GenBank as well as sequences not yet deposited in GenBank that have been obtained from ongoing sequencing projects. In addition to sequence data, the PBRC provides comprehensive annotation and curation of virus genes; analytical tools to aid in the understanding of the available sequence data, including tools for the comparative analysis of different virus isolates; and visualization tools to help better display the results of various analyses. The PBRC represents the initial development of what will become a more comprehensive Viral Bioinformatics Resource Center for Biodefense that will be one of the National Institute of Allergy and Infectious Diseases' 'Bioinformatics Resource Centers for Biodefense and Emerging or Re-Emerging Infectious Diseases'. The PBRC website is available at http://www.poxvirus.org. Citation for the above abstract: Lefkowitz, Elliot J., Upton, Chris, Changayil, Shankar S., Buck, Charles, Traktman, Paula, Buller, R. Mark L. Poxvirus Bioinformatics Resource Center: a comprehensive Poxviridae informational and analytical resource Nucl. Acids Res. 2005 33: D311-316 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D311 |
| 422. T4-like Genome Database |
URL: http://phage.bioc.tulane.edu/ Categories: Viral Databases "Interactive browsing of completed phage genomes is available using Lincoln Stein's Generic Genome Browser program. The browser allows users to scan the genome for particular features and to download sequence information plus analyses of those features. Views of the genome are generated showing named genes, BLAST similarities to other phages, predicted tRNAs, terminators and introns. Links are provided from the browser to gene-specific pages, with DNA sequence, protein sequence, protein statistics, sequence alignment to T4 orthologs, hydropathy plots and Pfam protein domain matches." |
| 423. VIDA |
URL: http://www.biochem.ucl.ac.uk/bsm/virus_database/VIDA.html Categories: Viral Databases VIDA is a new virus database that organizes open reading frames (ORFs) from partial and complete genomic sequences from animal viruses. Currently VIDA includes all sequences from GenBank for Herpesviridae, Coronaviridae and Arteriviridae. The ORFs are organized into homologous protein families, which are identified on the basis of sequence similarity relationships. Conserved sequence regions of potential functional importance are identified and can be retrieved as sequence alignments. We use a controlled taxonomical and functional classification for all the proteins and protein families in the database. When available, protein structures that are related to the families have also been included. The database is available for online search and sequence information retrieval at http://www.biochem.ucl.ac.uk/bsm/virus_database/ VIDA.html. Citation for the above abstract: Alba, M. Mar, Lee, David, Pearl, Frances M. G., Shepherd, Adrian J., Martin, Nigel, Orengo, Christine A., Kellam, Paul VIDA: a virus database system for the organization of animal virus genome open reading frames Nucl. Acids Res. 2001 29: 133-136 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/133 |
| 424. VIPERdb: Virus Particle Explorer |
URL: http://viperdb.scripps.edu/ Categories: Viral Databases VIPERdb (http://viperdb.scripps.edu) is a database for icosahedral virus capsid structures. Our aim is to provide a comprehensive resource specific to the needs of the structural virology community, with an emphasis on the description and comparison of derived data from structural and energetic analyses of capsids. A relational database implementation based on a schema for macromolecular structure makes the data highly accessible to the user, allowing detailed queries at the atomic level. Together with curation practices that maintain data uniformity, this will facilitate structural bioinformatics studies of virus capsids. User friendly search, visualization and educational tools on the website allow both structural and derived data to be examined easily and extensively. Links to relevant literature, sequence and taxonomy databases are provided for each entry. Citation for the above abstract: Shepherd, Craig M., Borelli, Ian A., Lander, Gabriel, Natarajan, Padmaja, Siddavanahalli, Vinay, Bajaj, Chandrajit, Johnson, John E., Brooks, Charles L., III, Reddy, Vijay S. VIPERdb: a relational database for structural virology Nucl. Acids Res. 2006 34: D386-389 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D386 |
| 425. Megx.net: database resources for Marine Ecological Genomix |
URL: http://www.megx.net/ Categories: General Genomics Databases Marine microbial genomics and metagenomics is an emerging field in environmental research. Since the completion of the first marine bacterial genome in 2003, the number of fully sequenced marine bacteria has grown rapidly. Concurrently, marine metagenomics studies are performed on a regular basis, and the resulting number of sequences is growing exponentially. To address environmentally relevant questions like organismal adaptations to oceanic provinces and regional differences in the microbial cycling of nutrients, it is necessary to couple sequence data with geographical information and supplement them with contextual information like physical, chemical and biological data. Therefore, new specialized databases are needed to organize and standardize data storage as well as centralize data access and interpretation. We introduce Megx.net, a set of databases and tools that handle genomic and metagenomic sequences in their environmental contexts. Megx.net includes (i) a geographic information system to systematically store and analyse marine genomic and metagenomic data in conjunction with contextual information; (ii) an environmental genome browser with fast search functionalities; (iii) a database with precomputed analyses for selected complete genomes; and (iv) a database and tool to classify metagenomic fragments based on oligonucleotide signatures. These integrative databases and webserver will help researchers to generate a better understanding of the functioning of marine ecosystems. All resources are freely accessible at http://www.megx.net. Citation for the above abstract: Lombardot, Thierry, Kottmann, Renzo, Pfeffer, Hauke, Richter, Michael, Teeling, Hanno, Quast, Christian, Glockner, Frank Oliver Megx.net--database resources for marine ecological genomics Nucl. Acids Res. 2006 34: D390-393 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D390 |
| 426. VirGen |
URL: http://202.41.70.51/virgen/virgen.html Categories: Viral Databases VirGen is a comprehensive viral genome resource that organizes the 'sequence space' of viral genomes in a structured fashion. It has been developed with the objective of serving as an annotated and curated database comprising complete genome sequences of viruses, value-added derived data and data mining tools. The current release (v1.1) contains 559 complete genomes in addition to 287 putative genomes of viruses belonging to eight viral families for which the host range includes animals and plants. Viral genomes in VirGen are annotated using sequence-based Bioinformatics approaches. The genomic data is also curated to identify 'alternate names' of viral proteins, where available. VirGen archives the results of comparisons of genomes, proteomes and individual proteins within and between viral species. It is the first resource to provide phylogenetic trees of viral species computed using whole-genome sequence data. The module of predicted B-cell antigenic determinants in VirGen is an attempt to link the genome to its vaccinome. Comparative genome analysis data facilitate the study of genome organization and evolution of viruses, which would have implications in applied research to identify candidates for the design of vaccines and antiviral drugs. VirGen is a relational database and is available at http://bioinfo. ernet.in/virgen/virgen.html. Citation for the above abstract: Kulkarni-Kale, Urmila, Bhosle, Shriram, Manjari, G. Sunitha, Kolaskar, A. S. VirGen: a comprehensive viral genome resource Nucl. Acids Res. 2004 32: D289-292 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D289 |
| 427. PDSP: NIMH Psychoactive Drug Screening Program |
URL: http://pdsp.cwru.edu/ Categories: Drug and Drug Design Databases Because psychoactive plants exert profound effects on human perception, emotion, and cognition, discovering the molecular mechanisms responsible for psychoactive plant actions will likely yield insights into the molecular underpinnings of human consciousness. Additionally, it is likely that elucidation of the molecular targets responsible for psychoactive drug actions will yield validated targets for CNS drug discovery. This review article focuses on an unbiased, discovery-based approach aimed at uncovering the molecular targets responsible for psychoactive drug actions wherein the main active ingredients of psychoactive plants are screened at the "receptorome" (that portion of the proteome encoding receptors). An overview of the receptorome is given and various in silico, public-domain resources are described. Newly developed tools for the in silico mining of data derived from the National Institute of Mental Health Psychoactive Drug Screening Program's (NIMH-PDSP) K(i) Database (K(i) DB) are described in detail. Additionally, three case studies aimed at discovering the molecular targets responsible for Hypericum perforatum, Salvia divinorum, and Ephedra sinica actions are presented. Finally, recommendations are made for future studies. Citation for the above abstract: Roth BL, Lopez E, Beischel S, Westkaemper RB, Evans JM. Screening the receptorome to discover the molecular targets for plant-derived psychoactive compounds: a novel approach for CNS drug discovery. Pharmacol Ther. 2004 May;102(2):99-110. © 2004 Elsevier B.V. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15163592 |
| 428. Allen Brain Atlas |
URL: http://www.brain-map.org Categories: Neuroscience Databases Automated in situ hybridization data will be generated for the entire mouse transcriptome—the full complement of activated genes in a particular tissue at a particular time—on a genome-wide scale. The mouse is a young adult at 56 days old, free from the confounding factors of development. After it is sacrificed, the mouse brain is immediately frozen, then sliced very thinly—to get forty sections from each millimeter of thickness—so that the probes for hybridization can expose gene expression in individual cells. ... The in situ data will be matched in a three-dimensional framework to the reference atlas developed by Allen's team. The resulting images will be turned into a virtual microscope, allowing users to focus down on genes expressed in regions of interest. While the Allen Brain Atlas is somewhat like other genomics projects in scale, it is unique. “No one's gone into a 3-D structure like a tissue and examined it in a systematic way,” says Allan Jones, senior director of Allen Brain Atlas Operations. In that way, he adds, it's a much richer dataset than the Human Genome Project. “As we're ramping up—fully by spring of next year—we'll be generating about 1,000 microscope slides a day with four mouse brain sections on each slide,” says Jones. Each day, those sections are scanned and stitched together electronically into 300-megabyte batches. “Scaling up a lab process is currently the biggest challenge,” says Jones. “In effect, we're turning an art form into something that gives high-quality data day in and day out,” says Jones. “If you are off slightly when cutting a 2-D brain slice, it becomes very difficult to map back into a 3-D context.” Citation for the above excerpt: Gewin V (2005) A Golden Age of Brain Exploration. PLoS Biol 3(1): e24. © 2005 Virginia Gewin The full text of the article can be found at: http://www.plosbiology.org/plosonline/?request=get-document&doi=10.1371/journal.pbio.0030024 |
| 429. National Cancer Institute 3D Structure Database |
URL: http://dtp.nci.nih.gov/docs/3d_database/dis3d.html Categories: 3D Molecular Structures, Drug and Drug Design Databases A searcheable database of three-dimensional structures has been developed from the chemistry database of the NCI Drug Information System (DIS), a file of about 450,000 primarily organic compounds which have been tested by NCI for anticancer activity. The DIS database is very similar in size and content to the proprietary databases used in the pharmaceutical industry; its development began in the 1950s; and this history led to a number of problems in the generation of 3D structures. Citation for the above abstract: Milne GW, Nicklaus MC, Driscoll JS, Wang S, Zaharevitz D. National Cancer Institute Drug Information System 3D database. J Chem Inf Comput Sci. 1994 Sep-Oct;34(5):1219-24. © 1994 American Chemical Society. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=7962217 |
| 430. Drug ADME Associated Protein Database |
URL: http://xin.cz3.nus.edu.sg/group/admeap/admeap.asp Categories: Drug and Drug Design Databases, Individual Protein Family Databases Drug absorption, distribution, metabolism and excretion (ADME) often involve interaction of a drug with specific proteins. Knowledge about these ADME-associated proteins is important in facilitating the study of the molecular mechanism of disposition and individual response as well as therapeutic action of drugs. It is also useful in the development and testing of pharmacokinetics prediction tools. Several databases describing specific classes of ADME-associated proteins have appeared. A new database, ADME-associated proteins (ADME-AP), is introduced to provide comprehensive information about all classes of ADME-associated proteins described in the literature including physiological function of each protein, pharmacokinetic effect, ADME classification, direction and driving force of disposition, location and tissue distribution, substrates, synonyms, gene name and protein availability in other species. Cross-links to other databases are also provided to facilitate the access of information about the sequence, 3D structure, function, polymorphisms, genetic disorders, nomenclature, ligand binding properties and related literatures of each protein. ADME-AP currently contains entries for 321 proteins and 964 substrates. Citation for the above abstract: L. Z. Sun , Z. L. Ji , X. Chen , J. F. Wang , and Y. Z. Chen ADME-AP: a database of ADME associated proteins Bioinformatics 18: 1699-1700. © 2002 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/18/12/1699 |
| 431. 3DPSD: 3D Pharmaceutical Structure Database |
URL: http://www.ps.toyaku.ac.jp/dobashi/3dpsd/index.htm Categories: 3D Molecular Structures, Drug and Drug Design Databases "Here you will find a comprehensive listing of the 3D pharmaceutical structures compiled according to the Drug in Japan ethical drugs 2001, 24th edition (Edited by Japan pharmaceutical information center, Tokyo, Japan). The Japanese ethical drugs contained in the Japanese Pharmacopeia, 13th edition are noted with an * (cf. the Physicians' Desk Reference (PDR) and the U.S. Pharmacopeia, National Formulary). The structures have been collected annually according to the pharmaceutical inserts presented by Iyakuhin Jyouhou Teikyou homepage (http://www.pharmasys.gr.jp). These pages have also been posted in Japanese. English pages are still under construction and many of our pages are still in Japanese." |
| 432. Chemicals with Pharmaceutical Activity: A 3D Structural Database |
URL: http://www.chem.ox.ac.uk/mom/chemical-database/ Categories: 3D Molecular Structures The Department of Chemistry at Oxford University hosts this database of 400 3D molecular structures. Also available is a feature called Molecules of the Month, which includes both 3D molecular structures and drug descriptions. |
| 433. CSD: Cambridge Structural Database |
URL: http://www.ccdc.cam.ac.uk/products/csd/ Categories: Small Molecule Structure Databases The Cambridge Structural Database (CSD) now contains data for more than a quarter of a million small-molecule crystal structures. The information content of the CSD, together with methods for data acquisition, processing and validation, are summarized, with particular emphasis on the chemical information added by CSD editors. Nearly 80% of new structural data arrives electronically, mostly in CIF format, and the CCDC acts as the official crystal structure data depository for 51 major journals. The CCDC now maintains both a CIF archive (more than 73,000 CIFs dating from 1996), as well as the distributed binary CSD archive; the availability of data in both archives is discussed. A statistical survey of the CSD is also presented and projections concerning future accession rates indicate that the CSD will contain at least 500,000 crystal structures by the year 2010. Citation for the above abstract: Allen FH. The Cambridge Structural Database: a quarter of a million crystal structures and rising. Acta Crystallogr B. 2002 Jun;58(Pt 3 Pt 1):380-8. Epub 2002 May 29. © 2002 International Union of Crystallography The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12037359 |
| 434. Protein Coil Library |
URL: http://www.roselab.jhu.edu/coil/ Categories: Protein Structure Databases Approximately half the structure of folded proteins is either alpha-helix or beta-strand. We have developed a convenient repository of all remaining structure after these two regular secondary structure elements are removed. The Protein Coil Library (http://roselab.jhu.edu/coil/) allows rapid and comprehensive access to non-alpha-helix and non-beta-strand fragments contained in the Protein Data Bank (PDB). The library contains both sequence and structure information together with calculated torsion angles for both the backbone and side chains. Several search options are implemented, including a query function that uses output from popular PDB-culling servers directly. Additionally, several popular searches are stored and updated for immediate access. The library is a useful tool for exploring conformational propensities, turn motifs, and a recent model of the unfolded state. Citation for the above abstract: Fitzkee NC, Fleming PJ, Rose GD. The Protein Coil Library: A structural database of nonhelix, nonstrand fragments derived from the PDB. Proteins. 2005 Jan 18; [Epub ahead of print] © 2005 Wiley-Liss, Inc. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15657933 |
| 435. Pharmabase: A Database of Cellular Physiology & Pharmacology |
URL: http://zeus.mbl.edu/public/BRC/subj.php?func=explode&myID=181 Categories: Drug and Drug Design Databases "This NIH funded database has been developed as a research tool, a resource for students, and an ongoing interactive forum on the use of pharmacological compounds in cellular research. Registered Pharmabase members will have access to detailed compound records with interactive features that include a secure personal notepad, a form to send comments to the editor, and an interactive forum screen shared by all Pharmabase members. Membership is free. Non-members will have access to detailed compound records without the interactive features. This is an evolving project designed for investigators to informally share their insight and experiences, which will ultimately enhance the database." |
| 436. IconBAZAAR - Molecules |
URL: http://www.iconbazaar.com/molecules/ Categories: 3D Molecular Structures Animated 3D molecular models are available for amino acids, carbohydrates, and drugs. |
| 437. Chemistry Molecular Models |
URL: http://www.uwsp.edu/chemistry/pdbs/ Categories: 3D Molecular Structures "This page will link you to an awesome collection of chemical structure files in PDB (Protein Data Bank) format that will be displayed in 3D by a plug-in developed by MDL Information Systems Inc. called Chemscape Chime." |
| 438. HIC-Up: Hetero-Compound Information Centre |
URL: http://alpha2.bmc.uu.se/hicup/index.html Categories: 3D Molecular Structures, Small Molecule Structure Databases "... a freely accessible resource for structural biologists who are dealing dealing with hetero-compounds ('small molecules'). This service is provided and maintained by Gerard Kleywegt at the Department of Cell and Molecular Biology, Uppsala University. This site contains information about hetero-compounds encountered in files from the Protein Data Bank (PDB). It is updated a few times a year." |
| 439. IMB Jena Image Library of Biological Macromolecules |
URL: http://www.imb-jena.de/IMAGE.html Categories: 3D Molecular Structures, Protein Structure Databases The IMB Jena Image Library of Biological Macromolecules (http://www.imb-jena.de/IMAGE.html) is aimed at a better dissemination of information on three-dimensional biopolymer structures with an emphasis on visualization and analysis. It provides access to all structure entries deposited at the Protein Data Bank (PDB) and Nucleic Acid Database (NDB). In addition, basic information on the architecture of biological macromolecules is offered. Recent developments include a site database and an analysis tool that identifies all residues surrounding hetero components or sites according to geometrical criteria. This enables one to search for all structures with a certain pattern of amino acids/nucleotides/water adjacent to hetero components or sites. A new PDB/SWISS-PROT cross-reference database combines information from both PDB and SWISS-PROT, thus providing significantly more cross-references than either PDB or SWISS-PROT. The existing brief descriptions of X-ray, NMR and FTIR methods for structure determination are supplemented by information on circular dichroism. Citation for the above abstract: Reichert, Jan, Suhnel, Jurgen The IMB Jena Image Library of Biological Macromolecules: 2002 update Nucl. Acids Res. 2002 30: 253-254 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/253 |
| 440. Molecular Expressions |
URL: http://micro.magnet.fsu.edu/index.html Categories: Microscopy Our knowledge of the structure, dynamics and physiology of a cell has increased significantly in the last ten years through the emergence of new optical imaging modalities such as optical sectioning microscopy, computer- enhanced video microscopy and laser-scanning microscopy. These techniques together with the use of genetically engineered fluorophores have helped scientists visualize the 3-dimensional dynamic processes of living cells. However as powerful as these imaging tools are, they can often be difficult to understand and fully utilize. Below I will discuss my favorite website: The Molecular Expressions Web Site that endeavors to present the power of microscopy to its visitors. The Molecular Expressions group does a remarkable job of not only clearly presenting the principles behind these techniques in a manner approachable by lay and scientific audiences alike but also provides representative data from each as well. Citation for the above abstract: Eliceiri KW. Molecular expressions: exploring the world of optics and microscopy. http://microscopy.fsu.edu. Biol Cell. 2004 Aug;96(6):403-5. © 2004 Elsevier B.V. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15325069 |
| 441. Molecular Models for Biochemistry at CMU |
URL: http://www.bio.cmu.edu/Courses/BiochemMols/BCMolecules.html Categories: 3D Molecular Structures "The linked pages described here have tutorials and quizzes that are based on Chime and RasMol images of the molecules and macromolecules found in biochemistry. They are intended to complement standard biochemistry texts where more explanation is provided, but where interactive 3-D images of the molecules are not available." |
| 442. OMM: The Online Macromolecular Museum |
URL: http://www.clunet.edu/BioDev/omm/gallery.htm Categories: 3D Molecular Structures "The Online Macromolecular Museum (OMM) is a site for the display and study of macromolecules. Macromolecular structures, as discovered by crystallographic or NMR methods, are scientific objects in much the same sense as fossil bones or dried specimens: they can be archived, studied, and displayed in aesthetically pleasing, educational exhibits. Hence, a museum seems an appropriate designation for the collection of displays that we are assembling. The OMM's exhibits are interactive tutorials on individual molecules in which hypertextual explanations of important biochemical features are linked to illustrative renderings of the molecule at hand." |
| 443. Reciprocal Net |
URL: http://www.reciprocalnet.org/index.html Categories: 3D Molecular Structures "The Reciprocal Net project will construct and deploy a distributed, open, extensible digital collection of molecular structures. Associated with the collection will be software tools for visualizing, interacting with, and rendering printable images of the contents; software for the automated conversion of local database representations into standard formats which can be globally shared; tools and components for constructing educational modules based on the collection; and examples of such modules as the beginning of a public repository for educational materials based on the collection. The contents of this collection will come principally from structures contributed by participating crystallography laboratories, thus providing a means for teachers, students, and the general public to connect better with current chemistry research. The Reciprocal Net's emphasis is on obtaining structures of general interest and usefulness to those several classes of digital library users. The collection will be fully integrated into the emerging NSDL framework, constituting a resource of outstanding value for education at every level." |
| 444. Smells Database |
URL: http://mc2.cchem.berkeley.edu/Smells/index.html Categories: 3D Molecular Structures A database of odiferous chemicals hosted by Berkeley University. |
| 445. ChemFinder.Com |
URL: http://chemfinder.cambridgesoft.com/ Categories: 3D Molecular Structures "ChemFinder has been providing free chemical searching to hundreds of thousands of scientists since 1995. This free database includes:
|
| 446. Human Mitochondrial Protein Database |
URL: http://bioinfo.nist.gov:8080/examples/servlets/index.html Categories: Mitochondrial Genes and Proteins Databases "The Human Mitochondrial Protein Database (HMPDb) provides comprehensive data on mitochondrial and human nuclear encoded proteins involved in mitochondrial biogenesis and function. This database consolidates information from SwissProt, LocusLink, Protein Data Bank (PDB), GenBank, Genome Database (GDB), Online Mendelian Inheritance in Man (OMIM), Human Mitochondrial Genome Database (mtDB), MITOMAP, Neuromuscular Disease Center and Human 2-D PAGE Databases. The mitochondrion plays a central role in cellular metabolism, and evidence of mitochondrial involvement in a number of different human diseases is increasing. This database is intended as a tool not only to aid in studying the mitochondrion but in studying the associated diseases." |
| 447. PDB: Protein Data Bank |
URL: http://www.rcsb.org/pdb/ Categories: Protein Structure Databases The RCSB Protein Data Bank (PDB) offers online tools, summary reports and target information related to the worldwide structural genomics initiatives from its portal at http://sg.pdb.org. There are currently three components to this site: Structural Genomics Initiatives contains information and links on each structural genomics site, including progress reports, target lists, target status, targets in the PDB and level of sequence redundancy; Targets provides combined target information, protocols and other data associated with protein structure determination; and Structures offers an assessment of the progress of structural genomics based on the functional coverage of the human genome by PDB structures, structural genomics targets and homology models. Functional coverage can be examined according to enzyme classification, gene ontology (biological process, cell component and molecular function) and disease. Citation for the above abstract: Kouranov, Andrei, Xie, Lei, de la Cruz, Joanna, Chen, Li, Westbrook, John, Bourne, Philip E., Berman, Helen M. The RCSB PDB information portal for structural genomics Nucl. Acids Res. 2006 34: D302-305 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D302 |
| 448. PDB-REPRDB: Representative protein chains from PDB |
URL: http://mbs.cbrc.jp/pdbreprdb-cgi/reprdb_menu.pl Categories: Protein Structure Databases PDB-REPRDB is a database of representative protein chains from the Protein Data Bank (PDB). Started at the Real World Computing Partnership (RWCP) in August 1997, it developed to the present system of PDB-REPRDB. In April 2001, the system was moved to the Computational Biology Research Center (CBRC), National Institute of Advanced Industrial Science and Technology (AIST) (http://www.cbrc.jp/); it is available at http://www.cbrc.jp/pdbreprdb/. The current database includes 33 368 protein chains from 16 682 PDB entries (1 September, 2002), from which are excluded (a) DNA and RNA data, (b) theoretically modeled data, (c) short chains (1<40 residues), or (d) data with non-standard amino acid residues at all residues. The number of entries including membrane protein structures in the PDB has increased rapidly with determination of numbers of membrane protein structures because of improved X-ray crystallography, NMR, and electron microscopic experimental techniques. Since many protein structure studies must address globular and membrane proteins separately, this new elimination factor, which excludes membrane protein chains, is introduced in the PDB-REPRDB system. Moreover, the PDB-REPRDB system for membrane protein chains begins at the same URL. The current membrane database includes 551 protein chains, including membrane domains in the SCOP database of release 1.59 (15 May, 2002). Citation for the above abstract: Noguchi, Tamotsu, Akiyama, Yutaka PDB-REPRDB: a database of representative protein chains from the Protein Data Bank (PDB) in 2003 Nucl. Acids Res. 2003 31: 492-493 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/492 |
| 449. Neuromuscular Disease Center Home Page |
URL: http://www.neuro.wustl.edu/neuromuscular/index.html Categories: Neuroscience Databases The Washington University School of Medicine at St. Louis, Missouri offers this substantial collection of information about neuromuscular diseases. Information about neuromuscular evaluations, antibody testing, and molecular and cellular biology is also provided. |
| 450. TCDB: Transport Classification Database |
URL: http://www.tcdb.org/ Categories: General Protein Sequence Databases, Protein Domain and Protein Classification Databases The Transporter Classification Database (TCDB) is a web accessible, curated, relational database containing sequence, classification, structural, functional and evolutionary information about transport systems from a variety of living organisms. TCDB is a curated repository for factual information compiled from >10,000 references, encompassing approximately 3000 representative transporters and putative transporters, classified into >400 families. The transporter classification (TC) system is an International Union of Biochemistry and Molecular Biology approved system of nomenclature for transport protein classification. TCDB is freely accessible at http://www.tcdb.org. The web interface provides several different methods for accessing the data, including step-by-step access to hierarchical classification, direct search by sequence or TC number and full-text searching. The functional ontology that underlies the database structure facilitates powerful query searches that yield valuable data in a quick and easy way. The TCDB website also offers several tools specifically designed for analyzing the unique characteristics of transport proteins. TCDB not only provides curated information and a tool for classifying newly identified membrane proteins, but also serves as a genome transporter-annotation tool. Citation for the above abstract: Saier, Milton H., Jr, Tran, Can V., Barabote, Ravi D. TCDB: the Transporter Classification Database for membrane transport protein analyses and information Nucl. Acids Res. 2006 34: D181-186 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D181 |
| 451. Visual Elements Periodic Table |
URL: http://www.chemsoc.org/viselements/pages/pertable_j.htm Categories: General Chemistry Databases "Visual Elements is an arts and science collaborative project supported by the Royal Society of Chemistry which aims to explore and reflect upon the diversity of elements that comprise matter in as unique and innovative manner as possible. Visual Elements aims to produce a new and vibrant visual assessment of the startling diversity of material that constitutes the world in which we live, not simply by rendering images of the respective elements but also by investigating the manner in which they affect our daily lives in largely unseen and often unexpected ways." |
| 452. PubMed |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi Categories: MEDLINE Interfaces "PubMed, available via the NCBI Entrez retrieval system, was developed by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM), located at the National Institutes of Health (NIH). Entrez is the text-based search and retrieval system used at NCBI for services including PubMed, Nucleotide and Protein Sequences, Protein Structures, Complete Genomes, Taxonomy, OMIM, and many others. PubMed was designed to provide access to citations from biomedical literature. LinkOut provides access to full-text articles at journal Web sites and other related Web resources. PubMed also provides access and links to the other Entrez molecular biology resources. Publishers participating in PubMed electronically submit their citations to NCBI prior to or at the time of publication. If the publisher has a web site that offers full-text of its journals, PubMed provides links to that site as well as biological resources, consumer health information, research tools, and more. There may be a charge to access the text or information.
In addition, PubMed provides a Batch Citation Matcher, which allows users to match their citations to PubMed citations using bibliographic information such as journal, volume, issue, page number, and year." |
| 453. PubCrawler |
URL: http://pubcrawler.gen.tcd.ie/ Categories: MEDLINE Interfaces The free PubCrawler web service (http://www.pubcrawler.ie) has been operating for five years and so far has brought literature and sequence updates to over 22 000 users. It provides information on a personalized web page whenever new articles appear in PubMed or when new sequences are found in GenBank that are specific to customized queries. The server also acts as an automatic alerting system by sending out short notifications or emails with the latest updates as soon as they become available. A new output format and more flexibility for the email formatting help PubCrawler cope with increasing challenges arising from browser incompatibilities and mail filters, therefore making it suitable for a wide range of users. Citation for the above abstract: Hokamp, Karsten, Wolfe, Kenneth H. PubCrawler: keeping up comfortably with PubMed and GenBank Nucl. Acids Res. 2004 32: W16-19 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_2/W16 |
| 454. BioMail |
URL: http://www.biomail.org/ Categories: MEDLINE Interfaces "BioMail regularly (weekly by default) searches for articles, which have recently appeared in the PubMed® MEDLINE® database, using customized search terms. Then it emails lists of the found articles to the user." |
| 455. AMEDEO, The Medical Literature Guide |
URL: http://amedeo.com/ Categories: Scientific Literature Databases and Services "AMEDEO has been created to serve the needs of healthcare professionals, including physicians, nurses, pharmacists, administrators, other members of the health professions, and patients and their friends. They can easily access timely, relevant information within their respective fields. AMEDEO’s core components include weekly emails with bibliographic lists about new scientific publications, personal Web pages for one-time download of available abstracts (see example), and an overview of the medical literature published in relevant journals over the past 12 to 24 months. All these new information resources are free of charge." |
| 456. DailyUpdates: Breaking PubMed entries for the Drug Discovery Community |
URL: http://www.leaddiscovery.co.uk/PubMed-dailyupdates.html Categories: MEDLINE Interfaces "DailyUpdates was launched in 2002 by the pharmaceutical analysts LeadDiscovery and represents the first "newsfeed" that identifies breaking scientific publications with drug development potential. Each week PubMed lists over 100,000 new scientific articles. Using proprietary data retrieval strategies LeadDiscovery selects 2,500 of these PubMed articles to be scanned by an industrial research panel for publications judged to be of greatest benefit for the drug development sector. Selected publications are delivered through the DailyUpdates newsfeeds and DailyUpdates e-mail alerts. Our basic service is available free of charge [register here]; we are also happy for third party websites to link through to this page (on request our applet can also be incorporated onto third party websites). A Premium Package is also provided that focuses on specific therapeutic areas [more]" |
| 457. PubMatrix |
URL: http://pubmatrix.grc.nia.nih.gov/ Categories: MEDLINE Interfaces BACKGROUND: Molecular experiments using multiplex strategies such as cDNA microarrays or proteomic approaches generate large datasets requiring biological interpretation. Text based data mining tools have recently been developed to query large biological datasets of this type of data. PubMatrix is a web-based tool that allows simple text based mining of the NCBI literature search service PubMed using any two lists of keywords terms, resulting in a frequency matrix of term co-occurrence. RESULTS: For example, a simple term selection procedure allows automatic pair-wise comparisons of approximately 1-100 search terms versus approximately 1-10 modifier terms, resulting in up to 1,000 pair wise comparisons. The matrix table of pair-wise comparisons can then be surveyed, queried individually, and archived. Lists of keywords can include any terms currently capable of being searched in PubMed. In the context of cDNA microarray studies, this may be used for the annotation of gene lists from clusters of genes that are expressed coordinately. An associated PubMatrix public archive provides previous searches using common useful lists of keyword terms. CONCLUSIONS: In this way, lists of terms, such as gene names, or functional assignments can be assigned genetic, biological, or clinical relevance in a rapid flexible systematic fashion. http://pubmatrix.grc.nia.nih.gov/ Citation for the above abstract: Becker KG, Hosack DA, Dennis G Jr, Lempicki RA, Bright TJ, Cheadle C, Engel J. PubMatrix: a tool for multiplex literature mining. BMC Bioinformatics. 2003 Dec 10;4(1):61. © 2003 Becker et al. The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/4/61 |
| 458. XplorMed: eXploring Medline abstracts |
URL: http://www.ogic.ca/projects/xplormed/ Categories: MEDLINE Interfaces As scientific literature databases like MEDLINE increase in size, so does the time required to search them. Scientists must frequently inspect long lists of references manually, often just reading the titles. XplorMed is a web tool that aids MEDLINE searching by summarizing the subjects contained in the results, thus allowing users to focus on subjects of interest. Here we describe new features added to XplorMed during the last 2 years (http://www.bork.embl-heidelberg.de/xplormed/). Citation for the above abstract: Perez-Iratxeta, Carolina, Perez, Antonio, Bork, Peer, Andrade, Miguel Update on XplorMed: a web server for exploring scientific literature Nucl. Acids Res. 2003 31: 3866-3868 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/13/3866 |
| 459. HighWire Press |
URL: http://highwire.stanford.edu/ Categories: Scientific Literature Databases and Services "Content - Full-text of 780 leading journals -- including 42 of the 100 most-frequently cited journals in the world -- available from the journals' own sites for complete and accurate representation of the research at its source including articles published online ahead of print; plus all of Medline -- with links to full text -- for the broadest coverage of biomedical science research." |
| 460. CANCERMondial Statistical Information System |
URL: http://www-dep.iarc.fr/ Categories: Cancer Databases, Public Health Databases "This website provides access to information on the occurrence of cancer world-wide held by the Descriptive Epidemiology Group (DEP) of IARC. ... Most of the information provided at this web site is based on original data collected by population-based cancer registries. The International Agency for Research on Cancer gratefully acknowledges the contribution of registries to cancer epidemiology and cancer control, through their efforts to ensure high standards of completeness and accuracy, and their willingness to share data in the interests of collaborative cancer research." |
| 461. BioRAT: A Search Engine and Information Extraction Tool for Biological Research |
URL: http://bioinf.cs.ucl.ac.uk/biorat/ Categories: Scientific Literature Databases and Services MOTIVATION: Converting the vast quantity of free-format text found in journals into a concise, structured format makes the researcher's quest for information easier. Recently, several information extraction systems have been developed that attempt to simplify the retrieval and analysis of biological and medical data. Most of this work has used the abstract alone, owing to the convenience of access and the quality of data. Abstracts are generally available through central collections with easy direct access (e.g. PubMed). The full-text papers contain more information, but are distributed across many locations (e.g. publishers' web sites, journal web sites and local repositories), making access more difficult. In this paper, we present BioRAT, a new information extraction (IE) tool, specifically designed to perform biomedical IE, and which is able to locate and analyse both abstracts and full-length papers. BioRAT is a Biological Research Assistant for Text mining, and incorporates a document search ability with domain-specific IE. RESULTS: We show first, that BioRAT performs as well as existing systems, when applied to abstracts; and second, that significantly more information is available to BioRAT through the full-length papers than via the abstracts alone. Typically, less than half of the available information is extracted from the abstract, with the majority coming from the body of each paper. Overall, BioRAT recalled 20.31% of the target facts from the abstracts with 55.07% precision, and achieved 43.6% recall with 51.25% precision on full-length papers. Citation for the above abstract: David P. A. Corney , Bernard F. Buxton , William B. Langdon , and David T. Jones BioRAT: extracting biological information from full-length papers Bioinformatics Advance Access published on November 22, 2004, DOI 10.1093/bioinformatics/bth386. Bioinformatics 20: 3206-3213. © 2004 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/17/3206 |
| 462. HubMed |
URL: http://www.hubmed.org/ Categories: MEDLINE Interfaces "An alternative interface to the PubMed medical literature database" |
| 463. Vivisimo Document Clustering |
URL: http://vivisimo.com/ Categories: MEDLINE Interfaces Vivisimo's document clustering feature can be applied to MEDLINE searches. |
| 464. Scirus |
URL: http://www.scirus.com/srsapp/ Categories: Scientific Literature Databases and Services "Scirus is the most comprehensive science-specific search engine on the Internet. Driven by the latest search engine technology, Scirus searches over 167 million science-specific Web pages ..." |
| 465. BacMap: Bacterial Genome Atlas |
URL: http://wishart.biology.ualberta.ca/BacMap/ Categories: General Genomics Databases BacMap is an interactive visual database containing fully labeled, zoomable and searchable chromosome maps from more than 170 bacterial (archaebacterial and eubacterial) species. It uses a recently developed visualization tool (CGView) to generate high-resolution circular genome maps from sequence feature information. Each map includes an interface that allows the image to be expanded and rotated. In the default view, identified genes are drawn to scale and colored according to coding directions. When a region of interest is expanded, gene labels are displayed. Each label is hyperlinked to a custom 'gene card' which provides several fields of information concerning the corresponding DNA and protein sequences. Each genome map is searchable via a local BLAST search and a gene name/synonym search. BacMap is freely available at http://wishart.biology.ualberta.ca/BacMap/. Citation for the above abstract: Stothard, Paul, Van Domselaar, Gary, Shrivastava, Savita, Guo, Anchi, O'Neill, Brian, Cruz, Joseph, Ellison, Michael, Wishart, David S. BacMap: an interactive picture atlas of annotated bacterial genomes Nucl. Acids Res. 2005 33: D317-320 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D317 |
| 466. COG: Clusters of Orthologous Groups of proteins |
URL: http://www.ncbi.nlm.nih.gov/COG/ Categories: General Genomics Databases, Protein Domain and Protein Classification Databases BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. Citation for the above abstract: Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003 Sep 11;4(1):41. © 2003 Tatusov et al. The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/4/41 |
| 467. PCOGR |
URL: http://www.uni-wh.de/pcogr Categories: General Genomics Databases BACKGROUND: The rapidly increasing number of completely sequenced genomes led to the establishment of the COG-database which, based on sequence homologies, assigns similar proteins from different organisms to clusters of orthologous groups (COGs). There are several bioinformatic studies that made use of this database to determine (hyper)thermophile-specific proteins by searching for COGs containing (almost) exclusively proteins from (hyper)thermophilic genomes. However, public software to perform individually definable group-specific searches is not available. RESULTS: The tool described here exactly fills this gap. The software is accessible at http://www.uni-wh.de/pcogr and is linked to the COG-database. The user can freely define two groups of organisms by selecting for each of the (current) 66 organisms to belong either to groupA, to the reference groupB or to be ignored by the algorithm. Then, for all COGs a specificity index is calculated with respect to the specificity to groupA, i. e. high scoring COGs contain proteins from the most of groupA organisms while proteins from the most organisms assigned to groupB are absent. In addition to ranking all COGs according to the user defined specificity criteria, a graphical visualization shows the distribution of all COGs by displaying their abundance as a function of their specificity indexes. CONCLUSIONS: This software allows detecting COGs specific to a predefined group of organisms. All COGs are ranked in the order of their specificity and a graphical visualization allows recognizing (i) the presence and abundance of such COGs and (ii) the phylogenetic relationship between groupA- and groupB-organisms. The software also allows detecting putative protein-protein interactions, novel enzymes involved in only partially known biochemical pathways, and alternate enzymes originated by convergent evolution. Citation for the above abstract: Meereis F, Kaufmann M. PCOGR: phylogenetic COG ranking as an online tool to judge the specificity of COGs with respect to freely definable groups of organisms. BMC Bioinformatics. 2004 Oct 15;5(1):150. © 2004 Meereis and Kaufmann. The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/5/150 |
| 468. COGENT: Complete Genome Tracking Database |
URL: http://maine.ebi.ac.uk:8000/services/cogent/ Categories: General Genomics Databases SUMMARY: We present a database of fully sequenced and published genomes to facilitate the re-distribution of data and ensure reproducibility of results in the field of computational genomics. For its design we have implemented an extremely simple yet powerful schema to allow linking of genome sequence data to other resources. AVAILABILITY: http://maine.ebi.ac.uk:8000/services/cogent/ Citation for the above abstract: Paul Janssen , Anton J. Enright , Benjamin Audit , Ildefonso Cases , Leon Goldovsky , Nicola Harte , Victor Kunin , and Christos A. Ouzounis COmplete GENome Tracking (COGENT): a flexible data environment for computational genomics Bioinformatics 19: 1451-1452. © 2003 Oxford University Press. The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/19/11/1451 |
| 469. DEG: Database of Essential Genes |
URL: http://tubic.tju.edu.cn/deg/ Categories: General Genomics Databases Essential genes are genes that are indispensable to support cellular life. These genes constitute a minimal gene set required for a living cell. We have constructed a Database of Essential Genes (DEG), which contains all the essential genes that are currently available. The functions encoded by essential genes are considered a foundation of life and therefore are likely to be common to all cells. Users can BLAST the query sequences against DEG. If homologous genes are found, it is possible that the queried genes are also essential. Users can search for essential genes by their function or name. Users can also browse and extract all the records in DEG. Essential gene products comprise excellent targets for antibacterial drugs. Analysis of essential genes could help to answer the question of what are the basic functions necessary to support cellular life. DEG is freely accessible from the website http://tubic.tju.edu.cn/deg/. Citation for the above abstract: Zhang, Ren, Ou, Hong-Yu, Zhang, Chun-Ting DEG: a database of essential genes Nucl. Acids Res. 2004 32: D271-272 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D271 |
| 470. Genomes at the EBI |
URL: http://www.ebi.ac.uk/genomes/ Categories: General Genomics Databases "The first completed genomes from viruses, phages and organelles were deposited into the EMBL Database in the early 1980's. Since then, molecular biology's shift to obtain the complete sequences of as many genomes as possible combined with major developments in sequencing technology resulted in hundreds of complete genome sequences being added to the database, including Archaea, Bacteria and Eukaryota. These web pages give access to a large number of complete genomes, help is available to describe the layout." |
| 471. Entrez Gene |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene Categories: General Genomics Databases Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez. Citation for the above abstract: Maglott, Donna, Ostell, Jim, Pruitt, Kim D., Tatusova, Tatiana Entrez Gene: gene-centered information at NCBI Nucl. Acids Res. 2005 33: D54-58 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D54 |
| 472. Entrez Genome |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Genome Categories: General Genomics Databases Entrez Genomes (20) provides access to genomic data contributed by the scientific community for species whose sequencing and mapping is complete or in progress. Entrez Genomes now includes over 140 complete microbial genomes, more than 1500 viruses, and over 425 reference sequences for eukaryotic organelles. Higher eukaryotic genomes are also included within Entrez Genomes such as the recent arrival, Ciona intestinalis. The Plant Genomes Central web page serves as a focal point for access to completed plant genomes, to information on plant genome sequencing projects or to plant-related resources at NCBI such as plant Genomic BLAST pages or Map Viewer. Complete genomes can be accessed hierarchically starting from either an alphabetical listing or a phylogenetic tree for each of six principle taxonomic groups. One can follow the hierarchy to a graphical overview for the genome of a single organism, on to the level of a single chromosome and, finally, down to the level of a single gene. At the level of a genome or a chromosome, a Coding Regions view displays the location of each coding region, length of the product, GenBank identification number for the protein sequence and name of the protein product. An RNA Genes view lists the location and gene names for ribosomal and transfer RNA genes. At the level of a single gene, links are provided to pre-computed sequence neighbors for the implied protein with links to the COGs database if possible. A summary of COG functional groups is presented in both tabular and graphical formats at the genome level. For complete microbial genomes, pre-computed BLAST neighbors for protein sequences, including their taxonomic distribution and links to 3D structures, are given in TaxTables and PDBTables, respectively. Pairwise sequence alignments are presented graphically and linked to the Cn3D macromolecular viewer (19), which provides interactive display of 3D structures and sequence alignments. The TaxPlot tool plots similarities in the proteomes of two organisms to that of a third, reference organism, and is available for both prokaryotic and eukaryotic genomes. Resources for the genomes of higher eukaryotes are discussed below. Citation for the above excerpt: Wheeler, David L., Church, Deanna M., Edgar, Ron, Federhen, Scott, Helmberg, Wolfgang, Madden, Thomas L., Pontius, Joan U., Schuler, Gregory D., Schriml, Lynn M., Sequeira, Edwin, Suzek, Tugba O., Tatusova, Tatiana A., Wagner, Lukas Database resources of the National Center for Biotechnology Information: update Nucl. Acids Res. 2004 32: D35-40 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D35 |
| 473. ERGO-Light |
URL: http://www.ergo-light.com/ Categories: General Genomics Databases The ERGO (http://ergo.integratedgenomics.com/ERGO/) genome analysis and discovery suite is an integration of biological data from genomics, biochemistry, high-throughput expression profiling, genetics and peer-reviewed journals to achieve a comprehensive analysis of genes and genomes. Far beyond any conventional systems that facilitate functional assignments, ERGO combines pattern-based analysis with comparative genomics by visualizing genes within the context of regulation, expression profiling, phylogenetic clusters, fusion events, networked cellular pathways and chromosomal neighborhoods of other functionally related genes. The result of this multifaceted approach is to provide an extensively curated database of the largest available integration of genomes, with a vast collection of reconstructed cellular pathways spanning all domains of life. Although access to ERGO is provided only under subscription, it is already widely used by the academic community. The current version of the system integrates 500 genomes from all domains of life in various levels of completion, 403 of which are available for subscription. Citation for the above abstract: Overbeek, Ross, Larsen, Niels, Walunas, Theresa, D'Souza, Mark, Pusch, Gordon, Selkov, Eugene, Jr, Liolios, Konstantinos, Joukov, Viktor, Kaznadzey, Denis, Anderson, Iain, Bhattacharyya, Anamitra, Burd, Henry, Gardner, Warren, Hanke, Paul, Kapatral, Vinayak, Mikhailova, Natalia, Vasieva, Olga, Osterman, Andrei, Vonstein, Veronika, Fonstein, Michael, Ivanova, Natalia, Kyrpides, Nikos The ERGOTM genome analysis and discovery system Nucl. Acids Res. 2003 31: 164-171 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/164 |
| 474. GenDiS: Genome Distribution of protein structural domain Superfamilies |
URL: http://caps.ncbs.res.in/gendis/home.html Categories: General Genomics Databases Several proteins that have substantially diverged during evolution retain similar three-dimensional structures and biological function inspite of poor sequence identity. The database on Genomic Distribution of protein structural domain Superfamilies (GenDiS) provides record for the distribution of 4001 protein domains organized as 1194 structural superfamilies across 18,997 genomes at various levels of hierarchy in taxonomy. GenDiS database provides a survey of protein domains enlisted in sequence databases employing a 3-fold sequence search approach. Lineage-specific literature is obtained from the taxonomy database for individual protein members to provide a platform for performing genomic and phyletic studies across organisms. The database documents residual properties and provides alignments for the various superfamily members in genomes, offering insights into the rational design of experiments and for the better understanding of a superfamily. GenDiS database can be accessed at http://www.ncbs.res.in/~faculty/mini/gendis/home.html. Citation for the above abstract: Pugalenthi, Ganesan, Bhaduri, Anirban, Sowdhamini, Ramanathan GenDiS: Genomic Distribution of protein structural domain Superfamilies Nucl. Acids Res. 2005 33: D252-255 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D252 |
| 475. GeneNest |
URL: http://genenest.molgen.mpg.de/ Categories: General Genomics Databases, Model Organisms and Comparative Genomics Databases GeneNest (9) is a database and software package for the generation and visualization of gene indices based on EST and mRNA sequences. Currently, the database comprises gene indices of man (based on UniGene), mouse, Arabidopsis thaliana and zebrafish. All cDNA/mRNA sequences related to an organism are extracted either directly from the EMBL (10) database or from an already clustered UniGene (11) database. A preprocessing step includes vector clipping, repeat annotation and marking of regions of low sequence quality in order to restrict processing to data of high quality. In further steps, these sequences are clustered and all members of each cluster are assembled into one or more contigs. Roughly speaking, each cluster represents a single gene, whereas contigs of a cluster reflect different transcripts of that gene. A schematic view of the assembled clusters is presented on the GeneNest web site. Detailed information about sequences and their preprocessing results, as well as information about open reading frames, similarities between clusters or protein homologies, can be accessed interactively. GeneNest can be queried using BLAST against the consensus sequences or by keyword search. GeneNest is tightly linked to SYSTERS and SpliceNest as well as to external resources like EMBL. Citation for the above excerpt: Krause, Antje, Haas, Stefan A., Coward, Eivind, Vingron, Martin SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein Nucl. Acids Res. 2002 30: 299-300 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/299 |
| 476. GIB: Genome Information Broker |
URL: http://gib.genes.nig.ac.jp/ Categories: General Genomics Databases Genome Information Broker (GIB) is a powerful tool for the study of comparative genomics. GIB allows users to retrieve and display partial and/or whole genome sequences together with the relevant biological annotation. GIB has accumulated all the completed microbial genome and has recently been expanded to include Arabidopsis thaliana genome data from DDBJ/EMBL/GenBank. In the near future, hundreds of genome sequences will be determined. In order to handle such huge data, we have enhanced the GIB architecture by using XML, CORBA and distributed RDBs. We introduce the new GIB here. GIB is freely accessible at http://gib.genes.nig.ac.jp/. Citation for the above abstract: Fumoto, Masaki, Miyazaki, Satoru, Sugawara, Hideaki Genome Information Broker (GIB): data retrieval and comparative analysis system for completed microbial genomes and more Nucl. Acids Res. 2002 30: 66-68 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/66 |
| 477. Genome Reviews |
URL: http://www.ebi.ac.uk/GenomeReviews/ Categories: General Genomics Databases Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8. Citation for the above abstract: Kersey, Paul, Bower, Lawrence, Morris, Lorna, Horne, Alan, Petryszak, Robert, Kanz, Carola, Kanapin, Alexander, Das, Ujjwal, Michoud, Karine, Phan, Isabelle, Gattiker, Alexandre, Kulikova, Tamara, Faruque, Nadeem, Duggan, Karyn, Mclaren, Peter, Reimholz, Britt, Duret, Laurent, Penel, Simon, Reuter, Ingmar, Apweiler, Rolf Integr8 and Genome Reviews: integrated views of complete genomes and proteomes Nucl. Acids Res. 2005 33: D297-302 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D297 |
| 478. Integr8 |
URL: http://www.ebi.ac.uk/integr8/ Categories: General Genomics Databases Integr8 is a new web portal for exploring the biology of organisms with completely deciphered genomes. For over 190 species, Integr8 provides access to general information, recent publications, and a detailed statistical overview of the genome and proteome of the organism. The preparation of this analysis is supported through Genome Reviews, a new database of bacterial and archaeal DNA sequences in which annotation has been upgraded (compared to the original submission) through the integration of data from many sources, including the EMBL Nucleotide Sequence Database, the UniProt Knowledgebase, InterPro, CluSTr, GOA and HOGENOM. Integr8 also allows the users to customize their own interactive analysis, and to download both customized and prepared datasets for their own use. Integr8 is available at http://www.ebi.ac.uk/integr8. Citation for the above abstract: Kersey, Paul, Bower, Lawrence, Morris, Lorna, Horne, Alan, Petryszak, Robert, Kanz, Carola, Kanapin, Alexander, Das, Ujjwal, Michoud, Karine, Phan, Isabelle, Gattiker, Alexandre, Kulikova, Tamara, Faruque, Nadeem, Duggan, Karyn, Mclaren, Peter, Reimholz, Britt, Duret, Laurent, Penel, Simon, Reuter, Ingmar, Apweiler, Rolf Integr8 and Genome Reviews: integrated views of complete genomes and proteomes Nucl. Acids Res. 2005 33: D297-302 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D297 |
| 479. GOLD: Genomes On Line Database |
URL: http://www.genomesonline.org/ Categories: General Genomics Databases The Genomes On Line Database (GOLD) is a web resource for comprehensive access to information regarding complete and ongoing genome sequencing projects worldwide. The database currently incorporates information on over 1500 sequencing projects, of which 294 have been completed and the data deposited in the public databases. GOLD v.2 has been expanded to provide information related to organism properties such as phenotype, ecotype and disease. Furthermore, project relevance and availability information is now included. GOLD is available at http://www.genomesonline.org. It is also mirrored at the Institute of Molecular Biology and Biotechnology, Crete, Greece at http://gold.imbb.forth.gr/ Citation for the above abstract: Liolios, Konstantinos, Tavernarakis, Nektarios, Hugenholtz, Philip, Kyrpides, Nikos C. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide Nucl. Acids Res. 2006 34: D332-334 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D332 |
| 480. GtRNAdb: The Genomic tRNA Database |
URL: http://lowelab.ucsc.edu/GtRNAdb/ Categories: General Genomics Databases, RNA Sequence Databases We describe a program, tRNAscan-SE, which identifies 99-100% of transfer RNA genes in DNA sequence while giving less than one false positive per 15 gigabases. Two previously described tRNA detection programs are used as fast, first-pass prefilters to identify candidate tRNAs, which are then analyzed by a highly selective tRNA covariance model. This work represents a practical application of RNA covariance models, which are general, probabilistic secondary structure profiles based on stochastic context-free grammars. tRNAscan-SE searches at approximately 30 000 bp/s. Additional extensions to tRNAscan-SE detect unusual tRNA homologues such as selenocysteine tRNAs, tRNA-derived repetitive elements and tRNA pseudogenes. Citation for the above abstract: Lowe, TM, Eddy, SR tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence Nucl. Acids Res. 1997 25: 955-964 © 1997 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/25/5/955 |
| 481. TIGR Genome Properties |
URL: http://www.tigr.org/tigr-scripts/CMR2/genome_properties Categories: General Genomics Databases MOTIVATION: The presence or absence of metabolic pathways and structures provide a context that makes protein annotation far more reliable. Compiling such information across microbial genomes improves the functional classification of proteins and provides a valuable resource for comparative genomics. RESULTS: We have created a Genome Properties system to present key aspects of prokaryotic biology using standardized computational methods and controlled vocabularies. Properties reflect gene content, phenotype, phylogeny and computational analyses. The results of searches using hidden Markov models allow many properties to be deduced automatically, especially for families of proteins (equivalogs) conserved in function since their last common ancestor. Additional properties are derived from curation, published reports and other forms of evidence. Genome Properties system was applied to 156 complete prokaryotic genomes, and is easily mined to find differences between species, correlations between metabolic features and families of uncharacterized proteins, or relationships among properties. AVAILABILITY: Genome Properties can be found at http://www.tigr.org/Genome_Properties CONTACT: selengut@tigr.org SUPPLEMENTARY INFORMATION: http://www.tigr.org/tigr-scripts/CMR2/genome_properties_references.spl. Citation for the above abstract: Daniel H. Haft , Jeremy D. Selengut , Lauren M. Brinkac , Nikhat Zafar , and Owen White Genome Properties: a system for the investigation of prokaryotic genetic content for microbiology, genome annotation and comparative genomics Bioinformatics Advance Access published on February 1, 2005, DOI 10.1093/bioinformatics/bti015. Bioinformatics 21: 293-306. © 2005 Oxford University Press. The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/21/3/293 |
| 482. InParanoid: Eukaryotic Ortholog Groups |
URL: http://inparanoid.cgb.ki.se/ Categories: General Genomics Databases The Inparanoid eukaryotic ortholog database (http://inparanoid.cgb.ki.se/) is a collection of pairwise ortholog groups between 17 whole genomes; Anopheles gambiae, Caenorhabditis briggsae, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Takifugu rubripes, Gallus gallus, Homo sapiens, Mus musculus, Pan troglodytes, Rattus norvegicus, Oryza sativa, Plasmodium falciparum, Arabidopsis thaliana, Escherichia coli, Saccharomyces cerevisiae and Schizosaccharomyces pombe. Complete proteomes for these genomes were derived from Ensembl and UniProt and compared pairwise using Blast, followed by a clustering step using the Inparanoid program. An Inparanoid cluster is seeded by a reciprocally best-matching ortholog pair, around which inparalogs (should they exist) are gathered independently, while outparalogs are excluded. The ortholog clusters can be searched on the website using Ensembl gene/protein or UniProt identifiers, annotation text or by Blast alignment against our protein datasets. The entire dataset can be downloaded, as can the Inparanoid program itself. Citation for the above abstract: O'Brien, Kevin P., Remm, Maido, Sonnhammer, Erik L. L. Inparanoid: a comprehensive database of eukaryotic orthologs Nucl. Acids Res. 2005 33: D476-480 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D476 |
| 483. KaryotypeDB |
URL: http://www.nenno.it/karyotypedb/ Categories: General Genomics Databases "The Karyotype database (KaryotypeDB) contains karyotype and chromosome information like chromosome number, length, karyotype features, idiograms, physical localizations of DNA sequences by fluorescence in situ hybridizion (FISH), and cell material for metaphase chromosomes and polytene chromosomes from different animal and plant species together with literature references and links." |
| 484. MBGD: Microbial Genome Database for Comparative Analysis |
URL: http://mbgd.genome.ad.jp/ Categories: General Genomics Databases MBGD is a workbench system for comparative analysis of completely sequenced microbial genomes. The central function of MBGD is to create an orthologous gene classification table using precomputed all-against-all similarity relationships among genes in multiple genomes. In MBGD, an automated classification algorithm has been implemented so that users can create their own classification table by specifying a set of organisms and parameters. This feature is especially useful when the user's interest is focused on some taxonomically related organisms. The created classification table is stored into the database and can be explored combining with the data of individual genomes as well as similarity relationships among genomes. Using these data, users can carry out comparative analyses from various points of view, such as phylogenetic pattern analysis, gene order comparison and detailed gene structure comparison. MBGD is accessible at http://mbgd.genome.ad.jp/. Citation for the above abstract: Uchiyama, Ikuo MBGD: microbial genome database for comparative analysis Nucl. Acids Res. 2003 31: 58-62 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/58 |
| 485. NegProt: Negative Proteome database |
URL: http://superfly.ucsd.edu/negprot/ Categories: Proteomics Databases A proteome comparision database. |
| 486. PartiGeneDB: A Database of Partial Genomes |
URL: http://www.partigenedb.org/ Categories: General Genomics Databases Owing to the high costs involved, only 28 eukaryotic genomes have been fully sequenced to date. On the other hand, an increasing number of projects have been initiated to generate survey sequence data for a large number of other eukaryotic organisms. For the most part, these data are poorly organized and difficult to analyse. Here, we present PartiGeneDB (http://www.partigenedb.org), a publicly available database resource, which collates and processes these sequence datasets on a species-specific basis to form non-redundant sets of gene objects-which we term partial genomes. Users may query the database to identify particular genes of interest either on the basis of sequence similarity or via the use of simple text searches for specific patterns of BLAST annotation. Alternatively, users can examine entire partial genome datasets on the basis of relative expression of gene objects or by the use of an interactive Java-based tool (SimiTri), which displays sequence similarity relationships for a large number of sequence objects in a single graphic. PartiGeneDB facilitates regular incremental updates of new sequence datasets associated with both new and exisitng species. PartiGeneDB currently contains the assembled partial genomes derived from 1.83 million sequences associated with 247 different eukaryotes. Citation for the above abstract: Peregrin-Alvarez, Jose M., Yam, Andrew, Sivakumar, Gaya, Parkinson, John PartiGeneDB--collating partial genomes Nucl. Acids Res. 2005 33: D303-307 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D303 |
| 487. PEDANT: Protein Extraction, Description and ANalysis Tool |
URL: http://pedant.gsf.de/ Categories: General Genomics Databases The PEDANT genome database (http://pedant.gsf.de) contains pre-computed bioinformatics analyses of publicly available genomes. Its main mission is to provide robust automatic annotation of the vast majority of amino acid sequences, which have not been subjected to in-depth manual curation by human experts in high-quality protein sequence databases. By design PEDANT annotation is genome-oriented, making it possible to explore genomic context of gene products, and evaluate functional and structural content of genomes using a category-based query mechanism. At present, the PEDANT database contains exhaustive annotation of over 1,240,000 proteins from 270 eubacterial, 23 archeal and 41 eukaryotic genomes. Citation for the above abstract: Riley, M. Louise, Schmidt, Thorsten, Wagner, Christian, Mewes, Hans-Werner, Frishman, Dmitrij The PEDANT genome database in 2005 Nucl. Acids Res. 2005 33: D308-310 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D308 |
| 488. STRING: Search Tool for the Retrieval of Interacting Genes/Proteins |
URL: http://string.embl.de/ Categories: General Genomics Databases A full description of a protein's function requires knowledge of all partner proteins with which it specifically associates. From a functional perspective, 'association' can mean direct physical binding, but can also mean indirect interaction such as participation in the same metabolic pathway or cellular process. Currently, information about protein association is scattered over a wide variety of resources and model organisms. STRING aims to simplify access to this information by providing a comprehensive, yet quality-controlled collection of protein-protein associations for a large number of organisms. The associations are derived from high-throughput experimental data, from the mining of databases and literature, and from predictions based on genomic context analysis. STRING integrates and ranks these associations by benchmarking them against a common reference set, and presents evidence in a consistent and intuitive web interface. Importantly, the associations are extended beyond the organism in which they were originally described, by automatic transfer to orthologous protein pairs in other organisms, where applicable. STRING currently holds 730,000 proteins in 180 fully sequenced organisms, and is available at http://string.embl.de/. Citation for the above abstract: von Mering, Christian, Jensen, Lars J., Snel, Berend, Hooper, Sean D., Krupp, Markus, Foglierini, Mathilde, Jouffre, Nelly, Huynen, Martijn A., Bork, Peer STRING: known and predicted protein-protein associations, integrated and transferred across organisms Nucl. Acids Res. 2005 33: D433-437 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D433 |
| 489. TIGR Comprehensive Microbial Resource |
URL: http://www.tigr.org/tigr-scripts/CMR2/CMRHomePage.spl Categories: General Genomics Databases One challenge presented by large-scale genome sequencing efforts is effective display of uniform information to the scientific community. The Comprehensive Microbial Resource (CMR) contains robust annotation of all complete microbial genomes and allows for a wide variety of data retrievals. The bacterial information has been placed on the Web at http://www.tigr.org/CMR for retrieval using standard web browsing technology. Retrievals can be based on protein properties such as molecular weight or hydrophobicity, GC-content, functional role assignments and taxonomy. The CMR also has special web-based tools to allow data mining using pre-run homology searches, whole genome dot-plots, batch downloading and traversal across genomes using a variety of datatypes. Citation for the above abstract: Peterson, Jeremy D., Umayam, Lowell A., Dickinson, Tanja, Hickey, Erin K., White, Owen The Comprehensive Microbial Resource Nucl. Acids Res. 2001 29: 123-125 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/123 |
| 490. TIGR Microbial Database |
URL: http://www.tigr.org/tdb/mdb/mdbcomplete.html Categories: General Genomics Databases "Alphabetical listing of published TIGR Microbial genomes" |
| 491. TransportDB: Transporter Protein Analysis Database |
URL: http://www.membranetransport.org/ Categories: General Genomics Databases, Individual Protein Family Databases TransportDB (http://www.membranetransport.org) is a relational database designed for describing the predicted cellular membrane transport proteins in organisms whose complete genome sequences are available. For each organism, the complete set of membrane transport systems was identified and classified into different types and families according to putative membrane topology, protein family, bioenergetics and substrate specificities. Web pages were created to provide user-friendly interfaces to easily access, query and download the data. Additional features, such as a BLAST search tool against known transporter protein sequences, comparison of transport systems from different organisms and phylogenetic trees of individual transporter families are also provided. TransportDB will be regularly updated with data obtained from newly sequenced genomes. Citation for the above abstract: Ren, Qinghu, Kang, Katherine H., Paulsen, Ian T. TransportDB: a relational database of cellular membrane transport systems Nucl. Acids Res. 2004 32: D284-288 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D284 |
| 492. Open NCI Database |
URL: http://cactus.nci.nih.gov/ncidb2/ Categories: 3D Molecular Structures, Drug and Drug Design Databases A Web-based, graphical user interface has been developed to conduct rapid searches by numerous criteria in the more than 250,000 structures of the Open NCI Database. It is based on the chemistry information toolkit CACTVS. Nearly all structures and anticancer and anti-HIV screening data provided by NCI's Developmental Therapeutics Program have been included. This data set has been augmented by a large amount of additional, mostly computed, data, such as calculated log P values, predicted biological activities, systematically determined names, and others. Complex boolean searches are possible. Flexible substructure searches have been implemented. The user can conduct 3D pharmacophore queries in up to 25 conformations precalculated for each compound. Numerous output formats as well as 2D and 3D visualization options are provided. It is possible to export search results in various forms and with choices for data contents in the exported files, for structure sets ranging in size from a single compound to the entire database. Only a Web browser is needed to use this service, with a few plug-ins being useful but optional. Citation for the above abstract: Ihlenfeldt WD, Voigt JH, Bienfait B, Oellien F, Nicklaus MC. Enhanced CACTVS browser of the Open NCI Database. J Chem Inf Comput Sci. 2002 Jan-Feb;42(1):46-57. © 2002 American Chemical Society. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11855965 |
| 493. ALFRED |
URL: http://alfred.med.yale.edu/ Categories: General Polymorphism Databases Elaboration of ALFRED (http://alfred.med.yale.edu) is being continued in two directions. One of which is developing tools for efficiently annotating the entries and checking the integrity of the data already in the database while the other is to increase the quantity and accessibility of data. Information contained in ALFRED such as, polymorphic sites, number of populations and frequency tables (one sample typed for one site) has significantly increased. Citation for the above abstract: Rajeevan, H., Osier, M. V., Cheung, K.-H., Deng, H., Druskin, L., Heinzen, R., Kidd, J. R., Stein, S., Pakstis, A. J., Tosches, N. P., Yeh, C.-C., Miller, P. L., Kidd, K. K. ALFRED: the ALelle FREquency Database. Update Nucl. Acids Res. 2003 31: 270-271 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/270 |
| 494. Cypriot National Mutation Database |
URL: http://www.goldenhelix.org/cypriot/ Categories: General Polymorphism Databases "This is an online repository of information about the different mutations leading to the various inherited disorders in the Cypriot population. The Cypriot National Mutation Database results from the fruitful collaboration among several investigators from Erasmus Medical Center (The Netherlands) and the Cyprus Institute of Neurology and Genetics (Cyprus), encouraged by the Human Genome Variation Society. The initial data came from previously published reports as well as from unpublished information contributed from individual researchers prior of publication. This information was converted to a database, and now new entries are added and old entries are corrected by our expert advisors and collaborators. Visit the summary page to see the types of information available for every inherited disorder reported for the Cypriot population. Also, by visiting our query page, you can query upon the data stored in the database, concerning the frequencies of the different mutations." |
| 495. Database of Genomic Variants |
URL: http://projects.tcag.ca/variation/ Categories: General Polymorphism Databases We identified 255 loci across the human genome that contain genomic imbalances among unrelated individuals. Twenty-four variants are present in > 10% of the individuals that we examined. Half of these regions overlap with genes, and many coincide with segmental duplications or gaps in the human genome assembly. This previously unappreciated heterogeneity may underlie certain human phenotypic variation and susceptibility to disease and argues for a more dynamic human genome structure. Citation for the above abstract: Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, Scherer SW, Lee C. Detection of large-scale variation in the human genome. Nat Genet. 2004 Sep;36(9):949-51. Epub 2004 Aug 01. © 2004 Nature Publishing Group. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15286789 |
| 496. dbQSNP |
URL: http://qsnp.gen.kyushu-u.ac.jp/ Categories: General Polymorphism Databases "Single nucleotide polymorphisms (SNPs) are bi-allelic genetic markers abundantly distributed throughout human genome. Several million SNPs have been collected and deposited in public databases (e.g., dbSNP, http://www.ncbi.nlm.nih.gov/SNP/), and they are expected to be useful markers in genetic analysis of polygenic traits. However, further characterization on their allele frequency in various population is needed before they are actually used for e.g., association study of diseases. We have established a streamlined and cost-efficient SNP discovery/ quantification method that is based on SSCP analysis using capillary electrophoresis (Orita et al., 1989; Inazuka et al., 1997; Sasaki et al., 2001; Hayashi et al., 2001; Tahira et al., 2002; Kukita et al., 2002; Baba et al., 2003). In this method, alleles are separated into peaks, and their frequencies can be reliably and accurately quantified from their peak heights of pooled DNA. The raw data of SSCP analysis obtained from various capillary-array apparatuses are interpreted by a newly developed fragment analysis software, 'QUISCA' (Higasa et al., 2002). To manage SSCP and sequencing analyses for discovering SNPs and determining their allele frequency at a large scale, we developed a relational database, 'dbQSNP Conductor', that runs on postgreSQL, and supports designing experiments, analyzing results of SSCP/sequencing from various capillary-array DNA sequencers, and verifying these results to minimize error (Baba et al., in preparation). This site, 'dbQSNP Public', is a repository of STS/SNP information obtained by 'dbQSNP conductor'. SSCP and sequence trace data are just a few clicks away, and thus, integrity of the data can be confirmed." |
| 497. Arabidopsis MPSS |
URL: http://mpss.udel.edu/at/ Categories: Arabidopsis thaliana Databases Microarrays and tag-based transcriptional profiling technologies represent diverse but complementary data types. We are currently conducting a comparison of high-density in situ synthesized microarrays and massively-parallel signature sequencing (MPSS) data in the model plant, Arabidopsis thaliana. The MPSS data (available at http://mpss.udel.edu/at) and the microarray data have been compiled using the same RNA source material. In this review, we outline the experimental strategy that we are using, and present preliminary data and interpretations from the transcriptional profiles of Arabidopsis leaves and roots. The preliminary data indicate that the log ratio differences of transcripts between leaves and roots measured by microarray data are in better agreement with the MPSS data than the absolute intensities measured for individual microarrays hybridized to only one of the cRNA populations. The correlation was substantially improved by focusing on a subset of genes excluding those with very low expression levels; this selection may have removed noisy data. Future reports will incorporate more than 10 tissues that have been sampled by MPSS. Citation for the above abstract: Sean J. Coughlan, Vikas Agrawal, Blake Meyers A comparison of global gene expression measurement technologies in Arabidopsis thaliana Comparative and Functional Genomics. Volume 5, Issue 3, 2004. Pages 245-252 © 2004 John Wiley & Sons, Ltd. The full abstract can be found at: http://www3.interscience.wiley.com/cgi-bin/abstract/108061140/ABSTRACT |
| 498. AMPDB: Arabidopsis Mitochondrial Protein Database |
URL: http://www.ampdb.bcs.uwa.edu.au/ Categories: Arabidopsis thaliana Databases, Mitochondrial Genes and Proteins Databases The Arabidopsis Mitochondrial Protein Database is an Internet-accessible relational database containing information on the predicted and experimentally confirmed protein complement of mitochondria from the model plant Arabidopsis thaliana (http://www.ampdb.bcs.uwa.edu.au/). The database was formed using the total non-redundant nuclear and organelle encoded sets of protein sequences and allows relational searching of published proteomic analyses of Arabidopsis mitochondrial samples, a set of predictions from six independent subcellular-targeting prediction programs, and orthology predictions based on pairwise comparison of the Arabidopsis protein set with known yeast and human mitochondrial proteins and with the proteome of Rickettsia. A variety of precomputed physical-biochemical parameters are also searchable as well as a more detailed breakdown of mass spectral data produced from our proteomic analysis of Arabidopsis mitochondria. It contains hyperlinks to other Arabidopsis genomic resources (MIPS, TIGR and TAIR), which provide rapid access to changing gene models as well as hyperlinks to T-DNA insertion resources, Massively Parallel Signature Sequencing (MPSS) and Genome Tiling Array data and a variety of other Arabidopsis online resources. It also incorporates basic analysis tools built into the query structure such as a BLAST facility and tools for protein sequence alignments for convenient analysis of queried results. Citation for the above abstract: Heazlewood, Joshua L., Millar, A. Harvey AMPDB: the Arabidopsis Mitochondrial Protein Database Nucl. Acids Res. 2005 33: D605-610 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D605 |
| 499. Brassica ASTRA |
URL: http://hornbill.cspp.latrobe.edu.au/astra.html Categories: Other Plant Databases Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au. Citation for the above abstract: Love, Christopher G., Robinson, Andrew J., Lim, Geraldine A. C., Hopkins, Clare J., Batley, Jacqueline, Barker, Gary, Spangenberg, German C., Edwards, David Brassica ASTRA: an integrated database for Brassica genomic research Nucl. Acids Res. 2005 33: D656-659 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D656 |
| 500. Diatom EST Database |
URL: http://avesthagen.sznbowler.com/ Categories: Other Plant Databases, Unicellular Eukaryote Genome Databases The Diatom EST database provides integrated access to expressed sequence tag (EST) data from two eukaryotic microalgae of the class Bacillariophyceae, Phaeodactylum tricornutum and Thalassiosira pseudonana. The database currently contains sequences of close to 30,000 ESTs organized into PtDB, the P.tricornutum EST database, and TpDB, the T.pseudonana EST database. The EST sequences were clustered and assembled into a non-redundant set for each organism, and these non-redundant sequences were then subjected to automated annotation using similarity searches against protein and domain databases. EST sequences, clusters of contiguous sequences, their annotation and analysis with reference to the publicly available databases, and a codon usage table derived from a subset of sequences from PtDB and TpDB can all be accessed in the Diatom EST Database. The underlying RDBMS enables queries over the raw and annotated EST data and retrieval of information through a user-friendly web interface, with options to perform keyword and BLAST searches. The EST data can also be retrieved based on Pfam domains, Cluster of Orthologous Groups (COG) and Gene Ontologies (GO) assigned to them by similarity searches. The Database is available at http://avesthagen.sznbowler.com. Citation for the above abstract: Maheswari, Uma, Montsant, Anton, Goll, Johannes, Krishnasamy, S., Rajyashri, K. R., Patell, Villoo Morawala, Bowler, Chris The Diatom EST Database Nucl. Acids Res. 2005 33: D344-347 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D344 |
| 501. Legume Information System |
URL: http://www.comparative-legumes.org/ Categories: Other Plant Databases The Legume Information System (LIS) (http://www.comparative-legumes.org), developed by the National Center for Genome Resources in cooperation with the USDA Agricultural Research Service (ARS), is a comparative legume resource that integrates genetic and molecular data from multiple legume species enabling cross-species genomic and transcript comparisons. The LIS virtual plant interface allows simplified and intuitive navigation of transcript data from Medicago truncatula, Lotus japonicus, Glycine max and Arabidopsis thaliana. Transcript libraries are represented as images of plant organs in different developmental stages, which are selected to query the analyzed and annotated data. Complex queries can be accomplished by adding modifiers, keywords and sequence names. The LIS also contains annotated genomic data featuring transcript alignments to validate gene predictions as well as motif and similarity analyses. The genomic browser supports comparative analysis via novel dynamic functional annotation comparisons. CMap, developed as part of the GMOD project (http://www.gmod.org/cmap/index.shtml), has been incorporated to support comparative analyses of community linkage and physical map data. LIS is being expanded to incorporate gene expression and biochemical pathways which will be seamlessly integrated forming a knowledge discovery framework. Citation for the above abstract: Gonzales, Michael D., Archuleta, Eric, Farmer, Andrew, Gajendran, Kamal, Grant, David, Shoemaker, Randy, Beavis, William D., Waugh, Mark E. The Legume Information System (LIS): an integrated information resource for comparative legume biology Nucl. Acids Res. 2005 33: D660-665 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D660 |
| 502. MaizeGDB |
URL: http://www.maizegdb.org/ Categories: Other Plant Databases The Maize Genetics and Genomics Database (MaizeGDB) is a central repository for maize sequence, stock, phenotype, genotypic and karyotypic variation, and chromosomal mapping data. In addition, MaizeGDB provides contact information for over 2400 maize cooperative researchers, facilitating interactions between members of the rapidly expanding maize community. MaizeGDB represents the synthesis of all data available previously from ZmDB and from MaizeDB-databases that have been superseded by MaizeGDB. MaizeGDB provides web-based tools for ordering maize stocks from several organizations including the Maize Genetics Cooperation Stock Center and the North Central Regional Plant Introduction Station (NCRPIS). Sequence searches yield records displayed with embedded links to facilitate ordering cloned sequences from various groups including the Maize Gene Discovery Project and the Clemson University Genomics Institute. An intuitive web interface is implemented to facilitate navigation between related data, and analytical tools are embedded within data displays. Web-based curation tools for both designated experts and general researchers are currently under development. MaizeGDB can be accessed at http://www.maizegdb.org/. Citation for the above abstract: Lawrence, Carolyn J., Dong, Qunfeng, Polacco, Mary L., Seigfried, Trent E., Brendel, Volker MaizeGDB, the community database for maize genetics and genomics Nucl. Acids Res. 2004 32: D393-397 © 2004 Oxford University Press. The full abstract can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D393 |
| 503. MtDB: Medicago trunculata Database |
URL: http://www.medicago.org/MtDB/ Categories: Other Plant Databases In order to identify the genes and gene functions that underlie key aspects of legume biology, researchers have selected the cool season legume Medicago truncatula (Mt) as a model system for legume research. A set of >170 000 Mt ESTs has been assembled based on in-depth sampling from various developmental stages and pathogen-challenged tissues. MtDB is a relational database that integrates Mt transcriptome data and provides a wide range of user-defined data mining options. The database is interrogated through a series of interfaces with 58 options grouped into two filters. In addition, the user can select and compare unigene sets generated by different assemblers: Phrap, Cap3 and Cap4. Sequence identifiers from all public Mt sites (e.g. IDs from GenBank, CCGB, TIGR, NCGR, INRA) are fully cross-referenced to facilitate comparisons between different sites, and hypertext links to the appropriate database records are provided for all queries' results. MtDB's goal is to provide researchers with the means to quickly and independently identify sequences that match specific research interests based on user-defined criteria. The underlying database and query software have been designed for ease of updates and portability to other model organisms. Public access to the database is at http://www.medicago.org/MtDB. Citation for the above abstract: Lamblin, Anne-Francoise J., Crow, John A., Johnson, James E., Silverstein, Kevin A. T., Kunau, Timothy M., Kilian, Alan, Benz, Diane, Stromvik, Martina, Endre, Gabriella, VandenBosch, Kathryn A., Cook, Douglas R., Young, Nevin D., Retzel, Ernest F. MtDB: a database for personalized data mining of the model legume Medicago truncatula transcriptome Nucl. Acids Res. 2003 31: 196-201 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/196 |
| 504. PoMaMo: Potato Maps and More |
URL: https://gabi.rzpd.de/PoMaMo.html Categories: Other Plant Databases A database for potato genome data (PoMaMo, Potato Maps and More) was established. The database contains molecular maps of all twelve potato chromosomes with about 1000 mapped elements, sequence data, putative gene functions, results from BLAST analysis, SNP and InDel information from different diploid and tetraploid potato genotypes, publication references, links to other public databases like GenBank (http://www.ncbi.nlm.nih.gov/) or SGN (Solanaceae Genomics Network, http://www.sgn.cornell.edu/), etc. Flexible search and data visualization interfaces enable easy access to the data via internet (https://gabi.rzpd.de/PoMaMo.html). The Java servlet tool YAMB (Yet Another Map Browser) was designed to interactively display chromosomal maps. Maps can be zoomed in and out, and detailed information about mapped elements can be obtained by clicking on an element of interest. The GreenCards interface allows a text-based data search by marker-, sequence- or genotype name, by sequence accession number, gene function, BLAST Hit or publication reference. The PoMaMo database is a comprehensive database for different potato genome data, and to date the only database containing SNP and InDel data from diploid and tetraploid potato genotypes. Citation for the above abstract: Meyer, Svenja, Nagel, Axel, Gebhardt, Christiane PoMaMo--a comprehensive database for potato genome data Nucl. Acids Res. 2005 33: D666-670 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D666 |
| 505. SGMD: the Soybean Genomics and Microarray Database |
URL: http://psi081.ba.ars.usda.gov/SGMD/default.htm Categories: Other Plant Databases The Soybean Genomics and Microarray Database (SGMD) attempts to provide an integrated view of the interaction of soybean with the soybean cyst nematode and contains genomic, EST and microarray data with embedded analytical tools allowing correlation of soybean ESTs with their gene expression profiles. SGMD provides analytical tools to mine the microarray data quickly by integrating many analysis methods within the database itself. The expression profiles of genes at time intervals during the first 8 days of nematode invasion is searchable by gene name or GenBank accession number. Recent developments include the addition of a searchable database for soybean cyst nematode ESTs and photographs of the invasion process at time points examined using microarrays. SGMD is completely accessible from the web at: http://psi081.ba.ars.usda.gov/SGMD/default.htm. Citation for the above abstract: Alkharouf, Nadim W., Matthews, Benjamin F. SGMD: the Soybean Genomics and Microarray Database Nucl. Acids Res. 2004 32: D398-400 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D398 |
| 506. CCSD: Complex Carbohydrate Structure Database |
URL: http://bssv01.lancs.ac.uk/gig/pages/gag/carbbank.htm Categories: Carbohydrate Structure Databases "The CCSD is a database containing complex carbohydrate structures and associated text. The information is mainly derived from scientific publications and submissions by authors. The database has a flat file format, i.e., one record contains a single structure with its associated text and citation. Thus a paper which has several structures will apear several times, and, a structure may appear in more than one record. Structural abbreviations and nomenclature are similar to those found in the journal Carbohydrate Research." |
| 507. CSS: Carbohydrate Structure Suite |
URL: http://www.dkfz.de/spec/css/ Categories: 3D Molecular Structures, Carbohydrate Structure Databases Knowledge of the 3D structure of glycoproteins and protein-carbohydrate complexes is indispensable to fully understand the biological processes they are involved in. Carbohydrate Structure Suite is an attempt to automatically analyse carbohydrate structures contained in the PDB and make the results publicly available on the internet. Characteristic torsion angles, glycoprotein sequences and carbohydrate-protein interactions are analysed. Furthermore, tools to crosslink the PDB and carbohydrate databases and to check the integrity of carbohydrate 3D structures are included. The service is available at (www.dkfz.de/spec/css/). Citation for the above abstract: Lutteke, Thomas, Frank, Martin, von der Lieth, Claus-W. Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB Nucl. Acids Res. 2005 33: D242-246 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D242 |
| 508. KEGG Glycan Structure Search using KCaM |
URL: http://glycan.genome.jp/ Categories: Carbohydrate Structure Databases KCaM (KEGG Carbohydrate Matcher) is a tool for the analysis of carbohydrate sugar chains, or glycans. It consists of a web-based graphical user interface that allows users to enter glycans easily with the mouse. The glycan structure is then transformed into our KCF (KEGG Chemical Function) file format and sent to our program which implements an efficient tree-structure alignment algorithm, similar to sequence alignment algorithms but for branched tree structures. Users can also retrieve glycan tree structures in KCF format from their local computers for visualization over the web. The tree-matching algorithm provides several options for performing different types of tree-matching procedures on glycans. These options consist of whether to incorporate gaps in a match, whether to take the linkage information into consideration and local versus global alignment. The results of this program are returned as a list of glycan structures in order of similarity based on these options. The actual alignment can be viewed graphically, and the annotation information can also be viewed easily since all this information is linked with KEGG's comprehensive suite of genomic data. Analogously to BLAST, users are thus able to compare glycan structures of interest with glycans from different glycan databases using a variety of tree-alignment options. KCaM is currently available at http://glycan.genome.ad.jp. Citation for the above abstract: Aoki, Kiyoko F., Yamaguchi, Atsuko, Ueda, Nobuhisa, Akutsu, Tatsuya, Mamitsuka, Hiroshi, Goto, Susumu, Kanehisa, Minoru KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains Nucl. Acids Res. 2004 32: W267-272 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_2/W267 |
| 509. GlycoSuiteDB |
URL: https://tmat.proteomesystems.com/glycosuite/ Categories: Carbohydrate Structure Databases GlycoSuiteDB is an annotated and curated relational database of glycan structures reported in the literature. It contains information on the glycan type, core type, linkages and anomeric configurations, mass, composition and the analytical methods used by the researchers to determine the glycan structure. Native and recombinant sources are detailed, including species, tissue and/or cell type, cell line, strain, life stage, disease, and if known the protein to which the glycan structures are attached. There are links to SWISS-PROT/TrEMBL and PubMed where applicable. Recent developments include the implementation of searching by 2D structure and substructure, disease and reference. The database is updated twice a year, and now contains over 7650 entries. Access to GlycoSuiteDB is available at http://www.glycosuite.com. Citation for the above abstract: Cooper, Catherine A., Joshi, Hiren J., Harrison, Mathew J., Wilkins, Marc R., Packer, Nicolle H. GlycoSuiteDB: a curated relational database of glycoprotein glycan structures and their biological sources. 2003 update Nucl. Acids Res. 2003 31: 511-513 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/511 |
| 510. Monosaccharide Browser |
URL: http://www.jonmaber.demon.co.uk/monosaccharide/ Categories: Carbohydrate Structure Databases "The monosaccharide browser allows you to view space filling Fischer projections of monosaccharides. You can edit the structure and discover the correct name or you can select names from the classified index to discover the structure. The structure can be edited by choosing between aldose/ketose, number of carbon atoms between 3 and 6 and by clicking on carbon atoms to alter chirality." |
| 511. SWEET-DB |
URL: http://www.dkfz-heidelberg.de/spec2/sweetdb/ Categories: Carbohydrate Structure Databases Complex carbohydrates are known as mediators of complex cellular events. Concerning their structural diversity, their potential of information content is several orders of magnitude higher in a short sequence than any other biological macromolecule. SWEET-DB (http://www.dkfz.de/spec2/sweetdb/) is an attempt to use modern web techniques to annotate and/or cross-reference carbohydrate-related data collections which allow glycoscientists to find important data for compounds of interest in a compact and well-structured representation. Currently, reference data taken from three data sources can be retrieved for a given carbohydrate (sub)structure. The sources are CarbBank structures and literature references (linked to NCBI PubMed service), NMR data taken from SugaBase and 3D co-ordinates generated with SWEET-II. The main purpose of SWEET-DB is to enable an easy access to all data stored for one carbohydrate structure entering a complete sequence or parts thereof. Access to SWEET-DB contents is provided with the help of separate input spreadsheets for (sub)structures, bibliographic data, general structural data like molecular weight, NMR spectra and biological data. A detailed online tutorial is available at http://www.dkfz.de/spec2/sweetdb/nar/. Citation for the above abstract: Lo{beta}, Alexander, Bunsmann, Peter, Bohne, Andreas, Lo{beta}, Annika, Schwarzer, Eberhard, Lang, Elke, von der Lieth, Claus-W. SWEET-DB: an attempt to create annotated data collections for carbohydrates Nucl. Acids Res. 2002 30: 405-408 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/405 |
| 512. Plant Genome Central |
URL: http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html Categories: General Plant Databases "The plant genomic effort has one technical hurdle relative to other genomic efforts. The range of plant genome size is very large extending from approximately the same size as the genome of many small animals to more than five times as large as the human genome. For each organism, the number of chromosomes is indicated by the integer within brackets preceding the Linnaean binomial name of the organism. For the Large-Scale Sequencing Projects and the Genetic Maps groups, the organism name links to the Map View overview page for that organism. For the Large-Scale EST Sequencing Projects group, the organism name links to the taxonomy page at NCBI. The Map View of a particular chromosome or organelle is accessed by following the link for that chromosome or organelle." |
| 513. AANT: Amino Acid-Nucleotide Interaction Database |
URL: http://aant.icmb.utexas.edu/ Categories: Small Molecule Structure Databases We have created an Amino Acid-Nucleotide Interaction Database (AANT; http://aant.icmb.utexas. edu/) that categorizes all amino acid-nucleotide interactions from experimentally determined protein-nucleic acid structures, and provides users with a graphic interface for visualizing these interactions in aggregate. AANT accomplishes this by extracting individual amino acid-nucleotide interactions from structures in the Protein Data Bank, combining and superimposing these interactions into multiple structure files (e.g. 20 amino acids x 5 nucleotides) and grouping structurally similar interactions into more readily identifiable clusters. Using the Chime web browser plug-in, users can view 3D representations of the superimpositions and clusters. The unique collection and representation of data on amino acid-nucleotide interactions facilitates understanding the specificity of protein-nucleic acid interactions at a more fundamental level, and allows comparison of otherwise extremely disparate sets of structures. Moreover, by modularly representing the fundamental interactions that govern binding specificity it may prove possible to better engineer nucleic acid binding proteins. Citation for the above abstract: Hoffman, Michael M., Khrapov, Maksim A., Cox, J. Colin, Yao, Jianchao, Tong, Lingnan, Ellington, Andrew D. AANT: the Amino Acid-Nucleotide Interaction Database Nucl. Acids Res. 2004 32: D174-181 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D174 |
| 514. ChEBI: Chemical Entities of Biological Interest |
URL: http://www.ebi.ac.uk/chebi/ Categories: Small Molecule Structure Databases Molecular biologists tend to focus on genes and proteins, but small molecules are equally important to life. The EBI's most recently launched database bridges the gap between the world of proteins and that of small molecules. Called ChEBI (Chemical Entities of Biological Interest, www.ebi.ac.uk/chebi), it catalogues small molecules, atoms, ions, ion pairs, radicals and other small chemical entities. ChEBI combines information on small molecular entities from three main sources to create a non-redundant resource: small molecules from the EBI's IntEnz database of enzymes (35), the COMPOUND database from the Kyoto Encyclopedia of Genes and Genomes (KEGG) (36) and the Chemical Ontology (http://cvs.sourceforge.net/viewcvs.py/obo/obo/ontology/biochemical/). The Chemical Ontology makes ChEBI uniquely powerful because it allows relationships between molecular entities or classes of entities to be recorded in a defined way. Each entity in the database is described in terms of its chemistry and, where known, its broad biological function. For example, FAD is described as a flavin adenine dinucleotide (chemistry) and as a cofactor (function). Synonyms for each entity are listed and searchable. ChEBI also defines the relationships between macromolecules and small molecular entities: there are cross-links to every protein in the UniProt protein knowledgebase that is documented to interact with each entity. Citation for the above excerpt: Brooksbank, Catherine, Cameron, Graham, Thornton, Janet The European Bioinformatics Institute's data resources: towards systems biology Nucl. Acids Res. 2005 33: D46-53 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D46 |
| 515. PDB-Ligand |
URL: http://www.idrtech.com/PDB-Ligand/ Categories: Small Molecule Structure Databases PDB-Ligand (http://www.idrtech.com/PDB-Ligand/) is a three-dimensional structure database of small molecular ligands that are bound to larger biomolecules deposited in the Protein Data Bank (PDB). It is also a database tool that allows one to browse, classify, superimpose and visualize these structures. As of May 2004, there are about 4870 types of small molecular ligands, experimentally determined as a complex with protein or DNA in the PDB. The proteins that a given ligand binds are often homologous and present the same binding structure to the ligand. However, there are also many instances wherein a given ligand binds to two or more unrelated proteins, or to the same or homologous protein in different binding environments. PDB-Ligand serves as an interactive structural analysis and clustering tool for all the ligand-binding structures in the PDB. PDB-Ligand also provides an easier way to obtain a number of different structure alignments of many related ligand-binding structures based on a simple and flexible ligand clustering method. PDB-Ligand will be a good resource for both a better interpretation of ligand-binding structures and the development of better scoring functions to be used in many drug discovery applications. Citation for the above abstract: Shin, Jae-Min, Cho, Doo-Ho PDB-Ligand: a ligand database based on PDB for the automated and customized classification of ligand-binding structures Nucl. Acids Res. 2005 33: D238-241 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D238 |
| 516. PubChem |
URL: http://pubchem.ncbi.nlm.nih.gov/ Categories: Small Molecule Structure Databases "PubChem contains the chemical structures of small organic molecules and information on their biological activities. PubChem Substance: Search PubChem/Substance using text, e.g. substance name, keyword, synonym, external ID, formula, SID, etc. PubChem Compound: Search PubChem/Compound using text terms including name, synonym, keyword, external ID, CID, formula, etc. PubChem BioAssay: Search PubChem/BioActivity database using text terms such as cell name, protocol keyword, etc. PubChem Structure Search: Search PubChem/Compound using chemical structure. Structure may be specified using SMILES, MOL file, molecular formula, etc. PubChem is intended to support the Molecular Libraries and Imaging component of the NIH Roadmap Initiative. PubChem's chemical structure database may be searched on the basis of descriptive terms, chemical properties, and structural similarity. When possible, PubChem's chemical structure records are linked to other NCBI databases. These include the PubMed scientific literature database, for example, and NCBI's protein 3D structure database. PubChem also contains the results of high-throughput biological screening experiments. PubChem is organized as three linked databases within the Entrez/PubMed information retrieval system. These are PubChem Substance, PubChem Compound, and PubChem BioAssay. PubChem also provides PubChem Structure Search, a fast chemical structure similarity search tool that links to the PubChem Compound and PubChem Substance databases. More information about using each component database may be found by links above. A PubChem FTP site is also available." |
| 517. ToxoDB: The Toxoplasma gondii Genome Database |
URL: http://toxodb.org/ Categories: Unicellular Eukaryote Genome Databases ToxoDB (http://ToxoDB.org) provides a genome resource for the protozoan parasite Toxoplasma gondii. Several sequencing projects devoted to T. gondii have been completed or are in progress: an EST project (http://genome.wustl.edu/est/index.php?toxoplasma=1), a BAC clone end-sequencing project (http://www.sanger.ac.uk/Projects/T_gondii/) and an 8X random shotgun genomic sequencing project (http://www.tigr.org/tdb/e2k1/tga1/). ToxoDB was designed to provide a central point of access for all available T. gondii data, and a variety of data mining tools useful for the analysis of unfinished, un-annotated draft sequence during the early phases of the genome project. In later stages, as more and different types of data become available (microarray, proteomic, SNP, QTL, etc.) the database will provide an integrated data analysis platform facilitating user-defined queries across the different data types. Citation for the above abstract: Kissinger, Jessica C., Gajria, Bindu, Li, Li, Paulsen, Ian T., Roos, David S. ToxoDB: accessing the Toxoplasma gondii genome Nucl. Acids Res. 2003 31: 234-236 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/234 |
| 518. ApiDots |
URL: http://www.cbil.upenn.edu/apidots/ Categories: Unicellular Eukaryote Genome Databases ApiEST-DB (http://www.cbil.upenn.edu/paradbs-servlet/) provides integrated access to publicly available EST data from protozoan parasites in the phylum Apicomplexa. The database currently incorporates a total of nearly 100,000 ESTs from several parasite species of clinical and/or veterinary interest, including Eimeria tenella, Neospora caninum, Plasmodium falciparum, Sarcocystis neurona and Toxoplasma gondii. To facilitate analysis of these data, EST sequences were clustered and assembled to form consensus sequences for each organism, and these assemblies were then subjected to automated annotation via similarity searches against protein and domain databases. The underlying relational database infrastructure, Genomics Unified Schema (GUS), enables complex biologically based queries, facilitating validation of gene models, identification of alternative splicing, detection of single nucleotide polymorphisms, identification of stage-specific genes and recognition of phylogenetically conserved and phylogenetically restricted sequences. Citation for the above abstract: Li, Li, Crabtree, Jonathan, Fischer, Steve, Pinney, Deborah, Stoeckert, Christian J., Jr, Sibley, L. David, Roos, David S. ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites Nucl. Acids Res. 2004 32: D326-328 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D326 |
| 519. DictyBase |
URL: http://dictybase.org/ Categories: Unicellular Eukaryote Genome Databases dictyBase (http://dictybase.org) is the model organism database (MOD) for the social amoeba Dictyostelium discoideum. The unique biology and phylogenetic position of Dictyostelium offer a great opportunity to gain knowledge of processes not characterized in other organisms. The recent completion of the 34 MB genome sequence, together with the sizable scientific literature using Dictyostelium as a research organism, provided the necessary tools to create a well-annotated genome. dictyBase has leveraged software developed by the Saccharomyces Genome Database and the Generic Model Organism Database project. This has reduced the time required to develop a full-featured MOD and greatly facilitated our ability to focus on annotation and providing new functionality. We hope that manual curation of the Dictyostelium genome will facilitate the annotation of other genomes. Citation for the above abstract: Chisholm, Rex L., Gaudet, Pascale, Just, Eric M., Pilcher, Karen E., Fey, Petra, Merchant, Sohel N., Kibbe, Warren A. dictyBase, the model organism database for Dictyostelium discoideum Nucl. Acids Res. 2006 34: D423-427 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D423 |
| 520. CryptoDB: The Cryptosporidium Genome Resource |
URL: http://cryptodb.org/ Categories: Unicellular Eukaryote Genome Databases The database, CryptoDB (http://CryptoDB.org), is a community bioinformatics resource for the AIDS-related apicomplexan-parasite, Cryptosporidium. CryptoDB integrates whole genome sequence and annotation with expressed sequence tag and genome survey sequence data and provides supplemental bioinformatics analyses and data-mining tools. A simple, yet comprehensive web interface is available for mining and visualizing the data. CryptoDB is allied with the databases PlasmoDB and ToxoDB via ApiDB, an NIH/NIAID-fundedBioinformatics Resource Center. Recent updates to CryptoDB include the deposition of annotated genome sequences for Cryptosporidium parvum and Cryptosporidium hominis, migration to a relational database (GUS), a new query and visualization interface and the introduction of Web services. Citation for the above abstract: Heiges, Mark, Wang, Haiming, Robinson, Edward, Aurrecoechea, Cristina, Gao, Xin, Kaluskar, Nivedita, Rhodes, Philippa, Wang, Sammy, He, Cong-Zhou, Su, Yanqi, Miller, John, Kraemer, Eileen, Kissinger, Jessica C. CryptoDB: a Cryptosporidium bioinformatics resource update Nucl. Acids Res. 2006 34: D419-422 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D419 |
| 521. Full-Malaria: Malaria Full-Length cDNA Database |
URL: http://fullmal.ims.u-tokyo.ac.jp/ Categories: Unicellular Eukaryote Genome Databases Full-malaria (http://fullmal.ims.u-tokyo.ac.jp), a database for full-length cDNAs from the human malaria parasite, Plasmodium falciparum has been updated in at least three points. (i) We added 8934 sequences generated from the addition of new libraries, so that our collection of 11,424 full-length cDNAs covers 1375 (25%) of the estimated number of the entire 5409 parasite genes. (ii) All of our full-length cDNAs and GenBank EST sequences were mapped to genomic sequences together with publicly available annotated genes and other predictions. This precisely determined the gene structures and positions of the transcriptional start sites, which are indispensable for the identification of the promoter regions. (iii) A total of 4257 cDNA sequences were newly generated from murine malaria parasites, Plasmodium yoelii yoelii. The genome/cDNA sequences were compared at both nucleotide and amino acid levels, with those of P.falciparum, and the sequence alignment for each gene is presented graphically. This part of the database serves as a versatile platform to elucidate the function(s) of malaria genes by a comparative genomic approach. It should also be noted that all of the cDNAs represented in this database are supported by physical cDNA clones, which are publicly and freely available, and should serve as indispensable resources to explore functional analyses of malaria genomes. Citation for the above abstract: Watanabe, Junichi, Suzuki, Yutaka, Sasaki, Masahide, Sugano, Sumio Full-malaria 2004: an enlarged database for comparative studies of full-length cDNAs of malaria parasites, Plasmodium species Nucl. Acids Res. 2004 32: D334-338 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D334 |
| 522. GeneDB |
URL: http://www.genedb.org/ Categories: Prokaryote Databases, Unicellular Eukaryote Genome Databases GeneDB (http://www.genedb.org/) is a genome database for prokaryotic and eukaryotic organisms. The resource provides a portal through which data generated by the Pathogen Sequencing Unit at the Wellcome Trust Sanger Institute and other collaborating sequencing centres can be made publicly available. It combines data from finished and ongoing genome and expressed sequence tag (EST) projects with curated annotation, that can be searched, sorted and downloaded, using a single web based resource. The current release stores 11 datasets of which six are curated and maintained by biologists, who review and incorporate information from the scientific literature, public databases and the respective research communities. Citation for the above abstract: Hertz-Fowler, Christiane, Peacock, Chris S., Wood, Valerie, Aslett, Martin, Kerhornou, Arnaud, Mooney, Paul, Tivey, Adrian, Berriman, Matthew, Hall, Neil, Rutherford, Kim, Parkhill, Julian, Ivens, Alasdair C., Rajandream, Marie-Adele, Barrell, Bart GeneDB: a resource for prokaryotic and eukaryotic organisms Nucl. Acids Res. 2004 32: D339-343 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D339 |
| 523. PlasmoDB: The Plasmodium Genome Resource |
URL: http://plasmodb.org/ Categories: Unicellular Eukaryote Genome Databases PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates the recently completed P. falciparum genome sequence and annotation, as well as draft sequence and annotation emerging from other Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for intra- and inter-species comparisons. Sequence information is integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects and proteomics studies. The relational schema used to build PlasmoDB, GUS (Genomics Unified Schema) employs a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically-based, queries of the database. A stand-alone version of the database is also available on CD-ROM (P. falciparum GenePlot), facilitating access to the data in situations where internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to facilitate utilization of the vast quantities of genomic-scale data produced by the global malaria research community. The software used to develop PlasmoDB has been used to create a second Apicomplexan parasite genome database, ToxoDB (http://ToxoDB.org). Citation for the above abstract: Bahl, Amit, Brunk, Brian, Crabtree, Jonathan, Fraunholz, Martin J., Gajria, Bindu, Grant, Gregory R., Ginsburg, Hagai, Gupta, Dinesh, Kissinger, Jessica C., Labo, Philip, Li, Li, Mailman, Matthew D., Milgram, Arthur J., Pearson, David S., Roos, David S., Schug, Jonathan, Stoeckert, Christian J., Jr, Whetzel, Patricia PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data Nucl. Acids Res. 2003 31: 212-215 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/212 |
| 524. TcruziDB: The Trypanasoma cruzi Genome Resource |
URL: http://tcruzidb.org/ Categories: Unicellular Eukaryote Genome Databases TcruziDB (http://TcruziDB.org) is an integrated post-genomics database for the parasitic organism, Trypanosoma cruzi, the causative agent of Chagas' disease. TcruziDB was established in 2003 as a flat-file database with tools for mining the unannotated sequence reads and preliminary contig assemblies emerging from the Tri-Tryp genome consortium (TIGR/SBRI/Karolinska). Today, TcruziDB houses the recently published assembled genomic contigs and annotation provided by the genome consortium in a relational database supported by the Genomics Unified Schema (GUS) architecture. The combination of an annotated genome and a relational architecture has facilitated the integration of genomic data with expression data (proteomic and EST) and permitted the construction of automated analysis pipelines. TcruziDB has accepted, and will continue to accept the deposition of genomic and functional genomic datasets contributed by the research community. Citation for the above abstract: Aguero, Fernan, Zheng, Wenlong, Weatherly, D. Brent, Mendes, Pablo, Kissinger, Jessica C. TcruziDB: an integrated, post-genomics community resource for Trypanosoma cruzi Nucl. Acids Res. 2006 34: D428-431 © 2006 Oxford University Press. The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D428 |
| 525. ACeDB |
URL: http://www.acedb.org/ Categories: Invertebrate Databases, Model Organisms and Comparative Genomics Databases Acedb is one of the more venerable pieces of Genomics software. Acedb was originally created in 1992 by Richard Durbin and Jean Thierry-Mieg to manage the data from the Caenorhabditis elegans mapping project and subsequently the C. elegans sequencing project. From beginnings as a C. elegans-specific tool, it has been continuously developed into a flexible suite of data management, display and scripting tools providing facilities for managing and annotation mapping information and DNA and peptide sequences.This paper gives a basic overview of the Acedb suite, and step-by-step guidance on how to download and install Acedb. It is intended to take an Acedb novice to stage where they can begin to experiment and explore the facilities that are available. Citation for the above abstract: Kelley S. Getting started with Acedb. Brief Bioinform. 2000 May;1(2):131-7. © 2000 Henry Stewart Publications. The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11465024 |
| 526. DoTS: Database Of Transcribed Sequences |
URL: http://www.allgenes.org/ Categories: Model Organisms and Comparative Genomics Databases "DoTS (Database Of Transcribed Sequences) is a human and mouse transcript index created from all publicly available transcript sequences. The input sequences are clustered and assembled to form the DoTS Consensus Transcripts that comprise the index. These transcripts are assigned stable identifiers of the form DT.123456 (and are often referred to as "dots"). The transcripts are in turn clustered to form putative DoTS Genes. These are assigned stable identifiers of the form DG.1234356. The DoTS Transcripts and DoTS Genes are extensively annotated and a significant number have been manually curated. As of September 1, 2004, the DoTS annotation team has manually annotated 43,164 human and 78,054 mouse DoTS Transcripts (DTs), corresponding to 3,939 human and 7,752 mouse DoTS Genes (DGs). Use the manually annotated gene query to see the DoTS Transcripts that have been manually annotated. The focus of the DoTS project is integrating the various types of data (e.g., EST sequences, genomic sequence, expression data, functional annotation) in a structured manner which facilitates sophisticated queries that are otherwise not easy to perform. DoTS is built on the GUS Platform which includes a relational database that uses controlled vocabularies and ontologies to ensure that biologically meaningful queries can be posed in a uniform fashion." |
| 527. ArkDB |
URL: http://www.thearkdb.org Categories: Model Organisms and Comparative Genomics Databases The ARKdb genome databases provide comprehensive public repositories for genome mapping data from farmed species and other animals (http://www.thearkdb.org) providing a resource similar in function to that offered by GDB or MGD for human or mouse genome mapping data, respectively. Because we have attempted to build a generic mapping database, the system has wide utility, particularly for those species for which development of a specific resource would be prohibitive. The ARKdb genome database model has been implemented for 10 species to date. These are pig, chicken, sheep, cattle, horse, deer, tilapia, cat, turkey and salmon. Access to the ARKdb databases is effected via the World Wide Web using the ARKdb browser and Anubis map viewer. The information stored includes details of loci, maps, experimental methods and the source references. Links to other information sources such as PubMed and EMBL/GenBank are provided. Responsibility for data entry and curation is shared amongst scientists active in genome research in the species of interest. Mirror sites in the United States are maintained in addition to the central genome server at Roslin. Citation for the above abstract: Hu, Jian, Mungall, Chris, Law, Andy, Papworth, Richard, Nelson, J. Paul, Brown, Alison, Simpson, Irene, Leckie, Shirley, Burt, David W., Hillyard, Alan L., Archibald, Alan L. The ARKdb: genome databases for farmed and other animals Nucl. Acids Res. 2001 29: 106-110 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/106 |
| 528. BodyMap |
URL: http://bodymap.ims.u-tokyo.ac.jp/ Categories: Microarray Data and other Gene Expression Databases, Model Organisms and Comparative Genomics Databases BodyMap is a human and mouse gene expression database that is based on site-directed 3'-expressed sequence tags generated at Osaka University. To date, it contains more than 300 000 tag sequences from 64 human and 39 mouse tissues. For the recent release, the precise anatomical expression patterns for more than half of the human gene entries were generated by introduced amplified fragment length polymorphism (iAFLP), which is a PCR-based high-throughput expression profiling method. The iAFLP data incorporated into BodyMap describe the relative contents of more than 12 000 transcripts across 30 tissue RNAs. In addition, a newly developed gene ranking system helps users obtain lists of genes that have desired expression patterns according to their significance. BodyMap supports complete transfer of unique data sets and provides analysis that is accessible through the WWW at http://bodymap.ims.u-tokyo.ac. jp. Citation for the above abstract: Sese, Jun, Nikaidou, Hitoshi, Kawamoto, Shoko, Minesaki, Yuichi, Morishita, Shinichi, Okubo, Kousaku BodyMap incorporated PCR-based expression profiling data and a gene ranking system Nucl. Acids Res. 2001 29: 156-158 © 2001 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/156 |
| 529. ChickVD: Chicken Variation Database |
URL: http://chicken.genomics.org.cn/ Categories: Model Organisms and Comparative Genomics Databases Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DNA from domestic breeds. Using the Red Jungle Fowl genome sequence as a reference, we identified 3.1 million non-redundant DNA sequence variants. To facilitate the application of our data to avian genetics and to provide a foundation for functional and evolutionary studies, we created the 'Chicken Variation Database' (ChickVD). A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Citation for the above abstract: Wang, Jing, He, Ximiao, Ruan, Jue, Dai, Mingtao, Chen, Jie, Zhang, Yong, Hu, Yafeng, Ye, Chen, Li, Shengting, Cong, Lijuan, Fang, Lin, Liu, Bin, Li, Songgang, Wang, Jian, Burt, David W., Wong, Gane Ka-Shu, Yu, Jun, Yang, Huanming, Wang, Jun ChickVD: a sequence variation database for the chicken genome Nucl. Acids Res. 2005 33: D438-441 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D438 |
| 530. Cre Transgenic Database |
URL: http://www.mshri.on.ca/nagy/ Categories: Model Organisms and Comparative Genomics Databases A database for Cre transgenic mouse lines. |
| 531. DED: Database of Evolutionary Distances |
URL: http://warta.bio.psu.edu/DED/ Categories: Model Organisms and Comparative Genomics Databases A large database of homologous sequence alignments with good estimates of evolutionary distances can be a valuable resource for molecular evolutionary studies and phylogenetic research in particular. We recently created a database containing 159,921 transcripts from human, mouse, rat, zebrafish and fugu species. Approximately 1,000 homology groups were identified with the help of Ensembl homology evidence. At the macro-level, the database allows us to answer queries of the form: 1. What is the average k-distance between 5' untranslated regions of human and mouse? 2. List the 10 groups with the highest K(a)/K(s) ratio between mouse and rat. 3. List all identical proteins between human and rat. Researchers interested in specific proteins can use a simple web interface to retrieve the homology groups of interest, examine all pairwise distances between members of the group and study the conservation of exon-intron gene structures using a graphical interface. The database is available at http://warta.bio.psu.edu/DED/. Citation for the above abstract: Veeramachaneni, Vamsi, Makalowski, Wojciech DED: Database of Evolutionary Distances Nucl. Acids Res. 2005 33: D442-446 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D442 |
| 532. EGO: Eukaryotic Gene Orthologs |
URL: http://www.tigr.org/tdb/tgi/ego/ Categories: Model Organisms and Comparative Genomics Databases "The Eukaryotic Gene Orthologs (EGO), is a database for orthologous genes in eukaryotes. EGO is generated by pair-wise comparison between the Tentative Consensus (TC) sequences that comprise the TIGR Gene Indices from individual organisms. The reciprocal pairs of the best match were clustered into individual groups and multiple sequence alignments were displayed for each group. The EGO database can be accessed through the SEARCH function. The release notes for the current EGO can also be referenced." |
| 533. euGenes: Genomic Information for Eukaryotic Organisms |
URL: http://eugenes.org/ Categories: Model Organisms and Comparative Genomics Databases euGenes is a genome information system and database that provides a common summary of eukaryote genes and genomes, at http://iubio.bio.indiana.edu/eugenes/. Seven popular genomes are included: human, mouse, fruitfly, Caenorhabditis elegans worm, Saccharomyces yeast, Arabidopsis mustard weed and zebrafish, with more planned. This information, automatically extracted and updated from several source databases, offers features not readily available through other genome databases to bioscientists looking for gene relationships across organisms. The database describes 150 000 known, predicted and orphan genes, using consistent gene names along with their homologies and associations with a standard vocabulary of molecular functions, cell locations and biological processes. Usable whole-genome maps including features, chromosome locations and molecular data integration are available, as are options to retrieve sequences from these genomes. Search and retrieval methods for these data are easy to use and efficient, allowing one to ask combined questions of sequence features, protein functions and other gene attributes, and fetch results in reports, computable tabular outputs or bulk database forms. These summarized data are useful for integration in other projects, such as gene expression databases. euGenes provides an extensible, flexible genome information system for many organisms. Citation for the above abstract: Gilbert, Donald G. euGenes: a eukaryote genome information system Nucl. Acids Res. 2002 30: 145-148 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/145 |
| 534. GALA: Genome Alignment and Annotation Database |
URL: http://gala.cse.psu.edu/ Categories: Model Organisms and Comparative Genomics Databases We describe improvements to two databases that give access to information on genomic sequence similarities, functional elements in DNA and experimental results that demonstrate those functions. GALA, the database of Genome ALignments and Annotations, is now a set of interlinked relational databases for five vertebrate species, human, chimpanzee, mouse, rat and chicken. For each species, GALA records pairwise and multiple sequence alignments, scores derived from those alignments that reflect the likelihood of being under purifying selection or being a regulatory element, and extensive annotations such as genes, gene expression patterns and transcription factor binding sites. The user interface supports simple and complex queries, including operations such as subtraction and intersections as well as clustering and finding elements in proximity to features. dbERGE II, the database of Experimental Results on Gene Expression, contains experimental data from a variety of functional assays. Both databases are now run on the DB2 database management system. Improved hardware and tuning has reduced response times and increased querying capacity, while simplified query interfaces will help direct new users through the querying process. Links are available at http://www.bx.psu.edu/. Citation for the above abstract: Elnitski, Laura, Giardine, Belinda, Shah, Prachi, Zhang, Yi, Riemer, Cathy, Weirauch, Matthew, Burhans, Richard, Miller, Webb, Hardison, Ross C. Improvements to GALA and dbERGE II: databases featuring genomic sequence alignment, annotation and experimental results Nucl. Acids Res. 2005 33: D466-470 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D466 |
| 535. dbERGE II: Database of Experimental Results on Gene Expression |
URL: http://dberge.cse.psu.edu/menu.html Categories: Microarray Data and other Gene Expression Databases We describe improvements to two databases that give access to information on genomic sequence similarities, functional elements in DNA and experimental results that demonstrate those functions. GALA, the database of Genome ALignments and Annotations, is now a set of interlinked relational databases for five vertebrate species, human, chimpanzee, mouse, rat and chicken. For each species, GALA records pairwise and multiple sequence alignments, scores derived from those alignments that reflect the likelihood of being under purifying selection or being a regulatory element, and extensive annotations such as genes, gene expression patterns and transcription factor binding sites. The user interface supports simple and complex queries, including operations such as subtraction and intersections as well as clustering and finding elements in proximity to features. dbERGE II, the database of Experimental Results on Gene Expression, contains experimental data from a variety of functional assays. Both databases are now run on the DB2 database management system. Improved hardware and tuning has reduced response times and increased querying capacity, while simplified query interfaces will help direct new users through the querying process. Links are available at http://www.bx.psu.edu/. Citation for the above abstract: Elnitski, Laura, Giardine, Belinda, Shah, Prachi, Zhang, Yi, Riemer, Cathy, Weirauch, Matthew, Burhans, Richard, Miller, Webb, Hardison, Ross C. Improvements to GALA and dbERGE II: databases featuring genomic sequence alignment, annotation and experimental results Nucl. Acids Res. 2005 33: D466-470 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D466 |
| 536. Genetpig |
URL: http://www.infobiogen.fr/services/Genetpig Categories: Model Organisms and Comparative Genomics Databases The GENETPIG database has been established for storing and disseminating the results of the European project: 'GENETPIG: identification of genes controlling economic traits in pig'. The partners of this project have mapped about 630 porcine and human ESTs onto the pig genome. The database collects the mapping results and links them to other sources of mapping data; this includes pig maps as well as available comparative mapping information. Functional annotation of the mapped ESTs is also given when a significant similarity to cognate genes was established. The database is accessible for consultation via the Internet at http://www.infobiogen.fr/services/Genetpig/. Citation for the above abstract: Karsenty, Emmanuelle, Barillot, Emmanuel, Tosser-Klopp, Gwenola, Lahbib-Mansais, Yvette, Milan, Denis, Hatey, Francois, Cirera, Susanna, Sawera, Milena, Jorgensen, Claus B., Chowdhary, Bhanu, Fredholm, Merete, Wimmers, Klaus, Ponsuksili, Siriluck, Davoli, Roberta, Fontanesi, Luca, Braglia, Silvia, Zambonelli, Paolo, Bigi, Daniele, Neuenschwander, Stefan, Gellin, Joel The GENETPIG database: a tool for comparative mapping in pig (Sus scrofa) Nucl. Acids Res. 2003 31: 138-141 © 2003 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/138 |
| 537. GXD: Mouse Gene Expression Database |
URL: http://www.informatics.jax.org/menus/expression_menu.shtml Categories: Microarray Data and other Gene Expression Databases, Model Organisms and Comparative Genomics Databases The Gene Expression Database (GXD) is a community resource for gene expression information in the laboratory mouse. By collecting and integrating different types of expression data, GXD provides information about expression profiles in different mouse strains and mutants. Participation in the Gene Ontology (GO) project classifies genes and gene products with regard to molecular functions, biological processes, and cellular components. Integration with other Mouse Genome Informatics (MGI) databases places the gene expression information in the context of mouse genetic, genomic and phenotypic information. The integration of these types of information enables valuable insights into the molecular biology that underlies development and disease. The utility of GXD has been improved by the daily addition of new data and through the implementation of new query and display features. These improvements make it easier for users to interrogate and visualize expression data in the context of their specific needs. GXD is accessible through the MGI website at http://www.informatics.jax.org/ or directly at http://www. informatics.jax.org/menus/expression_menu.shtml. Citation for the above abstract: Hill, David P., Begley, Dale A., Finger, Jacqueline H., Hayamizu, Terry F., McCright, Ingeborg J., Smith, Constance M., Beal, Jon S., Corbani, Lori E., Blake, Judith A., Eppig, Janan T., Kadin, James A., Richardson, Joel E., Ringwald, Martin The mouse Gene Expression Database (GXD): updates and enhancements Nucl. Acids Res. 2004 32: D568-571 © 2004 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D568 |
| 538. HomoloGene |
URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene Categories: Model Organisms and Comparative Genomics Databases HomoloGene is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes. The genomes represented in the recent Build 37 of HomoloGene include Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Anopheles gambiae, Caenorhabditis elegans, Schizosaccharomyces pombe, Saccharomyces cerevisiae, Neurospora crassa, Magnaporthe grisea, Arabidopsis thaliana and Plasmodium falciparum. NCBI has adopted a new HomoloGene build procedure which is guided by the taxonomic tree, and relies on conserved gene order and measures of DNA similarity among closely related species, while making use of protein similarity for more distantly related organisms. The new computational procedure greatly increases the reliability of the computed homologous gene sets and the resulting HomoloGene entries now include paralogs in addition to orthologs. HomoloGene can be queried using Entrez (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=homologene) Among the Entrez fields unique to HomoloGene is the ‘Ancestor’ field, which refers to the taxonomic group of the last common ancestor of the species represented in a HomoloGene entry. Using the ‘Ancestor’ field it is possible to limit a search to genes conserved in one of 22 ancestral groups. HomoloGene reports include homology and phenotype information drawn from Online Mendelian Inheritance in Man (OMIM), Mouse Genome Informatics (MGI), Zebrafish Information Network (ZFIN), Saccharomyces Genome Database (SGD), Clusters of Orthologous Groups (COG) and FlyBase. Citation for the above excerpt: Wheeler, David L., Barrett, Tanya, Benson, Dennis A., Bryant, Stephen H., Canese, Kathi, Church, Deanna M., DiCuccio, Michael, Edgar, Ron, Federhen, Scott, Helmberg, Wolfgang, Kenton, David L., Khovayko, Oleg, Lipman, David J., Madden, Thomas L., Maglott, Donna R., Ostell, James, Pontius, Joan U., Pruitt, Kim D., Schuler, Gregory D., Schriml, Lynn M., Sequeira, Edwin, Sherry, Steven T., Sirotkin, Karl, Starchenko, Grigory, Suzek, Tugba O., Tatusov, Roman, Tatusova, Tatiana A., Wagner, Lukas, Yaschenko, Eugene Database resources of the National Center for Biotechnology Information Nucl. Acids Res. 2005 33: D39-45 © 2005 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D39 |
| 539. Homophila |
URL: http://superfly.ucsd.edu/homophila/ Categories: Invertebrate Databases, Model Organisms and Comparative Genomics Databases Although many human genes have been associated with genetic diseases, knowing which mutations result in disease phenotypes often does not explain the etiology of a specific disease. Drosophila melanogaster provides a powerful system in which to use genetic and molecular approaches to investigate human genetic diseases. Homophila is an intergenomic resource linking the human and fly genomes in order to stimulate functional genomic investigations in Drosophila that address questions about genetic disease in humans. Homophila provides a comprehensive linkage between the disease genes compiled in Online Mendelian Inheritance in Man (OMIM) and the complete Drosophila genomic sequence. Homophila is a relational database that allows searching based on human disease descriptions, OMIM number, human or fly gene names, and sequence similarity, and can be accessed at http://homophila.sdsc.edu. Citation for the above abstract: Chien, Samson, Reiter, Lawrence T., Bier, Ethan, Gribskov, Michael Homophila: human disease gene cognates in Drosophila Nucl. Acids Res. 2002 30: 149-151 © 2002 Oxford University Press. The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/149 |
| 540. COG: Clusters of Orthologous Groups of proteins |
URL: http://www.ncbi.nlm.nih.gov/COG/ Categories: Model Organisms and Comparative Genomics Databases BACKGROUND: The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies. RESULTS: We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and Homo sapiens), one plant, Arabidopsis thaliana, two fungi (Saccharomyces cerevisiae and Schizosaccharomyces pombe), and the intracellular microsporidian parasite Encephalitozoon cuniculi. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or approximately 54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of approximately 20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (approximately 1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes. CONCLUSION: The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies. Citation for the above abstract: Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA. The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003 Sep 11;4(1):41. © 2003 Tatusov et al. The full text of the artic |