MetaDB: A Metadatabase for the Biological Sciences

brought to you by Neurotransmitter.net
Search:    

Home

E-mail Suggestions to Shawn Thomas


List databases by category:

Results in order of relevance:

1. Glomerular Activity Response Archive for the Rat Olfactory Bulb

URL: http://leonlab.bio.uci.edu/
Categories: Neuroscience Databases

"This archive contains the averaged activity maps that we have generated from the glomerular response to selected odorants in rat olfactory bulbs, as assessed by [14]-C 2-deoxyglucose uptake. These response profiles may be searched either by the Chemical Abstracts Service Registry Number (CAS number) of the odorants, odorant name, chemical formula, or chemical features. A detailed description of the procedure used to generate the response maps is provided, along with templates for duplication of the technique. Finally, a profile of identified olfactory bulb glomerular response modules is provided. This template may be printed on a transparent page to allow individualized comparisons in response patterns among odorants to be made."



2. Brain Biodiversity Bank

URL: http://brancusi.usc.edu/bkms/
Categories: Neuroscience Databases

"The Brain Biodiversity Bank refers to the repository of images of and information about brain specimens contained in the collections associated with the National Museum of Health and Medicine at the Armed Forces Institute of Pathology in Washington, DC. These collections include, besides the Michigan State University Collection, the Welker Collection from the University of Wisconsin, the Yakovlev-Haleem Collection from Harvard University, the Meyer Collection from the Johns Hopkins University, and the Huber-Crosby and Crosby-Lauer Collections from the University of Michigan. Our purpose here is to provide some examples of ways in which images and information from the Collections, in digital electronic format, can be used in educational, research and commercial enterprises."



3. BrainInfo: A Primate Brain Information System

URL: http://braininfo.rprc.washington.edu/
Categories: Neuroscience Databases

"BrainInfo is a website that helps one identify structures in the brain and provides many different kinds of information about each structure. It consists of three basic knowledge bases: NeuroNames, which provides the index to brain structures and narrative information about them; the Template Atlas, which shows the structures that are found in the primate brain; and NeuroMaps, a set of several hundred overlays that will show the location of different kinds of information that have been mapped to the standard background maps (templates) of the Atlas. Information about brain structures in other species, particularly the human, is provided by links to other websites."



4. BrainWeb: Simulated Brain Database

URL: http://www.bic.mni.mcgill.ca/brainweb/
Categories: Neuroscience Databases

"As the interest in the computer-aided, quantitative analysis of medical image data is growing, the need for the validation of such techniques is also increasing. Unfortunately, there exists no `ground truth' or gold standard for the analysis of in vivo acquired data. These pages provide a solution to the validation problem, in the form of a Simulated Brain Database (SBD). The SBD contains a set of realistic MRI data volumes produced by an MRI simulator. These data can be used by the neuroimaging community to evaluate the performance of various image analysis methods in a setting where the truth is known.

Currently, the SBD contains simulated brain MRI data based on two anatomical models: normal and multiple sclerosis (MS). For both of these, full 3-dimensional data volumes have been simulated using three sequences (T1-, T2-, and proton-density- (PD-) weighted) and a variety of slice thicknesses, noise levels, and levels of intensity non-uniformity. These data are available for viewing in three orthogonal views (transversal, sagittal, and coronal), and for downloading."



5. CoCoDat: Collation of Cortical [single neuron + neuronal microcircuitry] Data

URL: http://www.cocomac.org/cocodat/
Categories: Neuroscience Databases

"CoCoDat is a microcircuitry database that contains not only bibliographic references, but also data and parameter values from published experimental reports. The data characterize the experimental procedures, the brain structure (region, layer, neuron type and cellular compartment), as well as the experimental results obtained in the six categories: Morphology, Firing properties, Ionic currents, Ionic conductances, Synaptic currents, and Connectivity."



6. CoCoMac

URL: http://cocomac.org/
Categories: Neuroscience Databases

"CoCoMac (Collations of Connectivity data on the Macaque brain) is our approach to produce a systematic record of the known wiring of the primate brain. The main database contains details of hundreds of tracing studies in their original descriptions. Further data are continuously added.

To overcome the problem of divergent brain maps we developed ORT (Objective Relational Transformation), an algorithmic method to convert data in a coordinate- independent way based on logical relations between areas in different brain maps.

We use CoCoMac data to analyse the organisation of the cerebral cortex, and to establish its structure- function relationships. This includes multi-variate statistics and computer simulation of models that take into account the real anatomy of the primate cerebral cortex."



7. The Talairach Daemon

URL: http://ric.uthscsa.edu/RIC_WWW.data/Components/talairach/talairachdaemon.html
Categories: Neuroscience Databases

"The Talairach Daemon (TD) is a high-speed database server for querying and retrieving data about human brain structure over the internet. The core components of this server are a unique memory-resident application and memory-resident databases. The memory-resident design of the TD server provides high-speed access to its data. This is supported by using TCP/IP sockets for communications and by minimizing the amount of data transferred during transactions. By keeping most transactions to a low number of bytes (less than 50 generally), even slow throughput network transfers (1 Kbyte/sec) should have reasonable response times.

The TD server data is searched using x-y-z coordinates resolved to 1x1x1 mm volume elements within a standardized stereotaxic space. An array, indexed by x-y-z coordinates, that spans 170 mm (x), 210 mm (y) and 200 mm (z), provides high-speed access to data. Array dimensions were selected to be approximately 25% larger than those of the Co-planar Stereotaxic Atlas of the Human Brain (Talairach and Tournoux, 1988). Coordinates tracked by the TD server are spatially consistent with the Talairach Atlas. Each array location stores a pointer to a relation record that holds data describing what is present at the corresponding coordinate. Presently, the data in relation records are either Structure Probability Maps (SP Maps) or Talairach Atlas Labels, though others can be easily added. The relation records are implemented as linked lists to names and values for brain structures."



8. International Consortium for Brain Mapping (ICBM) Subject Database

URL: https://services.loni.ucla.edu/ida/login.jsp?project=ICBM&search=true
Categories: Neuroscience Databases

"The ICBM Subject Database has been constructed to provide an effective means for archival and protection of collaborator collected image data. The goal of this software is to provide a convenient mechanism for searching the existence of particular image data while protecting its usage at the same time. We have built the appropriate database query mechanisms to ensure that no image data or identifying patient information is accessible to the outside world or to any others without the appropriate authorization and the expressed permission to release data from the collaborator that acquired and provided the data.

The ICBM Subject Database may be queried using a combination of demographic and image-related attributes. Authorized investigations may form collections of images to download."



9. The GENESIS Neural Database and Modeler's Workspace ChannelDB

URL: http://www.genesis-sim.org/hbp/
Categories: Neuroscience Databases

"A realistic neuronal model represents a modeler's understanding of the structure and function of a part of the nervous system. As the number of neurobiologists constructing realistic models continues to grow, and as the models become ever more sophisticated, they collectively represent a significant accumulation of knowledge about the structural and functional organization of nervous systems. But at the same time, locating appropriate models and interpreting them becomes increasingly more difficult as the number of online model and experimental databases grows. The central motivation for the Modeler's Workspace project is to address these problems.

With support from The Human Brain Project, we began by exploring the construction of a brain database based on our existing neural simulation system, GENESIS. This was a feasibility study for a novel approach to neural database construction, organization, and interaction.

The Modeler's Workspace was originally conceived as the user interface to this system. As the design has evolved, the creation of a next-generation interface for collaborative neural simulations has become our goal. Although the initial version uses GENESIS as the simulator, the design permits the use of multiple simulation systems, with or without the use of a database. This allows modeling at multiple levels of scale from the molecular level, through the subcellular (e.g. ion channel), single cell, and network levels, to the systems level (e.g. relating models to fMRI studies).

The Modeler's Workspace is a collection of software tools that enable users to interact over the WWW with databases of models and data. It provides facilities for: searching multiple remote databases for model components based on various criteria; visualizing the characteristics of the components retrieved; creating new components, either from scratch or derived from existing models; combining components into new models; linking models to experimental data as well as online publications; and interacting with simulation packages such as GENESIS to simulate the new constructs.
...
We are now in the phase of implementing the core components of the Modelers Workspace (MWS). The first of these is ChannelDB, an implementation of a database of ionic conductance models stored in simulator-independent NeuroML format.

At present, ChannelDB is implemented as a stand-alone module, with its own graphical user interface to the database, which is implemented with MySQL. After further development, the ChannelDB GUI will be merged into the MWS."



10. Identified Neuron Database Project

URL: http://n002bsel.bios.uic.edu/
Categories: Neuroscience Databases

"NEUROPAD is a database of identified insect neurons. It was developed during research studies of insect nervous systems*. It is available on this web site as Version 3.0.1 which supersedes the earlier releases 1.1 and 2.0 (Version 2.0 also is available free to anyone interested). NEUROPAD allows structural information about nerve cells to be mapped into an idealized plan of the central nervous system [CNS] and it stores that information along with 1) relevant physiological and behavioral observations, and 2) reproductions of the original anatomical description of each cell, and other relevant data from peer reviewed publications.

NEUROPAD was designed with orthopteroid insects (such as crickets and cockroaches) in mind. However, it contains information on cells from a number of different insect species, with slightly different CNS organizational schemes, such as differing numbers of ganglia."



11. SumsDB: Surface Management Systems DataBase

URL: http://brainmap.wustl.edu:8081/sums/index.jsp
Categories: Neuroscience Databases

"SumsDB (Surface Management Systems DataBase) is a repository of brain-mapping data developed in the Van Essen laboratory. It emphasizes cortical surface-based representations, but also contains whole-brain volume data.

SumsDB includes surface-based atlases of cerebral and cerebellar cortex in primates (human, macaque) and rodents (mouse, rat). Many types of experimental data pertaining to cortical structure and function can be viewed on these atlases. SumsDB also contains extensive data from individual experimental hemispheres."



12. IBSR: Internet Brain Segmentation Repository

URL: http://www.cma.mgh.harvard.edu/ibsr/
Categories: Neuroscience Databases

"Its purpose is to encourage the development and evaluation of segmentation methods by providing raw test and image data, human expert segmentation results, and methods for comparing segmentation results.
...
This repository is meant to contain standard test image data sets which will permit a standardized mechanism for evaluation of the sensitivity of a given analysis method to signal to noise ratio, contrast to noise ratio, shape complexity, degree of partial volume effect, etc. This capability is felt to be essential to further development in the field since many published algorithms tend to only operate successfully under a narrow range of conditions which may not extend to those experienced under the typical clinical imaging setting. This repository is also meant to describe and discuss methods for the comparison of results."



13. IBVD: Internet Brain Volume Database

URL: http://www.cma.mgh.harvard.edu/ibvd/
Categories: Neuroscience Databases

"The goal of IBVD is to provide a web-based searchable database of brain neuroanatomic volumetric observations. This is designed to access both group volumetric results as well as volume observations in individual cases. A major thrust effort is to enable electronic access to the results that exist in the published literature. Currently, there is quite limited electronic or searchable methods for the data observations that are contained in publications. This effort will facilitate the disemination of volumetric observations by making a more complete corpus of volumetric observations findable to the neuroscience researcher. This also enhances the ability to perform comparative and integrative studies, as well as metaanalysis. Extensions that permit pre-published, non-published and other representation are planned, again to facilitate comparitive analyses."



14. CellPropDB: Cellular Properties Database

URL: http://senselab.med.yale.edu/senselab/CellPropDB/
Categories: Neuroscience Databases

"Cellular Properties Database (CellPropDB) provides a simple repository for data regarding membrane channels, receptor and neurotransmitters that are expressed in specific types of cells. The database is presently focused on neurons but will eventually include other cell types, such as glia, muscle, and gland cells."



15. NeuronDB: Neuron Database

URL: http://senselab.med.yale.edu/senselab/NeuronDB/
Categories: Neuroscience Databases

"NeuronDB provides a dynamically searchable database of three types of neuronal properties: voltage gated conductances, neurotransmitter receptors, and neurotransmitter substances. It contains tools that provide for integration of these properties in a given type of neuron and compartment, and for comparison of properties across different types of neurons and compartments."



16. ModelDB: Model Database

URL: http://senselab.med.yale.edu/senselab/ModelDB/
Categories: Neuroscience Databases

"ModelDB provides an accessible location for storing and efficiently retrieving compartmental neuron models. ModelDB is tightly coupled with NeuronDB. Models can be coded in any language for any environment, though ModelDB has been initially constructed for use with NEURON and GENESIS."



17. OdorDB: Odor Molecule Database

URL: http://senselab.med.yale.edu/senselab/OdorDB/
Categories: Microarray Data and other Gene Expression Databases

"Odor molecule Database (OdorDB) contains data on the odor molecules that have been shown to interact with different olfactory receptors. It is aimed at helping to solve the unprecedented problem of identifying the preferred odor ligands among thousands of potential molecules for the hundreds of different olfactory receptors."



18. OdorMapDB: Olfactory Bulb Odor Map DataBase

URL: http://senselab.med.yale.edu/senselab/OdorMapDB/default.asp
Categories: Neuroscience Databases

"OdorMapDB is designed to be a database to support the experimental analysis of the molecular and functional organization of the olfactory bulb and its basis for the perception of smell. It is primarily concerned with archiving, searching and analysing maps of the olfactory bulb generated by different methods. The first aim is to facilitate comparison of activity patterns elicited by odor stimulation in the glomerular layer obtained by different methods in different species. It is further aimed at facilitating comparison of these maps with molecular maps of the projections of olfactory receptor neuron subsets to different glomeruli, especially for gene targeted animals and for antibody staining."



19. GENSAT Bacterial Artificial Chromosome Transgenics Project

URL: http://www.gensat.org/index.html
Categories: Neuroscience Databases

"The Gensat database contains a gene expression atlas of the central nervous system of the mouse based on bacterial artificial chromosomes (BACs). In each of the BAC transgenic vectors, endogenous protein coding sequences have been replaced by sequences encoding the EGFP reporter gene. As in any gene replacement experiment, the stability of the reporter gene can vary somewhat from the endogenous gene. Thus these results measure the relative rates of transcription for each gene; they are not a direct measure of mRNA accumulation or of protein abundance for the endogenous gene products. Furthermore, the enhanced sensitivity of reporter gene assays, particularly in BAC lines carrying multiple copies of the BAC transgene may allow detection of sites of expression that are not evident in situ hybridization experiments. This database contains histological data from given BAC transgenic mouse lines at three developmental stages – embryonic day 15.5 (E15.5), postnatal day 7 (P7) and adult; in all cases the data represent results of multiple transgenic lines. EGFP is visualized by staining with an anti-EGFP antibody using the DAB method, or by confocal microscopy of unstained tissue sections. Protocols for the modification of BACs, BAC transgenesis production and histology are provided."



20. National Brain Databank: Brain Tissue Gene Expression Repository

URL: http://132.183.217.124/brainbank/index.jsp
Categories: Neuroscience Databases

"The idea of creating a National Brain Databank has been on the agenda for the Harvard Brain Tissue Resource Center (HBTRC) for several years and the National Institute of Neurological Disease and Stroke (NINDS) and the National Institute of Mental Health (NIMH) has funded the implementation of this proposal. Since July 2003, the HBTRC initiated development of the National Brain Databank in conjunction with Akaza Research a biomedical informatics consulting firm based in Cambridge, MA. The system was developed using the Java J2EE application platform and the PostgreSQL database. It is designed to incorporate MIAME and MAGE-ML based microarray data sharing standards in the future. The initial version of the National Brain Databank is publicly released in April 2004 and continues to be further developed, based on ongoing usage and feedback from users.

All of the data that is derived from studies of the HBTRC collection is being incorporated into the National Brain Databank. This data is available to the general public, although strict precautions are undertaken to maintain the confidentiality of the brain donors and their family members. These precautions include the use of anonymized numbers and restricted access to demographic information. For professional scientists who will require access to confidential information to complete their studies, a username and access code will be made available, after they have reviewed the HIPAA requirements and have agreed to abide by them.

Data from various types of studies conducted on brain tissue in the HBTRC collection will be available from studies using different technologies, such as gene expression profiling, quantitative RT-PCR, situ hybridization, and immunocytochemistry and will have the potential for providing powerful insights into the subregional and cellular distribution of genes and/or proteins in different brain regions and eventually in specific subregions and cellular subtypes. All qualified investigators who would like to gain access to more detailed information regarding the subjects (including diagnostic reports on postmortem brain tissue) must demonstrate that they are aware of the HIPAA requirements for confidentiality by reviewing information that appears in the privacy policy on this website."



21. BrainMap

URL: http://brainmap.org/
Categories: Neuroscience Databases

"BrainMap is an online database of published functional neuroimaging experiments with coordinate-based (Talairach) activation locations. The goal of BrainMap is to provide a vehicle to share methods and results of brain functional imaging studies. It is a tool to rapidly retrieve and understand studies in specific research domains, such as language, memory, attention, reasoning, emotion, and perception, and to perform meta-analyses of like studies."



22. BrainML

URL: http://brainml.org
Categories: Neuroscience Databases

"This site was created by the Laboratory of Neuroinformatics to describe BrainML and to serve as a repository for BrainML models. (A BrainML model is an XML Schema and optional vocabulary files describing a data model for electronic representation of neuroscience data, including data types, formats, and controlled vocabulary."



23. Conexus

URL: http://mallorn.ucdavis.edu/conexus/
Categories: Neuroscience Databases

"The goal of the Conexus project is to build a graphical database of neuroanatomical connections, focussing on thalamo-cortical and cortico-cortical connections in the macaque monkey. Conexus will allow users to enter data, and also to search for and analyze patterns of data gathered from different experiments, by different investigators, reported in a variety of journals, together into one unified macaque atlas. Equipped with the proper search and 3D visualization tools, it will allow students, modelers, and experimentalists to learn about the available data on neuroanatomical connections, or to compare their own findings to existing data.

Currently, the focus of the project is on the alignment tools that are necessary to visualize large numbers of immunohistologically stained sections together in 3D."



24. FME: Foundational Model Explorer

URL: http://sig.biostr.washington.edu/projects/fm/FME/index.html
Categories: Anatomy Databases

"The Foundational Model Explorer (FME) is an Internet based software application developed for viewing the content and organization of the Digital Anatomist Foundational Model (FMA). It was developed by the Structural Informatics Group at the University of Washington. The initial purpose of the FME was to provide a simple and intuitive interface to the FMA for domain experts, in the field of anatomy, participating in the evaluation of the FMA. The FME also provides an easily available method of exploring the FMA to individuals or groups considering the adoption of the Foundational Model of Anatomy knowledge base."



25. Language Map Experiment Management System

URL: http://tela.biostr.washington.edu/cgi-bin/repos/bmap_repo/main-menu.pl
Categories: Neuroscience Databases

"Bmap_repo is a web-based experiment management system for human brain mapping data. It is currently designed to manage language map data acquired during neurosurgery for tumors or intractable epilepsy, and during MR functional imaging studies. We are working to generalize these methods so that they are applicable to other brain mapping applications.
...
Bmap_repo permits web-based collaborative experimental data management, currently among investigators in different departments at the University of Washington. The data are primarily obtained from patients of George Ojemann in Neurosurgery, as a result of cortical stimulation language mapping (CSM), which is performed to plan the operation for intractable epilepsy or tumors. The imaging protocols were designed by Ken Maravilla in Radiology, and David Corina in Psychology. The data are processed and managed under the direction of Jim Brinkley in the UW Structural Informatics Group in Biological Structure. Additional data analysis is done by members of David Corina's lab in Psychology."



26. LONI (UCLA Laboratory of Neuro Imaging) Image Database

URL: https://services.loni.ucla.edu/ida/login.jsp?search=true
Categories: Neuroscience Databases

"The LONI Image Database has been constructed to provide an effective means for archival and protection of collaborator collected image data. The goal of this software is to provide a convenient mechanism for searching the existence of particular image data while protecting its usage at the same time. We have built the appropriate database query mechanisms to ensure that no image data or identifying patient information is accessible to the outside world or to any others without the appropriate authorization and the expressed permission to release data from the collaborator that acquired and provided the data."



27. Nervenet.org: The Informatics Center for Mouse Neurogenetics

URL: http://www.nervenet.org
Categories: Brain Atlases, Neuroscience Databases

"This server hosts the Mouse Brain Library, an expanding collection of high-resolution histological images, atlases, MRIs, and databases on brain structure of more than 120 different lines of mice. Nervenet also includes several useful genetics and gene mapping databases to download (SNP databases, Map Manager databases, and the Portable Dictionary of the Mouse Genome). The publications section includes revised, expanded, and annotated papers, tutorials, and reviews on neurogenetics, gene mapping, complex trait analysis, stereology, and the control of neuron number."



28. The NeSys Database on Brain Map Transformations in Cerebellar Systems

URL: http://www.nesys.uio.no/
Categories: Neuroscience Databases

"The aim of this database is to provide structure and structure-function data about brain map transformations in cerebellar systems.

The present version is a web based archive based on data from 4 original publications and represents Project Phase 2 (reached in 2004). It includes data on the organization of projections to the pontine nuclei from three cortical areas: primary and secondary somatosensory areas (SI and SII), and the primary motor cortex (MI). Axonal tracer substances are injected into electrophysiologically defined locations in these areas, and distributions of terminal fields of labeling in the pontine nuclei are computer reconstructed in 3-D and transferred to a common, standardized coordinate system."



29. AGNS (Arabidopsis GeneNet supplementary) Database

URL: http://emj-pc.ics.uci.edu/mgs/dbases/agns/
Categories: Arabidopsis thaliana Databases

"The aim of AGNS is to create an Internet available resource accumulating the data on detailed description of the experimental results and observed expression of the Arabidopsis genes at the levels of mRNA, protein, cell, tissue and ultimately at the levels of the organ and organism and in different genotypes from annotations of published papers.

AGNS consists now of two databases, the Expression Database (ED) and the Phenotype Database (PD), and two controlled vocabularies. The ED describes gene expression in wild type, mutant and transgenic plants. The PD contains information on phenotypic abnormalities in mutant and transgenic plants. The RD contains references to the papers together with a description of plant growth conditions with an indication of the ecotypes used as control in the experiments. Both PD and ED have links to the PubMed and items in controlled vocabularies. Controlled vocabularies contain information on description of organs, tissues and cells both in the mature plant and at different developmental stages and description of developmental stages of the plant itself and of its separate organs. The most frequently used names of the stages, organs are highlighted and their synonyms are given. Every description of stages and organs is accompanied by detailed commentaries.

All AGNS data have references to the papers from which they were annotated. Thus, AGNS accumulates information on the available Arabidopsis morphology and development and gene expression patterns in the wild type and in different mutants and transgenic lines, which is systematized and compared. AGNS makes possible search for genes expressed in particular organs, at particular stages, for genes whose expression is altered in particular mutants, and for mutants having similar phenotypic abnormalities."



30. NTSA Workbench Database

URL: http://soma.npa.uiuc.edu/ntsa/
Categories: Neuroscience Databases

"The core of the database system is the 'card catalog' of the NTSA Workbench which describes the characteristics of the time series neuronal data that can be searched. This descriptive information is referred to as metadata, whereas the neuronal and behavioral time series data records themselves are referred to as raw data. The database Table Schema determines how the metadata are organized in the database. A hierarchical organization was adopted for the NTSA Workbench database Table Schema.

This design provides a natural fit to the structure of neuroscience experiments, which can in most instances be described with reference to the following hierarchy: laboratory, experiment, subject, session, series and trial. Here, laboratory refers to a specific research group (e.g., the research of Co-PI David Clayton), experiment refers to a specific study in that laboratory (e.g., of songbird auditory thalamic neuron responses to natural and scrambled species-typical songs), subject refers to the experimental animal (e.g., zebra finch1255), session denotes the specific time and setting of neuronal data collection, series refers to sets of consecutive data collection episodes within a session having similar characteristics and trial refers to the individual data collection episodes, often defined by presentation of a single stimulus. Unlimited metadata descriptors can be used at each level to specify the experimental treatments."



31. PPID: Protein-Protein Interaction Database

URL: http://www.ppid.org/
Categories: Intermolecular Interactions and Signaling Pathways Databases

"The Protein-Protein Interaction Database (PPID) was constructed to integrate a gamut of biological/bibliographical/molecular data and build a framework which might help understanding how cells orchestrate their protein content in order to become what they are: machines with a purpose. This is based on the simple paradigm that functionality like signal transduction cascades are held together in a close space, thereby allowing specific events to occur without the necessity of passive diffusion and random events."




32. DDBJ: DNA Data Bank of Japan

URL: http://www.ddbj.nig.ac.jp
Categories: Nucleotide Sequences: International Nucleotide Sequence Database Collaboration Databases

In the past year, we at DDBJ (DNA Data Bank of Japan; http://www.ddbj.nig.ac.jp) collected and released 1 066 084 entries or 718 072 425 bases including the whole chromosome 22 of chimpanzee, the whole-genome shotgun sequences of silkworm and various others. On the other hand, we hosted workshops for human full-length cDNA annotation and participated in jamborees of mouse full-length cDNA annotation. The annotated data are made public at DDBJ. We are also in collaboration with a RIKEN team to accept and release the CAGE (Cap Analysis Gene Expression) data under a new category, MGA (Mass Sequences for Genome Annotation). The data will be useful for studying gene expression control in many aspects.

Citation for the above abstract:
Tateno, Y., Saitou, N., Okubo, K., Sugawara, H., Gojobori, T.
DDBJ in collaboration with mass-sequencing teams on annotation
Nucl. Acids Res. 2005 33: D25-28
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D25



33. AluGene

URL: http://alugene.tau.ac.il/
Categories: Human Genome Databases, Maps, and Viewers, Nucleotide Sequences: Coding and Non-coding DNA Databases

Alu elements are short interspersed elements (SINEs) 300 nucleotides in length. More than 1 million Alus are found in the human genome. Despite their being genetically functionless, recent findings suggest that Alu elements may have a broad evolutionary impact by affecting gene structures, protein sequences, splicing motifs and expression patterns. Because of these effects, compiling a genomic database of Alu sequences that reside within protein-coding genes seemed a useful enterprise. Presently, such data are limited since the structural and positional information on genes and Alu sequences are scattered throughout incompatible and unconnected databases. AluGene (http://Alugene.tau.ac.il/) provides easy access to a complete Alu map of the human genome, as well as Alu-associated information. The Alu elements are annotated with respect to coding region and exon/intron location. This design facilitates queries on Alu sequences, locations, as well as motifs and compositional properties via a one-stop search page.

Citation for the above abstract:
Dagan, Tal, Sorek, Rotem, Sharon, Eilon, Ast, Gil, Graur, Dan
AluGene: a database of Alu elements incorporated within protein-coding genes
Nucl. Acids Res. 2004 32: D489-492
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D489



34. EMBL Nucleotide Sequence Database

URL: http://www.ebi.ac.uk/embl/
Categories: Nucleotide Sequences: International Nucleotide Sequence Database Collaboration Databases

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl), maintained at the European Bioinformatics Institute (EBI) near Cambridge, UK, is a comprehensive collection of nucleotide sequences and annotation from available public sources. The database is part of an international collaboration with DDBJ (Japan) and GenBank (USA). Data are exchanged daily between the collaborating institutes to achieve swift synchrony. Webin is the preferred tool for individual submissions of nucleotide sequences, including Third Party Annotation (TPA) and alignments. Automated procedures are provided for submissions from large-scale sequencing projects and data from the European Patent Office. New and updated data records are distributed daily and the whole EMBL Nucleotide Sequence Database is released four times a year. Access to the sequence data is provided via ftp and several WWW interfaces. With the web-based Sequence Retrieval System (SRS) it is also possible to link nucleotide data to other specialist molecular biology databases maintained at the EBI. Other tools are available for sequence similarity searching (e.g. FASTA and BLAST). Changes over the past year include the removal of the sequence length limit, the launch of the EMBLCDSs dataset, extension of the Sequence Version Archive functionality and the revision of quality rules for TPA data.

Citation for the above abstract:
Kanz, Carola, Aldebert, Philippe, Althorpe, Nicola, Baker, Wendy, Baldwin, Alastair, Bates, Kirsty, Browne, Paul, van den Broek, Alexandra, Castro, Matias, Cochrane, Guy, Duggan, Karyn, Eberhardt, Ruth, Faruque, Nadeem, Gamble, John, Diez, Federico Garcia, Harte, Nicola, Kulikova, Tamara, Lin, Quan, Lombard, Vincent, Lopez, Rodrigo, Mancuso, Renato, McHale, Michelle, Nardone, Francesco, Silventoinen, Ville, Sobhany, Siamak, Stoehr, Peter, Tuli, Mary Ann, Tzouvara, Katerina, Vaughan, Robert, Wu, Dan, Zhu, Weimin, Apweiler, Rolf
The EMBL Nucleotide Sequence Database
Nucl. Acids Res. 2005 33: D29-33
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/abstract/33/suppl_1/D29



35. GenBank

URL: http://www.ncbi.nlm.nih.gov/Genbank/index.html
Categories: Nucleotide Sequences: International Nucleotide Sequence Database Collaboration Databases

GenBank (R) is a comprehensive database that contains publicly available DNA sequences for more than 205 000 named organisms, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects. Most submissions are made using the Web-based BankIt or standalone Sequin programs and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the EMBL Data Library in Europe and the DNA Data Bank of Japan ensures worldwide coverage. GenBank is accessible through NCBI's retrieval system, Entrez, which integrates data from the major DNA and protein sequence databases along with taxonomy, genome, mapping, protein structure and domain information, and the biomedical journal literature via PubMed. BLAST provides sequence similarity searches of GenBank and other sequence databases. Complete bimonthly releases and daily updates of the GenBank database are available by FTP. To access GenBank and its related retrieval and analysis services, go to the NCBI Homepage at www.ncbi.nlm.nih.gov.

Citation for the above abstract:
Benson, Dennis A., Karsch-Mizrachi, Ilene, Lipman, David J., Ostell, James, Wheeler, David L.
GenBank
Nucl. Acids Res. 2006 34: D16-20
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D16



36. ACLAME: A CLAssification of genetic Mobile Elements

URL: http://aclame.ulb.ac.be/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

The ACLAME database (http://aclame.ulb.ac.be) is a collection and classification of prokaryotic mobile genetic elements (MGEs) from various sources, comprising all known phage genomes, plasmids and transposons. In addition to providing information on the full genomes and genetic entities, it aims to build a comprehensive classification of the functional modules of MGEs at the protein, gene and higher levels. This first version contains a comprehensive classification of 5069 proteins from 119 DNA bacteriophages into over 400 functional families. This classification was produced automatically using TRIBE-MCL, a graph-theory-based Markov clustering algorithm that uses sequence measures as input, and then manually curated. Manual curation was aided by consulting annotations available in public databases retrieved through additional sequence similarity searches using Psi-Blast and Hidden Markov Models. The database is publicly accessible and open to expert volunteers willing to participate in its curation. Its web interface allows browsing as well as querying the classification. The main objectives are to collect and organize in a rational way the complexity inherent to MGEs, to extend and improve the inadequate annotation currently associated with MGEs and to screen known genomes for the validation and discovery of new MGEs.

Citation for the above abstract:
Leplae, Raphael, Hebrant, Aline, Wodak, Shoshana J., Toussaint, Ariane
ACLAME: A CLAssification of Mobile genetic Elements
Nucl. Acids Res. 2004 32: D45-49
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D45



37. Ciliate MDS/IES Database

URL: http://oxytricha.princeton.edu/dimorphism/database.htm
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

Ciliated protozoa have two kinds of nuclei: Macronuclei (MAC) and Micronuclei (MIC). In some ciliate classes, such as spirotrichs, most genes undergo several layers of DNA rearrangement during macronuclear development. Because of such processes, these organisms provide ideal systems for studying mechanisms of recombination and gene rearrangement. Here, we describe a database that contains all spirotrich genes for which both MAC and MIC versions are sequenced, with consistent annotation and easy access to all the features. An interface to query the database is available at http://oxytricha.princeton.edu/dimorphism/database.htm.

Citation for the above abstract:
Cavalcanti, Andre R. O., Clarke, Thomas H., Landweber, Laura F.
MDS_IES_DB: a database of macronuclear and micronuclear genes in spirotrichous ciliates
Nucl. Acids Res. 2005 33: D396-398
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D396



38. CORG: a database for COmparative Regulatory Genomics

URL: http://corg.molgen.mpg.de/
Categories: Model Organisms and Comparative Genomics Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases, RNA Sequence Databases

Sequence conservation in non-coding, upstream regions of orthologous genes from man and mouse is likely to reflect common regulatory DNA sites. Motivated by this assumption we have delineated a catalogue of conserved non-coding sequence blocks and provide the CORG—‘COmparative Regulatory Genomics’—database. The data were computed based on statistically significant local suboptimal alignments of 15 kb regions upstream of the translation start sites of, currently, 10 793 pairs of orthologous genes. The resulting conserved non-coding blocks were annotated with EST matches for easier detection of non-coding mRNA and with hits to known transcription factor binding sites. CORG data are accessible from the ENSEMBL web site via a DAS service as well as a specially developed web service (http://corg.molgen.mpg.de) for query and interactive visualization of the conserved blocks and their annotation.

Citation for the above abstract:
Dieterich, C., Wang, H., Rateitschak, K., Luz, H., Vingron, M.
CORG: a database for COmparative Regulatory Genomics
Nucl. Acids Res. 2003 31: 55-57
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/55



39. CUTG: Codon Usage Tabulated from GenBank

URL: http://www.kazusa.or.jp/codon/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

The frequencies of each of the 257 468 complete protein coding sequences (CDSs) have been compiled from the taxonomical divisions of the GenBank DNA sequence database. The sum of the codons used by 8792 organisms has also been calculated. The data files can be obtained from the anonymous ftp sites of DDBJ, Kazusa and EBI. A list of the codon usage of genes and the sum of the codons used by each organism can be obtained through the web site http://www.kazusa.or.jp/codon/ . The present study also reports recent developments on the WWW site. The new web interface provides data in the CodonFrequency-compatible format as well as in the traditional table format. The use of the database is facilitated by keyword based search analysis and the availability of codon usage tables for selected genes from each species. These new tools will provide users with the ability to further analyze for variations in codon usage among different genomes.

Citation for the above abstract:
Nakamura, Yasukazu, Gojobori, Takashi, Ikemura, Toshimichi
Codon usage tabulated from international DNA sequence databases: status for the year 2000
Nucl. Acids Res. 2000 28: 292-
© 2000 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/292



40. Entrez Gene

URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

Entrez Gene (www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene) is NCBI's database for gene-specific information. It does not include all known or predicted genes; instead Entrez Gene focuses on the genomes that have been completely sequenced, that have an active research community to contribute gene-specific information, or that are scheduled for intense sequence analysis. The content of Entrez Gene represents the result of curation and automated integration of data from NCBI's Reference Sequence project (RefSeq), from collaborating model organism databases, and from many other databases available from NCBI. Records are assigned unique, stable and tracked integers as identifiers. The content (nomenclature, map location, gene products and their attributes, markers, phenotypes, and links to citations, sequences, variation details, maps, expression, homologs, protein domains and external databases) is updated as new information becomes available. Entrez Gene is a step forward from NCBI's LocusLink, with both a major increase in taxonomic scope and improved access through the many tools associated with NCBI Entrez.

Citation for the above abstract:
Maglott, Donna, Ostell, Jim, Pruitt, Kim D., Tatusova, Tatiana
Entrez Gene: gene-centered information at NCBI
Nucl. Acids Res. 2005 33: D54-58
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D54



41. FREP: Functional Repeats in Mouse cDNAs

URL: http://facts.gsc.riken.go.jp/FREP/
Categories: Model Organisms and Comparative Genomics Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases

The FREP database (http://facts.gsc.riken.go.jp/FREP/) contains 31 396 RepeatMasker-identified non-redundant variant repeat sequences derived from 16 527 mouse cDNAs with protein-coding potential. The repeats were computationally associated with potential effects on transcriptional variation, translation, protein function or involvement in disease to identify Functional REPeats (FREPs). FREPs are defined by the (i) occurrence of exon–exon boundaries in repeats, (ii) presence of polyadenylation sites in 3'UTR-located repeats, (iii) effect on translation, (iv) position in the protein- coding region or protein domains or (v) conditional association with disease MeSH terms. Currently the database contains 9261 (29.5%) inferred FREPs derived from 6861 (41.5%) mouse cDNAs. Integrated evidence of the functional assignments and dynamically generated sequence similarity search results support the exploration and annotation of functional, ancestral or taxon-specific repeats. Keyword and pre-selected feature searches (e.g. coding sequence–repeat or splice site–repeat relations) support intuitive database querying as well as the retrieval of repeat sequences. Integrated sequence search and alignment tools allow the analysis of known or identification of new functional repeat candidates. FREP is a unique resource for illuminating the role of transposons and repetitive sequences in shaping the coding part of the mouse transcriptome and for selecting the appropriate experimental model to study diseases with suspected repeat etiology contributions.

Citation for the above abstract:
Nagashima, Takeshi, Matsuda, Hideo, Silva, Diego G., Petrovsky, Nikolai, RIKEN GER Group, , GSL Members, , Konagaya, Akihiko, Schonbach, Christian
FREP: a database of functional repeats in mouse cDNAs
Nucl. Acids Res. 2004 32: D471-475
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D471



42. Genetic Codes

URL: http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

"NCBI takes great care to ensure that the translation for each coding sequence (CDS) present in GenBank records is correct. Central to this effort is careful checking on the taxonomy of each record and assignment of the correct genetic code (shown as a /transl_table qualifier on the CDS in the flat files) for each organism and record. This page summarizes and references this work."



43. Islander: Database of Genomic Islands

URL: http://129.79.232.60/cgi-bin/islander/islander.cgi
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, Prokaryote Databases

Prokaryotic chromosomes often contain islands, such as temperate phages or pathogenicity islands, delivered by site-specific integrases. Integration usually occurs within a tRNA or tmRNA gene, splitting the gene, yet sequences within the island restore the disrupted gene. The regenerated RNA gene and the displaced fragment of that gene thus mark the endpoints of the island. We applied this principle to search for islands in genomic DNA sequences. Our algorithm generates a list of tRNA and tmRNA genes, uses each as the query for a BLAST search of the starting DNA and removes unlikely hits through a series of filters. A search for islands in 106 whole bacterial genomes produced 143 candidates, with the search itself providing an estimate of three false candidates among these. Preliminary phylogenetic analysis of the associated integrases reduced this set to 89 cases of independently evolved site specificity, which showed strong bias for the tmRNA gene. The website Islander (http://www.indiana.edu/islander) presents the candidate islands in GenBank-style files and correlates integrase phylogeny with site specificity.

Citation for the above abstract:
Mantri, Yogita, Williams, Kelly P.
Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities
Nucl. Acids Res. 2004 32: D55-58
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D55



44. L1Base

URL: http://l1base.molgen.mpg.de/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

L1Base is a dedicated database containing putatively active LINE-1 (L1) insertions residing in human and rodent genomes that are as follows: (i) intact in the two open reading frames (ORFs), full-length L1s (FLI-L1s) and (ii) intact ORF2 but disrupted ORF1 (ORF2-L1s). In addition, due to their regulatory potential, the full-length (>6000 bp) non-intact L1s (FLnI-L1s) were also included in the database. Application of a novel annotation methodology, L1Xplorer, allowed in-depth annotation of functional sequence features important for L1 activity, such as transcription factor binding sites and amino acid residues. The L1Base is available online at http://l1base.molgen.mpg.de. In addition, the data stored in the database can be accessed from the Ensembl web browser via a DAS service (http://l1das.molgen.mpg.de:8080/das).

Citation for the above abstract:
Penzkofer, Tobias, Dandekar, Thomas, Zemojtel, Tomasz
L1Base: from functional annotation to prediction of active LINE-1 elements
Nucl. Acids Res. 2005 33: D498-500
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D498



45. MethDB DNA Methylation Database

URL: http://www.methdb.de/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

Methylation of cytosine in the 5 position of the pyrimidine ring is a major modification of the DNA in most organisms. In eukaryotes, the distribution and number of 5-methylcytosines (5mC) along the DNA is heritable but can also change with the developmental state of the cell and as a response to modifications of the environment. While DNA methylation probably has a number of functions, scientific interest has recently focused on the gene silencing effect methylation can have in eukaryotic cells. In particular, the discovery of changes in the methylation level during cancer development has increased the interest in this field. In the past, a vast amount of data has been generated with different levels of resolution ranging from 5mC content of total DNA to the methylation status of single nucleotides. We present here a database for DNA methylation data that attempts to unify these results in a common resource. The database is accessible via WWW (http://www.methdb.de). It stores information about the origin of the investigated sample and the experimental procedure, and contains the DNA methylation data. Query masks allow for searching for 5mC content, species, tissue, gene, sex, phenotype, sequence ID and DNA type. The output lists all available information including the relative gene expression level. DNA methylation patterns and methylation profiles are shown both as a graphical representation and as G/A/T/C/5mC-sequences or tables with sequence positions and methylation levels, respectively.

Citation for the above abstract:
Grunau, Christoph, Renault, Eric, Rosenthal, Andre, Roizes, Gerard
MethDB--a public database for DNA methylation data
Nucl. Acids Res. 2001 29: 270-274
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/270



46. MICdb: Database of Prokaryotic Microsatellites

URL: http://210.212.212.7/MIC/index.html
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, Taxonomy and Identification Databases

The MICdb (Microsatellites Database) (http://www.cdfd.org.in/micas) is a comprehensive relational database of non-redundant microsatellites extracted from fully sequenced prokaryotic genomes. The current version (1.0) of the database has been compiled from 83 genomes belonging to different phylogenetic groups. This database has been linked to MICAS, the web-based Microstatellite Analysis Server. MICAS provides a user-friendly front-end to systematically extract data on microsatellite tracts from genomes. The database contains the following information pertaining to the microsatellites: the regions (coding/non-coding, if coding, their GenBank annotations) containing microsatellite tracts; the frequencies of their occurrences, the size and the number of repeating motifs; and the sequences of the tracts. MICAS also provides an interface to Autoprimer, a primer design program to automatically design primers for selected microsatellite loci.

Citation for the above abstract:
Sreenu, Vattipally B., Alevoor, Vishwanath, Nagaraju, Javaregowda, Nagarajaram, Hampapathalu A.
MICdb: database of prokaryotic microsatellites
Nucl. Acids Res. 2003 31: 106-108
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/106



47. NPRD: Nucleosome Positioning Region Database

URL: http://srs6.bionet.nsc.ru/srs6/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

Nucleosome Positioning Region Database (NPRD), which is compiling the available experimental data on locations and characteristics of nucleosome formation sites (NFSs), is the first curated NFS-oriented database. The object of the database is a single NFS described in an individual entry. When annotating results of NFS experimental mapping, we pay special attention to several important functional characteristics, such as the relationship between type of gene activity and nucleosome positioning, the influence of non-histone proteins on nucleosome formation, type of the variant of nucleosome positioning (translational or rotational), indication of tissue types and states of cell activity, description of experimental methods used and accuracy of nucleosome position determination, and the results of applying theoretical and computer methods to the analysis of contextual and conformational DNA properties. At present, the NPRD database contains 438 entries and integrates the data described in 124 original papers. The database URL: http://srs6.bionet.nsc.ru/srs6/. Then click the button ‘Databank’ and open the link NUCLEOSOME.

Citation for the above abstract:
Levitsky, Victor G., Katokhin, Aleksey V., Podkolodnaya, Olga A., Furman, Dagmara P., Kolchanov, Nikolay A.
NPRD: Nucleosome Positioning Region Database
Nucl. Acids Res. 2005 33: D67-70
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D67



48. PACRAT

URL: http://www.biosci.ohio-state.edu/~pacrat/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

Analysis of intergenic sequences for purposes such as the investigation of transcriptional signals or the identification of small RNA genes is frequently complicated by traditional biological database structures. Genome data is commonly treated as chromosome-length sequence records, detailed by gene calls demarcating subsequences of the chromosomes. Given this model, the determination of non-called subsequences between any gene and its nearest neighbors requires an exhaustive search of all gene calls associated with the chromosome. Further compounding the issue, the location of intergenic regions for many called genes cannot be resolved unambiguously due to uncertainties in gene boundaries, as well as the presence of other conflicting gene calls. To address these difficulties we have constructed the PACRAT (http://www.biosci.ohio-state.edu/~pacrat/) database system. PACRAT preprocesses GenBank genome submissions, evaluates for every gene the character of its relationship to those genes nearest to it, and produces a relationally linked model of the gene ordering for the genome. Using this information, the interface allows the researcher to query gene data as well as intergenic sequence data based on a number of criteria. These include the ability to filter searches based on the status of start and stop positions, or upstream/downstream sequences as conflicting with called genes and automated extension of upstream or downstream searches to find probable operon promoters or terminators. The database is also indexed by KEGG classification, allowing, for example, functionally-related groups of high-quality promoter-containing regions to be easily retrieved as a group.

Citation for the above abstract:
Ray, William C., Daniels, Charles J.
PACRAT: a database and analysis system for archaeal and bacterial intergenic sequence features
Nucl. Acids Res. 2003 31: 109-113
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/109



49. PANDIT: Protein and Associated Nucleotide Domains with Inferred Trees

URL: http://www.ebi.ac.uk/goldman-srv/pandit
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, Protein Domain and Protein Classification Databases

PANDIT is a database of homologous sequence alignments accompanied by estimates of their corresponding phylogenetic trees. It provides a valuable resource to those studying phylogenetic methodology and the evolution of coding-DNA and protein sequences. Currently in version 17.0, PANDIT comprises 7738 families of homologous protein domains; for each family, DNA and corresponding amino acid sequence multiple alignments are available together with high quality phylogenetic tree estimates. Recent improvements include expanded methods for phylogenetic tree inference, assessment of alignment quality and a redesigned web interface, available at the URL http://www.ebi.ac.uk/goldman-srv/pandit.

Citation for the above abstract:
Whelan, Simon, de Bakker, Paul I. W., Quevillon, Emmanuel, Rodriguez, Nicolas, Goldman, Nick
PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees
Nucl. Acids Res. 2006 34: D327-331
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D327



50. xBASE

URL: http://xbase.bham.ac.uk/
Categories: Prokaryote Databases

The schema of the previously described Escherischia coli database coliBASE has been applied to a number of other bacterial taxa, under the collective name xBASE. The new databases include CampyDB for Campylobacter, Helicobacter and Wolinella; PseudoDB for pseudomonads; ClostriDB for clostridia; RhizoDB for Rhizobium and Sinorhizobium; and MycoDB, for Mycobacterium, Streptomyces and related organisms. The databases provide user friendly access to annotation and genome comparisons through a web-based graphical interface. Newly developed features include whole genome displays, ‘painting’ of genes according to properties such as GC content, a pattern search system to identify conserved motifs and batch BLAST searching of every protein encoded by a region. Examples of how the databases have been, and continue to be, used to generate hypotheses for subsequent laboratory investigation are presented. xBASE is available online at http://xbase.bham.ac.uk.

Citation for the above abstract:
Chaudhuri, Roy R., Pallen, Mark J.
xBASE, a collection of online databases for bacterial comparative genomics
Nucl. Acids Res. 2006 34: D335-337
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D335



51. RECODE: The Database of the Translational Recoding Events

URL: http://recode.genetics.utah.edu/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

The RECODE database is a compilation of translational recoding events (programmed ribosomal frameshifting, codon redefinition and translational bypass). The database provides information about the genes utilizing these events for their expression, recoding sites, stimulatory sequences and other relevant information. The Database is freely available at http://recode.genetics.utah.edu/.

Citation for the above abstract:
Baranov, Pavel V., Gurvich, Olga L., Hammer, Andrew W., Gesteland, Raymond F., Atkins, John F.
RECODE 2003
Nucl. Acids Res. 2003 31: 87-89
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/87



52. RefSeq: NCBI Reference Sequence

URL: http://www.ncbi.nlm.nih.gov/RefSeq/
Categories: General Protein Sequence Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases

The National Center for Biotechnology Information (NCBI) Reference Sequence (RefSeq) database (http://www.ncbi.nlm.nih.gov/RefSeq/) provides a non-redundant collection of sequences representing genomic data, transcripts and proteins. Although the goal is to provide a comprehensive dataset representing the complete sequence information for any given species, the database pragmatically includes sequence data that are currently publicly available in the archival databases. The database incorporates data from over 2400 organisms and includes over one million proteins representing significant taxonomic diversity spanning prokaryotes, eukaryotes and viruses. Nucleotide and protein sequences are explicitly linked, and the sequences are linked to other resources including the NCBI Map Viewer and Gene. Sequences are annotated to include coding regions, conserved domains, variation, references, names, database cross-references, and other features using a combined approach of collaboration and other input from the scientific community, automated annotation, propagation from GenBank and curation by NCBI staff.

Citation for the above abstract:
Pruitt, Kim D., Tatusova, Tatiana, Maglott, Donna R.
NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins
Nucl. Acids Res. 2005 33: D501-504
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D501



53. S/MARt DB: The S/MAR transaction DataBase

URL: http://smartdb.bioinf.med.uni-goettingen.de/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

S/MARt DB, the S/MAR transaction database, is a relational database covering scaffold/matrix attached regions (S/MARs) and nuclear matrix proteins that are involved in the chromosomal attachment to the nuclear scaffold. The data are mainly extracted from original publications, but a World Wide Web interface for direct submissions is also available. S/MARt DB is closely linked to the TRANSFAC database on transcription factors and their binding sites. It is freely accessible through the World Wide Web (http://transfac.gbf.de/SMARtDB/) for non-profit research.

Citation for the above abstract:
Liebich, Ines, Bode, Jurgen, Frisch, Matthias, Wingender, Edgar
S/MARt DB: a database on scaffold/matrix attached regions
Nucl. Acids Res. 2002 30: 372-374
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/372



54. STRBase: Short Tandem Repeat DNA Internet DataBase

URL: http://www.cstl.nist.gov/div831/strbase/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

The National Institute of Standards and Technology (NIST) has compiled and maintained a Short Tandem Repeat DNA Internet Database (http://www.cstl.nist.gov/biotech/strbase/) since 1997 commonly referred to as STRBase. This database is an information resource for the forensic DNA typing community with details on commonly used short tandem repeat (STR) DNA markers. STRBase consolidates and organizes the abundant literature on this subject to facilitate on-going efforts in DNA typing. Observed alleles and annotated sequence for each STR locus are described along with a review of STR analysis technologies. Additionally, commercially available STR multiplex kits are described, published polymerase chain reaction (PCR) primer sequences are reported, and validation studies conducted by a number of forensic laboratories are listed. To supplement the technical information, addresses for scientists and hyperlinks to organizations working in this area are available, along with the comprehensive reference list of over 1300 publications on STRs used for DNA typing purposes.

Citation for the above abstract:
Ruitberg, Christian M., Reeder, Dennis J., Butler, John M.
STRBase: a short tandem repeat DNA database for the human identity testing community
Nucl. Acids Res. 2001 29: 320-322
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/320



55. The TIGR Plant Repeat Databases

URL: http://www.tigr.org/tdb/e2k1/plant.repeats/
Categories: General Plant Databases, Nucleotide Sequences: Coding and Non-coding DNA Databases

In a number of higher plants, a substantial portion of the genome is composed of repetitive sequences that can hinder genome annotation and sequencing efforts. To better understand the nature of repetitive sequences in plants and provide a resource for identifying such sequences, we constructed databases of repetitive sequences for 12 plant genera: Arabidopsis, Brassica, Glycine, Hordeum, Lotus, Lycopersicon, Medicago, Oryza, Solanum, Sorghum, Triticum and Zea (www.tigr.org/tdb/e2k1/plant. repeats/index.shtml). The repetitive sequences within each database have been coded into super-classes, classes and sub-classes based on sequence and structure similarity. These databases are available for sequence similarity searches as well as downloadable files either as entire databases or subsets of each database. To further the utility for comparative studies and to provide a resource for searching for repetitive sequences in other genera within these families, repetitive sequences have been combined into four databases to represent the Brassicaceae, Fabaceae, Gramineae and Solanaceae families. Collectively, these databases provide a resource for the identification, classification and analysis of repetitive sequences in plants.

Citation for the above abstract:
Ouyang, Shu, Buell, C. Robin
The TIGR Plant Repeat Databases: a collective resource for the identification of repetitive sequences in plants
Nucl. Acids Res. 2004 32: D360-363
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D360



56. UNIVEC

URL: http://www.ncbi.nlm.nih.gov/VecScreen/UniVec.html
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

"UniVec is a database that can be used to quickly identify segments within nucleic acid sequences which may be of vector origin (vector contamination). Screening using UniVec is efficient because a large number of redundant subsequences have been eliminated to create a database that contains only one copy of every unique sequence segment from a large number of vectors.

In addition to vector sequences, UniVec also contains sequences for those adapters, linkers, and primers commonly used in the process of cloning cDNA or genomic DNA. This enables contamination with these oligonucleotide sequences to be found during the vector screen.

UniVec can be obtained from the NCBI FTP directory: ftp://ftp.ncbi.nih.gov/pub/UniVec/."



57. UTRdb/UTRsite

URL: http://www.ba.itb.cnr.it/UTR/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases, RNA Sequence Databases

The 5' and 3' untranslated regions of eukaryotic mRNAs play crucial roles in the post-transcriptional regulation of gene expression through the modulation of nucleo-cytoplasmic mRNA transport, translation efficiency, subcellular localization and message stability. UTRdb is a curated database of 5' and 3' untranslated sequences of eukaryotic mRNAs, derived from several sources of primary data. Experimentally validated functional motifs are annotated (and also collated as the UTRsite database) and cross-links to genomic and protein data are provided. The integration of UTRdb with genomic and protein data has allowed the implementation of a powerful retrieval resource for the selection and extraction of UTR subsets based on their genomic coordinates and/or features of the protein encoded by the relevant mRNA (e.g. GO term, PFAM domain, etc.). All internet resources implemented for retrieval and functional analysis of 5' and 3' untranslated regions of eukaryotic mRNAs are accessible at http://www.ba.itb.cnr.it/UTR/.

Citation for the above abstract:
Mignone, Flavio, Grillo, Giorgio, Licciulli, Flavio, Iacono, Michele, Liuni, Sabino, Kersey, Paul J., Duarte, Jorge, Saccone, Cecilia, Pesole, Graziano
UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs
Nucl. Acids Res. 2005 33: D141-146
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D141



58. VectorDB: Molecular Biology Vector Sequence Database

URL: http://seq.yeastgenome.org/vectordb/
Categories: Nucleotide Sequences: Coding and Non-coding DNA Databases

"Space for VectorDB was provided by the Saccharomyces Genome Database (SGD) project. VectorDB contains annotations and sequence information for many vectors commonly used in molecular biology. Information for more than 2600 vectors is available with search facilities. Vectors which are also in GenBank have direct links to that database via NCBI's Entrez browser!"



59. ASAP: the Alternative Splicing Annotation Project

URL: http://www.bioinformatics.ucla.edu/ASAP/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

Recently, genomics analyses have demonstrated that alternative splicing is widespread in mammalian genomes (30–60% of genes reported to have multiple isoforms), and may be one of their most important mechanisms of functional regulation. However, by comparison with other genomics data such as genome annotation, SNPs, or gene expression, there exists relatively little database infrastructure for the study of alternative splicing. We have constructed an online database ASAP (the Alternative Splicing Annotation Project) for biologists to access and mine the enormous wealth of alternative splicing information coming from genomics and proteomics. ASAP is based on genome-wide analyses of alternative splicing in human (30 793 alternative splice relationships found) from detailed alignment of expressed sequences onto the genomic sequence. ASAP provides precise gene exon–intron structure, alternative splicing, tissue specificity of alternative splice forms, and protein isoform sequences resulting from alternative splicing. Moreover, it can help biologists design probe sequences for distinguishing specific mRNA isoforms. ASAP is intended to be a community resource for collaborative annotation of alternative splice forms, their regulation, and biological functions. The URL for ASAP is http://www.bioinformatics.ucla.edu/ASAP.

Citation for the above abstract:
Lee, Christopher, Atanelov, Levan, Modrek, Barmak, Xing, Yi
ASAP: the Alternative Splicing Annotation Project
Nucl. Acids Res. 2003 31: 101-105
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/101



60. ASD: Alternative Splicing Database

URL: http://www.ebi.ac.uk/asd/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

Alternative splicing is an important regulatory mechanism of mammalian gene expression. The alternative splicing database (ASD) consortium is systematically collecting and annotating data on alternative splicing. We present the continuation and upgrade of the ASD [T. A. Thanaraj, S. Stamm, F. Clark, J. J. Riethoven, V. Le Texier, J. Muilu (2004) Nucleic Acids Res. 32, D64–D69] that consists of computationally and manually generated data. Its largest parts are AltSplice, a value-added database of computationally delineated alternative splicing events. Its data include alternatively spliced introns/exons, events, isoform splicing patterns and isoform peptide sequences. AltSplice data are generated by examining gene-transcript alignments. The data are annotated for various biological features including splicing signals, expression states, (SNP)-mediated splicing and cross-species conservation. AEdb forms the manually curated component of ASD. It is a literature-based data set containing sequence and properties of alternatively spliced exons, functional enumeration of observed splicing events, characterization of observed splicing regulatory elements, and a collection of experimentally clarified minigene constructs. ASD includes a workbench, which is an analysis tool that enables users to carry out splicing related analysis such as characterization of introns for various splicing signals, identification of splicing regulatory elements on a given RNA sequence, prediction of putative exons and prediction of putative translation start codons. The different ASD modules are integrated and can be accessed through user-friendly interfaces and visualization tools. ASD data has been integrated with Ensembl genome annotation project as a Distributed Annotation System (DAS) resource and can be viewed on Ensembl genome browser. The ASD resource is presented at (http://www.ebi.ac.uk/asd).

Citation for the above abstract:
Stamm, Stefan, Riethoven, Jean-Jack, Le Texier, Vincent, Gopalakrishnan, Chellappa, Kumanduri, Vasudev, Tang, Yesheng, Barbosa-Morais, Nuno L., Thanaraj, Thangavel Alphonse
ASD: a bioinformatics resource on alternative splicing
Nucl. Acids Res. 2006 34: D46-55
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D46



61. ASDB: Alternative Splicing Database

URL: http://hazelton.lbl.gov/~teplitski/alt/
Categories: Human ORFs, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

Version 2.1 of ASDB (Alternative Splicing Data Base) contains 1922 protein and 2486 DNA sequences. The protein entries from SWISS-PROT are joined into clusters corresponding to alternatively spliced variants of one gene. The DNA division consists of complete genes with alternative splicing mentioned or annotated in GenBank. The search engine allows one to search over SWISS-PROT and GenBank fields and then follow the links to all variants.

Citation for the above abstract:
Dralyuk, I., Brudno, M., Gelfand, M. S., Zorn, M., Dubchak, I.
ASDB: database of alternatively spliced genes
Nucl. Acids Res. 2000 28: 296-297
© 2000 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/296



62. ASHESdb: Alternatively Spliced Human genes by Exon Skipping - A Database

URL: http://sege.ntu.edu.sg/wester/ashes/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

"Alternative splicing is the major contributor to protein diversity in human. Some genes can generate as many as thousand protein isoforms by alternative splicing. The mechanism of alternative splicing in normal and diseased states is perplexing. Differential joining of exons during alternative splicing is important in detecting genetic disorders. Alternative splicing is reported to regulate the sub-cellular localization of divalent metal transporter 1 isoforms and the NMDA R1 receptor gene. Therefore, a comprehensive knowledge on alternative splicing (mechanism and combinatorial protein diversity) is critical in efficient gene discovery and target validation. Alternative splicing can change the mRNA product in several ways. At its simplest level, an exon can be removed (exon skip), lengthened or shortened (alternative 5' or 3' splicing).

However, identification of splice variants remains tricky and arduous mainly due to large intervening sequences and lack of tissue specific cDNA sequence data. As can be seen majority of currently known splice variants are identified using EST and EST coverage in the protein coding sequence of many genes is still inadequate to predict splicing to a large extent. Moreover, there are limitations in accuracy resulting from the single-pass sequencing that has been used to identify ESTs. In this database, we describe alternatively spliced (exon skipping) human genes identified strictly using full-length cDNA sequences (MGC). This novel approach makes the detection of splice variants more reliable and accurate. This circumvents the greatest challenges in using EST databases to understand alternative splicing and thereby facilitates the task of comprehending the relationships of these short EST sequences to each other and to other genes.

The database integrates a variety of data for each gene ranging from gene map, gene structure, splice variants and tissue information. Information on mouse orthologs showing exon-skipping patterns for these genes is also provided. This database can be used to study the impact of alternative splicing on protein function and could be a useful resource to researchers who have found a new cDNA or human gene and wish to find additional information."



63. EASED: Extended Alternatively Spliced EST Database

URL: http://www.bioinf.mdc-berlin.de/splice/db/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

We established a database of alternative splice forms (ASforms) for nine eukaryotic organisms. ASforms are defined by comparing high-scoring ESTs with mRNA sequences using BLAST, taking known exon–intron information (from the Ensembl database). Filtering programs compare the ends of each aligned sequence pair for deletions or insertions in the EST sequence, which indicate the existence of alternative splice forms with respect to the exon–intron boundaries. Moreover, we defined the alternative splice profile of each human sequence. It indicates the number of alternatively spliced ESTs (NAE), the number of constitutively spliced ESTs (NCE) as well as the number of alternative splice sites (NSS) per mRNA. NAE and NCE correspond to the EST coverage and can be used as a quality indicator for the predicted alternative splice variants. The NSS value specifies the splice propensity of a gene. Additionally, the tissue type information of all ESTs was included. This allows (i) restriction of the search to certain tissues and (ii) calculation of the tissue-NAEs, tissue-NCEs and tissue-NSS. These scores are suitable for the estimation of tissue specificity of certain ASforms. Furthermore, the developmental stage and disease information of the ESTs is available. EASED is accessible at http://eased.bioinf.mdc-berlin.de/.

Citation for the above abstract:
Pospisil, Heike, Herrmann, Alexander, Bortfeldt, Ralf H., Reich, Jens G.
EASED: Extended Alternatively Spliced EST Database
Nucl. Acids Res. 2004 32: D70-74
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D70



64. ECgene: Gene Modeling with Alternative Splicing

URL: http://genome.ewha.ac.kr/ECgene/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

ECgene provides annotation for gene structure, function and expression, taking alternative splicing events into consideration. The gene-modeling algorithm combines the genome-based expressed sequence tag (EST) clustering and graph-theoretic transcript assembly procedures. The website provides several viewers and applications that have many unique features useful for the analysis of the transcript structure and gene expression. The summary viewer shows the gene summary and the essence of other annotation programs. The genome browser and the transcript viewer are available for comparing the gene structure of splice variants. Changes in the functional domains by alternative splicing can be seen at a glance in the transcript viewer. We also provide two unique ways of analyzing gene expression. The SAGE tags deduced from the assembled transcripts are used to delineate quantitative expression patterns from SAGE libraries available publically. Furthermore, the cDNA libraries of EST sequences in each cluster are used to infer qualitative expression patterns. It should be noted that the ECgene website provides annotation for the whole transcriptome, not just the alternatively spliced genes. Currently, ECgene supports the human, mouse and rat genomes. The ECgene suite of tools and programs is available at http://genome.ewha.ac.kr/ECgene/.

Citation for the above abstract:
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D75



65. EDAS: EST-Derived Alternative Splicing Database

URL: http://www.genebee.msu.su/edas/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases





66. ExInt: an Exon Intron Database

URL: http://sege.ntu.edu.sg/wester/exint/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

The Exon/Intron Database (ExInt) stores information of all GenBank eukaryotic entries containing an annotated intron sequence. Data are available through a retrieval system, as flat-files and as a MySQL dump file. In this report we discuss several implementations added to ExInt, which is accessible at http://intron.bic.nus.edu.sg/exint/newexint/exint.html.

Citation for the above abstract:
Sakharkar, M., Passetti, F., de Souza, J. E., Long, M., de Souza, S. J.
ExInt: an Exon Intron Database
Nucl. Acids Res. 2002 30: 191-194
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/191



67. FESD: a Functional Element SNPs Database

URL: http://combio.kribb.re.kr/FESD/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

We have created the Functional Element SNPs Database (FESD) that categorizes functional elements in human genic regions and provides a set of single nucleotide polymorphisms (SNPs) located within each area. In the FESD, the human genic regions were divided into 10 different functional elements, such as promoter regions, CpG islands, 5'-untranslated regions (5'-UTRs), translation start sites, splice sites, coding exons, introns, translation stop sites, polyadenylation signals and 3'-UTRs, and subsequently, all the known SNPs were assigned to each functional element at their respective position. With the FESD web interface, users can select a set of SNPs in the specific functional elements and get their flanking sequences for genotyping experiments, which will help in finding mutations that contribute to the common and polygenic diseases. A web interface for the FESD is freely available at http://combio.kribb.re.kr/ksnp/resd/.

Citation for the above abstract:
Kang, Hyo Jin, Choi, Kyoung Oak, Kim, Byung-Dong, Kim, Sangsoo, Kim, Young Joo
FESD: a Functional Element SNPs Database in human
Nucl. Acids Res. 2005 33: D518-522
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D518



68. FUGOID: Functional Genomics of Organellar Introns Database

URL: http://web.austin.utexas.edu/fugoid/introndata/main.htm
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases, Organelle Databases

FUGOID is a web-based, taxonomically broad organelle intron database that collects and integrates various functional and structural data on organellar (mitochondrial and chloroplast) introns. The main information provided by FUGOID includes intron sequence, subclass, resident ORF, self-splicing capability, host gene, protein factor(s) involved in splicing, mobility, insertion site, twintron, seminal references and taxonomic position of host organism. It is implemented in a relational database management system, allowing sophisticated, user-friendly searching, data entry and revision. Users can access the database by any common web browser using a variety of operating systems. The main page of the database is available at http://wnt.cc.utexas.edu/~ifmr530/introndata/main.htm.

Citation for the above abstract:
Li, Fei, Herrin, David L.
FUGOID: functional genomics of organellar introns database
Nucl. Acids Res. 2002 30: 385-386
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/385



69. HS3D: Homo Sapiens Splice Sites Dataset

URL: http://www.sci.unisannio.it/docenti/rampone/
Categories: Human ORFs, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

"HS3D (Homo Sapiens Splice Sites Dataset) is a data set of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank Rel.123.

The aim of this data set is to give standardized material to train and to assess the prediction accuracy of computational approaches for gene identification and characterization.

From the complete GenBank (Primate Sequences Division) Rel.123 (162,557 entries), entries of Human Nuclear DNA including Complete CDS and more than one Exon have been selected, and 4523 exons and 3802 introns have been extracted from these entries.

Details about extracted exons and introns are reported (Locus, number, Start and End position in the entry, sequence, length, G+C content, presence of not AGCT data (nucleotide scan check)).

Statistics are also reported (overall nucleotides, average G+C content, nucleotide scan check results, number of not GT starting / AG ending introns, minimum / maximum / average length, length standard deviation) ."



70. The Intronerator

URL: http://www.cse.ucsc.edu/~kent/intronerator/
Categories: Invertebrate Databases, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases, RNA Sequence Databases

The Intronerator (http://www.cse.ucsc.edu/~kent/intronerator/ ) is a set of web-based tools for exploring RNA splicing and gene structure in Caenorhabditis elegans. It includes a display of cDNA alignments with the genomic sequence, a catalog of alternatively spliced genes and a database of introns. The cDNA alignments include >100 000 ESTs and almost 1000 full-length cDNAs. ESTs from embryos and mixed stage animals as well as full-length cDNAs can be compared in the alignment display with each other and with predicted genes. The alt-splicing catalog includes 844 open reading frames for which there is evidence of alternative splicing of pre-mRNA. The intron database includes 28 478 introns, and can be searched for patterns near the splice junctions.

Citation for the above abstract:
Kent, W. James, Zahler, Alan M.
The Intronerator: exploring introns and alternative splicing in Caenorhabditis elegans
Nucl. Acids Res. 2000 28: 91-93
© 2000 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/91



71. SpliceDB

URL: http://www.softberry.com/berry.phtml?topic=splicedb&group=data&subgroup=spldb
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

A database (SpliceDB) of known mammalian splice site sequences has been developed. We extracted 43 337 splice pairs from mammalian divisions of the gene-centered Infogene database, including sites from incomplete or alternatively spliced genes. Known EST sequences supported 22 815 of them. After discarding sequences with putative errors and ambiguous location of splice junctions the verified dataset includes 22 489 entries. Of these, 98.71% contain canonical GT–AG junctions (22 199 entries) and 0.56% have non-canonical GC–AG splice site pairs. The remainder (0.73%) occurs in a lot of small groups (with a maximum size of 0.05%). We especially studied non-canonical splice sites, which comprise 3.73% of GenBank annotated splice pairs. EST alignments allowed us to verify only the exonic part of splice sites. To check the conservative dinucleotides we compared sequences of human non-canonical splice sites with sequences from the high throughput genome sequencing project (HTG). Out of 171 human non-canonical and EST-supported splice pairs, 156 (91.23%) had a clear match in the human HTG. They can be classified after sequence analysis as: 79 GC–AG pairs (of which one was an error that corrected to GC–AG), 61 errors corrected to GT–AG canonical pairs, six AT–AC pairs (of which two were errors corrected to AT–AC), one case was produced from a non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two other cases left of supported non-canonical splice pairs. The information about verified splice site sequences for canonical and non-canonical sites is presented in SpliceDB with the supporting evidence. We also built weight matrices for the major splice groups, which can be incorporated into gene prediction programs. SpliceDB is available at the computational genomic Web server of the Sanger Centre: http://genomic.sanger.ac.uk/spldb/SpliceDB.html and at http://www.softberry.com/spldb/SpliceDB.html.

Citation for the above abstract:
Burset, M., Seledtsov, I. A., Solovyev, V. V.
SpliceDB: database of canonical and non-canonical mammalian splice sites
Nucl. Acids Res. 2001 29: 255-259
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/255



72. SpliceInfo: An Information Repository for mRNA Alternative Splicing in Human Genome

URL: http://spliceinfo.mbc.nctu.edu.tw/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

We have developed an information repository named SpliceInfo to collect the occurrences of the four major alternative-splicing (AS) modes in human genome; these include exon skipping, 5'-alternative splicing, 3'-alternative splicing and intron retention. The dataset is derived by comparing the nucleotide and protein sequences available for a given gene for evidence of AS. Additional features such as the tissue specificity of the mRNA, the protein domain contained by exons, the GC-ratio of exons, the repeats contained within the exons, and the Gene Ontology are annotated computationally for each exonic region that is alternatively spliced. Motivated by a previous investigation of AS-related motifs such as exonic splicing enhancer and exonic splicing silencer, this resource also provides a means of identifying motifs candidates and this should help to identify potential regulatory mechanisms within a particular exonic sequence set and its two flanking intronic sequence sets. This is carried out using motif discovery tools to identify motif candidates related to alternative splicing regulation and together with a secondary structure prediction tool, will help in the identification of the structural properties of such regulatory motifs. The integrated resource is now available on http://SpliceInfo.mbc.NCTU.edu.tw/.

Citation for the above abstract:
Huang, Hsien-Da, Horng, Jorng-Tzong, Lin, Feng-Mao, Chang, Yu-Chung, Huang, Chen-Chia
SpliceInfo: an information repository for mRNA alternative splicing in human genome
Nucl. Acids Res. 2005 33: D80-85
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D80



73. SpliceNest

URL: http://splicenest.molgen.mpg.de/
Categories: Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

We have integrated the protein families from SYSTERS and the expressed sequence tag (EST) clusters from our database GeneNest with SpliceNest, a new database mapping EST contigs into genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT, TrEMBL and PIR databases into disjoint protein family and superfamily clusters. GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human, mouse, Arabidopsis thaliana and zebrafish. SpliceNest is a web-based graphical tool to explore gene structure, including alternative splicing, based on a mapping of the EST consensus sequences from GeneNest to the complete human genome. The integration of SYSTERS, GeneNest and SpliceNest into one framework now permits an overall exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The databases are available for querying and browsing at http://cmb.molgen.mpg.de.

Citation for the above abstract:
Krause, Antje, Haas, Stefan A., Coward, Eivind, Vingron, Martin
SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein
Nucl. Acids Res. 2002 30: 299-300
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/299



74. Xpro: Database of Eukaryotic Protein Encoding Genes

URL: http://origin.bic.nus.edu.sg/xpro/
Categories: Model Organisms and Comparative Genomics Databases, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

Xpro is a relational database that contains all the eukaryotic protein-encoding DNA sequences contained in GenBank with associated data required for the analysis of eukaryotic gene architecture. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein-coding regions, Xpro provides annotations on the splice sites and intron phases. Furthermore, Xpro validates intron positions using alignment information between the record’s sequence and EST sequences found in dbEST. In the process of validation, alternative splicing information is also obtained and can be found in the database. The intron-containing genes in the Xpro are also classified as experimental or predicted based on the intron position validation and specific keywords in the GenBank records that are present in predicted genes. An Entrez-like query system, which is familiar to most biologists, is provided for accessing the information present in the database system. A non-redundant set of Xpro database contents is also obtained by cross-referencing to the Swiss-Prot/TrEMBL and Pfam databases. The database currently contains information for 493 983 genes—351 918 intron- containing genes and 142 065 intron-less genes. Xpro is updated for each new GenBank release and is freely available via the internet at http://origin.bic. nus.edu.sg/xpro.

Citation for the above abstract:
Gopalan, Vivek, Tan, Tin Wee, Lee, Bernett T. K., Ranganathan, Shoba
Xpro: database of eukaryotic protein-encoding genes
Nucl. Acids Res. 2004 32: D59-63
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D59



75. Ares lab Yeast Intron Database

URL: http://www.cse.ucsc.edu/research/compbio/yeast_introns.html
Categories: Fungal Genome Databases, Nucleotide Sequences: Gene Structure, Introns and Exons, & Splice Sites Databases

"This site contains information about the spliceosomal introns of the yeast Saccharomyces cerevisiae. Introns present special problems for the annotation of eukaryotic genomes. Splice sites are information-poor, and their recognition by the splicing apparatus is highly context-dependent and regulated, making identification by computational gene prediction programs a challenge. At present we do not understand splice site context well enough to predict which potential splice sites will be used, and thus how the genomic sequences will be expressed.

Understanding the how and why of introns will require genome level information about splicing. One element of this will involve understanding splicing patterns and how they are regulated globally. Another element will involve understanding how splicing patterns change during evolution. To begin we study yeast, since it has the simplest known eukaryotic genome. In these pages we have listed known spliceosomal introns in the yeast genome and documented the splice sites actually used. Through the use of microarrays designed to monitor splicing, we are beginning to identify and analyze splice site context in terms of the nature and activities of the trans-acting factors that mediate splice site recognition.

In this edition (version 3.0), we include expression data that relates to the efficiency of splicing relative to other processes in strains of yeast lacking nonessential splicing factors. These data are displayed on each intron page for browsing and can be downloaded for other types of analysis."



76. AAindex: Amino Acid Index Database

URL: http://www.genome.jp/aaindex/
Categories: Protein Property Databases

AAindex is a database of amino acid indices and amino acid mutation matrices. An amino acid index is a set of 20 numerical values representing various physico­-chemical and biochemical properties of amino acids. An amino acid mutation matrix is generally 20 x 20 numerical values representing similarity of amino acids. AAindex consists of two sections: AAindex1 for the collection of published amino acid indices and AAindex2 for the collection of published amino acid mutation matrices. Each entry of either AAindex1 or AAindex2 consists of the definition, the reference information, a list of related entries in terms of the correlation coefficient and the actual data. The database may be accessed through the DBGET/LinkDB system at GenomeNet (http://www.genome.ad.jp/aaindex/ ) or may be downloaded by anonymous FTP (ftp://ftp.genome.ad.jp/db/genomenet/aaindex/ ).

Citation for the above abstract:
Kawashima, Shuichi, Kanehisa, Minoru
AAindex: Amino Acid index database
Nucl. Acids Res. 2000 28: 374-
© 2000 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/374



77. PFD: Protein Folding Database

URL: http://pfd.med.monash.edu.au/
Categories: Protein Property Databases, Protein Structure Databases

We have developed a new database that collects all protein folding data into a single, easily accessible public resource. The Protein Folding Database (PFD) contains annotated structural, methodological, kinetic and thermodynamic data for more than 50 proteins, from 39 families. A user-friendly web interface has been developed that allows powerful searching, browsing and information retrieval, whilst providing links to other protein databases. The database structure allows visualization of folding data in a useful and novel way, with a long-term aim of facilitating data mining and bioinformatics approaches. PFD can be accessed freely at http://pfd.med.monash.edu.au.

Citation for the above abstract:
Fulton, Kate F., Devlin, Glyn L., Jodun, Rachel A., Silvestri, Linda, Bottomley, Stephen P., Fersht, Alan R., Buckle, Ashley M.
PFD: a database for the investigation of protein folding kinetics and stability
Nucl. Acids Res. 2005 33: D279-283
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D279



78. ProTherm Thermodynamic Database for Proteins and Mutants

URL: http://gibk26.bse.kyutech.ac.jp/jouhou/protherm/protherm.html
Categories: Protein Property Databases

ProTherm and ProNIT are two thermodynamic databases that contain experimentally determined thermodynamic parameters of protein stability and protein-nucleic acid interactions, respectively. The current versions of both the databases have considerably increased the total number of entries and enhanced search interface with added new fields, improved search, display and sorting options. As on September 2005, ProTherm release 5.0 contains 17,113 entries from 771 proteins, retrieved from 1497 scientific articles (approximately 20% increase in data from the previous version). ProNIT release 2.0 contains 4900 entries from 273 research articles, representing 158 proteins. Both databases can be queried using WWW interfaces. Both quick search and advanced search are provided on this web page to facilitate easy retrieval and display of the data from these databases. ProTherm is freely available online at http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html and ProNIT at http://gibk26.bse.kyutech.ac.jp/jouhou/pronit/pronit.html.

Citation for the above abstract:
Kumar, M. D. Shaji, Bava, K. Abdulla, Gromiha, M. Michael, Prabakaran, Ponraj, Kitajima, Koji, Uedaira, Hatsuho, Sarai, Akinori
ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions
Nucl. Acids Res. 2006 34: D204-206
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D204



79. REFOLD: a Database for Protein Renaturation

URL: http://refold.med.monash.edu.au/
Categories: Protein Property Databases

A large proportion of proteins expressed in Escherichia coli form inclusion bodies and thus require renaturation to attain a functional conformation for analysis. In this process, identifying and optimizing the refolding conditions and methodology is often rate limiting. In order to address this problem, we have developed REFOLD, a web-accessible relational database containing the published methods employed in the refolding of recombinant proteins. Currently, REFOLD contains >300 entries, which are heavily annotated such that the database can be searched via multiple parameters. We anticipate that REFOLD will continue to grow and eventually become a powerful tool for the optimization of protein renaturation. REFOLD is freely available at http://refold.med.monash.edu.au.

Citation for the above abstract:
Chow, Michelle K. M., Amin, Abdullah A., Fulton, Kate F., Fernando, Thushan, Kamau, Lawrence, Batty, Chris, Louca, Michael, Ho, Storm, Whisstock, James C., Bottomley, Stephen P., Buckle, Ashley M.
The REFOLD database: a tool for the optimization of protein expression and refolding
Nucl. Acids Res. 2006 34: D207-212
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D207



80. LOCATE: a mouse protein subcellular localization database

URL: http://locate.imb.uq.edu.au/
Categories: Protein Localization and Targeting Databases

We present here LOCATE, a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of proteins from the FANTOM3 Isoform Protein Sequence set. Membrane organization is predicted by the high-throughput, computational pipeline MemO. The subcellular locations of selected proteins from this set were determined by a high-throughput, immunofluorescence-based assay and by manually reviewing >1700 peer-reviewed publications. LOCATE represents the first effort to catalogue the experimentally verified subcellular location and membrane organization of mammalian proteins using a high-throughput approach and provides localization data for 40% of the mouse proteome. It is available at http://locate.imb.uq.edu.au.

Citation for the above abstract:
Fink, J. Lynn, Aturaliya, Rajith N., Davis, Melissa J., Zhang, Fasheng, Hanson, Kelly, Teasdale, Melvena S., Kai, Chikatoshi, Kawai, Jun, Carninci, Piero, Hayashizaki, Yoshihide, Teasdale, Rohan D.
LOCATE: a mouse protein subcellular localization database
Nucl. Acids Res. 2006 34: D213-217
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D213



81. Proteome 2D-PAGE Database

URL: http://www.mpiib-berlin.mpg.de/2D-PAGE/
Categories: Proteomics Databases

"The Proteome 2D-PAGE Database is a curated database for storing and investigating proteomics data. The database currently contains about 2.500 identified spots and about 300 mass peaklists in 18 reference maps representing experiments from 13 different organisms."



82. Biozon

URL: http://biozon.org/
Categories: Protein Domain and Protein Classification Databases, Proteomics Databases

Biological entities are strongly related and mutually dependent on each other. Therefore, there is a growing need to corroborate and integrate data from different resources and aspects of biological systems in order to analyze them effectively. Biozon is a unified biological database that integrates heterogeneous data types such as proteins, structures, domain families, protein-protein interactions and cellular pathways, and establishes the relationships between them. All data are integrated on to a single graph schema centered around the non-redundant set of biological objects that are shared by each source. This integration results in a highly connected graph structure that provides a more complete picture of the known context of a given object that cannot be determined from any one source. Currently, Biozon integrates roughly 2 million protein sequences, 42 million DNA or RNA sequences, 32,000 protein structures, 150,000 interactions and more from sources such as GenBank, UniProt, Protein Data Bank (PDB) and BIND. Biozon augments source data with locally derived data such as 5 billion pairwise protein alignments and 8 million structural alignments. The user may form complex cross-type queries on the graph structure, add similarity relations to form fuzzy queries and rank the results based on analysis of the edge structure similar to Google PageRank, online at Biozon.org.

Citation for the above abstract:
Birkland, Aaron, Yona, Golan
BIOZON: a hub of heterogeneous biological data
Nucl. Acids Res. 2006 34: D235-242
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D235



83. DynaProt 2D: Proteome Database of Lactococcus lactis

URL: http://www.wzw.tum.de/proteomik/lactis/
Categories: Proteomics Databases

DynaProt 2D presents an advanced online database for dynamic access to proteomes and two-dimensional (2D) gels. The database was designed to administer complete in silico proteomes and links them with experimental proteomic data in the manner of 2D electrophoresis gels (IPG-Dalt). The 2D gels serve as reference maps in 2D gel analysis as well as tools for navigation of the database to switch between experimental and predicted data. Therefore, all identified spots in the gels are clickable and linked with summarized protein information. The protein information tables contain calculated characteristics, which are often used in proteomics, such as the molecular weight, isoelectric point, codon adaptation index, grand average of hydropathicity, etc. The design of the database permits online extension of gel data and protein attributes without knowledge of any software language. Besides navigation via 2D gels, the clear graphical user interface permits quick and intuitive searching throughout complete proteomes and supports, e.g. the search for proteins with isoelectric points within pH ranges of interest or protein classes (e.g. ribosomal proteins or transporters). The first organism implemented in the database is Lactococcus lactis. The database is available at www.wzw.tum.de/proteomik/lactis.

Citation for the above abstract:
Drews, Oliver, Gorg, Angelika
DynaProt 2D: an advanced proteomic database for dynamic online access to proteomes and two-dimensional electrophoresis gels
Nucl. Acids Res. 2005 33: D583-587
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D583



84. GELBANK

URL: http://gelbank.anl.gov/
Categories: Proteomics Databases

GELBANK is a publicly available database of two-dimensional gel electrophoresis (2DE) gel patterns of proteomes from organisms with known genome information (available at http://gelbank.anl.gov and ftp://bioinformatics.anl.gov/gelbank/). Currently it includes 131 completed, mostly microbial proteomes available from the National Center for Biotechnology Information. A web interface allows the upload of 2D gel patterns and their annotation for registered users. The images are organized by species, tissue type, separation method, sample type and staining method. The database can be queried based on protein or 2DE-pattern attributes. A web interface allows registered users to assign molecular weight and pH gradient profiles to their own 2D gel patterns as well as to link protein identifications to a given spot on the pattern. The website presents all of the submitted 2D gel patterns where the end-user can dynamically display the images or parts of images along with molecular weight, pH profile information and linked protein identification. A collection of images can be selected for the creation of animations from which the user can select sub-regions of interest and unlimited 2D gel patterns for visualization. The website currently presents 233 identifications for 81 gel patterns for Homo sapiens, Methanococcus jannaschii, Pyro coccus furiosus, Shewanella oneidensis, Escherichia coli and Deinococcus radiodurans.

Citation for the above abstract:
Babnigg, Gyorgy, Giometti, Carol S.
GELBANK: a database of annotated two-dimensional gel electrophoresis patterns of biological systems with completed genomes
Nucl. Acids Res. 2004 32: D582-585
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D582



85. OPD: Open Proteomics Database

URL: http://bioinformatics.icmb.utexas.edu/OPD/
Categories: Proteomics Databases

"OPD is a public database for storing and disseminating mass spectrometry based proteomics data. The database currently contains roughly 1,200,000 spectra representing experiments from 4 different organisms."



86. PEP: Predictions for Entire Proteomes

URL: http://cubic.bioc.columbia.edu/pep/
Categories: General Genomics Databases, Proteomics Databases

PEP is a database of Predictions for Entire Proteomes. The database contains summaries of analyses of protein sequences from a range of organisms representing all three major kingdoms of life: eukaryotes, prokaryotes and archaea. All proteins publicly available for organisms were aligned against SWISS-PROT, TrEMBL and PDB. Additionally, the following annotations are provided: secondary structure, transmembrane helices, coiled coils, regions of low complexity, signal peptides, PROSITE motifs, nuclear localization signals and classes of cellular function. Proteins that contain long regions without regular secondary structure are also identified. We have produced a related database of structural domain-like fragments derived from PEP and clusters based on homology between all fragments. The PEP database, fragments and clusters are distributed freely as a set of flat files and have been integrated into SRS. The PEP group of databases can be accessed from: http://cubic.bioc.columbia.edu/pep.

Citation for the above abstract:
Carter, Phil, Liu, Jinfeng, Rost, Burkhard
PEP: Predictions for Entire Proteomes
Nucl. Acids Res. 2003 31: 410-413
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/410



87. plantMarkers: a Database of Predicted Molecular Markers

URL: http://markers.btk.fi/
Categories: Proteomics Databases

Molecular markers are required in a broad spectrum of gene screening approaches, ranging from gene-mapping within traditional ‘forward’-genetics approaches through QTL identification studies to genotyping and haplotyping studies. As we enter the post-genomics era, the need for genetic markers does not diminish, even in the species with fully sequenced genomes. PlantMarkers is a genetic marker database that contains a comprehensive pool of predicted molecular markers. We have adopted contemporary techniques to identify putative single nucleotide polymorphism (SNP), simple sequence repeat (SSR) and conserved orthologue set markers. A systematic approach to identify as broad a range of putative markers has been undertaken by screening the available openSputnik unigene consensus sequences from over 50 plant species. A web presence at http://markers.btk.fi provides functionality so that a user may search for species-specific markers on the basis of many specific criteria not limited to non-synonymous SNPs segregating between different varieties or measured polymorphic SSRs. Feedback forms are provided with all sequence entries to enable inclusion of, for example, map location for markers validated by the research community.

Citation for the above abstract:
Rudd, Stephen, Schoof, Heiko, Mayer, Klaus
PlantMarkers--a database of predicted molecular markers from plants
Nucl. Acids Res. 2005 33: D628-632
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D628



88. RESID Database

URL: http://www-nbrf.georgetown.edu/pirwww/dbinfo/resid.html
Categories: Protein Structure Databases, Proteomics Databases

The RESID Database is a comprehensive collection of annotations and structures for protein pre-, co- and post-translational modifications including amino-terminal, carboxyl-terminal and peptide chain cross-link modifications. The RESID Database includes: systematic and alternate names, atomic formulas and masses, enzyme activities generating the modifications, keywords, literature citations, Gene Ontology cross-references, Protein Information Resource (PIR) and SWISS-PROT protein sequence database feature table annotations, structure diagrams and molecular models. This database is freely accessible on the Internet through the European Bioinformatics Institute at http://srs.ebi.ac.uk/srs6bin/cgi-bin/wgetz?-page+LibInfo+-lib+RESID, through the National Cancer Institute — Frederick Advanced Biomedical Computing Center at http://www.ncifcrf.gov/RESID, or through the Protein Information Resource at http://pir.georgetown.edu/pirwww/dbinfo/resid.html.

Citation for the above abstract:
Garavelli, John S.
The RESID Database of Protein Modifications: 2003 developments
Nucl. Acids Res. 2003 31: 499-501
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/499



89. SWISS-2DPAGE: Two-dimensional Polyacrylamide Gel Electrophoresis Database

URL: http://www.expasy.org/ch2d/
Categories: Proteomics Databases

SWISS-2DPAGE (http://www.expasy.ch/ch2d/ ) is an annotated two-dimensional polyacrylamide gel electro­phoresis (2-DE) database established in 1993. The current release contains 24 reference maps from human and mouse biological samples, as well as from Saccharomyces cerevisiae, Escherichia coli and Dictyostelium discoideum origin. These reference maps have now 2824 identified spots, corresponding to 614 separate protein entries in the database, in addition to virtual entries for each SWISS-PROT sequence or any user-entered amino acids sequence. Last year improvements in the SWISS-2DPAGE database are as follows: three new maps have been created and several others have been updated; cross-references to newly built federated 2-DE databases have been added; new functions to access the data have been provided through the ExPASy proteomics server.

Citation for the above abstract:
Hoogland, Christine, Sanchez, Jean-Charles, Tonella, Luisa, Binz, Pierre-Alain, Bairoch, Amos, Hochstrasser, Denis F., Appel, Ron D.
The 1999 SWISS-2DPAGE database update
Nucl. Acids Res. 2000 28: 286-288
© 1999 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/286



90. BRENDA: BRaunschweig ENzyme DAtabase

URL: http://www.brenda.uni-koeln.de/
Categories: Enzyme and Enzyme Nomenclature Databases

BRENDA (BRaunschweig ENzyme DAtabase) represents a comprehensive collection of enzyme and metabolic information, based on primary literature. The database contains data from at least 83,000 different enzymes from 9800 different organisms, classified in approximately 4200 EC numbers. BRENDA includes biochemical and molecular information on classification and nomenclature, reaction and specificity, functional parameters, occurrence, enzyme structure, application, engineering, stability, disease, isolation and preparation, links and literature references. The data are extracted and evaluated from approximately 46,000 references, which are linked to PubMed as long as the reference is cited in PubMed. In the past year BRENDA has undergone major changes including a large increase in updating speed with >50% of all data updated in 2002 or in the first half of 2003, the development of a new EC-tree browser, a taxonomy-tree browser, a chemical substructure search engine for ligand structure, the development of controlled vocabulary, an ontology for some information fields and a thesaurus for ligand names. The database is accessible free of charge to the academic community at http://www.brenda. uni-koeln.de.

Citation for the above abstract:
Schomburg, Ida, Chang, Antje, Ebeling, Christian, Gremse, Marion, Heldt, Christian, Huhn, Gregor, Schomburg, Dietmar
BRENDA, the enzyme database: updates and major new developments
Nucl. Acids Res. 2004 32: D431-433
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D431



91. ENZYME: Enzyme Nomenclature Database

URL: http://www.expasy.org/enzyme/
Categories: Enzyme and Enzyme Nomenclature Databases

The ENZYME database is a repository of information related to the nomenclature of enzymes. In recent years it has became an indispensable resource for the development of metabolic databases. The current version contains information on 3705 enzymes. It is available through the ExPASy WWW server (http://www.expasy.ch/enzyme/ ).

Citation for the above abstract:
Bairoch, Amos
The ENZYME database in 2000
Nucl. Acids Res. 2000 28: 304-305
© 2004 Oxford University Press.



The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/304



92. Enzyme Nomenclature

URL: http://www.chem.qmul.ac.uk/iubmb/enzyme/
Categories: Enzyme and Enzyme Nomenclature Databases

"The complete contents of Enzyme Nomenclature, 1992 (plus subsequent supplements and other changes) are listed below in enzyme number order giving just the recommended name. Each entry provides a link to details of that enzyme. Alternatively if looking for a specific reaction used in the classification of enzymes the broad outline defined by the first two numbers are given below. Each of these subclass entries is linked to a location where the category is subdivided to sub-subclasses. These in turn are linked to a list of recommended names for each enzyme in the sub-subclass."



93. IntEnz

URL: http://www.ebi.ac.uk/intenz/index.html
Categories: Enzyme and Enzyme Nomenclature Databases

IntEnz is the name for the Integrated relational Enzyme database and is the official version of the Enzyme Nomenclature. The Enzyme Nomenclature comprises recommendations of the Nomenclature Committee of the International Union of Bio chemistry and Molecular Biology (NC-IUBMB) on the nomenclature and classification of enzyme-catalysed reactions. IntEnz is supported by NC-IUBMB and contains enzyme data curated and approved by this committee. The database IntEnz is available at http://www.ebi.ac.uk/intenz.

Citation for the above abstract:
Fleischmann, Astrid, Darsow, Michael, Degtyarenko, Kirill, Fleischmann, Wolfgang, Boyce, Sinead, Axelsen, Kristian B., Bairoch, Amos, Schomburg, Dietmar, Tipton, Keith F., Apweiler, Rolf
IntEnz, the integrated relational enzyme database
Nucl. Acids Res. 2004 32: D434-437
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D434



94. PDBrtf

URL: http://cgl.imim.es/pdbrtf/
Categories: Enzyme and Enzyme Nomenclature Databases

"Representativity of Target Families in the Protein Data Bank."



95. PRECISE: Predicted and Consensus Interaction Sites in Enzymes

URL: http://precise.bu.edu/precisedb/
Categories: Enzyme and Enzyme Nomenclature Databases

PRECISE (Predicted and Consensus Interaction Sites in Enzymes) is a database of interactions between the amino acid residues of an enzyme and its ligands (substrate and transition state analogs, cofactors, inhibitors and products). It is available online at http://precise.bu.edu/. In the current version, all information on interactions is extracted from the enzyme–ligand complexes in the Protein Data Bank (PDB) by performing the following steps: (i) clustering homologous enzyme chains such that, in each cluster, the proteins have the same EC number and all sequences are similar; (ii) selecting a representative chain for each cluster; (iii) selecting ligand types; (iv) finding non-bonded interactions and hydrogen bonds; and (v) summing the interactions for all chains within the cluster. The output of the search is the color-coded sequence of the representative. The colors indicate the total number of interactions found at each amino acid position in all chains of the cluster. Clicking on a residue displays a detailed list of interactions for that residue. Optional filters allow restricting the output to selected chains in the cluster, to non-bonded or hydrogen bonding interactions, and to selected ligand types. The binding site information is essential for understanding and altering substrate specificity and for the design of enzyme inhibitors.

Citation for the above abstract:
Sheu, Shu-Hsien, Lancia, David R., Jr, Clodfelter, Karl H., Landon, Melissa R., Vajda, Sandor
PRECISE: a Database of Predicted and Consensus Interaction Sites in Enzymes
Nucl. Acids Res. 2005 33: D206-211
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D206



96. SCOPEC: a Database of Protein Catalytic Domains

URL: http://www.enzome.com/databases/scopec.php
Categories: Enzyme and Enzyme Nomenclature Databases

MOTIVATION: Domains are the units of protein structure, function and evolution. It is therefore essential to utilize knowledge of domains when studying the evolution of function, or when assigning function to genome sequence data. For this purpose, we have developed a database of catalytic domains, SCOPEC, by combining structural domain information from SCOP, full-length sequence information from Swiss-Prot, and verified functional information from the Enzyme Classification (EC) database. Two major problems need to be overcome to create a database of domain-function relationships; (1) for sequences, EC numbers are typically assigned to whole sequences rather than the functional unit, and (2) The Protein Data Bank (PDB) structures elucidated from a larger multi-domain protein will often have EC annotation although the relevant catalytic domain may lie elsewhere. RESULTS: SCOPEC entries have high quality enzyme assignments; having passed both computational and manual checks. SCOPEC currently contains entries for 75% of all EC annotations in the PDB. Overall, EC number is fairly well conserved within a superfamily, even when the proteins are distantly related. Initial analysis is encouraging; suggesting that there is a 50:50 chance of conserved function in distant homologues first detected by a third iteration PSI-BLAST search. Therefore, we envisage that a knowledge-based approach to function assignment using the domain-EC relationships in SCOPEC will gain a marked improvement over this base line. AVAILABILITY: The SCOPEC database is a valuable resource in the analysis and prediction of protein structure and function. It can be obtained or queried at our website http://www.enzome.com

Citation for the above abstract:
Richard A. George , Ruth V. Spriggs , Janet M. Thornton , Bissan Al-Lazikani , and Mark B. Swindells
SCOPEC: a database of protein catalytic domains
Bioinformatics 20: i130-i136.
© 2004 Oxford University Press.


The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/20/suppl_1/i130



97. TECRDB: Thermodynamics of Enzyme-catalyzed Reactions Database

URL: http://xpdb.nist.gov/enzyme_thermodynamics/
Categories: Enzyme and Enzyme Nomenclature Databases

Summary: The Thermodynamics of Enzyme-catalyzed Reactions Database (TECRDB) is a comprehensive collection of thermodynamic data on enzyme-catalyzed reactions. The data, which consist of apparent equilibrium constants and calorimetrically determined molar enthalpies of reaction, are the primary experimental results obtained from thermodynamic studies of biochemical reactions. The results from 1000 published papers containing data on 400 different enzyme-catalyzed reactions constitute the essential information in the database. The information is managed using Oracle and is available on the Web.

Citation for the above abstract:
Robert N. Goldberg , Yadu B. Tewari , and Talapady N. Bhat
Thermodynamics of enzyme-catalyzed reactions—a database for quantitative biochemistry
Bioinformatics Advance Access published on November 1, 2004, DOI 10.1093/bioinformatics/bth314.
Bioinformatics 20: 2874-2877.
© 2004 Oxford University Press.


The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/16/2874



98. BioCarta: Charting Pathways of Life

URL: http://www.biocarta.com/genes/
Categories: Intermolecular Interactions and Signaling Pathways Databases, Metabolic Pathway Databases

"Observe how genes interact in dynamic graphical models. Our online maps depict molecular relationships from areas of active research. In an "open source" approach, this community-fed forum constantly integrates emerging proteomic information from the scientific community. It also catalogs and summarizes important resources providing information for over 120,000 genes from multiple species. Find both classical pathways as well as current suggestions for new pathways."



99. BioCyc

URL: http://biocyc.org/
Categories: Metabolic Pathway Databases

"The BioCyc collection of databases provides electronic reference sources on the pathways and genomes of different organisms. Currently, detailed organism-specific databases are available for 14 species. In addition, the MetaCyc metabolic pathway database contains literature-derived metabolic pathway data for 160 species.

Scientists can use BioCyc databases to visualize the layout of genes within a chromosome, or of an individual biochemical reaction, or of a complete biochemical pathway. The structures of chemical compounds can be displayed in pathways and reactions. The navigation capabilities of the software allow a user to move from a display of an enzyme to a display of a reaction that the enzyme catalyzes, or to the gene that encodes the enzyme. The interface supports a variety of queries, such as generating a display of the map positions of all genes that code for enzymes within a given biochemical pathway. As well as being used as a reference source to look up individual facts, BioCyc databases support computational studies of the metabolism, such as design of novel biochemical pathways for biotechnology, studies of the evolution of metabolic pathways, and simulation of metabolic pathways.

BioCyc is linked to other biological databases containing protein and nucleic-acid sequence data, bibliographic data, protein structures, and descriptions of different strains."



100. BioSilico: An Integrated Metabolic Database System

URL: http://biosilico.kaist.ac.kr/
Categories: Metabolic Pathway Databases

BioSilico is a web-based database system that facilitates the search and analysis of metabolic pathways. Heterogeneous metabolic databases including LIGAND, ENZYME, EcoCyc and MetaCyc are integrated in a systematic way, thereby allowing users to efficiently retrieve the relevant information on enzymes, biochemical compounds and reactions. In addition, it provides well-designed view pages for more detailed summary information. BioSilico is developed as an extensible system with a robust systematic architecture.

Citation for the above abstract:
Bo Kyeng Hou , Jin Sik Kim , Ji Hoon Jun , Dong-Yup Lee , Yong Wook Kim , Sujin Chae , Mira Roh , Yong-Ho In , and Sang Yup Lee
BioSilico: an integrated metabolic database system
Bioinformatics Advance Access published on November 22, 2004, DOI 10.1093/bioinformatics/bth363.
Bioinformatics 20: 3270-3272.
© 2004 Oxford University Press.


The full abstract can be found at: http://bioinformatics.oupjournals.org/cgi/content/abstract/20/17/3270



101. BRITE: Biomolecular Relations in Information Transmission and Expression

URL: http://www.genome.jp/brite/
Categories: Metabolic Pathway Databases

"BRITE is a database of binary relations for network computation and logical reasoning involving genes, proteins, and other biological molecules. It contains diverse sets of binary relations, including the generalized protein interactions that underlie the KEGG pathway diagrams, systematic experimental data on protein-protein interactions by yeast two-hybrid systems, expression similarity relations by microarray gene expression profiles, cross-reference links between database entries, and parent-child relations in the hierarchies of terminology (ontologies). The BRITE project is supported by the Institute for Bioinformatics Research and Develpment (BIRD) of the Japan Science and Technology Agency (JST) and also by a Grant-in-Aid for Scientific Research in Priority Areas from the Ministry of Education, Culture, Sports, Science and Technology (MEXT)."



102. BSD: the Biodegradative Strain Database

URL: http://bsd.cme.msu.edu/bsd/index.html
Categories: Drug and Drug Design Databases, Metabolic Pathway Databases

The Biodegradative Strain Database (BSD) is a freely-accessible, web-based database providing detailed information on degradative bacteria and the hazardous substances that they degrade, including corresponding literature citations, relevant patents and links to additional web-based biological and chemical data. The BSD (http://bsd.cme.msu.edu) is being developed within the phylogenetic framework of the Ribosomal Database Project II (RDPII: http://rdp.cme.msu.edu/html) to provide a biological complement to the chemical and degradative pathway data of the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD: http://umbbd.ahc.umn.edu). Data is accessible through a series of strain, chemical and reference lists or by keyword search. The web site also includes on-line data submission and user survey forms to solicit user contributions and suggestions. The current release contains information on over 250 degradative bacterial strains and 150 hazardous substances. The transformation of xenobiotics and other environmentally toxic compounds by microorganisms is central to strategies for biocatalysis and the bioremediation of contaminated environments. However, practical, comprehensive, strain-level information on biocatalytic/biodegradative microbes is not readily available and is often difficult to compile. Similarly, for any given environmental contaminant, there is no single resource that can provide comparative information on the array of identified microbes capable of degrading the chemical. A web site that consolidates and cross-references strain, chemical and reference data related to biocatalysis, biotransformation, biodegradation and bioremediation would be an invaluable tool for academic and industrial researchers and environmental engineers.

Citation for the above abstract:
Urbance, John W., Cole, James, Saxman, Paul, Tiedje, James M.
BSD: the Biodegradative Strain Database
Nucl. Acids Res. 2003 31: 152-155
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/152



103. KEGG: Kyoto Encyclopedia of Genes and Genomes

URL: http://www.genome.jp/kegg/
Categories: General Genomics Databases, Metabolic Pathway Databases

The increasing amount of genomic and molecular information is the basis for understanding higher-order biological systems, such as the cell and the organism, and their interactions with the environment, as well as for medical, industrial and other practical applications. The KEGG resource (http://www.genome.jp/kegg/) provides a reference knowledge base for linking genomes to biological systems, categorized as building blocks in the genomic space (KEGG GENES) and the chemical space (KEGG LIGAND), and wiring diagrams of interaction networks and reaction networks (KEGG PATHWAY). A fourth component, KEGG BRITE, has been formally added to the KEGG suite of databases. This reflects our attempt to computerize functional interpretations as part of the pathway reconstruction process based on the hierarchically structured knowledge about the genomic, chemical and network spaces. In accordance with the new chemical genomics initiatives, the scope of KEGG LIGAND has been significantly expanded to cover both endogenous and exogenous molecules. Specifically, RPAIR contains curated chemical structure transformation patterns extracted from known enzymatic reactions, which would enable analysis of genome-environment interactions, such as the prediction of new reactions and new enzyme genes that would degrade new environmental compounds. Additionally, drug information is now stored separately and linked to new KEGG DRUG structure maps.

Citation for the above abstract:
Kanehisa, Minoru, Goto, Susumu, Hattori, Masahiro, Aoki-Kinoshita, Kiyoko F., Itoh, Masumi, Kawashima, Shuichi, Katayama, Toshiaki, Araki, Michihiro, Hirakawa, Mika
From genomics to chemical genomics: new developments in KEGG
Nucl. Acids Res. 2006 34: D354-357
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D354



104. Klotho: Biochemical Compounds Declarative Database

URL: http://www.biocheminfo.org/klotho/
Categories: Metabolic Pathway Databases, Small Molecule Structure Databases

"A sufficiently realistic large-scale model needs to include detailed information on both the structure of the molecular parts and their functions in biochemical reactions. We are developing representations for molecules and reaction biochemistry for use in databases of biochemical function. Our approach is to capture the 'natural language' of biochemistry in a layered graph grammar, Klotho, which permits interconversion among a family of equivalent representations for compounds, and then operate on these with rules which express chemical and mechanistic aspects of the biochemical reaction (Atropos)."



105. KEGG LIGAND Database

URL: http://www.genome.jp/ligand/
Categories: Metabolic Pathway Databases, Small Molecule Structure Databases

LIGAND is a composite database comprising three sections: COMPOUND for the information about metabolites and other chemical compounds, REACTION for the collection of substrate-product relations representing metabolic and other reactions, and ENZYME for the information about enzyme molecules. The current release (as of September 7, 2001) includes 7298 compounds, 5166 reactions and 3829 enzymes. In addition to the keyword search provided by the DBGET/LinkDB system, a substructure search to the COMPOUND and REACTION sections is now available through the World Wide Web (http://www.genome.ad.jp/ligand/). LIGAND may be also downloaded by anonymous FTP (ftp://ftp.genome.ad.jp/pub/kegg/ligand/).

Citation for the above abstract:
Goto, Susumu, Okuno, Yasushi, Hattori, Masahiro, Nishioka, Takaaki, Kanehisa, Minoru
LIGAND: database of chemical compounds and reactions in biological pathways
Nucl. Acids Res. 2002 30: 402-404
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/402



106. MetaCyc Encyclopedia of Metabolic Pathways

URL: http://metacyc.org/
Categories: General Genomics Databases, Metabolic Pathway Databases

MetaCyc is a database of metabolic pathways and enzymes located at http://MetaCyc.org/. Its goal is to serve as a metabolic encyclopedia, containing a collection of non-redundant pathways central to small molecule metabolism, which have been reported in the experimental literature. Most of the pathways in MetaCyc occur in microorganisms and plants, although animal pathways are also represented. MetaCyc contains metabolic pathways, enzymatic reactions, enzymes, chemical compounds, genes and review-level comments. Enzyme information includes substrate specificity, kinetic properties, activators, inhibitors, cofactor requirements and links to sequence and structure databases. Data are curated from the primary literature by curators with expertise in biochemistry and molecular biology. MetaCyc serves as a readily accessible comprehensive resource on microbial and plant pathways for genome analysis, basic research, education, metabolic engineering and systems biology. Querying, visualization and curation of the database is supported by SRI's Pathway Tools software. The PathoLogic component of Pathway Tools is used in conjunction with MetaCyc to predict the metabolic network of an organism from its annotated genome. SRI and the European Bioinformatics Institute employed this tool to create pathway/genome databases (PGDBs) for 165 organisms, available at the BioCyc.org website. These PGDBs also include predicted operons and pathway hole fillers.

Citation for the above abstract:
Caspi, Ron, Foerster, Hartmut, Fulcher, Carol A., Hopkinson, Rebecca, Ingraham, John, Kaipa, Pallavi, Krummenacker, Markus, Paley, Suzanne, Pick, John, Rhee, Seung Y., Tissier, Christophe, Zhang, Peifen, Karp, Peter D.
MetaCyc: a multiorganism database of metabolic pathways and enzymes
Nucl. Acids Res. 2006 34: D511-516
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D511



107. Metagrowth

URL: http://igs-server.cnrs-mrs.fr/axenic/
Categories: Metabolic Pathway Databases, Prokaryote Databases

Metagrowth is a new type of knowledge base developed to guide the experimental studies of culture conditions of obligate parasitic bacteria. We have gathered biological evidences giving possible clues to the development of the axenic (i.e. 'cell-free') growth of obligate parasites from various sources including published literature, genomic sequence information, metabolic databases and transporter databases. The database entries are composed of those evidences and specific hypotheses derived from them. Currently, 200 entries are available for Rickettsia prowazekii, Rickettsia conorii, Tropheryma whipplei, Treponema pallidum, Mycobacterium tuberculosis and Coxiella burnetii. The web interface of Metagrowth helps users to design new axenic culture media eventually suitable for those bacteria. Metagrowth is accessible at http://igs-server.cnrs-mrs.fr/axenic/.

Citation for the above abstract:
Ogata, Hiroyuki, Claverie, Jean-Michel
Metagrowth: a new resource for the building of metabolic hypotheses in microbiology
Nucl. Acids Res. 2005 33: D321-324
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D321



108. PathDB

URL: http://www.ncgr.org/pathdb/
Categories: Metabolic Pathway Databases

"PathDB is both a data repository and a system for building and visualizing cellular networks targeted for the gene expression, proteomics, and metabolic profiling communities. Uses include finding all pathways and phenotypes associated with genes in a cluster or validating computational predicted associations with known biological data.

Innovations with the data model resulted in a progression from a concrete model of metabolism, primarily supporting curation from literature, to an abstract model for all kinds of cellular function, e.g. signal cascade, gene regulatory, protein-protein interaction, and protein-small molecule binding data, as well as metabolism. Leveraging off the new data model, concentration shifted to importing large data-sets which drove the development of our Import Framework and an elegant solution to 'publish and subscribe' for data warehousing. Researchers can now more easily focus and combine data of interest with our flexible data model and Import Framework innovations and file up-load capabilities.

NCGR's current public pathways database houses curated Arabidopsis literature, Gene Ontology data, and data from currently published large-scale experiments in yeast with transcriptional binding factors from Richard Young's lab at MIT staged for addition."



109. UM-BBD: the University of Minnesota Biocatalysis/Biodegradation Database

URL: http://umbbd.ahc.umn.edu/
Categories: Metabolic Pathway Databases

As the University of Minnesota Biocatalysis/Biodegradation Database (UM-BBD, http://umbbd.ahc.umn.edu/) starts its second decade, it includes information on over 900 compounds, over 600 enzymes, nearly 1000 reactions and about 350 microorganism entries. Its Biochemical Periodic Tables have grown to include biological information for almost all stable, non-noble-gas elements (http://umbbd.ahc.umn.edu/periodic/). Its Pathway Prediction System (PPS) (http://umbbd.ahc.umn.edu/predict/) is now an internationally recognized, open system for predicting microbial catabolism of organic compounds. Graphical display of PPS rules, a stand-alone version of the PPS and guidance for PPS users are being developed. The next decade should see the PPS, and the UM-BBD on which it is based, find increasing use by national and international government agencies, commercial organizations and educational institutions.

Citation for the above abstract:
Ellis, Lynda B. M., Roe, Dave, Wackett, Lawrence P.
The University of Minnesota Biocatalysis/Biodegradation Database: the first decade
Nucl. Acids Res. 2006 34: D517-521
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D517



110. AffinDB: Affinity database for protein-ligand complexes

URL: http://www.agklebe.de/affinity
Categories: Drug and Drug Design Databases, Intermolecular Interactions and Signaling Pathways Databases

AffinDB is a database of affinity data for structurally resolved protein–ligand complexes from the Protein Data Bank (PDB). It is freely accessible at http://www.agklebe.de/affinity. Affinity data are collected from the scientific literature, both from primary sources describing the original experimental work of affinity determination and from secondary references which report affinity values determined by others. AffinDB currently contains over 730 affinity entries covering more than 450 different protein–ligand complexes. Besides the affinity value, PDB summary information and additional data are provided, including the experimental conditions of the affinity measurement (if available in the corresponding reference); 2D drawing, SMILES code and molecular weight of the ligand; links to other databases, and bibliographic information. AffinDB can be queried by PDB code or by any combination of affinity range, temperature and pH value of the measurement, ligand molecular weight, and publication data (author, journal and year). Search results can be saved as tabular reports in text files. The database is supposed to be a valuable resource for researchers interested in biomolecular recognition and the development of tools for correlating structural data with affinities, as needed, for example, in structure-based drug design.

Citation for the above abstract:
Block, Peter, Sotriffer, Christoph A., Dramburg, Ingo, Klebe, Gerhard
AffinDB: a freely accessible database of affinities for protein-ligand complexes from the PDB
Nucl. Acids Res. 2006 34: D522-526
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D522



111. WIT: What Is There

URL: http://www-wit.mcs.anl.gov/wit3/
Categories: General Genomics Databases, Metabolic Pathway Databases

The WIT (What Is There) (http://wit.mcs.anl.gov/WIT2/ ) system has been designed to support comparative analysis of sequenced genomes and to generate metabolic reconstructions based on chromosomal sequences and metabolic modules from the EMP/MPW family of databases. This system contains data derived from about 40 completed or nearly completed genomes. Sequence homologies, various ORF-clustering algorithms, relative gene positions on the chromosome and placement of gene products in metabolic pathways (metabolic reconstruction) can be used for the assignment of gene functions and for development of overviews of genomes within WIT. The integration of a large number of phylogenetically diverse genomes in WIT facilitates the understanding of the physiology of different organisms.

Citation for the above abstract:
Overbeek, Ross, Larsen, Niels, Pusch, Gordon D., D'Souza, Mark, Jr, Evgeni Selkov, Kyrpides, Nikos, Fonstein, Michael, Maltsev, Natalia, Selkov, Evgeni
WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction
Nucl. Acids Res. 2000 28: 123-125
© 2000 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/123



112. ACTIVITY

URL: http://wwwmgs.bionet.nsc.ru/mgs/systems/activity/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

MOTIVATION: The goal of the work was to develop a WWW-oriented computer system providing a maximal integration of informational and software resources on the regulation of gene expression and navigation through them. Rapid growth of the variety and volume of information accumulated in the databases on regulation of gene expression necessarily requires the development of computer systems for automated discovery of the knowledge that can be further used for analysis of regulatory genomic sequences. RESULTS: The GeneExpress system developed includes the following major informational and software modules: (1) Transcription Regulation (TRRD) module, which contains the databases on transcription regulatory regions of eukaryotic genes and TRRD Viewer for data visualization; (2) Site Activity Prediction (ACTIVITY), the module for analysis of functional site activity and its prediction; (3) Site Recognition module, which comprises (a) B-DNA-VIDEO system for detecting the conformational and physicochemical properties of DNA sites significant for their recognition, (b) Consensus and Weight Matrices (ConsFrec) and (c) Transcription Factor Binding Sites Recognition (TFBSR) systems for detecting conservative contextual regions of functional sites and their recognition; (4) Gene Networks (GeneNet), which contains an object-oriented database accumulating the data on gene networks and signal transduction pathways, and the Java-based Viewer for exploration and visualization of the GeneNet information; (5) mRNA Translation (Leader mRNA), designed to analyze structural and contextual properties of mRNA 5'-untranslated regions (5'-UTRs) and predict their translation efficiency; (6) other program modules designed to study the structure-function organization of regulatory genomic sequences and regulatory proteins. AVAILABILITY: GeneExpress is available at http://wwwmgs.bionet.nsc. ru/systems/GeneExpress/ and the links to the mirror site(s) can be found at http://wwwmgs.bionet.nsc.ru/mgs/links/mirrors.html+ ++.

Citation for the above abstract:
NA Kolchanov , MP Ponomarenko , AS Frolov , EA Ananko , FA Kolpakov , EV Ignatieva , OA Podkolodnaya , TN Goryachkovskaya , IL Stepanenko , TI Merkulova , VV Babenko , YV Ponomarenko , AV Kochetov , NL Podkolodny , DV Vorobiev , SV Lavryushev , DA Grigorovich , YV Kondrakhin , L Milanesi , E Wingender , V Solovyev , and GC Overton
Integrated databases and computer systems for studying eukaryotic gene expression
Bioinformatics 15: 669-686.
© 1999 Oxford University Press.


The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/15/7/669



113. AGRIS: Arabidopsis Gene Regulatory Information Server

URL: http://arabidopsis.med.ohio-state.edu/
Categories: Arabidopsis thaliana Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

BACKGROUND: The gene regulatory information is hardwired in the promoter regions formed by cis-regulatory elements that bind specific transcription factors (TFs). Hence, establishing the architecture of plant promoters is fundamental to understanding gene expression. The determination of the regulatory circuits controlled by each TF and the identification of the cis-regulatory sequences for all genes have been identified as two of the goals of the Multinational Coordinated Arabidopsis thaliana Functional Genomics Project by the Multinational Arabidopsis Steering Committee (June 2002). RESULTS: AGRIS is an information resource of Arabidopsis promoter sequences, transcription factors and their target genes. AGRIS currently contains two databases, AtTFDB (Arabidopsis thaliana transcription factor database) and AtcisDB (Arabidopsis thaliana cis-regulatory database). AtTFDB contains information on approximately 1,400 transcription factors identified through motif searches and grouped into 34 families. AtTFDB links the sequence of the transcription factors with available mutants and, when known, with the possible genes they may regulate. AtcisDB consists of the 5' regulatory sequences of all 29,388 annotated genes with a description of the corresponding cis-regulatory elements. Users can search the databases for (i) promoter sequences, (ii) a transcription factor, (iii) a direct target genes for a specific transcription factor, or (vi) a regulatory network that consists of transcription factors and their target genes. CONCLUSION: AGRIS provides the necessary software tools on Arabidopsis transcription factors and their putative binding sites on all genes to initiate the identification of transcriptional regulatory networks in the model dicotyledoneous plant Arabidopsis thaliana. AGRIS can be accessed from http://arabidopsis.med.ohio-state.edu.

Citation for the above abstract:
Ramana V Davuluri, Hao Sun, Saranyan K Palaniswamy, Nicole Matthews, Carlos Molina, Mike Kurtz, and Erich Grotewold
AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors
BMC Bioinformatics 2003, 4:25; doi:10.1186/1471-2105-4-25
© 2003 By the Authors.


The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/4/25



114. ASPD: Artificial Selected Proteins/Peptides Database

URL: http://wwwmgs.bionet.nsc.ru/mgs/gnw/aspd/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

ASPD is a new curated database that incorporates data on full-length proteins, protein domains and peptides that were obtained through in vitro directed evolution processes (mainly by means of phage display). At present, the ASPD database contains data on 195 selection experiments, which were described in 112 original papers. For each experiment, the following information is given: (i) description of the target for binding, (ii) description of the protein or peptide which serves as the template for library construction and description of the native protein which binds the target, (iii) links to the major proteomic databases (SWISS-PROT, PDB, PROSITE and ENZYME), (iv) keywords referring to the biological significance of the experiment, (v) aligned sequences of proteins or peptides retrieved through in vitro evolution and relevant native or constructed sequences, (vi) the number of rounds of selection/amplification and (vii) the number of occurrences of clones with each sequence. The literature data include a full reference, a link to the MEDLINE database and the name of the corresponding author with his email address. ASPD has a user-friendly interface which allows for simple queries using the names of proteins and ligands, as well as keywords describing the biological role of the interaction studied, and also for queries based on authors' names. It is also possible to access the database by means of the SRS system, allowing complex queries. There is a BLAST search tool against the ASPD for looking directly for homologous sequences. Research tools of the ASPD allow the analysis of pairwise correlations in the sequences of proteins and peptides selected against one target. The URL for the ASPD database is http://www.sgi.sscc.ru/mgs/gnw/aspd/.

Citation for the above abstract:
Valuev, Vadim P., Afonnikov, Dmitry A., Ponomarenko, Mikhail P., Milanesi, Luciano, Kolchanov, Nikolay A.
ASPD (Artificially Selected Proteins/Peptides Database): a database of proteins and peptides evolved in vitro
Nucl. Acids Res. 2002 30: 200-202
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/200



115. Cancer Chromosomes

URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=cancerchromosomes
Categories: Cancer Databases, Human Genome Databases, Maps, and Viewers, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

"Three databases, the NCI/NCBI SKY/M-FISH & CGH Database, the NCI Mitelman Database of Chromosome Aberrations in Cancer, and the NCI Recurrent Aberrations in Cancer , are now integrated into NCBI's Entrez system as Cancer Chromosomes.

Search for cytogenetic, clinical, and/or reference information. Queries are performed using the same approach as for other Entrez databases such as PubMed and Nucleotide."




116. DBTBS

URL: http://dbtbs.hgc.jp/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases

DBTBS (http://dbtbs.hgc.jp) was originally released in 1999 as a reference database of published transcriptional regulation events in Bacillus subtilis, one of the best studied bacteria. It is essentially a compilation of transcription factors with their regulated genes as well as their recognition sequences, which were experimentally characterized and reported in the literature. Here we report its major update, which contains information on 114 transcription factors, including sigma factors, and 633 promoters of 525 genes. The number of references cited in the database has increased from 291 to 378. It also supports a function to find putative transcription factor binding sites within input sequences by using our collection of weight matrices and consensus patterns. Furthermore, though preliminarily, DBTBS now aims to contribute to comparative genomics by showing the presence or absence of potentially orthologous transcription factors and their corresponding cis-elements on the promoters of their potentially orthologously regulated genes in 50 eubacterial genomes.

Citation for the above abstract:
Makita, Yuko, Nakao, Mitsuteru, Ogasawara, Naotake, Nakai, Kenta
DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics
Nucl. Acids Res. 2004 32: D75-77
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D75



117. DBTSS: Database of Transcriptional Start Sites

URL: http://dbtss.hgc.jp/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

DBTSS was first constructed in 2002 based on precise, experimentally determined 5' end clones. Several major updates and additions have been made since the last report. First, the number of human clones has drastically increased, going from 190,964 to 1,359,000. Second, information about potential alternative promoters is presented because the number of 5' end clones is now sufficient to determine several promoters for one gene. Namely, we defined putative promoter groups by clustering transcription start sites (TSSs) separated by <500 bases. A total of 8308 human genes and 4276 mouse genes were found to have putative multiple promoters. Third, DBTSS provides detailed sequence comparisons of user-specified TSSs. Finally, we have added TSS information for zebrafish, malaria and schyzon (a red algae model organism). DBTSS is accessible at http://dbtss.hgc.jp.

Citation for the above abstract:
Yamashita, Riu, Suzuki, Yutaka, Wakaguri, Hiroyuki, Tsuritani, Katsuki, Nakai, Kenta, Sugano, Sumio
DBTSS: DataBase of Human Transcription Start Sites, progress report 2006
Nucl. Acids Res. 2006 34: D86-89
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D86



118. HTPSELEX

URL: http://www.isrec.isb-sib.ch/htpselex/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

HTPSELEX is a public database providing access to primary and derived data from high-throughput SELEX experiments aimed at characterizing the binding specificity of transcription factors. The resource is primarily intended to serve computational biologists interested in building models of transcription factor binding sites from large sets of binding sequences. The guiding principle is to make available all information that is relevant for this purpose. For each experiment, we try to provide accurate information about the protein material used, details of the wet lab protocol, an archive of sequencing trace files, assembled clone sequences (concatemers) and complete sets of in vitro selected protein-binding tags. In addition, we offer in-house derived binding sites models. HTPSELEX also offers reasonably large SELEX libraries obtained with conventional low-throughput protocols. The FTP site contains the trace archives and database flatfiles. The web server offers user-friendly interfaces for viewing individual entries and quality-controlled download of SELEX sequence libraries according to a user-defined sequencing quality threshold. HTPSELEX is available from ftp://ftp.isrec.isb-sib.ch/pub/databases/htpselex/ and http://www.isrec.isb-sib.ch/htpselex.

Citation for the above abstract:
Jagannathan, Vidhya, Roulet, Emmanuelle, Delorenzi, Mauro, Bucher, Philipp
HTPSELEX--a database of high-throughput SELEX libraries for transcription factor binding sites
Nucl. Acids Res. 2006 34: D90-94
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D90



119. DoOP: Databases of Orthologous Promoters

URL: http://doop.abc.hu/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

DoOP (http://doop.abc.hu/) is a database of eukaryotic promoter sequences (upstream regions) aiming to facilitate the recognition of regulatory sites conserved between species. The annotated first exons of human and Arabidopsis thaliana genes were used as queries in BLAST searches to collect the most closely related orthologous first exon sequences from Chordata and Viridiplantae species. Up to 3000 bp DNA segments upstream from these first exons constitute the clusters in the chordate and plant sections of the Database of Orthologous Promoters. Release 1.0 of DoOP contains 21,061 chordate clusters from 284 different species and 7548 plant clusters from 269 different species. The database can be used to find and retrieve promoter sequences of a given gene from various species and it is also suitable to see the most trivial conserved sequence blocks in the orthologous upstream regions. Users can search DoOP with either sequence or text (annotation) to find promoter clusters of various genes. In addition to the sequence data, the positions of the conserved sequence blocks derived from multiple alignments, the positions of repetitive elements and the positions of transcription start sites known from the Eukaryotic Promoter Database (EPD) can be viewed graphically.

Citation for the above abstract:
Barta, Endre, Sebestyen, Endre, Palfy, Tamas B., Toth, Gabor, Ortutay, Csaba P., Patthy, Laszlo
DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants
Nucl. Acids Res. 2005 33: D86-90
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D86



120. DPInteract

URL: http://arep.med.harvard.edu/dpinteract/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases

"This dataset is being collected with several purposes in mind
1. Cataloging demonstrated sites and non-sites for E.coli DNA-binding proteins
2. Aiding the annotation of such sites in other E.coli databases and sequence entries
3. Interpreting the results of whole-genome in vivo methylation protection experiments (Nature 360: 606-610; J Bacteriol 176: 3438-3441)
4. Developing better computational tools for recognizing DNA binding proteins in sequence data"



121. EPD: The Eukaryotic Promoter Database

URL: http://www.epd.isb-sib.ch/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, experimentally defined by a transcription start site (TSS). Access to promoter sequences is provided by pointers to positions in the corresponding genomes. Promoter evidence comes from conventional TSS mapping experiments for individual genes, or, starting from release 73, from mass genome annotation projects. Subsets of promoter sequences with customized 5' and 3' extensions can be downloaded from the EPD website. The focus of current development efforts is to reach complete promoter coverage for important model organisms as soon as possible. To speed up this process, a new class of preliminary promoter entries has been introduced as of release 83, which requires less stringent admission criteria. As part of a continuous integration process, new web-based interfaces have been developed, which allow joint analysis of promoter sequences with other bioinformatics resources developed by our group, in particular programs offered by the Signal Search Analysis Server, and gene expression data stored in the CleanEx database. EPD can be accessed at http://www.epd.isb-sib.ch.

Citation for the above abstract:
Schmid, Christoph D., Perier, Rouaida, Praz, Viviane, Bucher, Philipp
EPD in its twentieth year: towards complete promoter coverage of selected model organisms
Nucl. Acids Res. 2006 34: D82-85
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D82



122. GeneNet

URL: http://wwwmgs.bionet.nsc.ru/mgs/gnw/genenet/
Categories: Intermolecular Interactions and Signaling Pathways Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

The GeneNet system is designed for collection and analysis of the data on gene and metabolic networks, signal transduction pathways and kinetic characteristics of elementary processes. In the past 2 years, the GeneNet structure was considerably improved: (i) the current version of the database is now implemented using ORACLE9i; (ii) the capacities to describe the structure of the protein complexes and the interactions between the units are increased; (iii) two tables with kinetic constants and more detailed descriptions of certain reactions were added; and (iv) a module for kinetic modeling was supplemented. The current SRS release of the GeneNet database contains 37 graphical maps of gene networks, as well as descriptions of 1766 proteins, 1006 genes, 241 small molecules and 3254 relationships between gene network units, and 552 kinetic constants. Information distributed between 16 interlinked tables was obtained by annotating 1980 journal publications. SRS release of the GeneNet database, the graphical viewer and the modeling section are available at http://wwwmgs.bionet.nsc.ru/mgs/gnw/genenet/.

Citation for the above abstract:
Ananko, E. A., Podkolodny, N. L., Stepanenko, I. L., Podkolodnaya, O. A., Rasskazov, D. A., Miginsky, D. S., Likhoshvai, V. A., Ratushny, A. V., Podkolodnaya, N. N., Kolchanov, N. A.
GeneNet in 2005
Nucl. Acids Res. 2005 33: D425-427
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D425



123. The JASPAR Database

URL: http://jaspar.cgb.ki.se/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

JASPAR is the most complete open-access collection of transcription factor binding site (TFBS) matrices. In this new release, JASPAR grows into a meta-database of collections of TFBS models derived by diverse approaches. We present JASPAR CORE—an expanded version of the original, non-redundant collection of annotated, high-quality matrix-based transcription factor binding profiles, JASPAR FAM—a collection of familial TFBS models and JASPAR phyloFACTS—a set of matrices computationally derived from statistically overrepresented, evolutionarily conserved regulatory region motifs from mammalian genomes. JASPAR phyloFACTS serves as a non-redundant extension to JASPAR CORE, enhancing the overall breadth of JASPAR for promoter sequence analysis. The new release of JASPAR is available at http://jaspar.genereg.net.

Citation for the above abstract:
Vlieghe, Dominique, Sandelin, Albin, De Bleser, Pieter J., Vleminckx, Kris, Wasserman, Wyeth W., van Roy, Frans, Lenhard, Boris
A new generation of JASPAR, the open-access repository for transcription factor binding site profiles
Nucl. Acids Res. 2006 34: D95-97
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D95



124. MAPPER: Multi-genome Analysis of Positions and Patterns of Elements of Regulation

URL: http://bio.chip.org/mapper
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

We describe a comprehensive map of putative transcription factor binding sites (TFBSs) across multiple genomes created using a search method that relies on hidden Markov models built from experimentally determined TFBSs. Using the information in the TRANSFAC and JASPAR databases, we built 1134 models for TFBSs and used them to scan regions 10 kb upstream of the start of the transcript for all known genes in the human, mouse and Drosophila melanogaster genomes. The results, together with homology information on clusters of ortholog genes across the three genomes, were used to create a multi-organism catalog of annotated TFBSs. The catalog can be queried through a web interface accessible at http://bio.chip.org/mapper that allows the identification, visualization and selection of TFBSs occurring in the promoter of a gene of interest and also the common factors predicted to bind across the cluster of orthologs that includes that gene. Alternatively, the interface allows the user to retrieve binding sites for a single transcription factor of interest in a single gene or in all genes of the human, mouse or fruit fly genomes.

Citation for the above abstract:
Marinescu, Voichita D., Kohane, Isaac S., Riva, Alberto
The MAPPER database: a multi-genome catalog of putative transcription factor binding sites
Nucl. Acids Res. 2005 33: D91-97
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D91



125. ooTFD: object-oriented Transcription Factors Database

URL: http://www.ifti.org/ootfd/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

ooTFD (object-oriented Transcription Factors Database) is an object-oriented successor to TFD. This database is aimed at capturing information regarding the polypeptide interactions which comprise and define the properties of transcription factors. ooTFD contains information about transcription factor binding sites, as well as composite relationships within transcription factors, which frequently occur as multisubunit proteins that form a complex interface to cellular processes outside the transcription machinery through protein-protein interactions. In the past year, a few additions and changes were made to this database and associated tools, which are accessible through the IFTI-MIRAGE web site at http://www.ifti.org/

Citation for the above abstract:
Ghosh, David
Object-oriented Transcription Factors Database (ooTFD)
Nucl. Acids Res. 2000 28: 308-310
© 2000 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/308



126. OPD: Osteo-Promoter Database

URL: http://www.opd.tau.ac.il/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

"Osteo-Promoter Database (OPD) is a catalogic database of functional genes in osteogenic proliferation and differentiation. OPD analyzes promoters of genes which differentiates along with the osteogenic pathway. Uniqueness of OPD is the analysis of promoter matrix attachment regions (MARs) which allocates AT-rich sites in promoters.Interaction between AT-rich sites in the DNA to AT-hook motif of the protein is important component of production regulator proteins complex, which controls transcription of genes in the cell. Expanding the knowledge of AT-rich sites in the promoters of specific genes leads to construction of regulation system for transcription in bone tissue."



127. PLACE: A Database of Plant Cis-acting Regulatory DNA Elements

URL: http://www.dna.affrc.go.jp/PLACE/
Categories: General Plant Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

PLACE (http://www.dna.affrc.go.jp/htdocs/PLACE/) is a database of nucleotide sequence motifs found in plant cis-acting regulatory DNA elements. Motifs were extracted from previously published reports on genes in vascular plants. In addition to the motifs originally reported, their variations in other genes or in other plant species in later reports are also compiled. Documents for each motif in the PLACE database contains, in addition to a motif sequence, a brief definition and description of each motif, and relevant literature with PubMed ID numbers and GenBank accession numbers where available. Users can search their query sequences for cis-elements using the Signal Scan program at our web site. The results will be reported in one of the three forms. Clicking the PLACE accession numbers in the result report will open the pertinent motif document. Clicking the PubMed or GenBank accession number in the document will allow users to access to these databases, and to read the of the literature or the annotation in the DNA database. This report summarizes the present status of this database and available tools.

Citation for the above abstract:
Higo, K, Ugawa, Y, Iwamoto, M, Korenaga, T
Plant cis-acting regulatory DNA elements (PLACE) database: 1999
Nucl. Acids Res. 1999 27: 297-300
© 1999 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/297



128. Polygenic Signaling Pathways

URL: http://www.polygenicpathways.co.uk
Categories: Gene-, System-, or Disease- Specific Databases

"This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease."



129. Relemed

URL: http://www.relemed.com/
Categories: MEDLINE Interfaces

BACKGROUND: Receiving extraneous articles in response to a query submitted to MEDLINE/PubMed is common. When submitting a multi-word query (which is the majority of queries submitted), the presence of all query words within each article may be a necessary condition for retrieving relevant articles, but not sufficient. Ideally a relationship between the query words in the article is also required. We propose that if two words occur within an article, the probability that a relation between them is explained is higher when the words occur within adjacent sentences versus remote sentences. Therefore, sentence-level concurrence can be used as a surrogate for existence of the relationship between the words.In order to avoid the irrelevant articles, one solution would be to increase the search specificity. Another solution is to estimate a relevance score to sort the retrieved articles. However among the >30 retrieval services available for MEDLINE, only a few estimate a relevance score, and none detects and incorporates the relation between the query words as part of the relevance score. RESULTS: We have developed "Relemed", a search engine for MEDLINE. Relemed increases specificity and precision of retrieval by searching for query words within sentences rather than the whole article. It uses sentence-level concurrence as a statistical surrogate for the existence of relationship between the words. It also estimates a relevance score and sorts the results on this basis, thus shifting irrelevant articles lower down the list.In two case studies, we demonstrate that the most relevant articles appear at the top of the Relemed results, while this is not necessarily the case with a PubMed search. We have also shown that a Relemed search includes not only all the articles retrieved by PubMed, but potentially additional relevant articles, due to the extended 'automatic term mapping' and text-word searching features implemented in Relemed. CONCLUSION: By using sentence-level matching, Relemed can deliver higher specificity, thus eliminating more false-positive articles. By introducing an appropriate relevance metric, the most relevant articles on which the user wishes to focus are listed first. Relemed also shrinks the displayed text, and hence the time spent scanning the articles.

Citation for the above abstract:
Siadaty MS, Shu J, Knaus WA.
Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles.
BMC Med Inform Decis Mak. 2007 Jan 10;7:1.
© 2007 By the authors


The full text of the article can be found at: http://www.biomedcentral.com/1472-6947/7/1



130. PlantProm

URL: http://mendel.cs.rhul.ac.uk/mendel.php
Categories: General Plant Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

PlantProm DB, a plant promoter database, is an annotated, non-redundant collection of proximal promoter sequences for RNA polymerase II with experimentally determined transcription start site(s), TSS, from various plant species. The first release (2002.01) of PlantProm DB contains 305 entries including 71, 220 and 14 promoters from monocot, dicot and other plants, respectively. It provides DNA sequence of the promoter regions (-200 : +51) with TSS on the fixed position +201, taxonomic/promoter type classification of promoters and Nucleotide Frequency Matrices (NFM) for promoter elements: TATA-box, CCAAT-box and TSS-motif (Inr). Analysis of TSS-motifs revealed that their composition is different in dicots and monocots, as well as for TATA and TATA-less promoters. The database serves as learning set in developing plant promoter prediction programs. One such program (TSSP) based on discriminant analysis has been created by Softberry Inc. and the application of a support ftp: vector machine approach for promoter identification is under development. PlantProm DB is available at http://mendel.cs.rhul.ac.uk/ and http://www.softberry.com/.

Citation for the above abstract:
Shahmuradov, Ilham A., Gammerman, Alex J., Hancock, John M., Bramley, Peter M., Solovyev, Victor V.
PlantProm: a database of plant promoter sequences
Nucl. Acids Res. 2003 31: 114-117
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/114



131. PRODORIC: Prokaryotic Database of Gene Regulation

URL: http://prodoric.tu-bs.de/
Categories: Intermolecular Interactions and Signaling Pathways Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

The database PRODORIC aims to systematically organize information on prokaryotic gene expression, and to integrate this information into regulatory networks. The present version focuses on pathogenic bacteria such as Pseudomonas aeruginosa. PRODORIC links data on environmental stimuli with trans-acting transcription factors, cis-acting promoter elements and regulon definition. Interactive graphical representations of operon, gene and promoter structures including regulator-binding sites, transcriptional and translational start sites, supplemented with information on regulatory proteins are available at varying levels of detail. The data collection provided is based on exhaustive analyses of scientific literature and computational sequence prediction. Included within PRODORIC are tools to define and predict regulator binding sites. It is accessible at http://prodoric.tu-bs.de.

Citation for the above abstract:
Munch, Richard, Hiller, Karsten, Barg, Heiko, Heldt, Dana, Linz, Simone, Wingender, Edgar, Jahn, Dieter
PRODORIC: prokaryotic database of gene regulation
Nucl. Acids Res. 2003 31: 266-269
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/266



132. PromEC

URL: http://bioinfo.md.huji.ac.il/marg/promec
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases

PromEC is an updated compilation of Escherichia coli mRNA promoter sequences. It includes documentation on the location of experimentally identified mRNA transcriptional start sites on the E. coli chromosome, as well as the actual sequences in the promoter region. The database was updated as of July 2000 and includes 472 entries. PromEC is accessible at http://bioinfo.md.huji.ac. il/marg/promec

Citation for the above abstract:
Hershberg, Ruti, Bejerano, Gill, Santos-Zavaleta, Alberto, Margalit, Hanah
PromEC: An updated database of Escherichia coli mRNA promoters with experimentally identified transcriptional start sites
Nucl. Acids Res. 2001 29: 277-0
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/277



133. RegulonDB

URL: http://regulondb.ccg.unam.mx/index.html
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, Prokaryote Databases

RegulonDB is the internationally recognized reference database of Escherichia coli K-12 offering curated knowledge of the regulatory network and operon organization. It is currently the largest electronically-encoded database of the regulatory network of any free-living organism. We present here the recently launched RegulonDB version 5.0 radically different in content, interface design and capabilities. Continuous curation of original scientific literature provides the evidence behind every single object and feature. This knowledge is complemented with comprehensive computational predictions across the complete genome. Literature-based and predicted data are clearly distinguished in the database. Starting with this version, RegulonDB public releases are synchronized with those of EcoCyc since our curation supports both databases. The complex biology of regulation is simplified in a navigation scheme based on three major streams: genes, operons and regulons. Regulatory knowledge is directly available in every navigation step. Displays combine graphic and textual information and are organized allowing different levels of detail and biological context. This knowledge is the backbone of an integrated system for the graphic display of the network, graphic and tabular microarray comparisons with curated and predicted objects, as well as predictions across bacterial genomes, and predicted networks of functionally related gene products. Access RegulonDB at http://regulondb.ccg.unam.mx.

Citation for the above abstract:
Salgado, Heladia, Gama-Castro, Socorro, Peralta-Gil, Martin, Diaz-Peredo, Edgar, Sanchez-Solano, Fabiola, Santos-Zavaleta, Alberto, Martinez-Flores, Irma, Jimenez-Jacinto, Veronica, Bonavides-Martinez, Cesar, Segura-Salazar, Juan, Martinez-Antonio, Agustino, Collado-Vides, Julio
RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization, and growth conditions
Nucl. Acids Res. 2006 34: D394-397
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D394



134. rSNP_Guide

URL: http://wwwmgs.bionet.nsc.ru/mgs/systems/rsnp/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

The analysis of gene regulatory networks has become one of the most challenging problems of the postgenomic era. Earlier we developed rSNP_Guide (http://util.bionet.nsc.ru/databases/rsnp.html), a computer system and database devoted to prediction of transcription factor (TF) binding sites (TF sites), which can be responsible for disease phenotypes. The prediction results were confirmed by 70 known relationships between TF sites and diseases, as well as by site-directed mutagenesis data. The rSNP_Guide is being investigated as a tool for TF site annotation. Previously analyzed and characterized cases of altered TF sites were used to annotate potential sites of the same type and at the same location in homologous genes. Based on 20 TF sites with known alterations in TF binding to DNA, we localized 245 potential TF sites in homologous genes. For these potential TF sites, rSNP_Guide estimates TF-DNA interaction according to three categories: 'present', 'weak', and 'absent'. The significance of each assignment is statistically measured.

Citation for the above abstract:
Ponomarenko, Julia V., Merkulova, Tatyana I., Orlova, Galina V., Fokin, Oleg N., Gorshkova, Elena V., Frolov, Anatoly S., Valuev, Vadim P., Ponomarenko, Mikhail P.
rSNP_Guide, a database system for analysis of transcription factor binding to DNA with variations: application to genome annotation
Nucl. Acids Res. 2003 31: 118-121
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/118



135. SCPD: The Promoter Database of Saccharomyces cerevisiae

URL: http://cgsigma.cshl.org/jian/
Categories: Fungal Genome Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

MOTIVATION: In order to facilitate a systematic study of the promoters and transcriptionally regulatory cis-elements of the yeast Saccharomyces cerevisiae on a genomic scale, we have developed a comprehensive yeast-specific promoter database, SCPD. RESULTS: Currently SCPD contains 580 experimentally mapped transcription factor (TF) binding sites and 425 transcriptional start sites (TSS) as its primary data entries. It also contains relevant binding affinity and expression data where available. In addition to mechanisms for promoter information (including sequence) retrieval and a data submission form, SCPD also provides some simple but useful tools for promoter sequence analysis. AVAILABILITY: SCPD can be accessed from the URL http://cgsigma.cshl.org/jian. The database is continually updated.

Citation for the above abstract:
J Zhu , and MQ Zhang
SCPD: a promoter database of the yeast Saccharomyces cerevisiae
Bioinformatics 15: 607-611.
© 1999 Oxford University Press.


The full text of the article can be found at: http://bioinformatics.oupjournals.org/cgi/reprint/15/7/607



136. SELEX_DB

URL: http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases, RNA Sequence Databases

SELEX_DB is an online resource containing both the experimental data on in vitro selected DNA/RNA oligomers (aptamers) and the applets for recognition of these oligomers. Since in vitro experimental data are evidently system-dependent, the new release of the SELEX_DB has been supplemented by the database SYSTEM storing the experimental design. In addition, the recognition applet package, SELEX_TOOLS, applying in vitro selected data to annotation of the genome DNA, is accompanied by the cross-validation test database CROSS_TEST discriminating the sites (natural or other) related to in vitro selected sites out of random DNA. By cross-validation testing, we have unexpectedly observed that the recognition accuracy increases with the growth of homology between the training and test sets of protein binding sequences. For natural sites, the recognition accuracy was lower than that for the nearest protein homologs and higher than that for distant homologs and non-homologous proteins binding the common site. The current SELEX_DB release is available at http://wwwmgs.bionet.nsc.ru/mgs/systems/selex/.

Citation for the above abstract:
Ponomarenko, Julia V., Orlova, Galina V., Frolov, Anatoly S., Gelfand, Mikhail S., Ponomarenko, Mikhail P.
SELEX_DB: a database on in vitro selected oligomers adapted for recognizing natural sites and for analyzing both SNPs and site-directed mutagenesis data
Nucl. Acids Res. 2002 30: 195-199
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/195



137. SKY/M-FISH and CGH Database

URL: http://www.ncbi.nlm.nih.gov/sky/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

"The goal of the SKY/M-FISH and CGH database is to provide a public
platform for investigators to share and compare their molecular
cytogenetic data. The database is open to everyone and all users can view
an individual investigator's public data or compare public cases from
different investigators. Those wishing to contribute their own data must
register and can choose to keep their data private for a period not to
exceed two years.
...
Spectral Karyotyping (SKY), Multiplex Fluorescence In Situ Hybridization (M-FISH) and
Comparative Genomic Hybridization (CGH) are complementary fluorescent molecular
cytogenetic techniques. SKY/M-FISH permits the simultaneous visualization of each human
or mouse chromosome in a different color, facilitating the identification of chromosomal
aberrations. CGH utilizes the hybridization of differentially labeled tumor and reference
DNA to generate a map of DNA copy number changes in tumor genomes."



138. TESS: Transcription Element Search System

URL: http://www.cbil.upenn.edu/tess/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

"TESS is a web tool for predicting transcription factor binding sites in DNA sequences. It can identify binding sites using site or consensus strings and positional weight matrices from the TRANSFAC, IMD, and our CBIL-GibbsMat database. You may also include your own site or consensus strings and/or weight matrices in the search.

TESS assigns a TESS job number to all sequence search jobs. The job results are stored on our server for a period of time specified in the search submit form. During this time you may recall the search results using the form on this page. TESS can also email results to you as a tab-delimited file suitable for loading into a spreadsheet program.

TESS also has data browsing and querying capabilities to help you learn about the factors that were predicted to bind to your sequence."



139. Tractor DB: Transcriptional Factor Database

URL: http://www.tractor.lncc.br/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

Experimental data on the Escherichia coli transcriptional regulatory system has been used in the past years to predict new regulatory elements (promoters, transcription factors (TFs), TFs' binding sites and operons) within its genome. As more genomes of gamma-proteobacteria are being sequenced, the prediction of these elements in a growing number of organisms has become more feasible, as a step towards the study of how different bacteria respond to environmental changes at the level of transcriptional regulation. In this work, we present TRACTOR_DB (TRAnscription FaCTORs' predicted binding sites in prokaryotic genomes), a relational database that contains computational predictions of new members of 74 regulons in 17 gamma-proteobacterial genomes. For these predictions we used a comparative genomics approach regarding which several proof-of-principle articles for large regulons have been published.

Citation for the above abstract:
Gonzalez, Abel D., Espinosa, Vladimir, Vasconcelos, Ana T., Perez-Rueda, Ernesto, Collado-Vides, Julio
TRACTOR_DB: a database of regulatory networks in gamma-proteobacterial genomes
Nucl. Acids Res. 2005 33: D98-102
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D98



140. TRANSCompel

URL: http://www.gene-regulation.com/pub/databases.html#transcompel
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

Originating from COMPEL, the TRANSCompel database emphasizes the key role of specific interactions between transcription factors binding to their target sites providing specific features of gene regulation in a particular cellular content. Composite regulatory elements contain two closely situated binding sites for distinct transcription factors and represent minimal functional units providing combinatorial transcriptional regulation. Both specific factor--DNA and factor--factor interactions contribute to the function of composite elements (CEs). Information about the structure of known CEs and specific gene regulation achieved through such CEs appears to be extremely useful for promoter prediction, for gene function prediction and for applied gene engineering as well. Each database entry corresponds to an individual CE within a particular gene and contains information about two binding sites, two corresponding transcription factors and experiments confirming cooperative action between transcription factors. The COMPEL database, equipped with the search and browse tools, is available at http://www.gene-regulation.com/pub/databases.html#transcompel. Moreover, we have developed the program CATCH for searching potential CEs in DNA sequences. It is freely available as CompelPatternSearch at http://compel.bionet.nsc.ru/FunSite/CompelPatternSearch.html.

Citation for the above abstract:
Kel-Margoulis, Olga V., Kel, Alexander E., Reuter, Ingmar, Deineko, Igor V., Wingender, Edgar
TRANSCompel(R): a database on composite regulatory elements in eukaryotic genes
Nucl. Acids Res. 2002 30: 332-334
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/332



141. TRANSFAC

URL: http://www.gene-regulation.com/pub/databases.html#transfac
Categories: Microarray Data and other Gene Expression Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

The TRANSFAC® database on transcription factors, their binding sites, nucleotide distribution matrices and regulated genes as well as the complementing database TRANSCompel® on composite elements have been further enhanced on various levels. A new web interface with different search options and integrated versions of MatchTM and PatchTM provides increased functionality for TRANSFAC®. The list of databases which are linked to the common GENE table of TRANSFAC® and TRANSCompel® has been extended by: Ensembl, UniGene, EntrezGene, HumanPSDTM and TRANSPROTM. Standard gene names from HGNC, MGI and RGD, are included for human, mouse and rat genes, respectively. With the help of InterProScan, Pfam, SMART and PROSITE domains are assigned automatically to the protein sequences of the transcription factors. TRANSCompel® contains now, in addition to the COMPEL table, a separate table for detailed information on the experimental EVIDENCE on which the composite elements are based. Finally, for TRANSFAC®, in respect of data growth, in particular the gain of Drosophila transcription factor binding sites (by courtesy of the Drosophila DNase I footprint database) and of Arabidopsis factors (by courtesy of DATF, Database of Arabidopsis Transcription Factors) has to be stressed. The here described public releases, TRANSFAC® 7.0 and TRANSCompel® 7.0, are accessible under http://www.gene-regulation.com/pub/databases.html.

Citation for the above abstract:
Matys, V., Kel-Margoulis, O. V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter, I., Chekmenev, D., Krull, M., Hornischer, K., Voss, N., Stegmaier, P., Lewicki-Potapov, B., Saxel, H., Kel, A. E., Wingender, E.
TRANSFAC(R) and its module TRANSCompel(R): transcriptional gene regulation in eukaryotes
Nucl. Acids Res. 2006 34: D108-110
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D108



142. TRANSPATH

URL: http://www.gene-regulation.com/pub/databases.html#transpath
Categories: Intermolecular Interactions and Signaling Pathways Databases, Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

TRANSPATH® is a database about signal transduction events. It provides information about signaling molecules, their reactions and the pathways these reactions constitute. The representation of signaling molecules is organized in a number of orthogonal hierarchies reflecting the classification of the molecules, their species-specific or generic features, and their post-translational modifications. Reactions are similarly hierarchically organized in a three-layer architecture, differentiating between reactions that are evidenced by individual publications, generalizations of these reactions to construct species-independent ‘reference pathways’ and the ‘semantic projections’ of these pathways. A number of search and browse options allow easy access to the database contents, which can be visualized with the tool PathwayBuilderTM. The module PathoSign adds data about pathologically relevant mutations in signaling components, including their genotypes and phenotypes. TRANSPATH® and PathoSign can be used as encyclopaedia, in the educational process, for vizualization and modeling of signal transduction networks and for the analysis of gene expression data. TRANSPATH® Public 6.0 is freely accessible for users from non-profit organizations under http://www.gene-regulation.com/pub/databases.html.

Citation for the above abstract:
Krull, Mathias, Pistor, Susanne, Voss, Nico, Kel, Alexander, Reuter, Ingmar, Kronenberg, Deborah, Michael, Holger, Schwarzer, Knut, Potapov, Anatolij, Choi, Claudia, Kel-Margoulis, Olga, Wingender, Edgar
TRANSPATH(R): an information resource for storing and visualizing signaling pathways and their pathological aberrations
Nucl. Acids Res. 2006 34: D546-551
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D546



143. TRED: Transcriptional Regulatory Element Database

URL: http://rulai.cshl.edu/tred
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

In order to understand gene regulation, accurate and comprehensive knowledge of transcriptional regulatory elements is essential. Here, we report our efforts in building a mammalian Transcriptional Regulatory Element Database (TRED) with associated data analysis functions. It collects cis- and trans-regulatory elements and is dedicated to easy data access and analysis for both single-gene-based and genome-scale studies. Distinguishing features of TRED include: (i) relatively complete genome-wide promoter annotation for human, mouse and rat; (ii) availability of gene transcriptional regulation information including transcription factor binding sites and experimental evidence; (iii) data accuracy is ensured by hand curation; (iv) efficient user interface for easy and flexible data retrieval; and (v) implementation of on-the-fly sequence analysis tools. TRED can provide good training datasets for further genome-wide cis-regulatory element prediction and annotation, assist detailed functional studies and facilitate the decipher of gene regulatory networks (http://rulai.cshl.edu/TRED).

Citation for the above abstract:
Zhao, Fang, Xuan, Zhenyu, Liu, Lihua, Zhang, Michael Q.
TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies
Nucl. Acids Res. 2005 33: D103-107
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D103



144. TRRD: Transcription Regulatory Regions Database

URL: http://wwwmgs.bionet.nsc.ru/mgs/gnw/trrd/
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

Transcription Regulatory Regions Database (TRRD) is an informational resource containing an integrated description of the gene transcription regulation. An entry of the database corresponds to a gene and contains the data on localization and functions of the transcription regulatory regions as well as gene expression patterns. TRRD contains only experimental data that are inputted into the database through annotating scientific publication. TRRD release 6.0 comprises the information on 1167 genes, 5537 transcription factor binding sites, 1714 regulatory regions, 14 locus control regions and 5335 expression patterns obtained through annotating 3898 scientific papers. This information is arranged in seven databases: TRRDGENES (general gene description), TRRDLCR (locus control regions); TRRDUNITS (regulatory regions: promoters, enhancers, silencers, etc.), TRRDSITES (transcription factor binding sites), TRRDFACTORS (transcription factors), TRRDEXP (expression patterns) and TRRDBIB (experimental publications). Sequence Retrieval System (SRS) is used as a basic tool for navigating and searching TRRD and integrating it with external informational and software resources. The visualization tool, TRRD Viewer, provides the information representation in a form of maps of gene regulatory regions. The option allowing nucleotide sequences to be searched for according to their homology using BLAST is also included. TRRD is available at http://www.bionet.nsc.ru/trrd/.

Citation for the above abstract:
Kolchanov, N. A., Ignatieva, E. V., Ananko, E. A., Podkolodnaya, O. A., Stepanenko, I. L., Merkulova, T. I., Pozdnyakov, M. A., Podkolodny, N. L., Naumochkin, A. N., Romashchenko, A. G.
Transcription Regulatory Regions Database (TRRD): its status in 2002
Nucl. Acids Res. 2002 30: 312-317
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/312



145. TrSDB: A Proteome Database of Transcription Factors

URL: http://ibb.uab.es/trsdb
Categories: Nucleotide Sequences: Transcriptional Regulator Sites and Transcription Factors Databases

TrSDB-TranScout Database-(http://ibb.uab.es/trsdb) is a proteome database of eukaryotic transcription factors based upon predicted motifs by TranScout and data sources such as InterPro and Gene Ontology Annotation. Nine eukaryotic proteomes are included in the current version. Extensive and diverse information for each database entry, different analyses considering TranScout classification and similarity relationships are offered for research on transcription factors or gene expression.

Citation for the above abstract:
Hermoso, Antoni, Aguilar, Daniel, Aviles, Francesc X., Querol, Enrique
TrSDB: a proteome database of transcription factors
Nucl. Acids Res. 2004 32: D171-173
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D171



146. 16S and 23S Ribosomal RNA Mutation Database

URL: http://ribosome.fandm.edu/
Categories: RNA Sequence Databases

Expanded versions of the Ribosomal RNA Mutation Databases provide lists of mutated positions in 16S and 16S-like ribosomal RNA (16SMDBexp) and 23S and 23S-like ribosomal RNA (23SMDBexp) and the identity of each alteration. Alterations from organisms other than Escherichia coli are reported at positions according to the E.coli numbering system. Information provided for each mutation includes: (i) a brief description of the phenotype(s) associated with each mutation, (ii) whether a mutant phenotype has been detected by in vivo or in vitro methods, and (iii) relevant literature citations.

Citation for the above abstract:
Triman, KL, Peister, A, Goel, RA
Expanded versions of the 16S and 23S ribosomal RNA mutation databases (16SMDBexp and 23SMDBexp)
Nucl. Acids Res. 1998 26: 280-284
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/280



147. SfN Neuroscience Database Gateway

URL: http://ndg.sfn.org/
Categories: Metadatabases and Directories

"Databases are of growing importance in neuroscience, as in many other biomedical research fields. The Neuroscience Database Gateway is a new resource for SfN [Society for Neuroscience] members, aimed at promoting awareness and facilitating access to relevant neuroscience databases."



148. 5S Ribosomal RNA Database

URL: http://biobases.ibch.poznan.pl/5SData/
Categories: RNA Sequence Databases

Ribosomal 5S RNA (5S rRNA) is an integral component of the large ribosomal subunit in all known organisms with the exception only of mitochondrial ribosomes of fungi and animals. It is thought to enhance protein synthesis by stabilization of a ribosome structure. This paper presents the updated database of 5S rRNA and their genes (5S rDNA). Its short characteristics are presented in the Introduction. The database contains 2280 primary structures of 5S rRNA and 5S rRNA genes. These include 536 eubacterial, 61 archaebacterial, 1611 eukaryotic and 72 organelle sequences. The database is available on line through the World Wide Web at http://biobases.ibch.poznan.pl/5SData/.

Citation for the above abstract:
Szymanski, Maciej, Barciszewska, Miroslawa Z., Erdmann, Volker A., Barciszewski, Jan
5S Ribosomal RNA Database
Nucl. Acids Res. 2002 30: 176-178
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/176



149. Aptamer Database

URL: http://aptamer.icmb.utexas.edu/
Categories: RNA Sequence Databases

The aptamer database is designed to contain comprehensive sequence information on aptamers and unnatural ribozymes that have been generated by in vitro selection methods. Such data are not normally collected in 'natural' sequence databases, such as GenBank. Besides serving as a storehouse of sequences that may have diagnostic or therapeutic utility, the database serves as a valuable resource for theoretical biologists who describe and explore fitness landscapes. The database is updated monthly and is publicly available at http://aptamer. icmb.utexas.edu/.

Citation for the above abstract:
Lee, Jennifer F., Hesselberth, Jay R., Meyers, Lauren Ancel, Ellington, Andrew D.
Aptamer Database
Nucl. Acids Res. 2004 32: D95-100
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D95



150. ARED: Human AU-Rich Element-Containing mRNA Database

URL: http://rc.kfshrc.edu.sa/ared/
Categories: RNA Sequence Databases

A comprehensive search that utilized a large set of mRNA data from human genome databases and additionally, expressed sequence tag (EST) database characterized this latest update of AU-rich elements (AREs) containing mRNA database (ARED). A large number of ARE-mRNA, as much as 4000, were recovered and include many of ARE alternative forms. This number represents as much as 5–8% of the human genes depending on the entire number of genes. The new ARED does not contain only larger and diverse number of ARE-mRNAs but additional functionality and enhanced search capabilities are given in the database website http://rc.kfshrc.edu.sa/ared/. These include class and cluster of AREs, source mRNAs, EST evidence, buildup information, retrieval of lists of genes, and integration with current and new NCBI data, such as Entrez ID and Unigene. Gene Ontology analysis shows there are significant differences in functional diversity of ARED when compared with the overall genome. Many of ARE-genes mediate regulatory processes, reactions to outside stimuli, RNA metabolism, and developmental processes particularly those of early and transient responses. The wide interest in mRNA turnover and importance of AREs in health and disease signify the compilation of ARE-genes.

Citation for the above abstract:
Bakheet, Tala, Williams, Bryan R. G., Khabar, Khalid S. A.
ARED 3.0: the large and diverse AU-rich transcriptome
Nucl. Acids Res. 2006 34: D111-114
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D111



151. The European Ribosomal RNA Database

URL: http://www.psb.ugent.be/rRNA/
Categories: RNA Sequence Databases

The European ribosomal RNA database aims to compile all complete or nearly complete ribosomal RNA sequences from both the small (SSU) and large (LSU) ribosomal subunits. All sequences are available in aligned format. Sequence alignment is based on the secondary structure of the molecules, as determined by comparative sequence analysis. Additional information about the sequences, such as taxonomic classification of the organism from which they have been obtained, and literature references are also provided. In order to identify the closest relatives to newly determined sequences, BLAST searches can be performed, after which the best matching sequences are aligned and a phylogenetic tree is inferred. As of 2003, the European ribosomal RNA database is maintained at Ghent University (Belgium). The database can be consulted at http://www.psb.ugent.be/rRNA/.

Citation for the above abstract:
Wuyts, Jan, Perriere, Guy, Van de Peer, Yves
The European ribosomal RNA database
Nucl. Acids Res. 2004 32: D101-103
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D101



152. GtRDB: The Genomic tRNA Database

URL: http://rna.wustl.edu/GtRDB/
Categories: RNA Sequence Databases

"This genomic tRNA database contains tRNA identifications made by the program tRNAscan-SE (Lowe & Eddy, Nucl Acids Res 25: 955-964, 1997) on complete or nearly complete genomes. Unless otherwise noted, all annotation is automated, and has not been inspected for agreement with published literature.

Inevitably with automated sequence analysis, we find exceptions to general identification rules, isoacceptor type predictions (esp. due to variable post-transcriptional anticodon modification), and questionable tRNA identifications (due to pseudogenes, SINES, or other tRNA-derived elements). We attempt to document all cases we come across, and welcome feedback on new or unrecognized discrepancies."



153. gpDB: A Database of G-proteins and Their Interaction with GPCRs

URL: http://bioinformatics.biol.uoa.gr/gpDB
Categories: Neuroscience Databases

BACKGROUND: G protein-coupled receptors (GPCRs) transduce signals from extracellular space into the cell, through their interaction with G proteins, which act as switches forming hetero-trimers composed of different subunits (alpha,beta,gamma). The alpha subunit of the G protein is responsible for the recognition of a given GPCR. Whereas specialised resources for GPCRs, and other groups of receptors, are already available, currently, there is no publicly available database focusing on G proteins and containing information about their coupling specificity with their respective receptors. Description gpDB is a publicly accessible G proteins/GPCRs relational database. Including species homologs, the database contains detailed information for 418 G protein monomers (272 Galpha, 87 Gbeta and 59 Ggamma) and 2782 GPCRs sequences belonging to families with known coupling to G proteins. The GPCRs and the G proteins are classified according to a hierarchy of different classes, families and sub-families, based on extensive literature search. The main innovation besides the classification of both G proteins and GPCRs is the relational model of the database, describing the known coupling specificity of the GPCRs to their respective a subunit of G proteins, a unique feature not available in any other database. There is full sequence information with cross-references to publicly available databases, references to the literature concerning the coupling specificity and the dimerization of GPCRs and the user may submit advanced queries for text search. Furthermore, we provide a pattern search tool, an interface for running BLAST against the database and interconnectivity with PRED-TMR, PRED-GPCR and TMRPres2D. CONCLUSIONS: The database will be very useful, for both experimentalists and bioinformaticians, for the study of G protein/GPCR interactions and for future development of predictive algorithms. It is available for academics, via a web browser at the URL: http://bioinformatics.biol.uoa.gr/gpDB.

Citation for the above abstract:
Antigoni L Elefsinioti, Pantelis G Bagos, Ioannis C Spyropoulos, and Stavros J Hamodrakas
A database for G proteins and their interaction with GPCRs.
BMC Bioinformatics 2004, 5:208; doi:10.1186/1471-2105-5-208
© 2004 By the Authors.


The full text of the article can be found at: http://www.biomedcentral.com/1471-2105/5/208



154. gRNA: Guide RNA Database

URL: http://biosun.bio.tu-darmstadt.de/goringer/gRNA/gRNA.html
Categories: RNA Sequence Databases

The RNA editing process within the mitochondria of kinetoplastid organisms is controlled by small, trans -acting RNA molecules referred to as guide RNAs. The guide RNA database is a compilation of published guide RNA sequences, currently containing 254 entries from 11 different organisms. Additional information includes RNA secondary and tertiary structure models, information on the gene localisation, literature citations and other relevant facts.

Citation for the above abstract:
Hinz, S, Goringer, HU
The guide RNA database (3.0)
Nucl. Acids Res. 1999 27: 168-
© 1999 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/168



155. HIV Sequence Database

URL: http://hiv-web.lanl.gov/content/hiv-db/mainpage.html
Categories: HIV/AIDS Databases, RNA Sequence Databases, Viral Databases

"The sequence database is based on HIV and SIV sequences downloaded from Genbank. We annotate these sequences with information from the literature, and sometimes from the authors. What information we add depends on what we can find, and ranges from sample information (sampling year, - country, - city), patient information (risk group, infection country and - year, sex, known epidemiological links to other patients); biological information about the virus (phenotype, tropism, coreceptor usage), technical information about the sample treatment and sequencing method, and (for a small number of important strains) extensive notes about their origin and derivation. In the future we hope to add information about treatment status of the patients and about HLA types.

At least as important as the database itself is the search interface that provides access to it. In addition to straightforward searches on many fields in the database, this tool allows the user to download alignments of certain regions, either all sequences there are for that region or a selection based on user-defined criteria. This can be very important for comparing one's sequence to existing sequences in the database; one of the most time-consuming tasks in sequence analysis used to be locating the appropriate region in sequences from the database."



156. HuSiDa: Human siRNA Database

URL: http://itb1.biologie.hu-berlin.de/~nebulus/sirna/
Categories: RNA Sequence Databases

Small interfering RNAs (siRNAs) have become a standard tool in functional genomics. Once incorporated into the RNA-induced silencing complex (RISC), siRNAs mediate the specific recognition of corresponding target mRNAs and their cleavage. However, only a small fraction of randomly chosen siRNA sequences is able to induce efficient gene silencing. In common laboratory practice, successful RNA interference experiments typically require both, the labour and cost-intensive identification of an active siRNA sequence and the optimization of target cell line-specific procedures for optimal siRNA delivery. To optimize the design and performance of siRNA experiments, we have established the human siRNA database (HuSiDa). The database provides sequences of published functional siRNA molecules targeting human genes and important technical details of the corresponding gene silencing experiments, including the mode of siRNA generation, recipient cell lines, transfection reagents and procedures and direct links to published references (PubMed). The database can be accessed at http://www.human-siRNA-database.net. We used the siRNA sequence information stored in the database for scrutinizing published sequence selection parameters for efficient gene silencing.

Citation for the above abstract:
Truss, Matthias, Swat, Maciej, Kielbasa, Szymon M., Schafer, Reinhold, Herzel, Hanspeter, Hagemeier, Christian
HuSiDa--the human siRNA database: an open-access database for published functional siRNA sequences and technical details of efficient transfer into recipient cells
Nucl. Acids Res. 2005 33: D108-111
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D108



157. HyPaLib

URL: http://bibiserv.techfak.uni-bielefeld.de/HyPa/
Categories: RNA Sequence Databases

The database, called HyPaLib (for Hybrid Pattern Library), contains annotated structural elements characteristic for certain classes of structural and/or functional RNAs. These elements are described in a language specifically designed for this purpose. The language allows convenient specification of hybrid patterns, i.e. motifs consisting of sequence features and structural elements together with sequence similarity and thermodynamic constraints. We are currently developing software tools that allow a user to search sequence databases for any pattern in HyPaLib, thus providing functionality which is similar to PROSITE, but dedicated to the more complex patterns in RNA sequences. HyPaLib is available at http://bibiserv.techfak.uni-bielefeld.de/HyPa/.

Citation for the above abstract:
Graf, Stefan, Strothmann, Dirk, Kurtz, Stefan, Steger, Gerhard
HyPaLib: a database of RNAs and RNA structural elements defined by hybrid patterns
Nucl. Acids Res. 2001 29: 196-198
© 2001 Oxford University Press.



The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/196



158. IRESite: The database of experimentally verified IRES structures

URL: http://www.iresite.org/
Categories: RNA Sequence Databases

IRESite is an exhaustive, manually annotated non-redundant relational database focused on the IRES elements (Internal Ribosome Entry Site) and containing information not available in the primary public databases. IRES elements were originally found in eukaryotic viruses hijacking initiation of translation of their host. Later on, they were also discovered in 5'-untranslated regions of some eukaryotic mRNA molecules. Currently, IRESite presents up to 92 biologically relevant aspects of every experiment, e.g. the nature of an IRES element, its functionality/defectivity, origin, size, sequence, structure, its relative position with respect to surrounding protein coding regions, positive/negative controls used in the experiment, the reporter genes used to monitor IRES activity, the measured reporter protein yields/activities, and references to original publications as well as cross-references to other databases, and also comments from submitters and our curators. Furthermore, the site presents the known similarities to rRNA sequences as well as RNA–protein interactions. Special care is given to the annotation of promoter-like regions. The annotated data in IRESite are bound to mostly complete, full-length mRNA, and whenever possible, accompanied by original plasmid vector sequences. New data can be submitted through the publicly available web-based interface at http://www.iresite.org and are curated by a team of lab-experienced biologists.

Citation for the above abstract:
Mokrejs, Martin, Vopalensky, Vaclav, Kolenaty, Ondrej, Masek, Tomas, Feketova, Zuzana, Sekyrova, Petra, Skaloudova, Barbora, Kriz, Vitezslav, Pospisek, Martin
IRESite: the database of experimentally verified IRES structures (www.iresite.org)
Nucl. Acids Res. 2006 34: D125-130
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D125



159. miRNA: the microRNA Registry

URL: http://www.sanger.ac.uk/Software/Rfam/mirna/
Categories: RNA Sequence Databases

The miRNA Registry provides a service for the assignment of miRNA gene names prior to publication. A comprehensive and searchable database of published miRNA sequences is accessible via a web interface (http://www.sanger.ac.uk/Software/Rfam/mirna/), and all sequence and annotation data are freely available for download. Release 2.0 of the database contains 506 miRNA entries from six organisms.

Citation for the above abstract:
Griffiths-Jones, Sam
The microRNA Registry
Nucl. Acids Res. 2004 32: D109-111
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D109



160. Mobile group II introns database

URL: http://www.fp.ucalgary.ca/group2introns/
Categories: RNA Sequence Databases

Group II introns are self-splicing RNAs and retroelements found in bacteria and lower eukaryotic organelles. During the past several years, they have been uncovered in surprising numbers in bacteria due to the genome sequencing projects; however, most of the newly sequenced introns are not correctly identified. We have initiated an ongoing web site database for mobile group II introns in order to provide correct information on the introns, particularly in bacteria. Information in the web site includes: (1) introductory information on group II introns; (2) detailed information on subfamilies of intron RNA structures and intron-encoded proteins; (3) a listing of identified introns with correct boundaries, RNA secondary structures and other detailed information; and (4) phylogenetic and evolutionary information. The comparative data should facilitate study of the function, spread and evolution of group II introns. The database can be accessed at http://www.fp.ucalgary.ca/group2introns/.

Citation for the above abstract:
Dai, Lixin, Toor, Navtej, Olson, Robert, Keeping, Andrew, Zimmerly, Steven
Database for mobile group II introns
Nucl. Acids Res. 2003 31: 424-426
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/424



161. Non-canonical Base Pair Database

URL: http://prion.bchs.uh.edu/bp_type/
Categories: Nucleic Acid Structure Databases, RNA Sequence Databases

The secondary and tertiary structure of an RNA molecule typically includes a number of non-canonical base-base interactions. The known occurrences of these interactions are tabulated in the NCIR database, which can be accessed from http://prion.bchs.uh.edu/bp_type/. The number of examples is now over 1400, which is an increase of >700% since the database was first published. This dramatic increase reflects the addition of data from the recently published crystal structures of the 50S (2.4 A) and 30S (3.0 A) ribosomal subunits. In addition, non-canonical interactions observed in published crystal and NMR structures of tRNAs, group I introns, ribozymes, RNA aptamers and synthetic oligonucleotides are included. Properties associated with these interactions, such as sequence context, sugar pucker conformation, glycosidic angle conformation, melting temperature, chemical shift and free energy, are also reported when available. Out of the 29 anticipated pairs with at least two hydrogen bonds, 28 have been observed to date. In addition, several novel examples, not generally predicted, have also been encountered, bringing the total of such pairs to 36. Added to this list are a variety of single, bifurcated, triple and quadruple interactions. The most common non-canonical pairs are the sheared GA, GA imino, AU reverse Hoogsteen, and the GU and AC wobble pairs. The most frequent triple interaction connects N3 of an A with the amino of a G that is also involved in a standard Watson-Crick pair.

Citation for the above abstract:
Nagaswamy, Uma, Larios-Sanz, Maia, Hury, James, Collins, Shakaala, Zhang, Zhengdong, Zhao, Qin, Fox, George E.
NCIR: a database of non-canonical interactions in known RNA structures
Nucl. Acids Res. 2002 30: 395-397
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/395



162. Noncoding RNAs Database

URL: http://biobases.ibch.poznan.pl/ncRNA/
Categories: RNA Sequence Databases

The noncoding RNAs database is a collection of currently available sequence data on RNAs, which have no protein-coding capacity and have been implicated in regulation of cellular processes. The RNAs included in the database form very heterogenous group of molecules that act on different levels of information transmission in the cell. It includes RNAs acting on the level of chromatin structure, transcriptional and translational regulation of gene expression, modulation of protein function and regulation of subcellular distribution of RNAs and proteins. Those RNAs, with potential regulatory functions have been identified in prokaryotic, animal and plant cells. The database can be accessed at http://biobases.ibch.poznan.pl/ncRNA/.

Citation for the above abstract:
Szymanski, Maciej, Erdmann, Volker A., Barciszewski, Jan
Noncoding regulatory RNAs database
Nucl. Acids Res. 2003 31: 429-431
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/429



163. Ensembl

URL: http://www.ensembl.org/
Categories: Human Genome Databases, Maps, and Viewers, Model Organisms and Comparative Genomics Databases

The Ensembl (http://www.ensembl.org/) project provides a comprehensive and integrated source of annotation of large genome sequences. Over the last year the number of genomes available from the Ensembl site has increased from 4 to 19, with the addition of the mammalian genomes of Rhesus macaque and Opossum, the chordate genome of Ciona intestinalis and the import and integration of the yeast genome. The year has also seen extensive improvements to both data analysis and presentation, with the introduction of a redesigned website, the addition of RNA gene and regulatory annotation and substantial improvements to the integration of human genome variation data.

Citation for the above abstract:
Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X. M., Flicek, P., Graf, S., Hammond, M., Herrero, J., Howe, K., Iyer, V., Jekosch, K., Kahari, A., Kasprzyk, A., Keefe, D., Kokocinski, F., Kulesha, E., London, D., Longden, I., Melsopp, C., Meidl, P., Overduin, B., Parker, A., Proctor, G., Prlic, A., Rae, M., Rios, D., Redmond, S., Schuster, M., Sealy, I., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Stabenau, A., Stalker, J., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., Hubbard, T. J. P.
Ensembl 2006
Nucl. Acids Res. 2006 34: D556-561
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D556



164. NONCODE

URL: http://www.bioinfo.org.cn/NONCODE/index.htm
Categories: RNA Sequence Databases

NONCODE is an integrated knowledge database dedicated to non-coding RNAs (ncRNAs), that is to say, RNAs that function without being translated into proteins. All ncRNAs in NONCODE were filtered automatically from literature and GenBank, and were later manually curated. The distinctive features of NONCODE are as follows: (i) the ncRNAs in NONCODE include almost all the types of ncRNAs, except transfer RNAs and ribosomal RNAs. (ii) All ncRNA sequences and their related information (e.g. function, cellular role, cellular location, chromosomal information, etc.) in NONCODE have been confirmed manually by consulting relevant literature: more than 80% of the entries are based on experimental data. (iii) Based on the cellular process and function, which a given ncRNA is involved in, we introduced a novel classification system, labeled process function class, to integrate existing classification systems. (iv) In addition, some 1100 ncRNAs have been grouped into nine other classes according to whether they are specific to gender or tissue or associated with tumors and diseases, etc. (v) NONCODE provides a user-friendly interface, a visualization platform and a convenient search option, allowing efficient recovery of sequence, regulatory elements in the flanking sequences, secondary structure, related publications and other information. The first release of NONCODE (v1.0) contains 5339 non-redundant sequences from 861 organisms, including eukaryotes, eubacteria, archaebacteria, virus and viroids. Access is free for all users through a web interface at http://noncode.bioinfo.org.cn.

Citation for the above abstract:
Liu, Changning, Bai, Baoyan, Skogerbo, Geir, Cai, Lun, Deng, Wei, Zhang, Yong, Bu, Dongbo, Zhao, Yi, Chen, Runsheng
NONCODE: an integrated knowledge database of non-coding RNAs
Nucl. Acids Res. 2005 33: D112-115
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D112



165. Plant snoRNA Database

URL: http://bioinf.scri.sari.ac.uk/cgi-bin/plant_snorna/home
Categories: General Plant Databases, RNA Sequence Databases

The Plant snoRNA database (http://www.scri.sari.ac.uk/plant_snoRNA/) provides information on small nucleolar RNAs from Arabidopsis and eighteen other plant species. Information includes sequences, expression data, methylation and pseudouridylation target modification sites, initial gene organization (polycistronic, single gene and intronic) and the number of gene variants. The Arabidopsis information is divided into box C/D and box H/ACA snoRNAs, and within each of these groups, by target sites in rRNA, snRNA or unknown. Alignments of orthologous genes and gene variants from different plant species are available for many snoRNA genes. Plant snoRNA genes have been given a standard nomenclature, designed wherever possible, to provide a consistent identity with yeast and human orthologues.

Citation for the above abstract:
Brown, John W. S., Echeverria, Manuel, Qu, Liang-Hu, Lowe, Todd M., Bachellerie, Jean-Pierre, Huttenhofer, Alexander, Kastenmayer, James P., Green, Pamela J., Shaw, Paul, Marshall, Dave F.
Plant snoRNA database
Nucl. Acids Res. 2003 31: 432-435
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/432



166. PLANTncRNAs: Noncoding RNAs in Plants

URL: http://www.prl.msu.edu/PLANTncRNAs/
Categories: General Plant Databases, RNA Sequence Databases

"We have collected existing data on plant noncoding RNAs and expanded on this by examining about 20,000 Arabidopsis ESTs for characteristics of noncoding RNAs. About 15 putative Arabidopsis ncRNAs have been reported in the literature or have been annotated. Several have homologs in other plants, but all appear to be plant-specific with the exception of SRP RNA. Conversely, none of about 30 ncRNAs reported from yeast, bacteria or animal systems have homologs in Arabidopsis. To identify additional genes that appear to encode ncRNAs, we used computational tools to filter out the protein coding genes from those corresponding to 20,000 EST clones. What remained were 39 clones that either had the characteristics of ncRNAs (19), peptide coding RNAs (pepRNAs)(9) or could not be differentiated between the two categories(11). Again none of these clones had homologs outside the plant kingdom indicating that most ncRNAs of Arabidopsis are likely plant-specific."



167. PLMItRNA: a Database for tRNA Molecules and Genes in Mitochondria of Photosynthetic Eukaryotes

URL: http://bighost.area.ba.cnr.it/PLMItRNA/
Categories: Mitochondrial Genes and Proteins Databases, RNA Sequence Databases

The updated version of PLMItRNA reports information and multialignments on 609 genes and 34 tRNA molecules active in the mitochondria of Viridiplantae (27 Embryophyta and 10 Chlorophyta), and photosynthetic algae (one Cryptophyta, four Rhodophyta and two Stramenopiles). Colour-code based tables reporting the different genetic origin of identified genes allow hyper-textual link to single entries. Promoter sequences identified for tRNA genes in the mitochondrial genomes of Angiospermae are also reported. The PLMItRNA database is accessible at http://bighost.area.ba.cnr.it/PLMItRNA/.

Citation for the above abstract:
Rainaldi, Guglielmo, Volpicella, Mariateresa, Licciulli, Flavio, Liuni, Sabino, Gallerani, Raffaele, Ceci, Luigi R.
PLMItRNA, a database on the heterogeneous genetic origin of mitochondrial tRNA genes and tRNAs in photosynthetic eukaryotes
Nucl. Acids Res. 2003 31: 436-438
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/436



168. PolyA_DB: Polyadenylation Database

URL: http://polya.umdnj.edu/polyadb/
Categories: RNA Sequence Databases

Messenger RNA polyadenylation is one of the key post-transcriptional events in eukaryotic cells. A large number of genes in mammalian species can undergo alternative polyadenylation, which leads to mRNAs with variable 3' ends. As the 3' end of mRNAs often contains cis elements important for mRNA stability, mRNA localization and translation, the implications of the regulation of polyadenylation can be multifold. Alternative polyadenylation is controlled by cis elements and trans factors, and is believed to occur in a tissue- or disease-specific manner. Given the availability of many databases devoted to other aspects of mRNA metabolism, such as transcriptional initiation and splicing, systematic information on polyadenylation, including alternative polyadenylation and its regulation, is noticeably lacking. Here, we present a database named polyA_DB, through which we strive to provide several types of information regarding polyadenylation in mammalian species: (i) polyadenylation sites and their locations with respect to the genomic structure of genes; (ii) cis elements surrounding polyadenylation sites; (iii) comparison of polyadenylation configuration between orthologous genes; and (iv) tissue/organ information for alternative polyadenylation sites. Currently, polyA_DB contains 45,565 polyadenylation sites for 25,097 human and mouse genes, representing the most comprehensive polyadenylation database till date. The database is accessible via the website (http://polya.umdnj.edu/polyadb).

Citation for the above abstract:
Zhang, Haibo, Hu, Jun, Recce, Michael, Tian, Bin
PolyA_DB: a database for mammalian mRNA polyadenylation
Nucl. Acids Res. 2005 33: D116-120
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D116



169. PseudoBase

URL: http://wwwbio.leidenuniv.nl/~Batenburg/PKB.html
Categories: RNA Sequence Databases

PseudoBase is a database containing structural, functional and sequence data related to RNA pseudo-knots. It can be reached at http://wwwbio.LeidenUniv.nl/ approximately Batenburg/PKB.html. For each pseudoknot, thirteen items are stored, for example the relevant sequence, the stem positions of the pseudoknot, the EMBL accession number of the sequence and the support that can be given regarding the reliability of the pseudo-knot. Since the last publication, information on sizes of the stems and the loops in the pseudoknots has been added. Also added are alternative entries that produce surveys of where the pseudoknots are, sorted according to stem size or loop size.

Citation for the above abstract:
van Batenburg, F. H. D., Gultyaev, A. P., Pleij, C. W. A.
PseudoBase: structural information on RNA pseudoknots
Nucl. Acids Res. 2001 29: 194-195
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/194



170. Rfam: RNA Families Database of Alignments and CMs

URL: http://www.sanger.ac.uk/Software/Rfam/
Categories: Nucleic Acid Structure Databases, RNA Sequence Databases

Rfam is a comprehensive collection of non-coding RNA (ncRNA) families, represented by multiple sequence alignments and profile stochastic context-free grammars. Rfam aims to facilitate the identification and classification of new members of known sequence families, and distributes annotation of ncRNAs in over 200 complete genome sequences. The data provide the first glimpses of conservation of multiple ncRNA families across a wide taxonomic range. A small number of large families are essential in all three kingdoms of life, with large numbers of smaller families specific to certain taxa. Recent improvements in the database are discussed, together with challenges for the future. Rfam is available on the Web at http://www.sanger.ac.uk/Software/Rfam/ and http://rfam.wustl.edu/.

Citation for the above abstract:
Griffiths-Jones, Sam, Moxon, Simon, Marshall, Mhairi, Khanna, Ajay, Eddy, Sean R., Bateman, Alex
Rfam: annotating non-coding RNAs in complete genomes
Nucl. Acids Res. 2005 33: D121-124
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D121



171. Ribosomal Database Project (RDP-II)

URL: http://rdp.cme.msu.edu/
Categories: RNA Sequence Databases, Taxonomy and Identification Databases

The Ribosomal Database Project-II (RDP-II) pro-vides data, tools and services related to ribosomal RNA sequences to the research community. Through its website (http://rdp.cme.msu.edu), RDP-II offers aligned and annotated rRNA sequence data, analysis services, and phylogenetic inferences (trees) derived from these data. RDP-II release 8.1 contains 16 277 prokaryotic, 5201 eukaryotic, and 1503 mitochondrial small subunit rRNA sequences in aligned and annotated format. The current public beta release of 9.0 debuts a new regularly updated alignment of over 50 000 annotated (eu)bacterial sequences. New analysis services include a sequence search and selection tool (Hierarchy Browser) and a phylogenetic tree building and visualization tool (Phylip Interface). A new interactive tutorial guides users through the basics of rRNA sequence analysis. Other services include probe checking, phylogenetic placement of user sequences, screening of users' sequences for chimeric rRNA sequences, automated alignment, production of similarity matrices, and services to plan and analyze terminal restriction fragment polymorphism (T-RFLP) experiments. The RDP-II email address for questions or comments is rdpstaff@msu.edu.

Citation for the above abstract:
Cole, J. R., Chai, B., Marsh, T. L., Farris, R. J., Wang, Q., Kulam, S. A., Chandra, S., McGarrell, D. M., Schmidt, T. M., Garrity, G. M., Tiedje, J. M.
The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy
Nucl. Acids Res. 2003 31: 442-443
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/442



172. RISSC: Ribosomal Internal Spacer Sequence Collection

URL: http://miracle.umh.es/rissc/
Categories: RNA Sequence Databases, Taxonomy and Identification Databases

A novel database, under the acronym RISSC (Ribosomal Intergenic Spacer Sequence Collection), has been created. It compiles more than 1600 entries of edited DNA sequence data from the 16S-23S ribosomal spacers present in most prokaryotes and organelles (e.g. mitochondria and chloroplasts) and is accessible through the Internet (http://ulises.umh.es/RISSC), where systematic searches for specific words can be conducted, as well as BLAST-type sequence searches. Additionally, a characteristic feature of this region, the presence/absence and nature of tRNA genes within the spacer, is included in all the entries, even when not previously indicated in the original database. All these combined features could provide a useful documentation tool for studies on evolution, identification, typing and strain characterization, among others.

Citation for the above abstract:
Garcia-Martinez, Jesus, Bescos, Ignacio, Rodriguez-Sala, Jesus Javier, Rodriguez-Valera, Francisco
RISSC: a novel database for ribosomal 16S-23S RNA genes spacer regions
Nucl. Acids Res. 2001 29: 178-180
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/178



173. RNA Modification Database

URL: http://medlib.med.utah.edu/RNAmods/
Categories: RNA Sequence Databases

The RNA Modification Database (http://medlib.med.utah.edu/RNAmods/ ) provides a comprehensive listing of naturally modified nucleosides in RNA. Each file includes: chemical structure; common name and symbol; type(s) of RNA in which found and corresponding phylogenetic distribution; Chemical Abstracts registry number and index name; and initial literature citations for structure characterization and chemical synthesis. New features include capability to search database files by name or substructural features, modifications in tmRNA, and links to related data and sites.

Citation for the above abstract:
Rozenski, J, Crain, PF, McCloskey, JA
The RNA Modification Database: 1999 update
Nucl. Acids Res. 1999 27: 196-197
© 1999 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/196



174. RNAdb

URL: http://research.imb.uq.edu.au/rnadb/
Categories: RNA Sequence Databases

In recent years, there have been increasing numbers of transcripts identified that do not encode proteins, many of which are developmentally regulated and appear to have regulatory functions. Here, we describe the construction of a comprehensive mammalian noncoding RNA database (RNAdb) which contains over 800 unique experimentally studied non-coding RNAs (ncRNAs), including many associated with diseases and/or developmental processes. The database is available at http://research.imb.uq.edu.au/RNAdb and is searchable by many criteria. It includes microRNAs and snoRNAs, but not infrastructural RNAs, such as rRNAs and tRNAs, which are catalogued elsewhere. The database also includes over 1100 putative antisense ncRNAs and almost 20,000 putative ncRNAs identified in high-quality murine and human cDNA libraries, with more to be added in the near future. Many of these RNAs are large, and many are spliced, some alternatively. The database will be useful as a foundation for the emerging field of RNomics and the characterization of the roles of ncRNAs in mammalian gene expression and regulation.

Citation for the above abstract:
Pang, Ken C., Stephen, Stuart, Engstrom, Par G., Tajul-Arifin, Khairina, Chen, Weisan, Wahlestedt, Claes, Lenhard, Boris, Hayashizaki, Yoshihide, Mattick, John S.
RNAdb--a comprehensive mammalian noncoding RNA database
Nucl. Acids Res. 2005 33: D125-130
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D125



175. siRNAdb

URL: http://sirna.cgb.ki.se/
Categories: RNA Sequence Databases

Short interfering RNAs (siRNAs) are a popular method for gene-knockdown, acting by degrading the target mRNA. Before performing experiments it is invaluable to locate and evaluate previous knockdown experiments for the gene of interest. The siRNA database provides a gene-centric view of siRNA experimental data, including siRNAs of known efficacy and siRNAs predicted to be of high efficacy by a combination of methods. Linked to these sequences is information such as siRNA thermodynamic properties and the potential for sequence-specific off-target effects. The database enables the user to evaluate an siRNA's potential for inhibition and non-specific effects. The database is available at http://siRNA.cgb.ki.se.

Citation for the above abstract:
Chalk, Alistair M., Warfinge, Richard E., Georgii-Hemming, Patrick, Sonnhammer, Erik L. L.
siRNAdb: a database of siRNA sequences
Nucl. Acids Res. 2005 33: D131-134
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D131



176. Small RNA Database

URL: http://condor.bcm.tmc.edu/smallRNA/smallrna.html
Categories: RNA Sequence Databases

The small RNA database is a compilation of all the small size RNA sequences available to date, including nuclear, nucleolar, cytoplasmic and mitochondria small RNAs from eukaryotic organisms and small RNAs from prokaryotic cells as well as viruses. Currently, approximately 600 small RNA sequences are in our database. It also gives the sources of individual RNAs and their GenBank accession numbers. The small RNA database can be accessed through the WWW (World Wide Web). Our WWW URL address is: http://mbcr.bcm.tmc. edu/smallRNA/smallrna.html . The new small RNA sequences published since our last compilation are listed in this paper (Table 1).

Citation for the above abstract:
Gu, J, Chen, Y, Reddy, R
Small RNA database
Nucl. Acids Res. 1998 26: 160-162
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/160



177. SRPDB: Signal Recognition Particle Database

URL: http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html
Categories: Individual Protein Family Databases, RNA Sequence Databases

Maintained at the University of Texas Health Science Center at Tyler, Texas, the tmRNA database (tmRDB) is accessible at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html with mirror sites located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Royal Veterinary and Agricultural University, Denmark (http://tmrdb.kvl.dk/). The signal recognition particle database (SRPDB) at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html is mirrored at http://srpdb.kvl.dk/ and the University of Goteborg (http://bio.lundberg.gu.se/dbs/SRPDB/SRPDB.html). The databases assist in investigations of the tmRNP (a ribonucleoprotein complex which liberates stalled bacterial ribosomes) and the SRP (a particle which recognizes signal sequences and directs secretory proteins to cell membranes). The curated tmRNA and SRP RNA alignments consider base pairs supported by comparative sequence analysis. Also shown are alignments of the tmRNA-associated proteins SmpB, ribosomal protein S1, alanyl-tRNA synthetase and Elongation Factor Tu, as well as the SRP proteins SRP9, SRP14, SRP19, SRP21, SRP54 (Ffh), SRP68, SRP72, cpSRP43, Flhf, SRP receptor (alpha) and SRP receptor (beta). All alignments can be easily examined using a new exploratory browser. The databases provide links to high-resolution structures and serve as depositories for structures obtained by molecular modeling.

Citation for the above abstract:
Andersen, Ebbe Sloth, Rosenblad, Magnus Alm, Larsen, Niels, Westergaard, Jesper Cairo, Burks, Jody, Wower, Iwona K., Wower, Jacek, Gorodkin, Jan, Samuelsson, Tore, Zwieb, Christian
The tmRDB and SRPDB resources
Nucl. Acids Res. 2006 34: D163-168
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D163



178. Subviral RNA Database

URL: http://subviral.med.uottawa.ca/
Categories: RNA Sequence Databases, Viral Databases

We describe here the establishment of an online database containing a large number of sequences and related data on viroids, viroid-like RNAs and human hepatitis delta virus (vHDV) in a customizable and user-friendly format.

Citation for the above abstract:
Pelchat, Martin, Rocheleau, Lynda, Perreault, Jonathan, Perreault, Jean-Pierre
SubViral RNA: a database of the smallest known auto-replicable RNA species
Nucl. Acids Res. 2003 31: 444-445
© 2003 Oxford University Press.



The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/444



179. The Small Subunit rRNA Modification Database

URL: http://medstat.med.utah.edu/SSUmods/
Categories: RNA Sequence Databases

The Small Subunit rRNA Modification Database provides a listing of reported post-transcriptionally modified nucleosides and sequence sites in small subunit rRNAs from bacteria, archaea and eukarya. Data are compiled from reports of full or partial rRNA sequences, including RNase T1 oligonucleotide catalogs reported in earlier literature in studies of phylogenetic relatedness. Options for data presentation include full sequence maps, some of which have been assembled by database curators with the aid of contemporary gene sequence data, and tabular forms organized by source organism or chemical identity of the modification. A total of 32 rRNA sequence alignments are provided, annotated with sites of modification and chemical identities of modifications if known, with provision for scrolling full sequences or user-dictated subsequences for comparative viewing for organisms of interest. The database can be accessed through the World Wide Web at http://medlib.med.utah.edu/SSUmods.

Citation for the above abstract:
McCloskey, James A., Rozenski, Jef
The Small Subunit rRNA Modification Database
Nucl. Acids Res. 2005 33: D135-138
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D135



180. The tmRNA Website

URL: http://www.indiana.edu/~tmrna/
Categories: RNA Sequence Databases

tmRNA combines tRNA- and mRNA-like properties and ameliorates problems arising from stalled ribosomes. Research on the mechanism, structure and biology of tmRNA is served by the tmRNA website (http://www.indiana.edu/~ tmrna), a collection of sequences, alignments, secondary structures and other information. Because many of these sequences are not in GenBank, a BLAST server has been added; another new feature is an abbreviated alignment for the tRNA-like domain only. Many tmRNA sequences from plastids have been added, five found in public sequence data and another 10 generated by direct sequencing; detection in early-branching members of the green plastid lineage brings coverage to all three primary plastid lineages. The new sequences include the shortest known tmRNA sequence. While bacterial tmRNAs usually have a lone pseudoknot upstream of the mRNA segment and a string of three or four pseudoknots downstream, plastid tmRNAs collectively show loss of pseudoknots at both postions. The pseudoknot-string region is also too short to contain the usual pseudoknot number in another new entry, the tmRNA sequence from a bacterial endosymbiont of insect cells, Tremblaya princeps. Pseudoknots may optimize tmRNA function in free-living bacteria, yet become dispensible when the endosymbiotic lifestyle relaxes selective pressure for fast growth.

Citation for the above abstract:
Gueneau de Novoa, Pulcherie, Williams, Kelly P.
The tmRNA website: reductive evolution of tmRNA in plastids and other endosymbionts
Nucl. Acids Res. 2004 32: D104-108
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D104



181. tmRDB: tmRNA Database

URL: http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html
Categories: RNA Sequence Databases

Maintained at the University of Texas Health Science Center at Tyler, Texas, the tmRNA database (tmRDB) is accessible at the URL http://psyche.uthct.edu/dbs/tmRDB/tmRDB.html with mirror sites located at Auburn University, Auburn, Alabama (http://www.ag.auburn.edu/mirror/tmRDB/) and the Royal Veterinary and Agricultural University, Denmark (http://tmrdb.kvl.dk/). The signal recognition particle database (SRPDB) at http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html is mirrored at http://srpdb.kvl.dk/ and the University of Goteborg (http://bio.lundberg.gu.se/dbs/SRPDB/SRPDB.html). The databases assist in investigations of the tmRNP (a ribonucleoprotein complex which liberates stalled bacterial ribosomes) and the SRP (a particle which recognizes signal sequences and directs secretory proteins to cell membranes). The curated tmRNA and SRP RNA alignments consider base pairs supported by comparative sequence analysis. Also shown are alignments of the tmRNA-associated proteins SmpB, ribosomal protein S1, alanyl-tRNA synthetase and Elongation Factor Tu, as well as the SRP proteins SRP9, SRP14, SRP19, SRP21, SRP54 (Ffh), SRP68, SRP72, cpSRP43, Flhf, SRP receptor (alpha) and SRP receptor (beta). All alignments can be easily examined using a new exploratory browser. The databases provide links to high-resolution structures and serve as depositories for structures obtained by molecular modeling.

Citation for the above abstract:
Andersen, Ebbe Sloth, Rosenblad, Magnus Alm, Larsen, Niels, Westergaard, Jesper Cairo, Burks, Jody, Wower, Iwona K., Wower, Jacek, Gorodkin, Jan, Samuelsson, Tore, Zwieb, Christian
The tmRDB and SRPDB resources
Nucl. Acids Res. 2006 34: D163-168
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D163



182. Compilation of tRNA Sequences and Sequences of tRNA Genes

URL: http://www.staff.uni-bayreuth.de/~btc914/search/index.html
Categories: RNA Sequence Databases

Maintained at the Universitat Bayreuth, Bayreuth, Germany, the Compilation of tRNA Sequences and Sequences of tRNA Genes is accessible at the URL http://www.tRNA.uni-bayreuth.de with mirror site located at the Institute of Protein Research, Pushchino, Russia (http://alpha.protres.ru/trnadbase). The compilation is a searchable, periodically updated database of currently available tRNA sequences. The present version of the database contains a new Genomic tRNA Compilation including the sequences of tRNA genes from genomic sequences published up to July 2003. It consists of about 5800 tRNA gene sequences from 111 organisms covering archaea, bacteria, higher and lower eukarya. The former Compilation of tRNA Genes (up to the end of 1998) and the updated Compilation tRNA Sequences (561 entries) are also supported by the new software. The database can be explored by using multiple search criteria and sequence templates. The database provides a service that allows to obtain statistical information on the occurrences of certain bases at given positions of the tRNA sequences. This allows phylogenic studies and search for identity elements in respect to interactions of tRNAs with various enzymes.

Citation for the above abstract:
Sprinzl, Mathias, Vassilenko, Konstantin S.
Compilation of tRNA sequences and sequences of tRNA genes
Nucl. Acids Res. 2005 33: D139-140
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D139



183. Yeast snoRNA Database

URL: http://www.bio.umass.edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_DataBase.html
Categories: Fungal Genome Databases, RNA Sequence Databases

Small nucleolar RNAs (snoRNAs) are involved in cleavage of rRNA, modification of rRNA nucleotides and, perhaps, other aspects of ribosome biogenesis in eukaryotic cells. Scores of snoRNAs have been discovered in recent years from various eukaryotes, and the total number is predicted to be up to 200 different snoRNA species per individual organism. We have created a comprehensive database for snoRNAs from the yeast Saccharomyces cerevisiae which allows easy access to detailed information about each species known (almost 70 snoRNAs are featured). The database consists of three major parts: (i) a utilities section; (ii) a master table; and (iii) a collection of tables for the individual snoRNAs. The utilities section provides an introduction to the database. The master table lists all known S. cerevisiae snoRNAs and their major properties. Information in the individual tables includes: alternate names, size, family classification, genomic organization, sequences (with major features identified), GenBank accession numbers, occurrence of homologues, gene disruption phenotypes, functional properties and associated RNAs and proteins. All information is accompanied with appropriate literature references. The database is available on the World Wide Web (http://www.bio.umass. edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_ DataBase.html), and should be useful for a wide range of snoRNA studies.

Citation for the above abstract:
Samarsky, DA, Fournier, MJ
A comprehensive database for the small nucleolar RNAs from Saccharomyces cerevisiae
Nucl. Acids Res. 1999 27: 161-164
© 1999 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/161



184. EXProt: database for EXPerimentally verified Protein functions

URL: http://www.cmbi.kun.nl/EXProt/
Categories: General Protein Sequence Databases

EXProt is a non-redundant protein database containing a selection of entries from genome annotation projects and public databases, aimed at including only proteins with an experimentally verified function. In EXProt release 2.0 we have collected entries from the Pseudomonas aeruginosa community annotation project (PseudoCAP), the Escherichia coli genome and proteome database (GenProtEC) and the translated coding sequences from the Prokaryotes division of EMBL nucleotide sequence database, which are described as having an experimentally verified function. Each entry in EXProt has a unique ID number and contains information about the species, amino acid sequence, functional annotation and, in most cases, links to references in MEDLINE/PubMed and to the entry in the original database. EXProt is indexed in SRS at CMBI (http://www.cmbi.kun.nl/srs/) and can be searched with BLAST and FASTA through the EXProt web page (http://www.cmbi.kun.nl/EXProt/).

Citation for the above abstract:
Ursing, Bjorn M., van Enckevort, Frank H. J., Leunissen, Jack A. M., Siezen, Roland J.
EXProt: a database for proteins with an experimentally verified function
Nucl. Acids Res. 2002 30: 50-51
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/50



185. NCBI Protein database

URL: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Protein
Categories: General Protein Sequence Databases

"The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences submitted to Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) (sequences from solved structures)."



186. PA-GOSUB

URL: http://www.cs.ualberta.ca/~bioinfo/PA/GOSUB/
Categories: General Protein Sequence Databases

PA-GOSUB (Proteome Analyst: Gene Ontology Molecular Function and Subcellular Localization) is a publicly available, web-based, searchable and downloadable database that contains the sequences, predicted GO molecular functions and predicted subcellular localizations of more than 107,000 proteins from 10 model organisms (and growing), covering the major kingdoms and phyla for which annotated proteomes exist (http://www.cs.ualberta.ca/~bioinfo/PA/GOSUB). The PA-GOSUB database effectively expands the coverage of subcellular localization and GO function annotations by a significant factor (already over five for subcellular localization, compared with Swiss-Prot v42.7), and more model organisms are being added to PA-GOSUB as their sequenced proteomes become available. PA-GOSUB can be used in three main ways. First, a researcher can browse the pre-computed PA-GOSUB annotations on a per-organism and per-protein basis using annotation-based and text-based filters. Second, a user can perform BLAST searches against the PA-GOSUB database and use the annotations from the homologs as simple predictors for the new sequences. Third, the whole of PA-GOSUB can be downloaded in either FASTA or comma-separated values (CSV) formats.

Citation for the above abstract:
Lu, Paul, Szafron, Duane, Greiner, Russell, Wishart, David S., Fyshe, Alona, Pearcy, Brandon, Poulin, Brett, Eisner, Roman, Ngo, Danny, Lamb, Nicholas
PA-GOSUB: a searchable database of model organism protein sequences with their predicted Gene Ontology molecular function and subcellular localization
Nucl. Acids Res. 2005 33: D147-153
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D147



187. Polygenic Signaling Pathways

URL: http://www.polygenicpathways.co.uk
Categories: Gene-, System-, or Disease- Specific Databases

"This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease."



188. Polygenic Signaling Pathways

URL: http://www.polygenicpathways.co.uk
Categories: Gene-, System-, or Disease- Specific Databases

"This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease."



189. Polygenic Signaling Pathways

URL: http://www.polygenicpathways.co.uk
Categories: Gene-, System-, or Disease- Specific Databases

"This site contains lists of genes positively associated with Alzheimer's disease, Bipolar disorder or Schizophrenia. The protein products of these genes form consecutive elements of a signaling cascade or metabolic pathway. They may bind to each other, control each others transcription or form functional microcomplexes. These pathways, etched out by multiple association studies, may underpin the pathology of each disease."



190. PIR-PSD: Protein Sequence Database

URL: http://pir.georgetown.edu/
Categories: General Protein Sequence Databases

The Protein Information Resource (PIR) is an integrated public resource of protein informatics that supports genomic and proteomic research and scientific discovery. PIR maintains the Protein Sequence Database (PSD), an annotated protein database containing over 283 000 sequences covering the entire taxonomic range. Family classification is used for sensitive identification, consistent annotation, and detection of annotation errors. The superfamily curation defines signature domain architecture and categorizes memberships to improve automated classification. To increase the amount of experimental annotation, the PIR has developed a bibliography system for literature searching, mapping, and user submission, and has conducted retrospective attribution of citations for experimental features. PIR also maintains NREF, a non-redundant reference database, and iProClass, an integrated database of protein family, function, and structure information. PIR-NREF provides a timely and comprehensive collection of protein sequences, currently consisting of more than 1 000 000 entries from PIR-PSD, SWISS-PROT, TrEMBL, RefSeq, GenPept, and PDB. The PIR web site (http://pir.georgetown.edu) connects data analysis tools to underlying databases for information retrieval and knowledge discovery, with functionalities for interactive queries, combinations of sequence and text searches, and sorting and visual exploration of search results. The FTP site provides free download for PSD and NREF biweekly releases and auxiliary databases and files.

Citation for the above abstract:
Wu, Cathy H., Yeh, Lai-Su L., Huang, Hongzhan, Arminski, Leslie, Castro-Alvear, Jorge, Chen, Yongxing, Hu, Zhangzhi, Kourtesis, Panagiotis, Ledley, Robert S., Suzek, Baris E., Vinayaka, C.R., Zhang, Jian, Barker, Winona C.
The Protein Information Resource
Nucl. Acids Res. 2003 31: 345-347
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/345



191. PRF: Peptide Research Foundation Databases

URL: http://www4.prf.or.jp/en/
Categories: General Protein Sequence Databases

"You can search Literature Database (PRF/LITDB) and Protein/Peptide Sequence Database (PRF/SEQDB) of PRF."



192. Swiss-Prot

URL: http://www.expasy.org/sprot/
Categories: General Protein Sequence Databases

The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.

Citation for the above abstract:
Boeckmann, Brigitte, Bairoch, Amos, Apweiler, Rolf, Blatter, Marie-Claude, Estreicher, Anne, Gasteiger, Elisabeth, Martin, Maria J., Michoud, Karine, O'Donovan, Claire, Phan, Isabelle, Pilbout, Sandrine, Schneider, Michel
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
Nucl. Acids Res. 2003 31: 365-370
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/365



193. TrEMBL

URL: http://www.expasy.org/sprot/
Categories: General Protein Sequence Databases

The SWISS-PROT protein knowledgebase (http://www.expasy.org/sprot/ and http://www.ebi.ac.uk/swissprot/) connects amino acid sequences with the current knowledge in the Life Sciences. Each protein entry provides an interdisciplinary overview of relevant information by bringing together experimental results, computed features and sometimes even contradictory conclusions. Detailed expertise that goes beyond the scope of SWISS-PROT is made available via direct links to specialised databases. SWISS-PROT provides annotated entries for all species, but concentrates on the annotation of entries from human (the HPI project) and other model organisms to ensure the presence of high quality annotation for representative members of all protein families. Part of the annotation can be transferred to other family members, as is already done for microbes by the High-quality Automated and Manual Annotation of microbial Proteomes (HAMAP) project. Protein families and groups of proteins are regularly reviewed to keep up with current scientific findings. Complementarily, TrEMBL strives to comprise all protein sequences that are not yet represented in SWISS-PROT, by incorporating a perpetually increasing level of mostly automated annotation. Researchers are welcome to contribute their knowledge to the scientific community by submitting relevant findings to SWISS-PROT at swiss-prot@expasy.org.

Citation for the above abstract:
Boeckmann, Brigitte, Bairoch, Amos, Apweiler, Rolf, Blatter, Marie-Claude, Estreicher, Anne, Gasteiger, Elisabeth, Martin, Maria J., Michoud, Karine, O'Donovan, Claire, Phan, Isabelle, Pilbout, Sandrine, Schneider, Michel
The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003
Nucl. Acids Res. 2003 31: 365-370
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/365



194. Real Time PCR Primer Sets Database

URL: http://www.realtimeprimers.org/
Categories: Molecular Probe and Primer Databases





195. EPIMHC: A Curated Database of MHC Ligands

URL: http://immunax.dfci.harvard.edu/bioinformatics/epimhc/
Categories: Immunological Databases

SUMMARY: EPIMHC is a relational database of MHC-binding peptides and T cell epitopes that are observed in real proteins. Currently, the database contains 4867 distinct peptide sequences from various sources, including 84 tumor-associated antigens. The EPIMHC database is accessible through a web server that has been designed to facilitate research in computational vaccinology. Importantly, peptides resulting from a query can be selected to derive specific motif-matrices. Subsequently, these motif-matrices can be used in combination with a dynamic algorithm for predicting MHC-binding peptides from user-provided protein queries. AVAILABILITY: The EPIMHC database server is hosted by the Dana-Farber Cancer Institute at the site http://immunax.dfci.harvard.edu/bioinformatics/epimhc/

Citation for the above abstract:
Reche PA, Zhang H, Glutting JP, Reinherz EL. EPIMHC: a curated database of MHC-binding peptides for customized computational vaccinology.
Bioinformatics. 2005 May 1;21(9):2140-1. Epub 2005 Jan 18.
© 2005 Oxford University Press.


The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15657103



196. GDB: The GDB Human Genome Database

URL: http://www.gdb.org/
Categories: Human Genome Databases, Maps, and Viewers

The Genome Database (GDB, http://www.gdb.org ) is a public repository of data on human genes, clones, STSs, polymorphisms and maps. GDB entries are highly cross-linked to each other, to literature citations and to entries in other databases, including the sequence databases, OMIM, and the Mouse Genome Database. Mapping data from large genome centers and smaller mapping efforts are added to GDB on an ongoing basis. The database can be searched by a variety of methods, ranging from keyword searches to complex queries. Major functionality extensions in the last year include the ongoing computation of integrated human genome maps, called Comprehensive Maps, and the use of those maps to support positional queries and graphic displays. The capabilities of the GDB map viewer (Mapview) have been extended to include map printing and the graphical display of ad hoc query results. The HUGO Nomenclature Committee continues to curate the proposed and official gene symbols and related data in collaboration with GDB. As genome research shifts its emphasis from mapping to sequencing and functional analysis, the scope of the GDB schema is being extended. We are in the process of adding representations of gene function and expression, and improving our representation of human polymorphism and mutation.

Citation for the above abstract:
Letovsky, SI, Cottingham, RW, Porter, CJ, Li, PWD
GDB: the Human Genome Database
Nucl. Acids Res. 1998 26: 94-99
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/94



197. GRL: Gene Resource Locator

URL: http://grl.gi.k.u-tokyo.ac.jp/
Categories: Human Genome Databases, Maps, and Viewers

Since the advent of the draft human genome sequence there has been growing interest in transcriptome analysis based on genomic data. The Gene Resource Locator (GRL) assembles gene maps that include information on gene-expression patterns, cis-elements in regulatory regions and alternatively spliced transcripts. The database was constructed using customized software, and currently contains 2.2 million alignments (exon-intron structures). The alignments have been annotated and integrated into a system that encompasses approximately 90 000 EST loci sharing common exons, 8091 alternatively spliced transcript groups, 10 801 expression-profile groups, 8066 candidate regulatory regions in full-length cDNAs, and 1 million SNP loci. We have used Flash technology to build a dynamic web viewer that facilitates browsing through the millions of alignments. All of the information is available through the World Wide Web at the Gene Resource Locator web site (http://grl.gi.k.u-tokyo.ac.jp).

Citation for the above abstract:
Honkura, Toshihiko, Ogasawara, Jun, Yamada, Tomoyuki, Morishita, Shinichi
The Gene Resource Locator: gene locus maps for transcriptome analysis
Nucl. Acids Res. 2002 30: 221-225
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/221



198. UniProt: Universal Protein Resource

URL: http://www.pir.uniprot.org/
Categories: General Protein Sequence Databases

The Universal Protein Resource (UniProt) provides a central resource on protein sequences and functional annotation with three database components, each addressing a key need in protein bioinformatics. The UniProt Knowledgebase (UniProtKB), comprising the manually annotated UniProtKB/Swiss-Prot section and the automatically annotated UniProtKB/TrEMBL section, is the preeminent storehouse of protein annotation. The extensive cross-references, functional and feature annotations and literature-based evidence attribution enable scientists to analyse proteins and query across databases. The UniProt Reference Clusters (UniRef) speed similarity searches via sequence space compression by merging sequences that are 100% (UniRef100), 90% (UniRef90) or 50% (UniRef50) identical. Finally, the UniProt Archive (UniParc) stores all publicly available protein sequences, containing the history of sequence data with links to the source databases. UniProt databases continue to grow in size and in availability of information. Recent and upcoming changes to database contents, formats, controlled vocabularies and services are described. New download availability includes all major releases of UniProtKB, sequence collections by taxonomic division and complete proteomes. A bibliography mapping service has been added, and an ID mapping service will be available soon. UniProt databases can be accessed online at http://www.uniprot.org or downloaded at ftp://ftp.uniprot.org/pub/databases/.

Citation for the above abstract:
Wu, Cathy H., Apweiler, Rolf, Bairoch, Amos, Natale, Darren A., Barker, Winona C., Boeckmann, Brigitte, Ferro, Serenella, Gasteiger, Elisabeth, Huang, Hongzhan, Lopez, Rodrigo, Magrane, Michele, Martin, Maria J., Mazumder, Raja, O'Donovan, Claire, Redaschi, Nicole, Suzek, Baris
The Universal Protein Resource (UniProt): an expanding universe of protein information
Nucl. Acids Res. 2006 34: D187-191
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D187



199. CyBase: A Database of Cyclic Proteins

URL: http://research.imb.uq.edu.au/cybase
Categories: Protein Property Databases

CyBase is a curated database and information source for backbone-cyclized proteins. The database incorporates naturally occurring cyclic proteins as well as synthetic derivatives, grafted analogues and acyclic permutants. The database provides a centralized repository of information on all aspects of cyclic protein biology and addresses issues pertaining to the management and searching of topologically circular sequences. The database is freely available at http://research.imb.uq.edu.au/cybase.

Citation for the above abstract:
Mulvenna, Jason P., Wang, Conan, Craik, David J.
CyBase: a database of cyclic protein sequence and structure
Nucl. Acids Res. 2006 34: D192-194
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D192



200. UniParc

URL: http://www.uniprot.org/database/archive.shtml/
Categories: General Protein Sequence Databases

The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.

Citation for the above abstract:
Bairoch, Amos, Apweiler, Rolf, Wu, Cathy H., Barker, Winona C., Boeckmann, Brigitte, Ferro, Serenella, Gasteiger, Elisabeth, Huang, Hongzhan, Lopez, Rodrigo, Magrane, Michele, Martin, Maria J., Natale, Darren A., O'Donovan, Claire, Redaschi, Nicole, Yeh, Lai-Su L.
The Universal Protein Resource (UniProt)
Nucl. Acids Res. 2005 33: D154-159
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D154



201. UniRef

URL: http://www.pir.uniprot.org/database/nref.shtml
Categories: General Protein Sequence Databases

The Universal Protein Resource (UniProt) provides the scientific community with a single, centralized, authoritative resource for protein sequences and functional information. Formed by uniting the Swiss-Prot, TrEMBL and PIR protein database activities, the UniProt consortium produces three layers of protein sequence databases: the UniProt Archive (UniParc), the UniProt Knowledgebase (UniProt) and the UniProt Reference (UniRef) databases. The UniProt Knowledgebase is a comprehensive, fully classified, richly and accurately annotated protein sequence knowledgebase with extensive cross-references. This centrepiece consists of two sections: UniProt/Swiss-Prot, with fully, manually curated entries; and UniProt/TrEMBL, enriched with automated classification and annotation. During 2004, tens of thousands of Knowledgebase records got manually annotated or updated; we introduced a new comment line topic: TOXIC DOSE to store information on the acute toxicity of a toxin; the UniProt keyword list got augmented by additional keywords; we improved the documentation of the keywords and are continuously overhauling and standardizing the annotation of post-translational modifications. Furthermore, we introduced a new documentation file of the strains and their synonyms. Many new database cross-references were introduced and we started to make use of Digital Object Identifiers. We also achieved in collaboration with the Macromolecular Structure Database group at EBI an improved integration with structural databases by residue level mapping of sequences from the Protein Data Bank entries onto corresponding UniProt entries. For convenient sequence searches we provide the UniRef non-redundant sequence databases. The comprehensive UniParc database stores the complete body of publicly available protein sequence data. The UniProt databases can be accessed online (http://www.uniprot.org) or downloaded in several formats (ftp://ftp.uniprot.org/pub). New releases are published every two weeks.

Citation for the above abstract:
Bairoch, Amos, Apweiler, Rolf, Wu, Cathy H., Barker, Winona C., Boeckmann, Brigitte, Ferro, Serenella, Gasteiger, Elisabeth, Huang, Hongzhan, Lopez, Rodrigo, Magrane, Michele, Martin, Maria J., Natale, Darren A., O'Donovan, Claire, Redaschi, Nicole, Yeh, Lai-Su L.
The Universal Protein Resource (UniProt)
Nucl. Acids Res. 2005 33: D154-159
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D154



202. DBcat

URL: http://www.infobiogen.fr/services/dbcat/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases

The DBcat (http://www.infobiogen.fr/services/dbcat ) is a comprehensive catalog of biological databases, maintained and curated at Infobiogen. It contains 500 databases classified by application domains. The DBcat is a structured flat-file library, that can be searched by means of an SRS server or a dedicated Web interface. The files are available for download from Infobiogen anonymous ftp server.

Citation for the above abstract:
Discala, Claude, Benigni, Xavier, Barillot, Emmanuel, Vaysseix, Guy
DBcat: a catalog of 500 biological databases
Nucl. Acids Res. 2000 28: 8-9
© 2000 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/28/1/8



203. HGNC: Human Gene Nomenclature Database

URL: http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl
Categories: General Human Genetics Databases, Genome Annotation Terms, Ontology, and Nomenclature Databases

The HUGO Gene Nomenclature Committee (HGNC) aims to give every human gene a unique and ideally meaningful name and symbol. The HGNC database, previously known as Genew, contains over 22 000 public records with approved human gene nomenclature and associated information. The database has undergone major improvements throughout the last year, is publicly available for online searching at http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl and has a new custom downloads interface at http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/gdlw.pl.

Citation for the above abstract:
Eyre, Tina A., Ducluzeau, Fabrice, Sneddon, Tam P., Povey, Sue, Bruford, Elspeth A., Lush, Michael J.
The HUGO Gene Nomenclature Database, 2006 updates
Nucl. Acids Res. 2006 34: D319-321
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D319



204. GO: Gene Ontology

URL: http://www.geneontology.org/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases

The Gene Ontology (GO) project (http://www.geneontology.org) develops and uses a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see http://song.sourceforge.net/). The GO Consortium continues to improve to the vocabulary content, reflecting the impact of several novel mechanisms of incorporating community input. A growing number of model organism databases and genome annotation groups contribute annotation sets using GO terms to GO's public repository. Updates to the AmiGO browser have improved access to contributed genome annotations. As the GO project continues to grow, the use of the GO vocabularies is becoming more varied as well as more widespread. The GO project provides an ontological annotation system that enables biologists to infer knowledge from large amounts of data.

Citation for the above abstract:
Gene Ontology Consortium,
The Gene Ontology (GO) project in 2006
Nucl. Acids Res. 2006 34: D322-326
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D322



205. GOA: Gene Ontology Annotation

URL: http://www.ebi.ac.uk/GOA/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases

The Gene Ontology Annotation (GOA) database (http://www.ebi.ac.uk/GOA) aims to provide high-quality electronic and manual annotations to the UniProt Knowledgebase (Swiss-Prot, TrEMBL and PIR-PSD) using the standardized vocabulary of the Gene Ontology (GO). As a supplementary archive of GO annotation, GOA promotes a high level of integration of the knowledge represented in UniProt with other databases. This is achieved by converting UniProt annotation into a recognized computational format. GOA provides annotated entries for nearly 60,000 species (GOA-SPTr) and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. By integrating GO annotations from other model organism groups, GOA consolidates specialized knowledge and expertise to ensure the data remain a key reference for up-to-date biological information. Furthermore, the GOA database fully endorses the Human Proteomics Initiative by prioritizing the annotation of proteins likely to benefit human health and disease. In addition to a non-redundant set of annotations to the human proteome (GOA-Human) and monthly releases of its GO annotation for all species (GOA-SPTr), a series of GO mapping files and specific cross-references in other databases are also regularly distributed. GOA can be queried through a simple user-friendly web interface or downloaded in a parsable format via the EBI and GO FTP websites. The GOA data set can be used to enhance the annotation of particular model organism or gene expression data sets, although increasingly it has been used to evaluate GO predictions generated from text mining or protein interaction experiments. In 2004, the GOA team will build on its success and will continue to supplement the functional annotation of UniProt and work towards enhancing the ability of scientists to access all available biological information. Researchers wishing to query or contribute to the GOA project are encouraged to email: goa@ebi.ac.uk.

Citation for the above abstract:
Camon, Evelyn, Magrane, Michele, Barrell, Daniel, Lee, Vivian, Dimmer, Emily, Maslen, John, Binns, David, Harte, Nicola, Lopez, Rodrigo, Apweiler, Rolf
The Gene Ontology Annotation (GOA) Database: sharing knowledge in Uniprot with Gene Ontology
Nucl. Acids Res. 2004 32: D262-266
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D262



206. IUPAC Nomenclature database

URL: http://www.chem.qmul.ac.uk/iupac/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases

"Recommendations on Organic & Biochemical Nomenclature, Symbols & Terminology etc."



207. IUBMB Nomenclature database

URL: http://www.chem.qmul.ac.uk/iubmb/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases

"Recommendations on Biochemical & Organic Nomenclature, Symbols & Terminology etc."



208. IUPHAR-RD

URL: http://www.iuphar-db.org/iuphar-rd/
Categories: Drug and Drug Design Databases, Genome Annotation Terms, Ontology, and Nomenclature Databases

"... the official database of the IUPHAR [The International Union of Pharmacology] Committee on Receptor Nomenclature and Drug Classification."



209. PANTHER: Protein ANalysis THrough Evolutionary Relationships

URL: https://panther.appliedbiosystems.com/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases

PANTHER is a large collection of protein families that have been subdivided into functionally related subfamilies, using human expertise. These subfamilies model the divergence of specific functions within protein families, allowing more accurate association with function (ontology terms and pathways), as well as inference of amino acids important for functional specificity. Hidden Markov models (HMMs) are built for each family and subfamily for classifying additional protein sequences. The latest version, 5.0, contains 6683 protein families, divided into 31,705 subfamilies, covering approximately 90% of mammalian protein-coding genes. PANTHER 5.0 includes a number of significant improvements over previous versions, most notably (i) representation of pathways (primarily signaling pathways) and association with subfamilies and individual protein sequences; (ii) an improved methodology for defining the PANTHER families and subfamilies, and for building the HMMs; (iii) resources for scoring sequences against PANTHER HMMs both over the web and locally; and (iv) a number of new web resources to facilitate analysis of large gene lists, including data generated from high-throughput expression experiments. Efforts are underway to add PANTHER to the InterPro suite of databases, and to make PANTHER consistent with the PIRSF database. PANTHER is now publicly available without restriction at http://panther.appliedbiosystems.com.

Citation for the above abstract:
Mi, Huaiyu, Lazareva-Ulitsky, Betty, Loo, Rozina, Kejariwal, Anish, Vandergriff, Jody, Rabkin, Steven, Guo, Nan, Muruganujan, Anushya, Doremieux, Olivier, Campbell, Michael J., Kitano, Hiroaki, Thomas, Paul D.
The PANTHER database of protein families, subfamilies, functions and pathways
Nucl. Acids Res. 2005 33: D284-288
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D284



210. SOURCE

URL: http://source.stanford.edu/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases, Microarray Data and other Gene Expression Databases

The explosion in the number of functional genomic datasets generated with tools such as DNA microarrays has created a critical need for resources that facilitate the interpretation of large-scale biological data. SOURCE is a web-based database that brings together information from a broad range of resources, and provides it in manner particularly useful for genome-scale analyses. SOURCE's GeneReports include aliases, chromosomal location, functional descriptions, GeneOntology annotations, gene expression data, and links to external databases. We curate published microarray gene expression datasets and allow users to rapidly identify sets of co-regulated genes across a variety of tissues and a large number of conditions using a simple and intuitive interface. SOURCE provides content both in gene and cDNA clone-centric pages, and thus simplifies analysis of datasets generated using cDNA microarrays. SOURCE is continuously updated and contains the most recent and accurate information available for human, mouse, and rat genes. By allowing dynamic linking to individual gene or clone reports, SOURCE facilitates browsing of large genomic datasets. Finally, SOURCEs batch interface allows rapid extraction of data for thousands of genes or clones at once and thus facilitates statistical analyses such as assessing the enrichment of functional attributes within clusters of genes. SOURCE is available at http://source.stanford.edu.

Citation for the above abstract:
Diehn, Maximilian, Sherlock, Gavin, Binkley, Gail, Jin, Heng, Matese, John C., Hernandez-Boussard, Tina, Rees, Christian A., Cherry, J. Michael, Botstein, David, Brown, Patrick O., Alizadeh, Ash A.
SOURCE: a unified genomic resource of functional annotations, ontologies, and gene expression data
Nucl. Acids Res. 2003 31: 219-223
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/219



211. UMLS: Unified Medical Language System

URL: http://umlsks.nlm.nih.gov/
Categories: Genome Annotation Terms, Ontology, and Nomenclature Databases

The Unified Medical Language System (http://umlsks.nlm.nih.gov) is a repository of biomedical vocabularies developed by the US National Library of Medicine. The UMLS integrates over 2 million names for some 900,000 concepts from more than 60 families of biomedical vocabularies, as well as 12 million relations among these concepts. Vocabularies integrated in the UMLS Metathesaurus include the NCBI taxonomy, Gene Ontology, the Medical Subject Headings (MeSH), OMIM and the Digital Anatomist Symbolic Knowledge Base. UMLS concepts are not only inter-related, but may also be linked to external resources such as GenBank. In addition to data, the UMLS includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap). The UMLS knowledge sources are updated quarterly. All vocabularies are available at no fee for research purposes within an institution, but UMLS users are required to sign a license agreement. The UMLS knowledge sources are distributed on CD-ROM and by FTP.

Citation for the above abstract:
Bodenreider, Olivier
The Unified Medical Language System (UMLS): integrating biomedical terminology
Nucl. Acids Res. 2004 32: D267-270
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D267



212. ICB: Identification and Classification of Bacteria Database

URL: http://www.mbio.co.jp/icb
Categories: Taxonomy and Identification Databases

The Identification and Classification of Bacteria (ICB) database (http:/www.mbio.co.jp/icb) contains currently available information about the DNA gyrase subunit B (gyrB) gene in bacteria. The database is designed to provide the scientific community with a reference point for using gyrB as an evolutionary and taxonomic marker. Nucleic and amino acid sequence data are currently available for over 850 strains, along with alignments at several different taxonomic levels and an exhaustive review of primer selection and background information.

Citation for the above abstract:
Watanabe, Kanako, Nelson, James, Harayama, Shigeaki, Kasai, Hiroaki
ICB database: the gyrB database for identification and classification of bacteria
Nucl. Acids Res. 2001 29: 344-345
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/344



213. NCBI Taxonomy Browser

URL: http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html
Categories: Taxonomy and Identification Databases

The NCBI taxonomy database indexes over 165 000 named organisms that are represented in the databases with at least one nucleotide or protein sequence. The Taxonomy Browser can be used to view the taxonomic position or retrieve data from any of the principal Entrez databases for a particular organism or group. The Taxonomy Browser also displays links to the Map Viewer, Genomic BLAST services, the Trace Archive, and to model organism and taxonomic databases via LinkOut.

Searches of the NCBI taxonomy may be made on the basis of whole, partial or phonetically spelled organism names, but links to organisms commonly used in biological research are provided. The Entrez Taxonomy system adds the ability to display custom taxonomic trees representing user-defined subsets of the full NCBI taxonomy.

Citation for the above excerpt:
Wheeler, David L., Barrett, Tanya, Benson, Dennis A., Bryant, Stephen H., Canese, Kathi, Church, Deanna M., DiCuccio, Michael, Edgar, Ron, Federhen, Scott, Helmberg, Wolfgang, Kenton, David L., Khovayko, Oleg, Lipman, David J., Madden, Thomas L., Maglott, Donna R., Ostell, James, Pontius, Joan U., Pruitt, Kim D., Schuler, Gregory D., Schriml, Lynn M., Sequeira, Edwin, Sherry, Steven T., Sirotkin, Karl, Starchenko, Grigory, Suzek, Tugba O., Tatusov, Roman, Tatusova, Tatiana A., Wagner, Lukas, Yaschenko, Eugene
Database resources of the National Center for Biotechnology Information
Nucl. Acids Res. 2005 33: D39-45
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D39



214. RIDOM: Ribosomal Differentiation of Medical Microorganisms

URL: http://www.ridom.de/
Categories: Taxonomy and Identification Databases

The ribosomal differentiation of medical micro-organisms (RIDOM) web server, first described by Harmsen et al. [Harmsden,D., Rothganger,J., Singer,C., Albert,J. and Frosch,M. (1999) Lancet, 353, 291], is an evolving electronic resource designed to provide micro-organism differentiation services for medical identification needs. The diagnostic procedure begins with a specimen partial small subunit ribosomal DNA (16S rDNA) sequence. Resulting from a similarity search, a species or genus name for the specimen in question will be returned. Where the first results are ambiguous or do not define to species level, hints for further molecular, i.e. internal transcribed spacer, and conventional phenotypic differentiation will be offered ('sequential and polyphasic approach'). Additionally, each entry in RIDOM contains detailed medical and taxonomic information linked, context-sensitive, to external World Wide Web services. Nearly all sequences are newly determined and the sequence chromatograms are available for intersubjective quality control. Similarity searches are now also possible by direct submission of trace files (ABI or SCF format). Based on the PHRED/PHRAP software, error probability measures are attached to each predicted nucleotide base and visualised with a new 'Trace Editor'. The RIDOM web site is directly accessible on the World Wide Web at http://www.ridom.de/. The email address for questions and comments is webmaster@ridom.de.

Citation for the above abstract:
Harmsen, Dag, Rothganger, Jorg, Frosch, Matthias, Albert, Jurgen
RIDOM: Ribosomal Differentiation of Medical Micro-organisms Database
Nucl. Acids Res. 2002 30: 416-417
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/416



215. CroW 21: The Human Chromosome 21 Database at the Weizmann Institute

URL: http://genecards.weizmann.ac.il/crow21/
Categories: Human Genome Databases, Maps, and Viewers

Recent enhancements and current research in the GeneCards (GC) (http://bioinfo.weizmann.ac.il/cards/) project are described, including the addition of gene expression profiles and integrated gene locations. Also highlighted are the contributions of specialized associated human gene-centric databases developed at the Weizmann Institute. These include the Unified Database (UDB) (http://bioinfo.weizmann.ac.il/udb) for human genome mapping, the human Chromosome 21 database at the Weizmann Insti-tute (CroW 21) (http://bioinfo.weizmann.ac.il/crow21), and the Human Olfactory Receptor Data Explora-torium (HORDE) (http://bioinfo.weizmann.ac.il/HORDE). The synergistic relationships amongst these efforts have positively impacted the quality, quantity and usefulness of the GeneCards gene compendium.

Citation for the above abstract:
Safran, Marilyn, Chalifa-Caspi, Vered, Shmueli, Orit, Olender, Tsviya, Lapidot, Michal, Rosen, Naomi, Shmoish, Michael, Peter, Yakov, Glusman, Gustavo, Feldmesser, Ester, Adato, Avital, Peter, Inga, Khen, Miriam, Atarot, Tal, Groner, Yoram, Lancet, Doron
Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE
Nucl. Acids Res. 2003 31: 142-146
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/142



216. Tree of Life Web Project

URL: http://tolweb.org/tree/phylogeny.html
Categories: Taxonomy and Identification Databases

"The Tree of Life Web Project (ToL) is a collaborative effort of biologists from around the world. On more than 3000 World Wide Web pages, the project provides information about the diversity of organisms on Earth, their evolutionary history (phylogeny), and characteristics.

Each page contains information about a particular group of organisms (e.g., echinoderms, tyrannosaurs, phlox flowers, cephalopods, club fungi, or the salamanderfish of Western Australia). ToL pages are linked one to another hierarchically, in the form of the evolutionary tree of life. Starting with the root of all Life on Earth and moving out along diverging branches to individual species, the structure of the ToL project thus illustrates the genetic connections between all living things."



217. BAMS: The Brain Architecture Management System

URL: http://brancusi.usc.edu/bkms/
Categories: Neuroscience Databases

The brain's structural organization is so complex that 2,500 years of analysis leaves pervasive uncertainty about (i) the identity of its basic parts (regions with their neuronal cell types and pathways interconnecting them), (ii) nomenclature, (iii) systematic classification of the parts with respect to topographic relationships and functional systems and (iv) the reliability of the connectional data itself. Here we present a prototype knowledge management system (http://brancusi.usc.edu/bkms/) for analyzing the architecture of brain networks in a systematic, interactive and extendable way. It supports alternative interpretations and models, is based on fully referenced and annotated data and can interact with genomic and functional knowledge management systems through web services protocols.

Citation for the above abstract:
Mihail Bota, Hong-Wei Dong & Larry W Swanson
From gene networks to brain networks
Nature Neuroscience 6, 795 - 799 (2003)
© 2003 Nature Publishing Group.


The full abstract can be found at: http://www.nature.com/cgi-taf/DynaPage.taf?file=/neuro/journal/v6/n8/abs/nn1096.html&dynoptions=doi1105518408



218. Atlas of Genetics and Cytogenetics in Oncology and Haematology

URL: http://www.infobiogen.fr/services/chromcancer/
Categories: Cancer Databases, Gene-, System-, or Disease- Specific Databases, Metadatabases and Directories

The 'Atlas of Genetics and Cytogenetics in Oncology and Haematology' (http://www.infobiogen.fr/services/chromcancer) contains concise and updated cards on genes involved in cancer, cytogenetics and clinical entities in oncology, and cancer-prone diseases, a portal towards genetics/cancer, and teaching materials in genetics. This database is made for and by researchers and clinicians, who are encouraged to contribute. The Atlas is part of the genome project and it participates in research on cancer epidemiology.

Citation for the above abstract:
Huret, Jean-Loup, Dessen, Philippe, Bernheim, Alain
Atlas of Genetics and Cytogenetics in Oncology and Haematology, year 2003
Nucl. Acids Res. 2003 31: 272-274
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/272



219. CGED: Cancer Gene Expression Database

URL: http://cged.hgc.jp
Categories: Cancer Databases, Microarray Data and other Gene Expression Databases

Gene expression profiling of cancer tissues is expected to contribute to our understanding of cancer biology as well as developments of new methods of diagnosis and therapy. Our collaborative efforts in Japan have been mainly focused on solid tumors such as breast, colorectal and hepatocellular cancers. The expression data are obtained by a high-throughput RT-PCR technique, and patients are recruited mainly from a single hospital. In the cancer gene expression database (CGED), the expression and clinical data are presented in a way useful for scientists interested in specific genes or biological functions. The data can be retrieved either by gene identifiers or by functional categories defined by Gene Ontology terms or the Swiss-Prot annotation. Expression patterns of multiple genes, selected by names or similarity search of the patterns, can be compared. Visual presentation of the data with sorting function enables users to easily recognize of relationships between gene expression and clinical parameters. Data for other cancers such as lung and thyroid cancers will be added in the near future. The URL of CGED is http://cged.hgc.jp.

Citation for the above abstract:
Kato, Kikuya, Yamashita, Riu, Matoba, Ryo, Monden, Morito, Noguchi, Shinzaburo, Takagi, Toshihisa, Nakai, Kenta
Cancer gene expression database (CGED): a database for gene expression profiling with accompanying clinical information of human cancer tissues
Nucl. Acids Res. 2005 33: D533-536
© 2005 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/33/suppl_1/D533



220. COSMIC: Catalogue Of Somatic Mutations In Cancer

URL: http://www.sanger.ac.uk/genetics/CGP/cosmic/
Categories: Cancer Databases

The discovery of mutations in cancer genes has advanced our understanding of cancer. These results are dispersed across the scientific literature and with the availability of the human genome sequence will continue to accrue. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website have been developed to store somatic mutation data in a single location and display the data and other information related to human cancer. To populate this resource, data has currently been extracted from reports in the scientific literature for somatic mutations in four genes, BRAF, HRAS, KRAS2 and NRAS. At present, the database holds information on 66 634 samples and reports a total of 10 647 mutations. Through the web pages, these data can be queried, displayed as figures or tables and exported in a number of formats. COSMIC is an ongoing project that will continue to curate somatic mutation data and release it through the website.

Citation for the above abstract:
Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, Flanagan A, Teague J, Futreal PA, Stratton MR, Wooster R.
The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website.
Br J Cancer. 2004 Jul 19;91(2):355-8.
© 2004 Nature Publishing Group.


The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15188009



221. Database of Germline p53 Mutations

URL: http://www.lf2.cuni.cz/win/projects/germline_mut_p53.htm
Categories: Cancer Databases

We created a comprehensive database covering all published cases of germline p53 mutations. The current version lists 580 tumours in 448 individuals belonging to 122 independent pedigrees. The database describes each p53 mutation (type of the mutation, exon and codon affected by the mutation, nucleotide and amino acid change), each family (family history of cancer, diagnosis of Li-Fraumeni syndrome), each affected individual (sex, generation, p53 status, from which parent the mutation was inherited) and each tumour (type, age of onset, p53 status-loss of heterozygosity, immunostaining). Each entry contains the original reference(s). The database is freely available and can be obtained from http://www.lf2.cuni.cz

Citation for the above abstract:
Sedlacek, Z, Kodet, R, Poustka, A, Goetz, P
A database of germline p53 mutations in cancer-prone families
Nucl. Acids Res. 1998 26: 214-215
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/214



222. Human p53, Human hprt, Rodent lacI and Rodent lacZ Databases

URL: http://www.ibiblio.org/dnam/mainpage.html
Categories: Cancer Databases

We have created databases and software applications for the analysis of DNA mutations at the human p53 gene, the human hprt gene and both the rodent transgenic lacI and lacZ loci. The databases themselves are stand-alone dBASE files and the software for analysis of the databases runs on IBM-compatible computers with Microsoft Windows. Each database has a separate software analysis program. The software created for these databases permit the filtering, ordering, report generation and display of information in the database. In addition, a significant number of routines have been developed for the analysis of single base substitutions. One method of obtaining the databases and software is via the World Wide Web. Open the following home page with a Web Browser: http://sunsite.unc.edu/dnam/mainpage.html . Alternatively, the databases and programs are available via public FTP from: ftp://anonymous@sunsite.unc.edu. There is no password required to enter the system. The databases and software are found beneath the subdirectory: pub/academic/biology/dna-mutations. Two other programs are available at the site, a program for comparison of mutational spectra and a program for entry of mutational data into a relational database.

Citation for the above abstract:
Cariello, NF, Douglas, GR, Gorelick, NJ, Hart, DW, Wilson, JD, Soussi, T
Databases and software for the analysis of mutations in the human p53 gene, human hprt gene and both the lacI and lacZ gene in transgenic rodents
Nucl. Acids Res. 1998 26: 198-199
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/198



223. IARC TP53 Database

URL: http://www-p53.iarc.fr/index.html
Categories: Cancer Databases

Since 1989, about 570 different p53 mutations have been identified in more than 8000 human cancers. A database of these mutations was initiated by M. Hollstein and C. C. Harris in 1990. This database originally consisted of a list of somatic point mutations in the p 53 gene of human tumors and cell lines, compiled from the published literature and made available in a standard electronic form. The database is maintained at the International Agency for Research on Cancer (IARC) and updated versions are released twice a year (January and July). The current version (July 1997) contains records on 6800 published mutations and will surpass the 8000 mark in the January 1998 release. The database now contains information on somatic and germline mutations in a new format to facilitate data retrieval. In addition, new tools are constructed to improve data analysis, such as a Mutation Viewer Java applet developed at the European Bioinformatics Institute (EBI) to visualise the location and impact of mutations on p53 protein structure. The database is available in different electronic formats at IARC (http://www.iarc. fr/p53/homepage.htm ) or from the EBI server (http://www.ebi.ac.uk ). The IARC p53 website also provides reports on database analysis and links with other p53 sites as well as with related databases. In this report, we describe the criteria for inclusion of data, the revised format and the new visualisation tools. We also briefly discuss the relevance of p 53 mutations to clinical and biological questions.

Citation for the above abstract:
Hainaut, P, Hernandez, T, Robinson, A, Rodriguez-Tome, P, Flores, T, Hollstein, M, Harris, CC, Montesano, R
IARC Database of p53 gene mutations in human tumors and cell lines: updated compilation, revised formats and new visualisation tools
Nucl. Acids Res. 1998 26: 205-213
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/205



224. MTB: Mouse Tumor Biology Database

URL: http://tumor.informatics.jax.org/
Categories: Cancer Databases, Model Organisms and Comparative Genomics Databases

The Mouse Tumor Biology (MTB) Database serves as a curated, integrated resource for information about tumor genetics and pathology in genetically defined strains of mice (i.e., inbred, transgenic and targeted mutation strains). Sources of information for the database include the published scientific literature and direct data submissions by the scientific community. Researchers access MTB using Web-based query forms and can use the database to answer such questions as 'What tumors have been reported in transgenic mice created on a C57BL/6J background?', 'What tumors in mice are associated with mutations in the Trp53 gene?' and 'What pathology images are available for tumors of the mammary gland regardless of genetic background?'. MTB has been available on the Web since 1998 from the Mouse Genome Informatics web site (http://www.informatics.jax.org). We have recently implemented a number of enhancements to MTB including new query options, redesigned query forms and results pages for pathology and genetic data, and the addition of an electronic data submission and annotation tool for pathology data.

Citation for the above abstract:
Bult, Carol J., Krupke, Debra M., Naf, Dieter, Sundberg, John P., Eppig, Janan T.
Web-based access to mouse models of human cancers: the Mouse Tumor Biology (MTB) Database
Nucl. Acids Res. 2001 29: 95-97
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/95



225. OrCGDB: Oral Cancer Gene Database

URL: http://www.tumor-gene.org/Oral/oral.html
Categories: Cancer Databases

The Oral Cancer Gene Database (OrCGDB; http://www.tumor-gene. org/Oral/oral.html) was developed to provide the biomedical community with easy access to the latest information on the genes involved in oral cancer. The information is stored in a relational database and accessed through a WWW interface. The OrCGDB is organized by gene name, which is linked to information describing properties of the gene. This information is stored as a collection of findings ('facts') that are entered by the database curator in a semi-structured format from information in primary publications using a WWW interface. These facts include causes of oncogenic activation, chromosomal localization of the gene, mutations associated with the gene, the biochemical identity and activity of the gene product, synonyms for the gene name and a variety of clinical information. Each fact is associated with a MEDLINE citation. The user can search the OrCGDB by gene name or by entering a textword. The OrCGDB is part of a larger WWW-based tumor gene database and represents a new approach to catalog and display the research literature.

Citation for the above abstract:
Levine, Alan E., Steffen, David L.
OrCGDB: a database of genes involved in oral cancer
Nucl. Acids Res. 2001 29: 300-302
© 2001 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/29/1/300



226. RTCGD: Mouse Retroviral Tagged Cancer Gene Database

URL: http://rtcgd.ncifcrf.gov/
Categories: Cancer Databases, Model Organisms and Comparative Genomics Databases

Retroviral insertional mutagenesis in mouse hematopoietic tumors provides a potent cancer gene discovery tool in the post-genome-sequence era. To manage multiple high-throughput insertional mutagenesis screening projects, we developed the Retroviral Tagged Cancer Gene Database (RTCGD; http://RTCGD.ncifcrf.gov). A sequence analysis pipeline determines the genomic position of each retroviral integration site cloned from a mouse tumor, the distance between it and the nearest candidate disease gene(s) and its orientation with respect to the candidate gene(s). The pipeline also identifies genomic regions that are targets of retroviral integration in more than one tumor (common integration sites, CISs) and are thus likely to encode a disease gene. Users can search the database using a specified gene symbol, chromosome number or tumor model to identify both CIS genes and unique viral integration sites or compare the integration sites cloned by different laboratories using different models. As a default setting, users first review the CIS Lists and then Clone Lists. CIS Lists describe CISs and their candidate disease genes along with links to other public databases and clone lists. Clone Lists describe the viral integration site clones along with the tumor model and tumor type from which they were cloned, candidate disease gene(s), genomic position and orientation of the integrated provirus with respect to the candidate gene(s). It also provides a pictorial view of the genomic location of each integration site relative to neighboring genes and markers. Researchers can identify integrations of interest and compare their results with those for multiple tumor models and tumor types using RTCGD.

Citation for the above abstract:
Akagi, Keiko, Suzuki, Takeshi, Stephens, Robert M., Jenkins, Nancy A., Copeland, Neal G.
RTCGD: retroviral tagged cancer gene database
Nucl. Acids Res. 2004 32: D523-527
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D523



227. SNP500Cancer

URL: http://snp500cancer.nci.nih.gov/
Categories: Cancer Databases

The SNP500Cancer database provides sequence and genotype assay information for candidate SNPs useful in mapping complex diseases, such as cancer. The database is an integral component of the NCI Cancer Genome Anatomy Project (http://cgap.nci.nih.gov). SNP500Cancer reports sequence analysis of anonymized control DNA samples (n = 102 Coriell samples representing four self-described ethnic groups: African/African-American, Caucasian, Hispanic and Pacific Rim). The website is searchable by gene, chromosome, gene ontology pathway, dbSNP ID and SNP500Cancer SNP ID. As of October 2005, the database contains >13 400 SNPs, 9124 of which have been sequenced in the SNP500Cancer population. For each analysed SNP, gene location and >200 bp of surrounding annotated sequence (including nearby SNPs) are provided, with frequency information in total and per subpopulation as well as calculation of Hardy–Weinberg equilibrium for each subpopulation. The website provides the conditions for validated sequencing and genotyping assays, as well as genotype results for the 102 samples, in both viewable and downloadable formats. A subset of sequence validated SNPs with minor allele frequency >5% are entered into a high-throughput pipeline for genotyping analysis to determine concordance for the same 102 samples. In addition, the results of genotype analysis for select validated SNP assays (defined as 100% concordance between sequence analysis and genotype results) are posted for an additional 280 samples drawn from the Human Diversity Panel (HDP). SNP500Cancer provides an invaluable resource for investigators to select SNPs for analysis, design genotyping assays using validated sequence data, choose selected assays already validated on one or more genotyping platforms, and select reference standards for genotyping assays. The SNP500Cancer database is freely accessible via the web page at http://snp500cancer.nci.nih.gov.

Citation for the above abstract:
Packer, Bernice R., Yeager, Meredith, Burdett, Laura, Welch, Robert, Beerman, Michael, Qi, Liqun, Sicotte, Hugues, Staats, Brian, Acharya, Mekhala, Crenshaw, Andrew, Eckert, Andrew, Puri, Vinita, Gerhard, Daniela S., Chanock, Stephen J.
SNP500Cancer: a public resource for sequence validation, assay development, and frequency analysis for genetic variation in candidate genes
Nucl. Acids Res. 2006 34: D617-621
© 2006 Oxford University Press.


The full text of the article can be found at: http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D617



228. SV40 Large T-Antigen Mutant Database

URL: http://supernova.bio.pitt.edu/pipaslab/
Categories: Cancer Databases

The SV40 T antigen database (http://www.pitt.edu/~pipslab/ ) lists viruses and plasmids expressing mutant forms of large T antigen. Each entry contains information regarding the mutant designation, mutant type, virus strain, nucleotide change, amino acid change and pertinent references. The database is now available as an internet searchable index.

Citation for the above abstract:
Robinson, CG, Pipas, JM
SV40 large tumor antigen (T antigen): database of mutants
Nucl. Acids Res. 1998 26: 295-296
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/295



229. The Tumor Gene Family of Databases

URL: http://www.tumor-gene.org/tgdf.html
Categories: Cancer Databases

"The Tumor Gene Family of Databases contains information about genes which are targets for cancer-causing mutations; proto-oncogenes and tumor supressor genes. Its goal is to provide a standard set of facts (e.g. protein size, biochemical activity, chromosomal location, ...) about all known tumor genes. At present, the database contains over 2600 facts on over 300 genes.

These databases are designed to for biomedical researchers who work with tumor genes. Anyone is free to search it, but if you are not in this group, it may not be very useful to you.

The Tumor Gene Database Family is a consortium of more specialized databases.
  • The Tumor Gene Database is the least selective and most general. Anything in any of the other databases, plus some information in none of the other databases, is in the Tumor Gene Database. However, because of this breadth of information and because of the lack of selectivity of this database, it can be more efficient to use one of the more specialized and/or selective databases.
  • The Breast Cancer Gene Database is a reviewed, high quality database of genes involved in Breast Cancer.
  • The Oral Cancer Gene Database is a brand-new, high quality database of genes involved in cancers of the mouth. It is also reviewed."




230. UMD-p53 Database

URL: http://p53.free.fr/
Categories: Cancer Databases

The tumor suppressor gene TP53 (p53) is the most extensively studied gene involved in human cancers. More than 1,400 publications have reported mutations of this gene in 150 cancer types for a total of 14,971 mutations. To exploit this huge bulk of data, specific analytic tools were highly warranted. We therefore developed a locus-specific database software called UMD-p53. This database compiles all somatic and germline mutations as well as polymorphisms of the TP53 gene which have been reported in the published literature since 1989, or unpublished data submitted to the database curators. The database is available at www.umd.necker.fr or at http://p53.curie.fr/. In this paper, we describe recent developments of the UMD-p53 database. These developments include new fields and routines. For example, the analysis of putative acceptor or donor splice sites is now automated and gives new insight for the causal role of "silent mutations." Other routines have also been created such as the prescreening module, the UV module, and the cancer distribution module. These new improvements will help users not only for molecular epidemiology and pharmacogenetic studies but also for patient-based studies. To achieve theses purposes we have designed a procedure to check and validate data in order to reach the highest quality data.

Citation for the above abstract:
Beroud C, Soussi T.
The UMD-p53 database: new mutations and analysis tools.
Hum Mutat. 2003 Mar;21(3):176-81.
© 2003 Wiley-Liss, Inc.


The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12619103



231. ALPSbase

URL: http://research.nhgri.nih.gov/alps/
Categories: Gene-, System-, or Disease- Specific Databases, Immunological Databases

"Autoimmune Lymphoproliferative Syndrome (ALPS) is a recently recognized disease in which a genetic defect in programmed cell death, or apoptosis, leads to breakdown of lymphocyte regulation. Patients with ALPS have chronic enlargement of the spleen and lymph nodes, various manifestations of autoimmunity, and elevation of a normally rare population of "double negative T cells" (DNTs), T lymphocytes expressing neither cluster differentiation CD4 nor CD8 surface antigens. When lymphocytes from patients with ALPS are cultured in vitro, they are resistant to apoptosis as compared to cells from healthy controls. Most patients with ALPS have mutations in a gene now named TNFRSF6 (tumor necrosis factor receptor gene superfamily member 6). This gene encodes the cell surface receptor for the major apoptosis pathway in mature lymphocytes. The gene and protein have had several names including Fas (used here), APO-1 and APT1. ALPS is subdivided into: 1) Type Ia, ALPS with mutant Fas; 2) Type Ib, lymphadenopathy and systemic lupus erythematosus with mutation in the ligand for Fas; 3) Type II, ALPS with mutant caspase-10 or caspase-8; and 4) Type III, ALPS as yet without a defined genetic cause."



232. Androgen Receptor Gene Mutations Database

URL: http://www.androgendb.mcgill.ca/
Categories: Gene-, System-, or Disease- Specific Databases

The current version of the androgen receptor (AR) gene mutations database is described. The total number of reported mutations has risen from 374 to 605, and the number of AR-interacting proteins described has increased from 23 to 70, both over the past 3 years. A 3D model of the AR ligand-binding domain (AR LBD) has been added to give a better understanding of gene structure-function relationships. In addition, silent mutations have now been reported in both androgen insensitivity syndrome (AIS) and prostate cancer (CaP) cases. The database also now incorporates information on the exon 1 CAG repeat expansion disease, spinobulbar muscular atrophy (SBMA), as well as CAG repeat length variations associated with risk for female breast, uterine endometrial, colorectal, and prostate cancer, as well as for male infertility. The possible implications of somatic mutations, as opposed to germline mutations, in the development of future locus-specific mutation databases (LSDBs) is discussed. The database is available on the Internet (http://www.mcgill.ca/androgendb/).

Citation for the above abstract:
Gottlieb B, Beitel LK, Wu JH, Trifiro M.
The androgen receptor gene mutations database (ARDB): 2004 update.
Hum Mutat. 2004 Jun;23(6):527-33.
© 2004 Wiley-Liss, Inc.


The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15146455



233. AngioDB: Database of Angiogenesis and Angiogenesis-related Molecules

URL: http://angiodb.snu.ac.kr/
Categories: Gene-, System-, or Disease- Specific Databases

Angiogenesis is the formation of new capillaries sprouting from pre-existing vessels. Angiogenesis occurs in a variety of normal physiological and pathological conditions and is regulated by a balance of stimulatory and inhibitory angiogenic factors. The control of this balance may fail and result in the formation of a pathologic capillary network during the development of many diseases. Therefore, we developed the angiogenesis database (AngioDB), which can provide a signaling network of angiogenesis-related biomolecules in human. Each record of AngioDB consisted of 12 fields and was developed by using a relational database management system. For the retrieval of data, Active Server Page (ASP) technology was integrated in this system. Users can access the database by a query or imagemap browsing program. The retrieving system also provides a list of angiogenesis-related molecules classified by three categories, and the database has an external link to NCBI databases. AngioDB is available via the Internet at http://angiodb.snu.ac.kr/.

Citation for the above abstract:
Sohn, Tae-Kwon, Moon, Eun-Joung, Lee, Seok-Ki, Cho, Hwan-Gue, Kim, Kyu-Won
AngioDB: database of angiogenesis and angiogenesis-related molecules
Nucl. Acids Res. 2002 30: 369-371
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/369



234. BayGenomics

URL: http://baygenomics.ucsf.edu/
Categories: Gene-, System-, or Disease- Specific Databases

The BayGenomics gene-trap resource (http://baygenomics.ucsf.edu) provides researchers with access to thousands of mouse embryonic stem (ES) cell lines harboring characterized insertional mutations in both known and novel genes. Each cell line contains an insertional mutation in a specific gene. The identity of the gene that has been interrupted can be determined from a DNA sequence tag. Approximately 75% of our cell lines contain insertional mutations in known mouse genes or genes that share strong sequence similarities with genes that have been identified in other organisms. These cell lines readily transmit the mutation to the germline of mice and many mutant lines of mice have already been generated from this resource. BayGenomics provides facile access to our entire database, including sequence tags for each mutant ES cell line, through the World Wide Web. Investigators can browse our resource, search for specific entries, download any portion of our database and BLAST sequences of interest against our entire set of cell line sequence tags. They can then obtain the mutant ES cell line for the purpose of generating knockout mice.

Citation for the above abstract:
Stryke, Doug, Kawamoto, Michiko, Huang, Conrad C., Johns, Susan J., King, Leslie A., Harper, Courtney A., Meng, Elaine C., Lee, Roy E., Yee, Alice, L'Italien, Larry, Chuang, Pao-Tien, Young, Stephen G., Skarnes, William C., Babbitt, Patricia C., Ferrin, Thomas E.
BayGenomics: a resource of insertional mutations in mouse embryonic stem cells
Nucl. Acids Res. 2003 31: 278-281
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/278



235. BTKbase: Mutation Registry for X-linked Agammaglobulinemia

URL: http://bioinf.uta.fi/BTKbase/
Categories: Gene-, System-, or Disease- Specific Databases

X-linked agammaglobulinemia (XLA) is an immunodeficiency caused by mutations in the gene coding for Bruton's agammaglobulinemia tyrosine kinase (BTK). A database (BTKbase) of BTK mutations has been compiled and the recent update lists 463 mutation entries from 406 unrelated families showing 303 unique molecular events. In addition to mutations, the database also lists variants or polymorphisms. Each patient is given a unique patient identity number (PIN). Information is included regarding the phenotype including symptoms. Mutations in all the five domains of BTK have been noticed to cause the disease, the most common event being missense mutations. The mutations appear almost uniformly throughout the molecule and frequently affect CpG sites that code for arginine residues. The putative structural implications of all the missense mutations are given in the database. The improved version of the registry having a number of new features is available at http://www. helsinki.fi/science/signal/btkbase.html

Citation for the above abstract:
Vihinen, M, Brandau, O, Branden, LJ, Kwan, SP, Lappalainen, I, Lester, T, Noordzij, JG, Ochs, HD, Ollila, J, Pienaar, SM, Riikonen, P, Saha, BK, Smith, CIE
BTKbase, mutation database for X-linked agammaglobulinemia (XLA)
Nucl. Acids Res. 1998 26: 242-247
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/242



236. CarpeDB: A Comprehensive Database on the Genetics of Epilepsy

URL: http://www.carpedb.ua.edu/
Categories: Gene-, System-, or Disease- Specific Databases

"CarpeDB, a dynamic epilepsy genetics database sponsored by a National Science Foundation CAREER Award and the Department of Biological Sciences at The University of Alabama, is now available to the public! Although information pertinent to the study of epilepsy genetics has been widely available online, researchers interested in the genetics of epilepsy were required to utilize various sources for data collection. CarpeDB serves as a novel source for epilepsy researchers by featuring scores of "epilepsy genes" and associated publications in one locus. Furthermore, multiple genes implicated in epilepsy are also implicated in other human disorders. Consequently, the use of CarpeDB need not be limited to epilepsy researchers."



237. CASRdb: Calcium Sensing Receptor Databases

URL: http://www.casrdb.mcgill.ca/
Categories: Gene-, System-, or Disease- Specific Databases

Familial hypocalciuric hypercalcemia (FHH) is caused by heterozygous loss-of-function mutations in the calcium-sensing receptor (CASR), in which the lifelong hypercalcemia is generally asymptomatic. Homozygous loss-of-function CASR mutations manifest as neonatal severe hyperparathyroidism (NSHPT), a rare disorder characterized by extreme hypercalcemia and the bony changes of hyperparathyroidism, which occur in infancy. Activating mutations in the CASR gene have been identified in several families with autosomal dominant hypocalcemia (ADH), autosomal dominant hypoparathyroidism, or hypocalcemic hypercalciuria. Individuals with ADH may have mild hypocalcemia and relatively few symptoms. However, in some cases seizures can occur, especially in younger patients, and these often happen during febrile episodes due to intercurrent infection. Thus far, 112 naturally-occurring mutations in the human CASR gene have been reported, of which 80 are unique and 32 are recurrent. To better understand the mutations causing defects in the CASR gene and to define specific regions relevant for ligand-receptor interaction and other receptor functions, the data on mutations were collected and the information was centralized in the CASRdb (www.casrdb.mcgill.ca), which is easily and quickly accessible by search engines for retrieval of specific information. The information can be searched by mutation, genotype-phenotype, clinical data, in vitro analyses, and authors of publications describing the mutations. CASRdb is regularly updated for new mutations and it also provides a mutation submission form to ensure up-to-date information. The home page of this database provides links to different web pages that are relevant to the CASR, as well as disease clinical pages, sequence of the CASR gene exons, and position of mutations in the CASR. The CASRdb will help researchers to better understand and analyze the mutations, and aid in structure-function analyses.

Citation for the above abstract:
Pidasheva S, D'Souza-Li L, Canaff L, Cole DE, Hendy GN.
CASRdb: calcium-sensing receptor locus-specific database for mutations causing familial (benign) hypocalciuric hypercalcemia, neonatal severe hyperparathyroidism, and autosomal dominant hypocalcemia.
Hum Mutat. 2004 Aug;24(2):107-11.
© 2004 Wiley-Liss, Inc.


The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=15241791



238. Database of Human Type I and Type III Collagen Mutations

URL: http://www.le.ac.uk/genetics/collagen/
Categories: Gene-, System-, or Disease- Specific Databases

The collagens are a large and diverse family of proteins which are found in the extracellular matrix. In common with one another, the 19 known collagen types have triple-helical domains of variable length but they differ with respect to their overall size and the nature and location of their globular domains. Collagen mutations lead to heritable defects of connective tissues and mutation data for collagen types I and III are presented here. The mutation data are accessible on the world wide web at http://www.le.ac.uk/genetics/collagen/

Citation for the above abstract:
Dalgleish, R
The Human Collagen Mutation Database 1998
Nucl. Acids Res. 1998 26: 253-255
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/253



239. Cytokine Gene Polymorphism in Human Disease: On-line Databases

URL: http://www.bris.ac.uk/pathandmicro/services/GAI/cytokine4.htm
Categories: Gene-, System-, or Disease- Specific Databases, General Polymorphism Databases

The pathologies of many infectious, autoimmune and malignant diseases are influenced by the profiles of cytokine production in pro-inflammatory (TH1) and anti-inflammatory (TH2) T cells. Interindividual differences in cytokine profiles appear to be due, at least in part, to allelic polymorphism within regulatory regions of cytokine gene. Many studies have examined the relationship between cytokine gene polymorphism, cytokine gene expression in vitro, and the susceptibility to and clinical severity of diseases. A review of the findings of these studies is presented. An on-line version featuring appropriate updates is accessible from the World Wide Web site, http://www.pam.bris.ac.uk/services/GAI/cytokine4.htm.

Citation for the above abstract:
Bidwell J, Keen L, Gallagher G, Kimberly R, Huizinga T, McDermott MF, Oksenberg J, McNicholl J, Pociot F, Hardt C, D'Alfonso S.
Cytokine gene polymorphism in human disease: on-line databases.
Genes Immun. 1999 Sep;1(1):3-19.
© 1999 Nature Publishing Group.


The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=11197303



240. EICO DB: Expression-based Imprint Candidate Organiser

URL: http://fantom2.gsc.riken.jp/EICODB/
Categories: Gene-, System-, or Disease- Specific Databases

We have developed an integrated database that is specialized for the study of imprinted disease genes. The database contains novel candidate imprinted genes identified by the RIKEN full-length mouse cDNA microarray study, information on validated single nucleotide polymorphisms (SNPs) to confirm imprinting using reciprocal mouse crosses and the predicted physical position of imprinting-related disease loci in the mouse and human genomes. It has two user-friendly search interfaces: the SNP-central view (MuSCAT: MoUse SNP CATalog) and the candidate gene-central view (CITE: Candidate Imprinted Transcripts by Expression). The database, EICO (Expression-based Imprint Candidate Organizer), can be accessed via the World Wide Web (http://fantom2.gsc.riken.jp/EICODB/) and the DAS client software. These data and interfaces facilitate understanding of the mechanism of imprinting in mammalian inherited traits.

Citation for the above abstract:
Nikaido, Itoshi, Saito, Chika, Wakamoto, Akiko, Tomaru, Yasuhiro, Arakawa, Takahiro, Hayashizaki, Yoshihide, Okazaki, Yasushi
EICO (Expression-based Imprint Candidate Organizer): finding disease-related imprinted genes
Nucl. Acids Res. 2004 32: D548-551
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D548



241. EpoDB: Erythropoiesis Database

URL: http://www.cbil.upenn.edu/EpoDB/
Categories: Gene-, System-, or Disease- Specific Databases

EpoDB is a database of genes expressed in vertebrate red blood cells. It is also a prototype for the creation of cell and tissue-specific databases from multiple external sources. The information in EpoDB obtained from GenBank, SWISS-PROT, Transfac, TRRD and GERD is curated to provide high quality data for sequence analysis aimed at understanding gene regulation during erythropoiesis. New protocols have been developed for data integration and updating entries. Using a BLAST-based algorithm, we have grouped GenBank entries representing the same gene together. This sequence similarity protocol was also used to identify new entries to be included in EpoDB. We have recently implemented our database in Sybase (relational tables) in addition to SICStus Prolog to provide us with greater flexibility in asking complex queries that utilize information from multiple sources. New additions to the public web site (http://www.cbil.upenn.edu/epodb) for accessing EpoDB are the ability to retrieve groups of entries representing different variants of the same gene and to retrieve gene expression data. The BLAST query has been enhanced by incorporating BLASTView, an interactive and graphical display of BLAST results. We have also enhanced the queries for retrieving sequence from specified genes by the addition of MEME, a motif discovery tool, to the integrated analysis tools which include CLUSTALW and TESS.

Citation for the above abstract:
Stoeckert, CJ, Jr, Salas, F, Brunk, B, Overton, GC
EpoDB: a prototype database for the analysis of genes expressed during vertebrate erythropoiesis
Nucl. Acids Res. 1999 27: 200-203
© 1999 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/200



242. ERGDB: Estrogen Responsive Genes Database

URL: http://research.i2r.a-star.edu.sg/promoter/Ergdb-v11/
Categories: Gene-, System-, or Disease- Specific Databases

ERGDB is an integrated knowledge database dedicated to genes responsive to estrogen. Genes included in ERGDB are those whose expression levels are experimentally proven to be either up-regulated or down-regulated by estrogen. Genes included are identified based on publications from the PubMed database and each record has been manually examined, evaluated and selected for inclusion by biologists. ERGDB aims to be a unified gateway to store, search, retrieve and update information about estrogen responsive genes. Each record contains links to relevant databases, such as GenBank, LocusLink, Refseq, PubMed and ATCC. The unique feature of ERGDB is that it contains information on the dependence of gene reactions on experimental conditions. In addition to basic information about the genes, information for each record includes gene functional description, experimental methods used, tissue or cell type, gene reaction, estrogen exposure time and the summary of putative estrogen response elements if the gene’s promoter sequence was available. Through a web interface at http://sdmc.i2r.a-star.edu.sg/ergdb/cgi-bin/explore.pl users can either browse or query ERGDB. Access is free for academic and non-profit users.

Citation for the above abstract:
Tang, Suisheng, Han, Hao, Bajic, Vladimir B.
ERGDB: Estrogen Responsive Genes Database
Nucl. Acids Res. 2004 32: D533-536
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D533



243. EyeSite

URL: http://eyesite.cryst.bbk.ac.uk/
Categories: Gene-, System-, or Disease- Specific Databases

The EyeSite is a web-based database of protein families for proteins that function in the eye and their homologous sequences. The resource clusters proteins at different levels of homology in order to faciltate functional annotation of sequences and modelling of proteins from structural homologues. Eye proteins are organized into the tissue types in which they function and are clustered into homologous families using a novel protocol employing the TribeMCL algorithm. Homologous families are further subdivided into sequence clusters for which multiple sequence alignments are generated. Structural annotations from the CATH domain database are provided for nearly 90% of the sequences, and protein family annotations from the Pfam database for 86%. Homology models have also been generated where appropriate. The EyeSite is stored in a relational database and is extensively linked to other online bioinformatics resources to help relate allelic variants, annotations and clinical details to the derived data in the database. The EyeSite is available for online search, sequence information and model retrieval at http://eyesite.cryst.bbk.ac.uk/.

Citation for the above abstract:
Lee, David A., Fefeu, Sandrine, Edo-Ukeh, Adrian A., Orengo, Christine A., Slingsby, Christine
EyeSite: a semi-automated database of protein families in the eye
Nucl. Acids Res. 2004 32: D148-152
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D148



244. FUNPEP: Information System for "Low Complexity Sequence Regions"

URL: http://swift.cmbi.kun.nl/swift/FUNPEP/gergo/
Categories: Gene-, System-, or Disease- Specific Databases, Individual Protein Family Databases

"This part of the FUNPEP project is a bit different from all the others. The peptides on these pages were not chosen because of some kind of sequence similarity, what is more, they hardly have any. Their common, and very starnge property is the ability to form amyloid plaques (or fibrils). The exact structure and the formation of these supermolacular structures are still subject of research, but there are lots of promising results.
As a part of the FUNPEP project, we made a small collection of peptides, which are known to form these amyloid plaques. Sequences, including respective animal analogues, were extracted from SWISSPROT, and aligned. These sequences and some words about the peptides can be found under the links in the table below. Some molecular modelling was also perfomed, to show some possible structures of amyloids."



245. GOLD.db: Genomics Of Lipid-associated Disorders

URL: http://gold.tugraz.at/
Categories: Gene-, System-, or Disease- Specific Databases

BACKGROUND: The GOLD.db (Genomics of Lipid-Associated Disorders Database) was developed to address the need for integrating disparate information on the function and properties of genes and their products that are particularly relevant to the biology, diagnosis management, treatment, and prevention of lipid-associated disorders. Description: The GOLD.db (http://gold.tugraz.at) provides a reference for pathways and information about the relevant genes and proteins in an efficiently organized way. The main focus was to provide biological pathways with image maps and visual pathway information for lipid metabolism and obesity-related research. This database provides also the possibility to map gene expression data individually to each pathway. Gene expression at different experimental conditions can be viewed sequentially in context of the pathway. Related large scale gene expression data sets were provided and can be searched for specific genes to integrate information regarding their expression levels in different studies and conditions. Analytic and data mining tools, reagents, protocols, references, and links to relevant genomic resources were included in the database. Finally, the usability of the database was demonstrated using an example about the regulation of Pten mRNA during adipocyte differentiation in the context of relevant pathways. CONCLUSIONS: The GOLD.db will be a valuable tool that allow researchers to efficiently analyze patterns of gene expression and to display them in a variety of useful and informative ways, allowing outside researchers to perform queries pertaining to gene expression results in the context of biological processes and pathways.

Citation for the above abstract:
Hubert Hackl, Michael Maurer, Bernhard Mlecnik, Jurgen Hartler, Gernot Stocker, Diego Miranda-Saavedra, and Zlatko Trajanoski
GOLD.db: genomics of lipid-associated disorders database
BMC Genomics 2004, 5:93; doi:10.1186/1471-2164-5-93
© 2003 By the Authors


The full text of the article can be found at: http://www.biomedcentral.com/1471-2164/5/93



246. HaemB: Haemophilia B Mutation Database

URL: http://www.kcl.ac.uk/ip/petergreen/haemBdatabase.html
Categories: Gene-, System-, or Disease- Specific Databases

The eighth edition of the haemophilia B database (http://www.umds.ac.uk/molgen/haemBdatabase.htm ) lists in an easily accessible form all known factor IX mutations due to small changes (base substitutions and short additions and/or deletions of <30 bp) identified in haemophilia B patients. The 1713 patient entries are ordered by the nucleotide number of their mutation. Where known, details are given on: factor IX activity, factor IX antigen in circulation, presence of inhibitor and origin of mutation. References to published mutations are given and the laboratories generating the data are indicated.

Citation for the above abstract:
Giannelli, F, Green, PM, Sommer, SS, Poon, M, Ludwig, M, Schwaab, R, Reitsma, PH, Goossens, M, Yoshioka, A, Figueiredo, MS, Brownlee, GG
Haemophilia B: database of point mutations and short additions and deletions--eighth edition
Nucl. Acids Res. 1998 26: 265-268
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/26/1/265



247. HbVar: A Database of Human Hemoglobin Variants and Thalassemias

URL: http://globin.cse.psu.edu/globin/hbvar/
Categories: Gene-, System-, or Disease- Specific Databases

HbVar (http://globin.cse.psu.edu/globin/hbvar/) is a relational database developed by a multi-center academic effort to provide up-to-date and high quality information on the genomic sequence changes leading to hemoglobin variants and all types of thalassemia and hemoglobinopathies. Extensive information is recorded for each variant and mutation, including sequence alterations, biochemical and hematological effects, associated pathology, ethnic occurrence and references. In addition to the regular updates to entries, we report two significant advances: (i) The frequencies for a large number of mutations causing ß-thalassemia in at-risk populations have been extracted from the published literature and made available for the user to query upon. (ii) HbVar has been linked with the GALA (Genome Alignment and Annotation database, available at http://globin.cse.psu.edu/gala/) so that users can combine information on hemoglobin variants and thalassemia mutations with a wide spectrum of genomic data. It also expands the capacity to view and analyze the data, using tools within GALA and the University of California at Santa Cruz (UCSC) Genome Browser.

Citation for the above abstract:
Patrinos, George P., Giardine, Belinda, Riemer, Cathy, Miller, Webb, Chui, David H. K., Anagnou, Nicholas P., Wajcman, Henri, Hardison, Ross C.
Improvements in the HbVar database of human hemoglobin variants and thalassemia mutations for population and sequence variation studies
Nucl. Acids Res. 2004 32: D537-541
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D537



248. HemBase

URL: http://hembase.niddk.nih.gov/
Categories: Gene-, System-, or Disease- Specific Databases, Microarray Data and other Gene Expression Databases

Hembase (http://hembase.niddk.nih.gov) is an integrated browser and genome portal designed for web-based examination of the human erythroid transcriptome. To date, Hembase contains 15,752 entries from erythroblast Expressed Sequenced Tags (ESTs) and 380 referenced genes relevant for erythropoiesis. The database is organized to provide a cytogenetic band position, a unique name as well as a concise annotation for each entry. Search queries may be performed by name, keyword or cytogenetic location. Search results are linked to primary sequence data and three major human genome browsers for access to information considered current at the time of each search. Hembase provides interested scientists and clinical hematologists with a genome-based approach toward the study of erythroid biology.

Citation for the above abstract:
Goh, Sung-Ho, Lee, Y. Terry, Bouffard, Gerard G., Miller, Jeffery L.
Hembase: browser and genome portal for hematology and erythroid biology
Nucl. Acids Res. 2004 32: D572-574
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D572



249. HemoPDB: Hematopoietic Promoter Database

URL: http://bioinformatics.med.ohio-state.edu/HemoPDB/
Categories: Gene-, System-, or Disease- Specific Databases, Microarray Data and other Gene Expression Databases

Hematopoiesis describes the process of the normal formation and development of blood cells, involving both proliferation and differentiation from stem cells. Abnormalities in this developmental program yield blood cell diseases, such as leukemia. Although, in recent years, extensive molecular research in normal hematopoietic development has characterized transcription factors and their binding sites in the target gene promoters, the information generated is highly fragmented. In order to integrate this important regulatory information with the corresponding genomic sequences, we have developed a new database called Hematopoiesis Promoter Database (HemoPDB). HemoPDB is a comprehensive resource focused on transcriptional regulation during hematopoietic development and associated aberrances that result in malignancy. HemoPDB (version 1.0) contains 246 promoter sequences and 604 experimentally known cis-regulatory elements of 187 different transcription factors, with links to published references. Orthologous promoters from different species are linked with each other and displayed in the same database record, accompanied by a visual image of the promoters and corresponding annotations of cis-regulatory elements. HemoPDB may be searched for the promoter of a specific gene, transcription factors and target genes, and genes that are expressed in a certain cell type or lineage, through a user-friendly web interface at http://bioinformatics.med.ohio-state.edu/HemoPDB. Links to the documentation and other technical details are provided on this website.

Citation for the above abstract:
Pohar, Twyla T., Sun, Hao, Davuluri, Ramana V.
HemoPDB: Hematopoiesis Promoter Database, an information resource of transcriptional regulation in blood cell development
Nucl. Acids Res. 2004 32: D86-90
© 2004 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/32/suppl_1/D86



250. HORDE: Human Olfactory Receptor Data Exploratorium

URL: http://bioportal.weizmann.ac.il/HORDE/
Categories: Gene-, System-, or Disease- Specific Databases

Recent enhancements and current research in the GeneCards (GC) (http://bioinfo.weizmann.ac.il/cards/) project are described, including the addition of gene expression profiles and integrated gene locations. Also highlighted are the contributions of specialized associated human gene-centric databases developed at the Weizmann Institute. These include the Unified Database (UDB) (http://bioinfo.weizmann.ac.il/udb) for human genome mapping, the human Chromosome 21 database at the Weizmann Insti-tute (CroW 21) (http://bioinfo.weizmann.ac.il/crow21), and the Human Olfactory Receptor Data Explora-torium (HORDE) (http://bioinfo.weizmann.ac.il/HORDE). The synergistic relationships amongst these efforts have positively impacted the quality, quantity and usefulness of the GeneCards gene compendium.

Citation for the above abstract:
Safran, Marilyn, Chalifa-Caspi, Vered, Shmueli, Orit, Olender, Tsviya, Lapidot, Michal, Rosen, Naomi, Shmoish, Michael, Peter, Yakov, Glusman, Gustavo, Feldmesser, Ester, Adato, Avital, Peter, Inga, Khen, Miriam, Atarot, Tal, Groner, Yoram, Lancet, Doron
Human Gene-Centric Databases at the Weizmann Institute of Science: GeneCards, UDB, CroW 21 and HORDE
Nucl. Acids Res. 2003 31: 142-146
© 2003 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/31/1/142



251. HOX-Pro: Homeobox Genes DataBase

URL: http://www.iephb.nw.ru/labs/lab38/spirov/hox_pro/hox-pro00.html
Categories: Gene-, System-, or Disease- Specific Databases

The HOX Pro database contains information about the organization, function and evolution of gene ensembles, notably the homeobox-containing genes. It is now clear that a subset of genes containing the homeobox motif play key roles in the orchestration of genes which control embryonic patterning, morphogenesis, cell differentiation and malignant transformation. The HOX Pro contains a broad spectrum of information including images, diagrams and animations. Currently this amounts to approximately 700 HTML pages together with 400 images which contain information on 200 groups of genes and 90 promoters, in turn linked to maps of 13 HOX clusters and nine genetic networks. There are about 700 sequences of individual hox-genes of animals classified in approximately 200 homologous or paralogous groups. Graphical representation of HOX clusters and Hox-based networks is accomplished by means of flow and 3D diagrams, JavaScript animations and Java applets. The HOX Pro now includes sections presenting data mining and data simulation issues. The DB is located at http://www.iephb.nw.ru/hoxpro.

Citation for the above abstract:
Spirov, Alexander V., Borovsky, Mikhail, Spirova, Olesya A.
HOX Pro DB: the functional genomics of hox ensembles
Nucl. Acids Res. 2002 30: 351-353
© 2002 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/30/1/351



252. HPMR: Human Plasma Membrane Receptome

URL: http://receptome.stanford.edu/HPMR/
Categories: Gene-, System-, or Disease- Specific Databases, Microarray Data and other Gene Expression Databases

Intercellular communication in multicellular organisms requires the relay of extracellular signals by cell surface proteins to the interiors of cells. The availability of genome sequences from humans and several model organisms has facilitated the identification of several human plasma membrane receptor families and allowed the analysis of their phylogeny. This review provides a global categorization of most known signal transduction-associated receptors as enzymes, recruiters, and latent transcription factors. The evolution of known families of human plasma membrane signaling receptors was traced in current literature and validated by sequence relatedness. This global analysis reveals themes that recur during receptor evolution and allows the formulation of hypotheses for the origins of receptors. The human receptor families involved in signaling (with the exception of channels) are presented in the Human Plasma Membrane Receptome database.

Citation for the above abstract:
Ben-Shlomo I, Yu Hsu S, Rauch R, Kowalski HW, Hsueh AJ.
Signaling receptome: a genomic and evolutionary perspective of plasma membrane receptors involved in signal transduction.
Science's STKE. 2003 Jun 17;2003(187):RE9.
© 2003 Science's STKE.


The full abstract can be found at: http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12815191



253. Human PAX2 Allelic Variant Database

URL: http://pax2.hgu.mrc.ac.uk/
Categories: Gene-, System-, or Disease- Specific Databases

Mutations in the PAX2 gene are associated with developmental eye, kidney and ear anomalies with the disease commonly known as renal-coloboma syndrome. The mutations found to date show marked differences in phenotype making it difficult to predict the clinical effects of a PAX2 mutation. The database was created to satisfy the need for a single source of information about PAX2 mutations for researchers and clinicians. It also fills the need for a database to which researchers can submit new mutation information with minimal difficulty. Neutral polymorphisms are also included in the database as this information is also important to researchers. It is hoped that this database will provide a valuable tool for research and clinical diagnosis of renal-coloboma syndrome. Information about each mutation in the database is stored in 59 fields which are designed to provide as much information about each mutation as possible.

Citation for the above excerpt:
Leslie McNoe, Alastair Brown, Mark McKie, and Michael Eccles
The Human PAX2 Mutation Database
Nucl. Acids Res. 1999 27: On-line
© 1999 Oxford University Press.


The full text of the article can be found at: http://nar.oupjournals.org/cgi/content/full/27/1/1/DC1/37



254. Human PAX6 Allelic Variant Database

URL: http://pax6.hgu.mrc.ac.uk/
Categories: Gene-, System-, or Disease- Specific Databases

The Human PAX6 Mutation Database contains details of 94 mutations of the PAX6 gene. A Microsoft Access program is used by the Curator to store, update and search the database entries. Mutations can be entered directly by the Curator, or imported from submissions made via the World Wide Web. The PAX6 Mutation Database web page at URL http://www.hgu.mrc.ac.uk/Softdata/PAX6/ provides information about PAX6, as well as a fill-in form through which new mutations can be submitted to the Curator. A search facility allows remote users to query the database. A plain text format file of the data can be downloaded via the World Wide Web. The Curation program contains prior knowledge of the genetic code and of the PAX6 gene including cDNA sequence, location of intron/exon boundaries, and protein domains, so that the minimum of information need be provided by the submitter or Curator.

Citation for the above abstract:
Brown, A, McKie, M, van Heyningen, V, Prosser, J
The Human PAX6 Mutation Database
Nucl. Acids Res. 1998 26: 259-264
© 1998 Oxford University Press.


The full text of the article can be found at: http://nar.oupj