Recent Articles in Nucleic Acids Research

Bourdeau V, Deschęnes J, Laperričre D, Aid M, White JH, Mader S
Mechanisms of primary and secondary estrogen target gene regulation in breast cancer cells.
Nucleic Acids Res. 2007 Nov 5; .
Estrogen receptors (ERs), which mediate the proliferative action of estrogens in breast cancer cells, are ligand-dependent transcription factors that regulate expression of their primary target genes through several mechanisms. In addition to direct binding to cognate DNA sequences, ERs can be recruited to DNA through other transcription factors (tethering), or affect gene transcription through modulation of signaling cascades by non-genomic mechanisms of action. To better characterize the mechanisms of gene regulation by estrogens, we have identified more than 700 putative primary and about 1300 putative secondary target genes of estradiol in MCF-7 cells through microarray analysis performed in the presence or absence of the translation inhibitor cycloheximide. Although siRNA-mediated inhibition of ERalpha expression antagonized the effects of estradiol on up- and down-regulated primary target genes, estrogen response elements (EREs) were enriched only in the vicinity of up-regulated genes. Binding sites for several other transcription factors, including proteins known to tether ERalpha, were enriched in up- and/or down-regulated primary targets. Secondary estrogen targets were particularly enriched in sites for E2F family members, several of which were transcriptionally regulated by estradiol, consistent with a major role of these factors in mediating the effects of estrogens on gene expression and cellular growth. [Abstract/Link to Full Text]

Mangone M, Macmenamin P, Zegar C, Piano F, Gunsalus KC a platform for 3'UTR biology in C. elegans.
Nucleic Acids Res. 2007 Nov 22;
Three-prime untranslated regions (3'UTRs) are widely recognized as important post-transcriptional regulatory regions of mRNAs. RNA-binding proteins and small non-coding RNAs such as microRNAs (miRNAs) bind to functional elements within 3'UTRs to influence mRNA stability, translation and localization. These interactions play many important roles in development, metabolism and disease. However, even in the most well-annotated metazoan genomes, 3'UTRs and their functional elements are not well defined. Comprehensive and accurate genome-wide annotation of 3'UTRs and their functional elements is thus critical. We have developed an open-access database, available at, to provide a rich and comprehensive resource for 3'UTR biology in the well-characterized, experimentally tractable model system Caenorhabditis elegans. combines data from public repositories and a large-scale effort we are undertaking to characterize 3'UTRs and their functional elements in C. elegans, including 3'UTR sequences, graphical displays, predicted and validated functional elements, secondary structure predictions and detailed data from our cloning pipeline. will grow substantially over time to encompass individual 3'UTR isoforms for the majority of genes, new and revised functional elements, and in vivo data on 3'UTR function as they become available. The UTRome database thus represents a powerful tool to better understand the biology of 3'UTRs. [Abstract/Link to Full Text]

Okuno Y, Tamon A, Yabuuchi H, Niijima S, Minowa Y, Tonomura K, Kunimoto R, Feng C
GLIDA: GPCR ligand database for chemical genomics drug discovery database and tools update.
Nucleic Acids Res. 2007 Nov 5;
G-protein coupled receptors (GPCRs) represent one of the most important families of drug targets in pharmaceutical development. GLIDA is a public GPCR-related Chemical Genomics database that is primarily focused on the integration of information between GPCRs and their ligands. It provides interaction data between GPCRs and their ligands, along with chemical information on the ligands, as well as biological information regarding GPCRs. These data are connected with each other in a relational database, allowing users in the field of Chemical Genomics research to easily retrieve such information from either biological or chemical starting points. GLIDA includes a variety of similarity search functions for the GPCRs and for their ligands. Thus, GLIDA can provide correlation maps linking the searched homologous GPCRs (or ligands) with their ligands (or GPCRs). By analyzing the correlation patterns between GPCRs and ligands, we can gain more detailed knowledge about their conserved molecular recognition patterns and improve drug design efforts by focusing on inferred candidates for GPCR-specific drugs. This article provides a summary of the GLIDA database and user facilities, and describes recent improvements to database design, data contents, ligand classification programs, similarity search options and graphical interfaces. GLIDA is publicly available at We hope that it will prove very useful for Chemical Genomics research and GPCR-related drug discovery. [Abstract/Link to Full Text]

Levy A, Sela N, Ast G
TranspoGene and microTranspoGene: transposed elements influence on the transcriptome of seven vertebrates and invertebrates.
Nucleic Acids Res. 2007 Nov 5;
Transposed elements (TEs) are mobile genetic sequences. During the evolution of eukaryotes TEs were inserted into active protein-coding genes, affecting gene structure, expression and splicing patterns, and protein sequences. Genomic insertions of TEs also led to creation and expression of new functional non-coding RNAs such as microRNAs. We have constructed the TranspoGene database, which covers TEs located inside protein-coding genes of seven species: human, mouse, chicken, zebrafish, fruit fly, nematode and sea squirt. TEs were classified according to location within the gene: proximal promoter TEs, exonized TEs (insertion within an intron that led to exon creation), exonic TEs (insertion into an existing exon) or intronic TEs. TranspoGene contains information regarding specific type and family of the TEs, genomic and mRNA location, sequence, supporting transcript accession and alignment to the TE consensus sequence. The database also contains host gene specific data: gene name, genomic location, Swiss-Prot and RefSeq accessions, diseases associated with the gene and splicing pattern. In addition, we created microTranspoGene: a database of human, mouse, zebrafish and nematode TE-derived microRNAs. The TranspoGene and microTranspoGene databases can be used by researchers interested in the effect of TE insertion on the eukaryotic transcriptome. Publicly available query interfaces to TranspoGene and microTranspoGene are available at and, respectively. The entire database can be downloaded as flat files. [Abstract/Link to Full Text]

Sprenger J, Fink JL, Karunaratne S, Hanson K, Hamilton NA, Teasdale RD
LOCATE: a mammalian protein subcellular localization database.
Nucleic Acids Res. 2007 Nov 5;
LOCATE is a curated, web-accessible database that houses data describing the membrane organization and subcellular localization of mouse and human proteins. Over the past 2 years, the data in LOCATE have grown substantially. The database now contains high-quality localization data for 20% of the mouse proteome and general localization annotation for nearly 36% of the mouse proteome. The proteome annotated in LOCATE is from the RIKEN FANTOM Consortium Isoform Protein Sequence sets which contains 58 128 mouse and 64 637 human protein isoforms. Other additions include computational subcellular localization predictions, automated computational classification of experimental localization image data, prediction of protein sorting signals and third party submission of literature data. Collectively, this database provides localization proteome for individual subcellular compartments that will underpin future systematic investigations of these regions. It is available at [Abstract/Link to Full Text]

Wang CK, Kaas Q, Chiche L, Craik DJ
CyBase: a database of cyclic protein sequences and structures, with applications in protein discovery and engineering.
Nucleic Acids Res. 2007 Nov 5;
CyBase was originally developed as a database for backbone-cyclized proteins, providing search and display capabilities for sequence, structure and function data. Cyclic proteins are interesting because, compared to conventional proteins, they have increased stability and enhanced binding affinity and therefore can potentially be developed as protein drugs. The new CyBase release features a redesigned interface and internal architecture to improve user-interactivity, collates double the amount of data compared to the initial release, and hosts a novel suite of tools that are useful for the visualization, characterization and engineering of cyclic proteins. These tools comprise sequence/structure 2D representations, a summary of grafting and mutation studies of synthetic analogues, a study of N- to C-terminal distances in known protein structures and a structural modelling tool to predict the best linker length to cyclize a protein. These updates are useful because they have the potential to help accelerate the discovery of naturally occurring cyclic proteins and the engineering of cyclic protein drugs. The new release of CyBase is available at [Abstract/Link to Full Text]

Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, Radenbaugh A, Singh S, Swing V, Tissier C, Zhang P, Huala E
The Arabidopsis Information Resource (TAIR): gene structure and function annotation.
Nucleic Acids Res. 2007 Nov 5;
The Arabidopsis Information Resource (TAIR, is the model organism database for the fully sequenced and intensively studied model plant Arabidopsis thaliana. Data in TAIR is derived in large part from manual curation of the Arabidopsis research literature and direct submissions from the research community. New developments at TAIR include the addition of the GBrowse genome viewer to the TAIR site, a redesigned home page, navigation structure and portal pages to make the site more intuitive and easier to use, the launch of several TAIR web services and a new genome annotation release (TAIR7) in April 2007. A combination of manual and computational methods were used to generate this release, which contains 27 029 protein-coding genes, 3889 pseudogenes or transposable elements and 1123 ncRNAs (32 041 genes in all, 37 019 gene models). A total of 681 new genes and 1002 new splice variants were added. Overall, 10 098 loci (one-third of all loci from the previous TAIR6 release) were updated for the TAIR7 release. [Abstract/Link to Full Text]

Laun P, Bruschi CV, Richard Dickinson J, Rinnerthaler M, Heeren G, Schwimbersky R, Rid R, Breitenbach M
Yeast mother cell-specific ageing, genetic (in)stability, and the somatic mutation theory of ageing.
Nucleic Acids Res. 2007 Dec 11;
Yeast mother cell-specific ageing is characterized by a limited capacity to produce daughter cells. The replicative lifespan is determined by the number of cell cycles a mother cell has undergone, not by calendar time, and in a population of cells its distribution follows the Gompertz law. Daughter cells reset their clock to zero and enjoy the full lifespan characteristic for the strain. This kind of replicative ageing of a cell population based on asymmetric cell divisions is investigated as a model for the ageing of a stem cell population in higher organisms. The simple fact that the daughter cells can reset their clock to zero precludes the accumulation of chromosomal mutations as the cause of ageing, because semiconservative replication would lead to the same mutations in the daughters. However, nature is more complicated than that because, (i) the very last daughters of old mothers do not reset the clock; and (ii) mutations in mitochondrial DNA could play a role in ageing due to the large copy number in the cell and a possible asymmetric distribution of damaged mitochondrial DNA between mother and daughter cell. Investigation of the loss of heterozygosity in diploid cells at the end of their mother cell-specific lifespan has shown that genomic rearrangements do occur in old mother cells. However, it is not clear if this kind of genomic instability is causative for the ageing process. Damaged material other than DNA, for instance misfolded, oxidized or otherwise damaged proteins, seem to play a major role in ageing, depending on the balance between production and removal through various repair processes, for instance several kinds of proteolysis and autophagy. We are reviewing here the evidence for genetic change and its causality in the mother cell-specific ageing process of yeast. [Abstract/Link to Full Text]

Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX
PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor.
Nucleic Acids Res. 2007 Nov 4;
The PhosPhAt database provides a resource consolidating our current knowledge of mass spectrometry-based identified phosphorylation sites in Arabidopsis and combines it with phosphorylation site prediction specifically trained on experimentally identified Arabidopsis phosphorylation motifs. The database currently contains 1187 unique tryptic peptide sequences encompassing 1053 Arabidopsis proteins. Among the characterized phosphorylation sites, there are over 1000 with unambiguous site assignments, and nearly 500 for which the precise phosphorylation site could not be determined. The database is searchable by protein accession number, physical peptide characteristics, as well as by experimental conditions (tissue sampled, phosphopeptide enrichment method). For each protein, a phosphorylation site overview is presented in tabular form with detailed information on each identified phosphopeptide. We have utilized a set of 802 experimentally validated serine phosphorylation sites to develop a method for prediction of serine phosphorylation (pSer) in Arabidopsis. An analysis of the current annotated Arabidopsis proteome yielded in 27 782 predicted phosphoserine sites distributed across 17 035 proteins. These prediction results are summarized graphically in the database together with the experimental phosphorylation sites in a whole sequence context. The Arabidopsis Protein Phosphorylation Site Database (PhosPhAt) provides a valuable resource to the plant science community and can be accessed through the following link [Abstract/Link to Full Text]

Bowes JB, Snyder KA, Segerdell E, Gibb R, Jarabek C, Noumen E, Pollet N, Vize PD
Xenbase: a Xenopus biology and genomics resource.
Nucleic Acids Res. 2007 Nov 4;
Xenbase ( is a model organism database integrating a diverse array of biological and genomic data on the frogs, Xenopus laevis and Xenopus (Silurana) tropicalis. Data is collected from other databases, high-throughput screens and the scientific literature and integrated into a number of database modules covering subjects such as community, literature, gene and genomic analysis. Gene pages are automatically assembled from data piped from the Entrez Gene, Gurdon Institute, JGI, Metazome, MGI, OMIM, PubMed, Unigene, Zfin, commercial suppliers and others. These data are then supplemented with in-house annotation. Xenbase has implemented the Gbrowse genome browser and also provides a BLAST service that allows users to specifically search either laevis or tropicalis DNA or protein targets. A table of Xenopus gene synonyms has been implemented and allows the genome, genes, publications and high-throughput gene expression data to be seamlessly integrated with other Xenopus data and to external database resources, making the wealth of developmental and functional data from the frog available to the broader research community. [Abstract/Link to Full Text]

Bruford EA, Lush MJ, Wright MW, Sneddon TP, Povey S, Birney E
The HGNC Database in 2008: a resource for the human genome.
Nucleic Acids Res. 2007 Nov 4;
The HUGO Gene Nomenclature Committee (HGNC) aims to assign a unique and ideally meaningful name and symbol to every human gene. The HGNC database currently comprises over 24 000 public records containing approved human gene nomenclature and associated gene information. Following our recent relocation to the European Bioinformatics Institute our homepage can now be found at, with direct links to the searchable HGNC database and other related database resources, such as the HCOP orthology search tool and manually curated gene family webpages. [Abstract/Link to Full Text]

The Gene Ontology project in 2008.
Nucleic Acids Res. 2007 Nov 4;
The Gene Ontology (GO) project ( provides a set of structured, controlled vocabularies for community use in annotating genes, gene products and sequences (also see The ontologies have been extended and refined for several biological areas, and improvements to the structure of the ontologies have been implemented. To improve the quantity and quality of gene product annotations available from its public repository, the GO Consortium has launched a focused effort to provide comprehensive and detailed annotation of orthologous genes across a number of 'reference' genomes, including human and several key model organisms. Software developments include two releases of the ontology-editing tool OBO-Edit, and improvements to the AmiGO browser interface. [Abstract/Link to Full Text]

Zhang C, Crasta O, Cammer S, Will R, Kenyon R, Sullivan D, Yu Q, Sun W, Jha R, Liu D, Xue T, Zhang Y, Moore M, McGarvey P, Huang H, Chen Y, Zhang J, Mazumder R, Wu C, Sobral B
An emerging cyberinfrastructure for biodefense pathogen and pathogen host data.
Nucleic Acids Res. 2007 Nov 4;
The NIAID-funded Biodefense Proteomics Resource Center (RC) provides storage, dissemination, visualization and analysis capabilities for the experimental data deposited by seven Proteomics Research Centers (PRCs). The data and its publication is to support researchers working to discover candidates for the next generation of vaccines, therapeutics and diagnostics against NIAID's Category A, B and C priority pathogens. The data includes transcriptional profiles, protein profiles, protein structural data and host-pathogen protein interactions, in the context of the pathogen life cycle in vivo and in vitro. The database has stored and supported host or pathogen data derived from Bacillus, Brucella, Cryptosporidium, Salmonella, SARS, Toxoplasma, Vibrio and Yersinia, human tissue libraries, and mouse macrophages. These publicly available data cover diverse data types such as mass spectrometry, yeast two-hybrid (Y2H), gene expression profiles, X-ray and NMR determined protein structures and protein expression clones. The growing database covers over 23 000 unique genes/proteins from different experiments and organisms. All of the genes/proteins are annotated and integrated across experiments using UniProt Knowledgebase (UniProtKB) accession numbers. The web-interface for the database enables searching, querying and downloading at the level of experiment, group and individual gene(s)/protein(s) via UniProtKB accession numbers or protein function keywords. The system is accessible at [Abstract/Link to Full Text]

Luke B, Azzalin CM, Hug N, Deplazes A, Peter M, Lingner J
Saccharomyces cerevisiae Ebs1p is a putative ortholog of human Smg7 and promotes nonsense-mediated mRNA decay.
Nucleic Acids Res. 2007 Nov 4;
The Smg proteins Smg5, Smg6 and Smg7 are involved in nonsense-mediated RNA decay (NMD) in metazoans, but no orthologs have been found in the budding yeast Saccharomyces cerevisiae. Sequence alignments reveal that yeast Ebs1p is similar in structure to the human Smg5-7, with highest homology to Smg7. We demonstrate here that Ebs1p is involved in NMD and behaves similarly to human Smg proteins. Indeed, both loss and overexpression of Ebs1p results in stabilization of NMD targets. However, Ebs1-loss in yeast or Smg7-depletion in human cells only partially disrupts NMD and in the latter, Smg7-depletion is partially compensated for by Smg6. Ebs1p physically interacts with the NMD helicase Upf1p and overexpressed Ebs1p leads to recruitment of Upf1p into cytoplasmic P-bodies. Furthermore, Ebs1p localizes to P-bodies upon glucose starvation along with Upf1p. Overall our findings suggest that NMD is more conserved in evolution than previously thought, and that at least one of the Smg5-7 proteins is conserved in budding yeast. [Abstract/Link to Full Text]

Yang J, Chen L, Sun L, Yu J, Jin Q
VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics.
Nucleic Acids Res. 2007 Nov 4;
Virulence factor database (VFDB) was set up in 2004 dedicated for providing current knowledge of virulence factors (VFs) from various medical significant bacterial pathogens to facilitate pathogenomic research. Nowadays, complete genome sequences of almost all the major pathogenic microbes have been determined, which makes comparative genomics a powerful approach for uncovering novel virulence determinants and hidden aspects of pathogenesis. VFDB was therefore upgraded to present the enormous diversity of bacterial genomes in terms of virulence genes and their organization. The VFDB 2008 release includes the following new features; (i) detailed tabular comparison of virulence composition of a given genome with other genomes of the same genus, (ii) multiple alignments and statistical analysis of homologous VFs and (iii) graphical comparison of genomic organizations of virulence genes. Comparative analysis of the numerous VFs will improve our understanding of the nature and evolution of virulence, as well as the development of new therapeutic and preventive strategies. VFDB 2008 release offers more user-friendly tools for comparative pathogenomics and it is publicly accessible at [Abstract/Link to Full Text]

Ulrich EL, Akutsu H, Doreleijers JF, Harano Y, Ioannidis YE, Lin J, Livny M, Mading S, Maziuk D, Miller Z, Nakatani E, Schulte CF, Tolmie DE, Kent Wenger R, Yao H, Markley JL
Nucleic Acids Res. 2007 Nov 4;
The BioMagResBank (BMRB: is a repository for experimental and derived data gathered from nuclear magnetic resonance (NMR) spectroscopic studies of biological molecules. BMRB is a partner in the Worldwide Protein Data Bank (wwPDB). The BMRB archive consists of four main data depositories: (i) quantitative NMR spectral parameters for proteins, peptides, nucleic acids, carbohydrates and ligands or cofactors (assigned chemical shifts, coupling constants and peak lists) and derived data (relaxation parameters, residual dipolar couplings, hydrogen exchange rates, pK(a) values, etc.), (ii) databases for NMR restraints processed from original author depositions available from the Protein Data Bank, (iii) time-domain (raw) spectral data from NMR experiments used to assign spectral resonances and determine the structures of biological macromolecules and (iv) a database of one- and two-dimensional (1)H and (13)C one- and two-dimensional NMR spectra for over 250 metabolites. The BMRB website provides free access to all of these data. BMRB has tools for querying the archive and retrieving information and an ftp site ( where data in the archive can be downloaded in bulk. Two BMRB mirror sites exist: one at the PDBj, Protein Research Institute, Osaka University, Osaka, Japan ( and the other at CERM, University of Florence, Florence, Italy ( The site at Osaka also accepts and processes data depositions. [Abstract/Link to Full Text]

Halees AS, El-Badrawi R, Khabar KS
ARED Organism: expansion of ARED reveals AU-rich element cluster variations between human and mouse.
Nucleic Acids Res. 2007 Nov 4;
ARED Organism represents the expansion of the adenylate uridylate (AU)-rich element (ARE)-containing human mRNA database into the transcriptomes of mouse and rat. As a result, we performed quantitative assessment of ARE conservation in human, mouse and rat transcripts. We found that a significant proportion ( approximately 25%) of human genes differ in their ARE patterns from mouse and rat transcripts. ARED-Integrated, another updated and expanded version of ARED, is a compilation of ARED versions 1.0 to 3.0 and updated version 4.0 that is devoted to human mRNAs. Thus, ARED-Integrated and ARED-Organism databases, both publicly available at, offer scientists a comprehensive view of AREs in the human transcriptome and the ability to study the comparative genomics of AREs in model organisms. This ultimately will help in inferring the biological consequences of ARE variation in these key animal models as opposed to humans, particularly, in relationships to the role of RNA stability in disease. [Abstract/Link to Full Text]

Liang C, Jaiswal P, Hebbard C, Avraham S, Buckler ES, Casstevens T, Hurwitz B, McCouch S, Ni J, Pujar A, Ravenscroft D, Ren L, Spooner W, Tecle I, Thomason J, Tung CW, Wei X, Yap I, Youens-Clark K, Ware D, Stein L
Gramene: a growing plant comparative genomics resource.
Nucleic Acids Res. 2007 Nov 4;
Gramene ( is a curated resource for genetic, genomic and comparative genomics data for the major crop species, including rice, maize, wheat and many other plant (mainly grass) species. Gramene is an open-source project. All data and software are freely downloadable through the ftp site ( and available for use without restriction. Gramene's core data types include genome assembly and annotations, other DNA/mRNA sequences, genetic and physical maps/markers, genes, quantitative trait loci (QTLs), proteins, ontologies, literature and comparative mappings. Since our last NAR publication 2 years ago, we have updated these data types to include new datasets and new connections among them. Completely new features include rice pathways for functional annotation of rice genes; genetic diversity data from rice, maize and wheat to show genetic variations among different germplasms; large-scale genome comparisons among Oryza sativa and its wild relatives for evolutionary studies; and the creation of orthologous gene sets and phylogenetic trees among rice, Arabidopsis thaliana, maize, poplar and several animal species (for reference purpose). We have significantly improved the web interface in order to provide a more user-friendly browsing experience, including a dropdown navigation menu system, unified web page for markers, genes, QTLs and proteins, and enhanced quick search functions. [Abstract/Link to Full Text]

Quyen DV, Ha SC, Lowenhaupt K, Rich A, Kim KK, Kim YG
Characterization of DNA-binding activity of Z{alpha} domains from poxviruses and the importance of the {beta}-wing regions in converting B-DNA to Z-DNA.
Nucleic Acids Res. 2007 Nov 5;
The E3L gene is essential for pathogenesis in vaccinia virus. The E3L gene product consists of an N-terminal Zalpha domain and a C-terminal double-stranded RNA (dsRNA) binding domain; the left-handed Z-DNA-binding activity of the Zalpha domain of E3L is required for viral pathogenicity in mice. E3L is highly conserved among poxviruses, including the smallpox virus, and it is likely that the orthologous Zalpha domains play similar roles. To better understand the biological function of E3L proteins, we have investigated the Z-DNA-binding behavior of five representative Zalpha domains from poxviruses. Using surface plasmon resonance (SPR), we have demonstrated that these viral Zalpha domains bind Z-DNA tightly. Ability of Zalpha(E3L) converting B-DNA to Z-DNA was measured by circular dichroism (CD). The extents to which these Zalphas can stabilize Z-DNA vary considerably. Mutational studies demonstrate that residues in the loop of the beta-wing play an important role in this stabilization. Notably the Zalpha domain of vaccinia E3L acquires ability to convert B-DNA to Z-DNA by mutating amino acid residues in this region. Differences in the host cells of the various poxviruses may require different abilities to stabilize Z-DNA; this may be reflected in the observed differences in behavior in these Zalpha proteins. [Abstract/Link to Full Text]

Harris SA, Laughton CA, Liverpool TB
Mapping the phase diagram of the writhe of DNA nanocircles using atomistic molecular dynamics simulations.
Nucleic Acids Res. 2007 Nov 5;
We have investigated the effects of duplex length, sequence, salt concentration and superhelical density on the conformation of DNA nanocircles containing up to 178 base pairs using atomistic molecular dynamics simulation. These calculations reveal that the partitioning of twist and writhe is governed by a delicate balance of competing energetic terms. We have identified conditions which favour circular, positively or negatively writhed and denatured DNA conformations. Our simulations show that AT-rich DNA is more prone to denaturation when subjected to torsional stress than the corresponding GC containing circles. In contrast to the behaviour expected for a simple elastic rod, there is a distinct asymmetry in the behaviour of over and under-wound DNA nanocircles. The most biologically relevant negatively writhed state is more elusive than the corresponding positively writhed conformation, and is only observed for larger circles under conditions of high electrostatic screening. The simulation results have been summarised by plotting a phase diagram describing the various conformational states of nanocircles over the range of circle sizes and experimental conditions explored during the study. The changes in DNA structure that accompany supercoiling suggest a number of mechanisms whereby changes in DNA topology in vivo might be used to influence gene expression. [Abstract/Link to Full Text]

Gendron K, Charbonneau J, Dulude D, Heveker N, Ferbeyre G, Brakier-Gingras L
The presence of the TAR RNA structure alters the programmed -1 ribosomal frameshift efficiency of the human immunodeficiency virus type 1 (HIV-1) by modifying the rate of translation initiation.
Nucleic Acids Res. 2007 Nov 5;
HIV-1 uses a programmed -1 ribosomal frameshift to synthesize the precursor of its enzymes, Gag-Pol. The frameshift efficiency that is critical for the virus replication, is controlled by an interaction between the ribosome and a specific structure on the viral mRNA, the frameshift stimulatory signal. The rate of cap-dependent translation initiation is known to be altered by the TAR RNA structure, present at the 5' and 3' end of all HIV-1 mRNAs. Depending upon its concentration, TAR activates or inhibits the double-stranded RNA-dependent protein kinase (PKR). We investigated here whether changes in translation initiation caused by TAR affect HIV-1 frameshift efficiency. CD4+ T cells and 293T cells were transfected with a dual-luciferase construct where the firefly luciferase expression depends upon the HIV-1 frameshift. Translation initiation was altered by adding TAR in cis or trans of the reporter mRNA. We show that HIV-1 frameshift efficiency correlates negatively with changes in the rate of translation initiation caused by TAR and mediated by PKR. A model is presented where changes in the rate of initiation affect the probability of frameshifting by altering the distance between elongating ribosomes on the mRNA, which influences the frequency of encounter between these ribosomes and the frameshift stimulatory signal. [Abstract/Link to Full Text]

Ding G, Sun Y, Li H, Wang Z, Fan H, Wang C, Yang D, Li Y
EPGD: a comprehensive web resource for integrating and displaying eukaryotic paralog/paralogon information.
Nucleic Acids Res. 2007 Nov 5;
Gene duplication is common in all three domains of life, especially in eukaryotic genomes. The duplicates provide new material for the action of evolutionary forces such as selection or genetic drift. Here we describe a sophisticated procedure to extract duplicated genes (paralogs) from 26 available eukaryotic genomes, to pre-calculate several evolutionary indexes (evolutionary rate, synonymous distance/clock, transition redundant exchange clock, etc.) based on the paralog family, and to identify block or segmental duplications (paralogons). We also constructed an internet-accessible Eukaryotic Paralog Group Database (EPGD; The database is gene-centered and organized by paralog family. It focuses on paralogs and evolutionary duplication events. The paralog families and paralogons can be searched by text or sequence, and are downloadable from the website as plain text files. The database will be very useful for both experimentalists and bioinformaticians interested in the study of duplication events or paralog families. [Abstract/Link to Full Text]

Chaudhuri RR, Loman NJ, Snyder LA, Bailey CM, Stekel DJ, Pallen MJ
xBASE2: a comprehensive resource for comparative bacterial genomics.
Nucleic Acids Res. 2007 Nov 5;
xBASE is a genome database aimed at helping laboratory-based bacteriologists make best use of bacterial genome sequence data, with a particular emphasis on comparative genomics. The latest version, xBASE 2.0 (, now provides comprehensive coverage of all bacterial genomes and features an updated modularized backend and an improved user interface, which includes a taxonomy browser and a powerful full-text search facility. [Abstract/Link to Full Text]

Chen PH, Tsao YP, Wang CC, Chen SL
Nuclear receptor interaction protein, a coactivator of androgen receptors (AR), is regulated by AR and Sp1 to feed forward and activate its own gene expression through AR protein stability.
Nucleic Acids Res. 2007 Nov 5;
Previously, we found a novel gene, nuclear receptor interaction protein (NRIP), a transcription cofactor that can enhance an AR-driven PSA promoter activity in a ligand-dependent manner in prostate cancer cells. Here, we investigated NRIP regulation. We cloned a 413-bp fragment from the transcription initiation site of the NRIP gene that had strong promoter activity, was TATA-less and GC-rich, and, based on DNA sequences, contained one androgen response element (ARE) and three Sp1-binding sites (Sp1-1, Sp1-2, Sp1-3). Transient promoter luciferase assays, chromatin immunoprecipitation and small RNA interference analyses mapped ARE and Sp1-2-binding sites involved in NRIP promoter activation, implying that NRIP is a target gene for AR or Sp1. AR associates with the NRIP promoter through ARE and indirectly through Sp1-binding site via AR-Sp1 complex formation. Thus both ARE and Sp1-binding site within the NRIP promoter can respond to androgen induction. More intriguingly, NRIP plays a feed-forward role enhancing AR-driven NRIP promoter activity via NRIP forming a complex with AR to protect AR protein from proteasome degradation. This is the first demonstration that NRIP is a novel AR-target gene and that NRIP expression feeds forward and activates its own expression through AR protein stability. [Abstract/Link to Full Text]

Masih PJ, Kunnev D, Melendy T
Mismatch Repair proteins are recruited to replicating DNA through interaction with Proliferating Cell Nuclear Antigen (PCNA).
Nucleic Acids Res. 2007 Nov 5;
Mismatch Repair (MMR) is closely linked to DNA replication; however, other than the role of the replicative sliding clamp (PCNA) in various MMR functions, the linkage between DNA replication and MMR has been difficult to investigate. Here we use an in vitro DNA replication system based on simian virus 40, to investigate MMR recruitment to replicating DNA. Both DNA replication and MMR proteins are recruited to replicating DNA in an origin-dependent fashion. Primer synthesis is required for recruitment of both PCNA and MMR proteins, but not for recruitment of the single-stranded DNA-binding protein (RPA). Blocking PCNA recruitment to replicating DNA with a p21-based polypeptide blocks PCNA and MMR, but not RPA recruitment. Once PCNA and subsequent proteins required for replication are loaded onto DNA, addition of p21 leaves PCNA on the replicating DNA, but actively displaces MMR proteins. These findings indicate that the MMR machinery is recruited to replicating DNA through its interaction with PCNA, and suggests that this occurs via binding of the MMR proteins to the multi-protein interaction sites on PCNA. These studies demonstrate the utility of this system for further investigation of the role of DNA replication in MMR. [Abstract/Link to Full Text]

Qin Y, Rezler EM, Gokhale V, Sun D, Hurley LH
Characterization of the G-quadruplexes in the duplex nuclease hypersensitive element of the PDGF-A promoter and modulation of PDGF-A promoter activity by TMPyP4.
Nucleic Acids Res. 2007 Nov 5;
The proximal 5'-flanking region of the human platelet-derived growth factor A (PDGF-A) promoter contains one nuclease hypersensitive element (NHE) that is critical for PDGF-A gene transcription. On the basis of circular dichroism (CD) and electrophoretic mobility shift assay (EMSA), we have shown that the guanine-rich (G-rich) strand of the DNA in this region can form stable intramolecular parallel G-quadruplexes under physiological conditions. A Taq polymerase stop assay has shown that the G-rich strand of the NHE can form two major G-quadruplex structures, which are in dynamic equilibrium and differentially stabilized by three G-quadruplex-interactive drugs. One major parallel G-quadruplex structure of the G-rich strand DNA of NHE was identified by CD and dimethyl sulfate (DMS) footprinting. Surprisingly, CD spectroscopy shows a stable parallel G-quadruplex structure formed within the duplex DNA of the NHE at temperatures up to 100 degrees C. This structure has been characterized by DMS footprinting in the double-stranded DNA of the NHE. In transfection experiments, 10 muM TMPyP4 reduced the activity of the basal promoter of PDGF-A approximately 40%, relative to the control. On the basis of these results, we have established that ligand-mediated stabilization of G-quadruplex structures within the PDGF-A NHE can silence PDGF-A expression. [Abstract/Link to Full Text]

Tagawa M, Shohda KI, Fujimoto K, Sugawara T, Suyama A
Heat-resistant DNA tile arrays constructed by template-directed photoligation through 5-carboxyvinyl-2'-deoxyuridine.
Nucleic Acids Res. 2007 Nov 3;
Template-directed DNA photoligation has been applied to a method to construct heat-resistant two-dimensional (2D) DNA arrays that can work as scaffolds in bottom-up assembly of functional biomolecules and nano-electronic components. DNA double-crossover AB-staggered (DXAB) tiles were covalently connected by enzyme-free template-directed photoligation, which enables a specific ligation reaction in an extremely tight space and under buffer conditions where no enzymes work efficiently. DNA nanostructures created by self-assembly of the DXAB tiles before and after photoligation have been visualized by high-resolution, tapping mode atomic force microscopy in buffer. The improvement of the heat tolerance of 2D DNA arrays was confirmed by heating and visualizing the DNA nanostructures. The heat-resistant DNA arrays may expand the potential of DNA as functional materials in biotechnology and nanotechnology. [Abstract/Link to Full Text]

Tourasse NJ, Kolstř AB
SuperCAT: a supertree database for combined and integrative multilocus sequence typing analysis of the Bacillus cereus group of bacteria (including B. cereus, B. anthracis and B. thuringiensis).
Nucleic Acids Res. 2007 Nov 3;
The Bacillus cereus group of bacteria is an important group including mammalian and insect pathogens, such as B. anthracis, the anthrax bacterium, B. thuringiensis, used as a biological pesticide and B. cereus, often involved in food poisoning incidents. To characterize the population structure and epidemiology of these bacteria, five separate multilocus sequence typing (MLST) schemes have been developed, which makes results difficult to compare. Therefore, we have developed a database that compiles and integrates MLST data from all five schemes for the B. cereus group, accessible at Supertree techniques were used to combine the phylogenetic information from analysis of all schemes and datasets, in order to produce an integrated view of the B. cereus group population. The database currently contains strain information and sequence data for 1029 isolates and 26 housekeeping gene fragments, which can be searched by keywords, MLST scheme, or sequence similarity. Supertrees can be browsed according to various criteria such as species, isolate source, or genetic distance, and subtrees containing strains of interest can be extracted. Besides analysis of the available data, the user has the possibility to enter her/his own sequences and compare them to the database and/or include them into the supertree reconstructions. [Abstract/Link to Full Text]

Matsuya A, Sakate R, Kawahara Y, Koyanagi KO, Sato Y, Fujii Y, Yamasaki C, Habara T, Nakaoka H, Todokoro F, Yamaguchi K, Endo T, Oota S, Makalowski W, Ikeo K, Suzuki Y, Hanada K, Hashimoto K, Hirai M, Iwama H, Saitou N, Hiraki AT, Jin L, Kaneko Y, Kanno M, Murakami K, Noda AO, Saichi N, Sanbonmatsu R, Suzuki M, Takeda JI, Tanaka M, Gojobori T, Imanishi T, Itoh T
Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees.
Nucleic Acids Res. 2007 Nov 3;
Orthologs are genes in different species that evolved from a common ancestral gene by speciation. Currently, with the rapid growth of transcriptome data of various species, more reliable orthology information is prerequisite for further studies. However, detection of orthologs could be erroneous if pairwise distance-based methods, such as reciprocal BLAST searches, are utilized. Thus, as a sub-database of H-InvDB, an integrated database of annotated human genes (, we constructed a fully curated database of evolutionary features of human genes, called 'Evola'. In the process of the ortholog detection, computational analysis based on conserved genome synteny and transcript sequence similarity was followed by manual curation by researchers examining phylogenetic trees. In total, 18 968 human genes have orthologs among 11 vertebrates (chimpanzee, mouse, cow, chicken, zebrafish, etc.), either computationally detected or manually curated orthologs. Evola provides amino acid sequence alignments and phylogenetic trees of orthologs and homologs. In 'd(N)/d(S) view', natural selection on genes can be analyzed between human and other species. In 'Locus maps', all transcript variants and their exon/intron structures can be compared among orthologous gene loci. We expect the Evola to serve as a comprehensive and reliable database to be utilized in comparative analyses for obtaining new knowledge about human genes. Evola is available at [Abstract/Link to Full Text]

Hong EL, Balakrishnan R, Dong Q, Christie KR, Park J, Binkley G, Costanzo MC, Dwight SS, Engel SR, Fisk DG, Hirschman JE, Hitz BC, Krieger CJ, Livstone MS, Miyasato SR, Nash RS, Oughtred R, Skrzypek MS, Weng S, Wong ED, Zhu KK, Dolinski K, Botstein D, Cherry JM
Gene Ontology annotations at SGD: new data sources and annotation methods.
Nucleic Acids Res. 2007 Nov 3;
The Saccharomyces Genome Database (SGD; collects and organizes biological information about the chromosomal features and gene products of the budding yeast Saccharomyces cerevisiae. Although published data from traditional experimental methods are the primary sources of evidence supporting Gene Ontology (GO) annotations for a gene product, high-throughput experiments and computational predictions can also provide valuable insights in the absence of an extensive body of literature. Therefore, GO annotations available at SGD now include high-throughput data as well as computational predictions provided by the GO Annotation Project (GOA UniProt; Because the annotation method used to assign GO annotations varies by data source, GO resources at SGD have been modified to distinguish data sources and annotation methods. In addition to providing information for genes that have not been experimentally characterized, GO annotations from independent sources can be compared to those made by SGD to help keep the literature-based GO annotations current. [Abstract/Link to Full Text]

Recent Articles in Genome Research

Ichiyanagi K, Nakajima R, Kajikawa M, Okada N
Novel retrotransposon analysis reveals multiple mobility pathways dictated by hosts.
Genome Res. 2007 Jan;17(1):33-41.
Autonomous non-long-terminal-repeat retrotransposons (NLRs) proliferate by retrotransposition via coordinated reactions of target DNA cleavage and reverse transcription by a mechanism called target-primed reverse transcription (TPRT). Whereas this mechanism guarantees the covalent attachment of the NLR and its target site at the 3' junction, mechanisms for the joining at the 5' junction have been conjectural. To better understand the retrotransposition pathways, we analyzed target-NLR junctions of zebrafish NLRs with a new method of identifying genomic copies that reside within other transposons, termed "target analysis of nested transposons" (TANT). Application of the TANT method revealed various features of the zebrafish NLR integrants; for example, half of the integrants carry extra nucleotides at the 5' junction, which is in stark contrast to the major human NLR, LINE-1. Interestingly, in a cell culture assay, retrotransposition of the zebrafish NLR in heterologous human cells did not bear extra 5' nucleotides, indicating that the choice of the 5' joining pathway is affected by the host. Our results suggest that several pathways exist for NLR retrotransposition and argue in favor of host protein involvement. With genomic sequence information accumulating exponentially, our data demonstrate the general applicability of the TANT method for the analysis of a wide variety of retrotransposons. [Abstract/Link to Full Text]

Paschou P, Mahoney MW, Javed A, Kidd JR, Pakstis AJ, Gu S, Kidd KK, Drineas P
Intra- and interpopulation genotype reconstruction from tagging SNPs.
Genome Res. 2007 Jan;17(1):96-107.
The optimal method to be used for tSNP selection, the applicability of a reference LD map to unassayed populations, and the scalability of these methods to genome-wide analysis, all remain subjects of debate. We propose novel, scalable matrix algorithms that address these issues and we evaluate them on genotypic data from 38 populations and four genomic regions (248 SNPs typed for approximately 2000 individuals). We also evaluate these algorithms on a second data set consisting of genotypes available from the HapMap database (1336 SNPs for four populations) over the same genomic regions. Furthermore, we test these methods in the setting of a real association study using a publicly available family data set. The algorithms we use for tSNP selection and unassayed SNP reconstruction do not require haplotype inference and they are, in principle, scalable even to genome-wide analysis. Moreover, they are greedy variants of recently developed matrix algorithms with provable performance guarantees. Using a small set of carefully selected tSNPs, we achieve very good reconstruction accuracy of "untyped" genotypes for most of the populations studied. Additionally, we demonstrate in a quantitative manner that the chosen tSNPs exhibit substantial transferability, both within and across different geographic regions. Finally, we show that reconstruction can be applied to retrieve significant SNP associations with disease, with important genotyping savings. [Abstract/Link to Full Text]

Forster AC, Church GM
Synthetic biology projects in vitro.
Genome Res. 2007 Jan;17(1):1-6.
Advances in the in vitro synthesis and evolution of DNA, RNA, and polypeptides are accelerating the construction of biopolymers, pathways, and organisms with novel functions. Known functions are being integrated and debugged with the aim of synthesizing life-like systems. The goals are knowledge, tools, smart materials, and therapies. [Abstract/Link to Full Text]

Normand P, Lapierre P, Tisa LS, Gogarten JP, Alloisio N, Bagnarol E, Bassi CA, Berry AM, Bickhart DM, Choisne N, Couloux A, Cournoyer B, Cruveiller S, Daubin V, Demange N, Francino MP, Goltsman E, Huang Y, Kopp OR, Labarre L, Lapidus A, Lavire C, Marechal J, Martinez M, Mastronunzio JE, Mullin BC, Niemann J, Pujic P, Rawnsley T, Rouy Z, Schenowitz C, Sellstedt A, Tavares F, Tomkins JP, Vallenet D, Valverde C, Wall LG, Wang Y, Medigue C, Benson DR
Genome characteristics of facultatively symbiotic Frankia sp. strains reflect host range and host plant biogeography.
Genome Res. 2007 Jan;17(1):7-15.
Soil bacteria that also form mutualistic symbioses in plants encounter two major levels of selection. One occurs during adaptation to and survival in soil, and the other occurs in concert with host plant speciation and adaptation. Actinobacteria from the genus Frankia are facultative symbionts that form N(2)-fixing root nodules on diverse and globally distributed angiosperms in the "actinorhizal" symbioses. Three closely related clades of Frankia sp. strains are recognized; members of each clade infect a subset of plants from among eight angiosperm families. We sequenced the genomes from three strains; their sizes varied from 5.43 Mbp for a narrow host range strain (Frankia sp. strain HFPCcI3) to 7.50 Mbp for a medium host range strain (Frankia alni strain ACN14a) to 9.04 Mbp for a broad host range strain (Frankia sp. strain EAN1pec.) This size divergence is the largest yet reported for such closely related soil bacteria (97.8%-98.9% identity of 16S rRNA genes). The extent of gene deletion, duplication, and acquisition is in concert with the biogeographic history of the symbioses and host plant speciation. Host plant isolation favored genome contraction, whereas host plant diversification favored genome expansion. The results support the idea that major genome expansions as well as reductions can occur in facultative symbiotic soil bacteria as they respond to new environments in the context of their symbioses. [Abstract/Link to Full Text]

Freyhult EK, Bollback JP, Gardner PP
Exploring genomic dark matter: a critical assessment of the performance of homology search methods on noncoding RNA.
Genome Res. 2007 Jan;17(1):117-25.
Homology search is one of the most ubiquitous bioinformatic tasks, yet it is unknown how effective the currently available tools are for identifying noncoding RNAs (ncRNAs). In this work, we use reliable ncRNA data sets to assess the effectiveness of methods such as BLAST, FASTA, HMMer, and Infernal. Surprisingly, the most popular homology search methods are often the least accurate. As a result, many studies have used inappropriate tools for their analyses. On the basis of our results, we suggest homology search strategies using the currently available tools and some directions for future development. [Abstract/Link to Full Text]

Hicks J, Krasnitz A, Lakshmi B, Navin NE, Riggs M, Leibu E, Esposito D, Alexander J, Troge J, Grubor V, Yoon S, Wigler M, Ye K, Břrresen-Dale AL, Naume B, Schlicting E, Norton L, Hägerström T, Skoog L, Auer G, Mĺnér S, Lundin P, Zetterberg A
Novel patterns of genome rearrangement and their association with survival in breast cancer.
Genome Res. 2006 Dec;16(12):1465-79.
Representational Oligonucleotide Microarray Analysis (ROMA) detects genomic amplifications and deletions with boundaries defined at a resolution of approximately 50 kb. We have used this technique to examine 243 breast tumors from two separate studies for which detailed clinical data were available. The very high resolution of this technology has enabled us to identify three characteristic patterns of genomic copy number variation in diploid tumors and to measure correlations with patient survival. One of these patterns is characterized by multiple closely spaced amplicons, or "firestorms," limited to single chromosome arms. These multiple amplifications are highly correlated with aggressive disease and poor survival even when the rest of the genome is relatively quiet. Analysis of a selected subset of clinical material suggests that a simple genomic calculation, based on the number and proximity of genomic alterations, correlates with life-table estimates of the probability of overall survival in patients with primary breast cancer. Based on this sample, we generate the working hypothesis that copy number profiling might provide information useful in making clinical decisions, especially regarding the use or not of systemic therapies (hormonal therapy, chemotherapy), in the management of operable primary breast cancer with ostensibly good prognosis, for example, small, node-negative, hormone-receptor-positive diploid cases. [Abstract/Link to Full Text]

Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J, McDonald H, Pandoh P, Dhalla N, Prabhu AL, Ma K, Lee S, Ally A, Tam A, Sa D, Rogers S, Charest D, Stott J, Zuyderduyn S, Varhol R, Eaves C, Jones S, Holt R, Hirst M, Hoodless PA, Marra MA
Large-scale production of SAGE libraries from microdissected tissues, flow-sorted cells, and cell lines.
Genome Res. 2007 Jan;17(1):108-16.
We describe the details of a serial analysis of gene expression (SAGE) library construction and analysis platform that has enabled the generation of >298 high-quality SAGE libraries and >30 million SAGE tags primarily from sub-microgram amounts of total RNA purified from samples acquired by microdissection. Several RNA isolation methods were used to handle the diversity of samples processed, and various measures were applied to minimize ditag PCR carryover contamination. Modifications in the SAGE protocol resulted in improved cloning and DNA sequencing efficiencies. Bioinformatic measures to automatically assess DNA sequencing results were implemented to analyze the integrity of ditag structure, linker or cross-species ditag contamination, and yield of high-quality tags per sequence read. Our analysis of singleton tag errors resulted in a method for correcting such errors to statistically determine tag accuracy. From the libraries generated, we produced an essentially complete mapping of reliable 21-base-pair tags to the mouse reference genome sequence for a meta-library of approximately 5 million tags. Our analyses led us to reject the commonly held notion that duplicate ditags are artifacts. Rather than the usual practice of discarding such tags, we conclude that they should be retained to avoid introducing bias into the results and thereby maintain the quantitative nature of the data, which is a major theoretical advantage of SAGE as a tool for global transcriptional profiling. [Abstract/Link to Full Text]

Forton JT, Udalova IA, Campino S, Rockett KA, Hull J, Kwiatkowski DP
Localization of a long-range cis-regulatory element of IL13 by allelic transcript ratio mapping.
Genome Res. 2007 Jan;17(1):82-7.
It appears that, for many genes, the two alleles possessed by an individual may produce different amounts of transcript. When such allelic differences in transcription are observed for some individuals but not others, a plausible explanation is genetic variation in the cis-acting elements that regulate the gene in question. Here we describe a novel analytical approach that uses such observations, combined with genotyping data from the HapMap project, to define the genomic location of cis-acting regulatory elements. When applied to the human 5q31 chromosomal region, where complex regulatory mechanisms are known to exist, we demonstrate the sensitivity of this approach by locating a highly significant cis-regulatory element operating on IL13 at long range from a position 250 kb upstream from the gene (P = 2 x 10(-6)). As this method is unaffected by other sources of variation, such as environmental and trans-acting genetic factors, it provides a tractable approach for dissecting the complexities of genetic variation in gene regulation. [Abstract/Link to Full Text]

Roh TY, Wei G, Farrell CM, Zhao K
Genome-wide prediction of conserved and nonconserved enhancers by histone acetylation patterns.
Genome Res. 2007 Jan;17(1):74-81.
Comparative genomic studies have been useful in identifying transcriptional regulatory elements in higher eukaryotic genomes, but many important regulatory elements cannot be detected by such analyses due to evolutionary variations and alignment tool limitations. Therefore, in this study we exploit the highly conserved nature of epigenetic modifications to identify potential transcriptional enhancers. By using a high-resolution genome-wide mapping technique, which combines the chromatin immunoprecipitation and serial analysis of gene expression assays, we have recently determined the distribution of lysine 9/14-diacetylated histone H3 in human T cells. We showed the existence of 46,813 regions with clusters of histone acetylation, termed histone acetylation islands, some of which correspond to known transcriptional regulatory elements. In the present study, we find that 4679 sequences conserved between human and pufferfish coincide with histone acetylation islands, and random sampling shows that 33% (13/39) of these can function as transcriptional enhancers in human Jurkat T cells. In addition, by comparing the human histone acetylation island sequences with mouse genome sequences, we find that despite the conservation of many of these regions between these species, 21,855 of these sequences are not conserved. Furthermore, we demonstrate that about 50% (26/51) of these nonconserved sequences have enhancer activity in Jurkat cells, and that many of the orthologous mouse sequences also have enhancer activity in addition to conserved epigenetic modification patterns in mouse T-cell chromatin. Therefore, by combining epigenetic modification and sequence data, we have established a novel genome-wide method for identifying regulatory elements not discernable by comparative genomics alone. [Abstract/Link to Full Text]

Fiegler H, Redon R, Andrews D, Scott C, Andrews R, Carder C, Clark R, Dovey O, Ellis P, Feuk L, French L, Hunt P, Kalaitzopoulos D, Larkin J, Montgomery L, Perry GH, Plumb BW, Porter K, Rigby RE, Rigler D, Valsesia A, Langford C, Humphray SJ, Scherer SW, Lee C, Hurles ME, Carter NP
Accurate and reliable high-throughput detection of copy number variation in the human genome.
Genome Res. 2006 Dec;16(12):1566-74.
This study describes a new tool for accurate and reliable high-throughput detection of copy number variation in the human genome. We have constructed a large-insert clone DNA microarray covering the entire human genome in tiling path resolution that we have used to identify copy number variation in human populations. Crucial to this study has been the development of a robust array platform and analytic process for the automated identification of copy number variants (CNVs). The array consists of 26,574 clones covering 93.7% of euchromatic regions. Clones were selected primarily from the published "Golden Path," and mapping was confirmed by fingerprinting and BAC-end sequencing. Array performance was extensively tested by a series of validation assays. These included determining the hybridization characteristics of each individual clone on the array by chromosome-specific add-in experiments. Estimation of data reproducibility and false-positive/negative rates was carried out using self-self hybridizations, replicate experiments, and independent validations of CNVs. Based on these studies, we developed a variance-based automatic copy number detection analysis process (CNVfinder) and have demonstrated its robustness by comparison with the SW-ARRAY method. [Abstract/Link to Full Text]

Komura D, Shen F, Ishikawa S, Fitch KR, Chen W, Zhang J, Liu G, Ihara S, Nakamura H, Hurles ME, Lee C, Scherer SW, Jones KW, Shapero MH, Huang J, Aburatani H
Genome-wide detection of human copy number variations using high-density DNA oligonucleotide arrays.
Genome Res. 2006 Dec;16(12):1575-84.
Recent reports indicate that copy number variations (CNVs) within the human genome contribute to nucleotide diversity to a larger extent than single nucleotide polymorphisms (SNPs). In addition, the contribution of CNVs to human disease susceptibility may be greater than previously expected, although a complete understanding of the phenotypic consequences of CNVs is incomplete. We have recently reported a comprehensive view of CNVs among 270 HapMap samples using high-density SNP genotyping arrays and BAC array CGH. In this report, we describe a novel algorithm using Affymetrix GeneChip Human Mapping 500K Early Access (500K EA) arrays that identified 1203 CNVs ranging in size from 960 bp to 3.4 Mb. The algorithm consists of three steps: (1) Intensity pre-processing to improve the resolution between pairwise comparisons by directly estimating the allele-specific affinity as well as to reduce signal noise by incorporating probe and target sequence characteristics via an improved version of the Genomic Imbalance Map (GIM) algorithm; (2) CNV extraction using an adapted SW-ARRAY procedure to automatically and robustly detect candidate CNV regions; and (3) copy number inference in which all pairwise comparisons are summarized to more precisely define CNV boundaries and accurately estimate CNV copy number. Independent testing of a subset of CNVs by quantitative PCR and mass spectrometry demonstrated a >90% verification rate. The use of high-resolution oligonucleotide arrays relative to other methods may allow more precise boundary information to be extracted, thereby enabling a more accurate analysis of the relationship between CNVs and other genomic features. [Abstract/Link to Full Text]

Emanuelsson O, Nagalakshmi U, Zheng D, Rozowsky JS, Urban AE, Du J, Lian Z, Stolc V, Weissman S, Snyder M, Gerstein MB
Assessing the performance of different high-density tiling microarray strategies for mapping transcribed regions of the human genome.
Genome Res. 2007 Jun;17(6):886-97.
Genomic tiling microarrays have become a popular tool for interrogating the transcriptional activity of large regions of the genome in an unbiased fashion. There are several key parameters associated with each tiling experiment (e.g., experimental protocols and genomic tiling density). Here, we assess the role of these parameters as they are manifest in different tiling-array platforms used for transcription mapping. First, we analyze how a number of published tiling-array experiments agree with established gene annotation on human chromosome 22. We observe that the transcription detected from high-density arrays correlates substantially better with annotation than that from other array types. Next, we analyze the transcription-mapping performance of the two main high-density oligonucleotide array platforms in the ENCODE regions of the human genome. We hybridize identical biological samples and develop several ways of scoring the arrays and segmenting the genome into transcribed and nontranscribed regions, with the aim of making the platforms most comparable to each other. Finally, we develop a platform comparison approach based on agreement with known annotation. Overall, we find that the performance improves with more data points per locus, coupled with statistical scoring approaches that properly take advantage of this, where this larger number of data points arises from higher genomic tiling density and the use of replicate arrays and mismatches. While we do find significant differences in the performance of the two high-density platforms, we also find that they complement each other to some extent. Finally, our experiments reveal a significant amount of novel transcription outside of known genes, and an appreciable sample of this was validated by independent experiments. [Abstract/Link to Full Text]

Coulombe-Huntington J, Majewski J
Characterization of intron loss events in mammals.
Genome Res. 2007 Jan;17(1):23-32.
The exon/intron structure of eukaryotic genes differs extensively across species, but the mechanisms and relative rates of intron loss and gain are still poorly understood. Here, we used whole-genome sequence alignments of human, mouse, rat, and dog to perform a genome-wide analysis of intron loss and gain events in >17,000 mammalian genes. We found no evidence for intron gain and 122 cases of intron loss, most of which occurred within the rodent lineage. The majority (68%) of the deleted introns were extremely small (<150 bp), significantly smaller than average. The intron losses occurred almost exclusively within highly expressed, housekeeping genes, supporting the hypothesis that intron loss is mediated via germline recombination of genomic DNA with intronless cDNA. This study constitutes the largest scale analysis for intron dynamics in vertebrates to date and allows us to confirm and extend several hypotheses previously based on much smaller samples. Our results in mammals show that intron gain has not been a factor in the evolution of gene structure during the past 95 Myr and has likely been restricted to more ancient history. [Abstract/Link to Full Text]

Emrich SJ, Barbazuk WB, Li L, Schnable PS
Gene discovery and annotation using LCM-454 transcriptome sequencing.
Genome Res. 2007 Jan;17(1):69-73.
454 DNA sequencing technology achieves significant throughput relative to traditional approaches. More than 261,000 ESTs were generated by 454 Life Sciences from cDNA isolated using laser capture microdissection (LCM) from the developmentally important shoot apical meristem (SAM) of maize (Zea mays L.). This single sequencing run annotated >25,000 maize genomic sequences and also captured approximately 400 expressed transcripts for which homologous sequences have not yet been identified in other species. Approximately 70% of the ESTs generated in this study had not been captured during a previous EST project conducted using a cDNA library constructed from hand-dissected apex tissue that is highly enriched for SAMs. In addition, at least 30% of the 454-ESTs do not align to any of the approximately 648,000 extant maize ESTs using conservative alignment criteria. These results indicate that the combination of LCM and the deep sequencing possible with 454 technology enriches for SAM transcripts not present in current EST collections. RT-PCR was used to validate the expression of 27 genes whose expression had been detected in the SAM via LCM-454 technology, but that lacked orthologs in GenBank. Significantly, transcripts from approximately 74% (20/27) of these validated SAM-expressed "orphans" were not detected in meristem-rich immature ears. We conclude that the coupling of LCM and 454 sequencing technologies facilitates the discovery of rare, possibly cell-type-specific transcripts. [Abstract/Link to Full Text]

Jehan Z, Vallinayagam S, Tiwari S, Pradhan S, Singh L, Suresh A, Reddy HM, Ahuja YR, Jesudasan RA
Novel noncoding RNA from human Y distal heterochromatic block (Yq12) generates testis-specific chimeric CDC2L2.
Genome Res. 2007 Apr;17(4):433-40.
The human Y chromosome, because it is enriched in repetitive DNA, has been very intractable to genetic and molecular analyses. There is no previous evidence for developmental stage- and testis-specific transcription from the male-specific region of the Y (MSY). Here, we present evidence for the first time for a developmental stage- and testis-specific transcription from MSY distal heterochromatic block. We isolated two novel RNAs, which localize to Yq12 in multiple copies, show testis-specific expression, and lack active X-homologs. Experimental evidence shows that one of the above Yq12 noncoding RNAs (ncRNAs) trans-splices with CDC2L2 mRNA from chromosome 1p36.3 locus to generate a testis-specific chimeric beta sv13 isoform. This 67-nt 5'UTR provided by the Yq12 transcript contains within it a Y box protein-binding CCAAT motif, indicating translational regulation of the beta sv13 isoform in testis. This is also the first report of trans-splicing between a Y chromosomal and an autosomal transcript. [Abstract/Link to Full Text]

Chen FC, Chen CJ, Li WH, Chuang TJ
Human-specific insertions and deletions inferred from mammalian genome sequences.
Genome Res. 2007 Jan;17(1):16-22.
It has been suggested that insertions and deletions (indels) have contributed to the sequence divergence between the human and chimpanzee genomes more than do nucleotide changes (3% vs. 1.2%). However, although there have been studies of large indels between the two genomes, no systematic analysis of small indels (i.e., indels </= 100 bp) has been published. In this study, we first estimated that the false-positive rate of small indels inferred from human-chimpanzee pairwise sequence alignments is quite high, suggesting that the chimpanzee genome draft is not sufficiently accurate for our purpose. We have therefore inferred only human-specific indels using multiple sequence alignments of mammalian genomes. We identified >840,000 "small" indels, which affect >7000 UCSC-annotated human genes (>11,000 transcripts). These indels, however, amount to only approximately 0.21% sequence change in the human lineage for the regions compared, whereas in pseudogenes indels contribute to a sequence divergence of 1.40%, suggesting that most of the indels that occurred in genic regions have been eliminated. Functional analysis reveals that the genes whose coding exons have been affected by human-specific indels are enriched in transcription and translation regulatory activities but are underrepresented in catalytic and transporter activities, cellular and physiological processes, and extracellular region/matrix. This functional bias suggests that human-specific indels might have contributed to human unique traits by causing changes at the RNA and protein level. [Abstract/Link to Full Text]

Eyheramendy S, Marchini J, McVean G, Myers S, Donnelly P
A model-based approach to capture genetic variation for future association studies.
Genome Res. 2007 Jan;17(1):88-95.
Genome-wide association studies are still constrained by the cost of genotyping. For this reason, the selection of a reduced set of markers or tags able to capture a significant proportion of the genetic variation is an important aspect of these studies. Most tagging SNP selection methods have been successful in capturing the genetic variation of the data from which the tags have been chosen. However, when these tags are used in an independent data set, a significant proportion of the remaining SNPs (non-tags) are not captured and, in most cases, there is no information on which SNPs are captured. We propose to use a probabilistic model to predict the non-tags based on a set of tags, as a way to capture genetic variation. An important advantage of this method is that it directly predicts the genotype of the non-tags with which we can test for association with the phenotype and which could help to elucidate the location of genes responsible for increasing disease susceptibility. Additionally, this method provides an estimate of the probabilities with which the predictions are made, which reflects the confidence of the probabilistic model. We also propose new methods to select the tagging SNPs. We empirically show by using HapMap data that our approach is able to capture significantly more genetic variation than methods based solely on a pairwise LD measure. [Abstract/Link to Full Text]

Didelot X, Achtman M, Parkhill J, Thomson NR, Falush D
A bimodal pattern of relatedness between the Salmonella Paratyphi A and Typhi genomes: convergence or divergence by homologous recombination?
Genome Res. 2007 Jan;17(1):61-8.
All Salmonella can cause disease but severe systemic infections are primarily caused by a few lineages. Paratyphi A and Typhi are the deadliest human restricted serovars, responsible for approximately 600,000 deaths per annum. We developed a Bayesian changepoint model that uses variation in the degree of nucleotide divergence along two genomes to detect homologous recombination between these strains, and with other lineages of Salmonella enterica. Paratyphi A and Typhi showed an atypical and surprising pattern. For three quarters of their genomes, they appear to be distantly related members of the species S. enterica, both in their gene content and nucleotide divergence. However, the remaining quarter is much more similar in both aspects, with average nucleotide divergence of 0.18% instead of 1.2%. We describe two different scenarios that could have led to this pattern, convergence and divergence, and conclude that the former is more likely based on a variety of criteria. The convergence scenario implies that, although Paratyphi A and Typhi were not especially close relatives within S. enterica, they have gone through a burst of recombination involving more than 100 recombination events. Several of the recombination events transferred novel genes in addition to homologous sequences, resulting in similar gene content in the two lineages. We propose that recombination between Typhi and Paratyphi A has allowed the exchange of gene variants that are important for their adaptation to their common ecological niche, the human host. [Abstract/Link to Full Text]

Gomes JP, Bruno WJ, Nunes A, Santos N, Florindo C, Borrego MJ, Dean D
Evolution of Chlamydia trachomatis diversity occurs by widespread interstrain recombination involving hotspots.
Genome Res. 2007 Jan;17(1):50-60.
Chlamydia trachomatis is an obligate intracellular bacterium of major public health significance, infecting over one-tenth of the world's population and causing blindness and infertility in millions. Mounting evidence supports recombination as a key source of genetic diversity among free-living bacteria. Previous research shows that intracellular bacteria such as Chlamydiaceae may also undergo recombination but whether this plays a significant evolutionary role has not been determined. Here, we examine multiple loci dispersed throughout the chromosome to determine the extent and significance of recombination among 19 laboratory reference strains and 10 present-day ocular and urogenital clinical isolates using phylogenetic reconstructions, compatibility matrices, and statistically based recombination programs. Recombination is widespread; all clinical isolates are recombinant at multiple loci with no two belonging to the same clonal lineage. Several reference strains show nonconcordant phylogenies across loci; one strain is unambiguously identified as recombinantly derived from other reference strain lineages. Frequent recombination contrasts with a low level of point substitution; novel substitutions relative to reference strains occur less than one per kilobase. Hotspots for recombination are identified downstream from ompA, which encodes the major outer membrane protein. This widespread recombination, unexpected for an intracellular bacterium, explains why strain-typing using one or two genes, such as ompA, does not correlate with clinical phenotypes. Our results do not point to specific events that are responsible for different pathogenicities but, instead, suggest a new approach to dissect the genetic basis for clinical strain pathology with implications for evolution, host cell adaptation, and emergence of new chlamydial diseases. [Abstract/Link to Full Text]

Dewannieux M, Harper F, Richaud A, Letzelter C, Ribet D, Pierron G, Heidmann T
Identification of an infectious progenitor for the multiple-copy HERV-K human endogenous retroelements.
Genome Res. 2006 Dec;16(12):1548-56.
Human Endogenous Retroviruses are expected to be the remnants of ancestral infections of primates by active retroviruses that have thereafter been transmitted in a Mendelian fashion. Here, we derived in silico the sequence of the putative ancestral "progenitor" element of one of the most recently amplified family - the HERV-K family - and constructed it. This element, Phoenix, produces viral particles that disclose all of the structural and functional properties of a bona-fide retrovirus, can infect mammalian, including human, cells, and integrate with the exact signature of the presently found endogenous HERV-K progeny. We also show that this element amplifies via an extracellular pathway involving reinfection, at variance with the non-LTR-retrotransposons (LINEs, SINEs) or LTR-retrotransposons, thus recapitulating ex vivo the molecular events responsible for its dissemination in the host genomes. We also show that in vitro recombinations among present-day human HERV-K (also known as ERVK) loci can similarly generate functional HERV-K elements, indicating that human cells still have the potential to produce infectious retroviruses. [Abstract/Link to Full Text]

Rocha EP, Touchon M, Feil EJ
Similar compositional biases are caused by very different mutational effects.
Genome Res. 2006 Dec;16(12):1537-47.
Compositional replication strand bias, commonly referred to as GC skew, is present in many genomes of prokaryotes, eukaryotes, and viruses. Although cytosine deamination in ssDNA (resulting in C-->T changes on the leading strand) is often invoked as its major cause, the precise contributions of this and other substitution types are currently unknown. It is also unclear if the underlying mutational asymmetries are the same among taxa, are stable over time, or how closely the observed biases are to mutational equilibrium. We analyzed nearly neutral sites of seven taxa each with between three and six complete bacterial genomes, and inferred the substitution spectra of fourfold degenerate positions in nonhighly expressed genes. Using a bootstrap procedure, we extracted compositional biases associated with replication and identified the significant asymmetries. Although all taxa showed an overrepresentation of G relative to C on the leading strand (and imbalances between A and T), widely variable substitution asymmetries are noted. Surprisingly, all substitution types show significant asymmetry in at least one taxon, but none were universally biased in all taxa. Notably, in the two most biased genomes, A-->G, rather than C-->T, shapes the compositional bias. Given the variability in these biases, we propose that the process is multifactorial. Finally, we also find that most genomes are not at compositional equilibrium, and suggest that mutational-based heterotachy is deeply imprinted in the history of biological macromolecules. This shows that similar compositional biases associated with the same essential well-conserved process, replication, do not reflect similar mutational processes in different genomes, and that caution is required in inferring the roles of specific mutational biases on the basis of contemporary patterns of sequence composition. [Abstract/Link to Full Text]

Jones AK, Raymond-Delpech V, Thany SH, Gauthier M, Sattelle DB
The nicotinic acetylcholine receptor gene family of the honey bee, Apis mellifera.
Genome Res. 2006 Nov;16(11):1422-30.
Nicotinic acetylcholine receptors (nAChRs) mediate fast cholinergic synaptic transmission and play roles in many cognitive processes. They are under intense research as potential targets of drugs used to treat neurodegenerative diseases and neurological disorders such as Alzheimer's disease and schizophrenia. Invertebrate nAChRs are targets of anthelmintics as well as a major group of insecticides, the neonicotinoids. The honey bee, Apis mellifera, is one of the most beneficial insects worldwide, playing an important role in crop pollination, and is also a valuable model system for studies on social interaction, sensory processing, learning, and memory. We have used the A. mellifera genome information to characterize the complete honey bee nAChR gene family. Comparison with the fruit fly Drosophila melanogaster and the malaria mosquito Anopheles gambiae shows that the honey bee possesses the largest family of insect nAChR subunits to date (11 members). As with Drosophila and Anopheles, alternative splicing of conserved exons increases receptor diversity. Also, we show that in one honey bee nAChR subunit, six adenosine residues are targeted for RNA A-to-I editing, two of which are evolutionarily conserved in Drosophila melanogaster and Heliothis virescens orthologs, and that the extent of editing increases as the honey bee lifecycle progresses, serving to maximize receptor diversity at the adult stage. These findings on Apis mellifera enhance our understanding of nAChR functional genomics and provide a useful basis for the development of improved insecticides that spare a major beneficial insect species. [Abstract/Link to Full Text]

Cho S, Huang ZY, Green DR, Smith DR, Zhang J
Evolution of the complementary sex-determination gene of honey bees: balancing selection and trans-species polymorphisms.
Genome Res. 2006 Nov;16(11):1366-75.
The mechanism of sex determination varies substantively among evolutionary lineages. One important mode of genetic sex determination is haplodiploidy, which is used by approximately 20% of all animal species, including >200,000 species of the entire insect order Hymenoptera. In the honey bee Apis mellifera, a hymenopteran model organism, females are heterozygous at the csd (complementary sex determination) locus, whereas males are hemizygous (from unfertilized eggs). Fertilized homozygotes develop into sterile males that are eaten before maturity. Because homozygotes have zero fitness and because common alleles are more likely than rare ones to form homozygotes, csd should be subject to strong overdominant selection and negative frequency-dependent selection. Under these selective forces, together known as balancing selection, csd is expected to exhibit a high degree of intraspecific polymorphism, with long-lived alleles that may be even older than the species. Here we sequence the csd genes as well as randomly selected neutral genomic regions from individuals of three closely related species, A. mellifera, Apis cerana, and Apis dorsata. The polymorphic level is approximately seven times higher in csd than in the neutral regions. Gene genealogies reveal trans-species polymorphisms at csd but not at any neutral regions. Consistent with the prediction of rare-allele advantage, nonsynonymous mutations are found to be positively selected in csd only in early stages after their appearances. Surprisingly, three different hypervariable repetitive regions in csd are present in the three species, suggesting variable mechanisms underlying allelic specificities. Our results provide a definitive demonstration of balancing selection acting at the honey bee csd gene, offer insights into the molecular determinants of csd allelic specificities, and help avoid homozygosity in bee breeding. [Abstract/Link to Full Text]

Kaplan N, Linial M
ProtoBee: hierarchical classification and annotation of the honey bee proteome.
Genome Res. 2006 Nov;16(11):1431-8.
The recently sequenced genome of the honey bee (Apis mellifera) has produced 10,157 predicted protein sequences, calling for a computational effort to extract biological insights from them. We have applied an unsupervised hierarchical protein-clustering method, which was previously used in the ProtoNet system, to nearly 200,000 proteins consisting of the predicted honey bee proteins, the SWISS-PROT protein database, and the complete set of proteins of the mouse (Mus musculus) and the fruit fly (Drosophila melanogaster). The hierarchy produced by this method has been entitled ProtoBee. In ProtoBee, the proteins are hierarchically organized into 18,936 separate tree hierarchies, each representing a protein functional family. By using the mouse and Drosophila complete proteomes as reference, we are able to highlight functional groups of putative gene-loss events, putative novel proteins of unique functionality, and bee-specific paralogs. We have studied some of the ProtoBee findings and suggest their biological relevance. Examples include novel opsin genes and intriguing nuclear matches of mitochondrial genes. The organization of bee sequences into functional clusters suggests a natural way of automatically inferring functional annotation. Following this notion, we were able to assign functional annotation to about 70% of the sequences. ProtoBee is available at [Abstract/Link to Full Text]

Drapeau MD, Albert S, Kucharski R, Prusko C, Maleszka R
Evolution of the Yellow/Major Royal Jelly Protein family and the emergence of social behavior in honey bees.
Genome Res. 2006 Nov;16(11):1385-94.
The genomic architecture underlying the evolution of insect social behavior is largely a mystery. Eusociality, defined by overlapping generations, parental brood care, and reproductive division of labor, has most commonly evolved in the Hymenopteran insects, including the honey bee Apis mellifera. In this species, the Major Royal Jelly Protein (MRJP) family is required for all major aspects of eusocial behavior. Here, using data obtained from the A. mellifera genome sequencing project, we demonstrate that the MRJP family is encoded by nine genes arranged in an approximately 60-kb tandem array. Furthermore, the MRJP protein family appears to have evolved from a single progenitor gene that encodes a member of the ancient Yellow protein family. Five genes encoding Yellow-family proteins flank the genomic region containing the genes encoding MRJPs. We describe the molecular evolution of these protein families. We then characterize developmental-stage-specific, sex-specific, and caste-specific expression patterns of the mrjp and yellow genes in the honey bee. We review empirical evidence concerning the functions of Yellow proteins in fruit flies and social ants, in order to shed light on the roles of both Yellow and MRJP proteins in A. mellifera. In total, the available evidence suggests that Yellows and MRJPs are multifunctional proteins with diverse, context-dependent physiological and developmental roles. However, many members of the Yellow/MRJP family act as facilitators of reproductive maturation. Finally, it appears that MRJP protein subfamily evolution from the Yellow protein family may have coincided with the evolution of honey bee eusociality. [Abstract/Link to Full Text]

Sutherland TD, Campbell PM, Weisman S, Trueman HE, Sriskantha A, Wanjura WJ, Haritos VS
A highly divergent gene cluster in honey bees encodes a novel silk family.
Genome Res. 2006 Nov;16(11):1414-21.
The pupal cocoon of the domesticated silk moth Bombyx mori is the best known and most extensively studied insect silk. It is not widely known that Apis mellifera larvae also produce silk. We have used a combination of genomic and proteomic techniques to identify four honey bee fiber genes (AmelFibroin1-4) and two silk-associated genes (AmelSA1 and 2). The four fiber genes are small, comprise a single exon each, and are clustered on a short genomic region where the open reading frames are GC-rich amid low GC intergenic regions. The genes encode similar proteins that are highly helical and predicted to form unusually tight coiled coils. Despite the similarity in size, structure, and composition of the encoded proteins, the genes have low primary sequence identity. We propose that the four fiber genes have arisen from gene duplication events but have subsequently diverged significantly. The silk-associated genes encode proteins likely to act as a glue (AmelSA1) and involved in silk processing (AmelSA2). Although the silks of honey bees and silkmoths both originate in larval labial glands, the silk proteins are completely different in their primary, secondary, and tertiary structures as well as the genomic arrangement of the genes encoding them. This implies independent evolutionary origins for these functionally related proteins. [Abstract/Link to Full Text]

Robertson HM, Wanner KW
The chemoreceptor superfamily in the honey bee, Apis mellifera: expansion of the odorant, but not gustatory, receptor family.
Genome Res. 2006 Nov;16(11):1395-403.
The honey bee genome sequence reveals a remarkable expansion of the insect odorant receptor (Or) family relative to the repertoires of the flies Drosophila melanogaster and Anopheles gambiae, which have 62 and 79 Ors respectively. A total of 170 Or genes were annotated in the bee, of which seven are pseudogenes. These constitute five bee-specific subfamilies in an insect Or family tree, one of which has expanded to a total of 157 genes encoding proteins with 15%-99% amino acid identity. Most of the Or genes are in tandem arrays, including one with 60 genes. This bee-specific expansion of the Or repertoire presumably underlies their remarkable olfactory abilities, including perception of several pheromone blends, kin recognition signals, and diverse floral odors. The number of Apis mellifera Ors is approximately equal to the number of glomeruli in the bee antennal lobe (160-170), consistent with a general one-receptor/one-neuron/one-glomerulus relationship. The bee genome encodes just 10 gustatory receptors (Grs) compared with the D. melanogaster and A. gambiae repertoires of 68 and 76 Grs, respectively. A lack of Gr gene family expansion primarily accounts for this difference. A nurturing hive environment and a mutualistic relationship with plants may explain the lack of Gr family expansion. The Or family is the most dramatic example of gene family expansion in the bee genome, and characterizing their caste- and sex-specific gene expression may provide clues to their specific roles in detection of pheromone, kin, and floral odors. [Abstract/Link to Full Text]

Foręt S, Maleszka R
Function and evolution of a gene family encoding odorant binding-like proteins in a social insect, the honey bee (Apis mellifera).
Genome Res. 2006 Nov;16(11):1404-13.
The remarkable olfactory power of insect species is thought to be generated by a combinatorial action of two large protein families, G protein-coupled olfactory receptors (ORs) and odorant binding proteins (OBPs). In olfactory sensilla, OBPs deliver hydrophobic airborne molecules to ORs, but their expression in nonolfactory tissues suggests that they also may function as general carriers in other developmental and physiological processes. Here we used bioinformatic and experimental approaches to characterize the OBP-like gene family in a highly social insect, the Western honey bee. Comparison with other insects shows that the honey bee has the smallest set of these genes, consisting of only 21 OBPs. This number stands in stark contrast to the more than 70 OBPs in Anopheles gambiae and 51 in Drosophila melanogaster. In the honey bee as in the two dipterans, these genes are organized in clusters. We show that the evolution of their structure involved frequent intron losses. We describe a monophyletic subfamily of OBPs where the diversification of some amino acids appears to have been accelerated by positive selection. Expression profiling under a wide range of conditions shows that in the honey bee only nine OBPs are antenna-specific. The remaining genes are expressed either ubiquitously or are tightly regulated in specialized tissues or during development. These findings support the view that OBPs are not restricted to olfaction and are likely to be involved in broader physiological functions. [Abstract/Link to Full Text]

Robertson HM, Gordon KH
Canonical TTAGG-repeat telomeres and telomerase in the honey bee, Apis mellifera.
Genome Res. 2006 Nov;16(11):1345-51.
The draft assembly of the honey bee Apis mellifera genome sequence reveals that the 17 centromeric-distal telomeres are of a simple, shared, and canonical structure, with 3-4 kb of a unique subtelomeric sequence, followed by several kilobases of TTAGG or variant telomeric repeats. This simple subtelomeric structure differs from the centromeric-proximal telomeres on the short arms of the 15 acrocentric chromosomes, which are apparently composed primarily of the 176-bp AluI tandem repeat. This dichotomy between the distal and proximal telomeres may involve differential participation of the telomeres of the 15 acrocentric chromosomes in the Rabl configuration after mitosis and the chromosome bouquet in meiotic prophase I. As expected from the presence of canonical TTAGG telomeric repeats, we identified a candidate telomerase gene in the bee, as well as the silkmoth Bombyx mori and the flour beetle Tribolium castaneum. [Abstract/Link to Full Text]

Rubin EB, Shemesh Y, Cohen M, Elgavish S, Robertson HM, Bloch G
Molecular and phylogenetic analyses reveal mammalian-like clockwork in the honey bee (Apis mellifera) and shed new light on the molecular evolution of the circadian clock.
Genome Res. 2006 Nov;16(11):1352-65.
The circadian clock of the honey bee is implicated in ecologically relevant complex behaviors. These include time sensing, time-compensated sun-compass navigation, and social behaviors such as coordination of activity, dance language communication, and division of labor. The molecular underpinnings of the bee circadian clock are largely unknown. We show that clock gene structure and expression pattern in the honey bee are more similar to the mouse than to Drosophila. The honey bee genome does not encode an ortholog of Drosophila Timeless (Tim1), has only the mammalian type Cryptochrome (Cry-m), and has a single ortholog for each of the other canonical "clock genes." In foragers that typically have strong circadian rhythms, brain mRNA levels of amCry, but not amTim as in Drosophila, consistently oscillate with strong amplitude and a phase similar to amPeriod (amPer) under both light-dark and constant darkness illumination regimes. In contrast to Drosophila, the honey bee amCYC protein contains a transactivation domain and its brain transcript levels oscillate at virtually an anti-phase to amPer, as it does in the mouse. Phylogenetic analyses indicate that the basal insect lineage had both the mammalian and Drosophila types of Cry and Tim. Our results suggest that during evolution, Drosophila diverged from the ancestral insect clock and specialized in using a set of clock gene orthologs that was lost by both mammals and bees, which in turn converged and specialized in the other set. These findings illustrate a previously unappreciated diversity of insect clockwork and raise critical questions concerning the evolution and functional significance of species-specific variation in molecular clockwork. [Abstract/Link to Full Text]

Dearden PK, Wilson MJ, Sablan L, Osborne PW, Havler M, McNaughton E, Kimura K, Milshina NV, Hasselmann M, Gempe T, Schioett M, Brown SJ, Elsik CG, Holland PW, Kadowaki T, Beye M
Patterns of conservation and change in honey bee developmental genes.
Genome Res. 2006 Nov;16(11):1376-84.
The current insect genome sequencing projects provide an opportunity to extend studies of the evolution of developmental genes and pathways in insects. In this paper we examine the conservation and divergence of genes and developmental processes between Drosophila and the honey bee; two holometabolous insects whose lineages separated approximately 300 million years ago, by comparing the presence or absence of 308 Drosophila developmental genes in the honey bee. Through examination of the presence or absence of genes involved in conserved pathways (cell signaling, axis formation, segmentation and homeobox transcription factors), we find that the vast majority of genes are conserved. Some genes involved in these processes are, however, missing in the honey bee. We have also examined the orthology of Drosophila genes involved in processes that differ between the honey bee and Drosophila. Many of these genes are preserved in the honey bee despite the process in which they act in Drosophila being different or absent in the honey bee. Many of the missing genes in both situations appear to have arisen recently in the Drosophila lineage, have single known functions in Drosophila, and act early in developmental pathways, while those that are preserved have pleiotropic functions. An evolutionary interpretation of these data is that either genes with multiple functions in a common ancestor are more likely to be preserved in both insect lineages, or genes that are preserved throughout evolution are more likely to co-opt additional functions. [Abstract/Link to Full Text]

Savard J, Tautz D, Richards S, Weinstock GM, Gibbs RA, Werren JH, Tettelin H, Lercher MJ
Phylogenomic analysis reveals bees and wasps (Hymenoptera) at the base of the radiation of Holometabolous insects.
Genome Res. 2006 Nov;16(11):1334-8.
Comparative studies require knowledge of the evolutionary relationships between taxa. However, neither morphological nor paleontological data have been able to unequivocally resolve the major groups of holometabolous insects so far. Here, we utilize emerging genome projects to assemble and analyze a data set of 185 nuclear genes, resulting in a fully resolved phylogeny of the major insect model species. Contrary to the most widely accepted phylogenetic hypothesis, bees and wasps (Hymenoptera) are basal to the other major holometabolous orders, beetles (Coleoptera), moths (Lepidoptera), and flies (Diptera). We validate our results by meticulous examination of potential confounding factors. Phylogenomic approaches are thus able to resolve long-standing questions about the phylogeny of insects. [Abstract/Link to Full Text]

Recent Articles in Journal of Applied Genetics

Wyszy?ska-Koko J, Kury? J
A novel polymorphism in exon 1 of the porcine myogenin gene.
J Appl Genet. 2005;46(4):399-402.
Myogenin is a gene belonging to the MyoD family, which codes for the bHLH transcription factor playing a key role in myogenesis. It affects the processes of differentiation and maturation of myotubes during embryogenesis. Fragments of the porcine myogenin coding sequence and promoter region were amplified and subjected to MSSCP analysis. T-->C transition recognised by the MaeIII restriction enzyme in exon 1 was revealed, which appeared to be a silent mutation in the region of the transactivation domain. No other polymorphism was found either in the remaining coding sequence or the promoter region. [Abstract/Link to Full Text]

Kaminski S, Grzybowski G, Prusak B, Ru?? A
No incidence of DUMPS carriers in Polish dairy cattle.
J Appl Genet. 2005;46(4):395-7.
DUMPS (Deficiency of Uridine Monophosphate Synthase) is a hereditary recessive disorder in Holstein cattle causing early embryo mortality during its implantation in the uterus. The only way to avoid the economic losses is early detection of DUMPS carriers. Because American Holstein semen has been intensively imported to Poland since 1970, there was a risk that DUMPS could have spread in Polish dairy cattle. In our study, 2209 dairy cattle of the Polish Holstein breed have been screened by the DNA test. The dominant group was young bulls entering the testing program (1171) and proven bulls (781). They represented all sires entering Polish breeding programs between 1999 and 2003. Also, 257 sire dams were included in the screening program. No DUMPS carrier has been found. Our results then indicate that the population of dairy cattle reared in Poland is free from DUMPS. Because of the economical significance of the DUMPS mutation and its recessive mode of inheritance, attention has to be paid to any case of a bull having in his origin any known DUMPS carrier. Such a bull should be tested and if positive eliminated from the active population. Also, young bulls (testing bulls) should be screened for DUMPS if in their progeny a high incidence of embryo mortality is observed and their genealogy cannot exclude their relatedness to any DUMPS carriers. [Abstract/Link to Full Text]

Chanvijit K, Duangjinda M, Pattarajinda V, Reodecha C
Model comparison for genetic evaluation of milk yield in crossbred Holsteins in the tropics.
J Appl Genet. 2005;46(4):387-93.
The objective of this study was to compare models for appropriate genetic parameter estimation for milk yield (305-day) in crossbred Holsteins in the tropics, where only records from crossbred cows were available. Eleven models with different effects of contemporary group (CG) at calving (herd-year-season or herd-year-month as fixed, and herd-year-month as random), age at calving (as linear or quadratic covariates, age-class, and age-class x lactation), and dominance were considered. On-farm records from small herds (n < 50) were included or excluded to validate the parameter estimates. Average Information Restricted Maximum Likelihood (AIREML) and Best Linear Unbiased Prediction (BLUP) were used to estimate variance components and breeding values. R-square (R2) and standard error of heritability (h2) were used to determine the appropriate model. The estimates of heritability from most models ranged from 0.18 to 0.22. CG formation of herd-year-month as a random effect slightly lowered the additive genetic variance but considerably decreased the permanent environmental variance. The model with age-class x lactation gave better R2 than other age adjustments. The models including records from smallholders gave similar estimates of heritability and a lower standard error than the models excluding them. The estimate of dominance variance as a proportion of total variance was close to zero. The low ratio of dominance to additive genetic variance suggested that the inclusion of dominance effects in the model was unjustified. In conclusion, the model including the effects of herd-year-month, age-class x lactation, as well as additive genetic, permanent environmental and residual effects, was the most appropriate for genetic evaluation in crossbred Holsteins, where records from smallholders could be included. [Abstract/Link to Full Text]

Chakraborty A, Aranishi F, Iwatsuki Y
Molecular identification of hairtail species (Pisces: Trichiuridae) based on PCR-RFLP analysis of the mitochondrial 16S rRNA gene.
J Appl Genet. 2005;46(4):381-5.
A rapid PCR-RFLP analysis was designed to identify 3 closely related species of hairtails: Trichiurus lepturus, T. japonicus, and Trichiurus sp. 2, basing on partial sequence data (600 bp) of the mitochondrial DNA encoding the 16S ribosomal RNA (16S rRNA) gene. Restriction digestion analysis of the unpurified PCR products of these 3 species, using EcoRI and VspI endonucleases, generated reproducible species-specific restriction patterns showing 2 fragments (250 bp and 350 bp) for T. lepturus in EcoRI digestion and 2 fragments (196 bp and 404 bp) for T. japonicus in VspI digestion, whereas no cleavage was observed for Trichiurus sp. 2 in both EcoRI and VspI digestions. The PCR-RFLP technique developed in this study proved to be a rapid, reliable and simple method that enables easy and accurate identification of these 3 closely related species of the genus Trichiurus. [Abstract/Link to Full Text]

Sud S, Bains NS, Nanda GS
Genetic relationships among wheat genotypes, as revealed by microsatellite markers and pedigree analysis.
J Appl Genet. 2005;46(4):375-9.
Genetic relationships among 20 elite wheat genotypes were studied using microsatellite markers and pedigree analysis. A total of 93 polymorphic bands were obtained with 25 microsatellite primer pairs. Coefficient of parentage (COP) values were calculated using parentage information at the expansion level of 5. The pedigree-based similarity (mean 0.115, range 0.00-0.53) was lower than the similarity assessed using microsatellite markers (mean 0.70, range 0.47-0.91). Similarity estimates were used to construct dendrograms by using the unweighted pair-group method with arithmetic averages (UPGMA). Clustering of genotypes in respect of marker-based similarity revealed two groups. Genotype PBW442 diverged and appeared as distinct from all other genotypes in both marker-based and pedigree-based analysis. The correlation of COP values with genetic similarity values based on microsatellite markers is low (r = 0.285, p < 0.05). The results indicate a need to develop wheat varieties with a diverse genetic background and to incorporate new variability into the existing wheat gene pool. [Abstract/Link to Full Text]

Stoja?owski S, Jaciubek M, Masoj? P
Rye SCAR markers for male fertility restoration in the P cytoplasm are also applicable to marker-assisted selection in the C cytoplasm.
J Appl Genet. 2005;46(4):371-3.
The study aimed at testing the usefulness of recently developed SCAR markers on rye (Secale cereale L.) chromosome 4R in hybrid breeding based on the C source of male sterility-inducing cytoplasm. Of 10 markers studied, 4 revealed polymorphisms between 2 inbred lines (544cms-C and Ot0-20) crossed to develop F2 and BC1 mapping populations. Analyses performed on 94 F2 and 93 BC1 plants allowed to extend a formerly constructed genetic map of chromosome arm 4RL. Three SCAR markers (SCP14M55, SCP15M55 and SCP16M58) were mapped in the vicinity of gene Rfc1, which restores male fertility in the C cytoplasm. The 3 tested SCAR markers proved to be effective in marker-assisted selection (MAS) for male fertility/sterility. [Abstract/Link to Full Text]

Wang HY, Liu DC, Yan ZH, Wei YM, Zheng YL
Cytological characteristics of F2 hybrids between Triticum aestivum L. and T. durum Desf. with reference to wheat breeding.
J Appl Genet. 2005;46(4):365-9.
Cytological and agronomic characteristics of a F2 population from Triticum aestivum L. x T. durum Desf. hybrids were analyzed plant by plant. Means of morphologic traits in the F2 population were similar to those of the low-value parent. On average, F2 hybrids had 36.54 chromosomes per plant, indicating that each gamete lost 2.73 chromosomes at meiosis of the F1 generation. More than half of plants had 36-39 chromosomes, so male gametes with 19-21 chromosomes seemed to be superior to the others. The distribution frequency of chromosomes in this study differed from that in a previous report, where a different tetraploid wheat was used. This shows that a different breeding strategy may need to be taken when exploiting a different tetraploid wheat. According to our results, some plants with 42 chromosomes, having all the wheat A, B and D chromosomes, would appear in the F3 population, which provides a chance to obtain stable bread wheat lines from the self-pollinated progenies. Alternatively, the desirable individuals of the F2 population were backcrossed to bread wheat, which is very useful and efficient for the improvement of bread wheat by exploiting desirable genes in durum wheat. [Abstract/Link to Full Text]

Blaszczyk L, Tyrka M, Che?kowski J
PstIAFLP based markers for leaf rust resistance genes in common wheat.
J Appl Genet. 2005;46(4):357-64.
The aim of the present study was to detect candidate DNA markers for selected leaf rust resistance genes. A total number of 286 loci in the 'Thatcher' near-isogenic lines carrying resistance gene Lr1, Lr9, Lr10, Lr13, Lr19, Lr21, Lr24, Lr26, Lr28, Lr35, and Lr37 were screened for DNA polymorphism by the PstIAFLP method. A survey with 33 selective primers yielded 16 candidate markers. Further validation studies on cultivars characterized for the presence and absence of selected resistance genes confirmed specificity of markers for Lr24, Lr26 and Lr37. The AFLP-based marker P42-530 was successfully converted into an STS marker. The new marker was linked with the Lr37-specific marker (CslVrga13) at the distance of 1.7 cM. The PstIAFLP method was found to be effective in the identification of DNA changes induced in hexaploid wheat by translocations from Agropyron elongatum, Secale cereale and Aegilops ventricosa. [Abstract/Link to Full Text]

Yue YW, Long H, Liu Q, Wei YM, Yan ZH, Zheng YL
Isolation of low-molecular-weight glutenin subunit genes from wild emmer wheat (Triticum dicoccoides).
J Appl Genet. 2005;46(4):349-55.
Three low-molecular-weight glutenin subunit (LMW-GS) genes, designated LMW-Td1, LMW-Td2 and LMW-Td3, were isolated from wild emmer wheat (Triticum dicoccoides), which is the tetraploid progenitor of common wheat (T. aestivum). The complete nucleotide sequence lengths of LMW-Td1, LMW-Td2 and LMW-Td3 are 858, 900 and 1062 bp, respectively. LMW-Td1 and LMW-Td3 can encode proteins with 284 and 352 amino acid residues, respectively, whereas LMW-Td2 is a putative pseudogene due to the presence of 3 inframe stop codons in its C-terminal domain. The deduced protein sequences of the 3 genes share the same typical polypeptide structures with known LMW-GS genes containing 8 cysteines in the mature protein domains. LMW-Td1 was clearly distinguished from all known LMW-GS genes, and considered as a novel LMW-GS gene. Two hydrophobic motifs (i.e. PIIIL and PVIIL) were observed in the repetitive domain of LMW-Td3. Sequence comparison indicates that sequences of the 3 LMW-GS genes from this study are strongly similar to known LMW-GS genes. Our phylogenetic analysis suggests that LMW-Td1 and LMW-Td2 are homologous with genes on chromosome 1A, and LMW-Td3 is closely related to genes on chromosome 1B. [Abstract/Link to Full Text]

Latos-Biele?ska A, Materna-Kiryluk A
Polish Registry of Congenital Malformations - aims and organization of the registry monitoring 300 000 births a year.
J Appl Genet. 2005;46(4):341-8.
In 1997, the Polish Registry of Congenital Malformations (PRCM) was established, to fulfil epidemiological, prophylactic, socioeconomic and scientific functions. The PRCM is a population-based registry monitoring currently about 300 000 births a year in 13 provinces. Such a large area and population require a special organizational structure of the Registry. The PRCM Central Working Group and the computer database are located in the Department of Medical Genetics, University of Medical Sciences, Pozna?. Here the data are collected, validated, encoded according to the ICD-10, and analysed. Provincial Working Groups are responsible for supervision of data collection in the given province. The PRCM staff has grown from about 250 members in 1997 to more than 400 members today. The PRCM collects information on structural defects diagnosed before the end of the second year of life. Minor anomalies are excluded from the registry. The main source of information is a registration form filled up by the physician diagnosing the anomaly. Since 2004 also electronic reporting has been possible. On 28 September 2005 there were 54 020 entries in the database concerning 33 729 children with at least one congenital malformation and 1261 control entries concerning children without malformations. The PRCM is also an important source of identification of families at genetic risk. Education of physicians and the community in the field of genetic counselling is also an important aim of the PRCM. Since 2001, the PRCM has been a member of the Eurocat. Detailed information on PRCM organization, electronic reporting, and results are available at the PRCM website ( [Abstract/Link to Full Text]

?ugowska A, Szyma?ska K, Kmiec T, Tarczy?ska I, Czartoryska B, Tylki-Szyma?ska A, Jurkiewicz E
Homozygote for mutation c.1204 + 1G > A of the ARSA gene presents with a late-infantile form of metachromatic leukodystrophy and a rare MRI white matter lesion type.
J Appl Genet. 2005;46(3):337-9.
The metachromatic leukodystrophy (MLD)--causing mutation c.1204 + 1G > A damages an intron-exon splice site recognition sequence. This results in a complete loss of enzymatic activity of arylsulfatase A (ARSA) protein molecules. We have found a late-infantile type MLD-patient to be homozygous for this mutation, which was not reported earlier, but is consistent with previous suggestions. Interestingly, the cerebral magnetic resonance imaging (MRI) in this patient displayed linear or punctuate structures radiating in the demyelinated white matter, which resembled the patterns described in Pelizaeus-Merzbacher disease. It should be emphasised that whenever a cerebral MRI demonstrates the "tigroid" or "leopard-skin" demyelination pattern not only Pelizaeus-Merzbacher disease, but also metachromatic leukodystrophy diagnosis should be considered; this suggests the necessity of ARSA activity estimations in patients with such specific MRI patterns. [Abstract/Link to Full Text]

Srebniak M, Popowska L, Wawrzkiewicz-Witkowska A, Tomaszewska A, Kazmierczak W
Subfertile couple with t(4;22)(q23;q11.2).
J Appl Genet. 2005;46(3):333-6.
A couple was referred for cytogenetic examination due to idiopathic miscarriages. The proband proved to be a carrier of chromosomal translocation and her partner's karyotype was found to be normal. The karyotype of the proband is 46,XX,t(4;22)(q23;q11.2) and can be regarded as a reason of fertility problems in the investigated couple. The risk of further miscarriages is high, but the risk of a progeny with abnormal karyotype is rather low, as the progeny would probably have lethal imbalances. [Abstract/Link to Full Text]

Adler G, Widecka K, Peczkowska M, Dobrucki T, Placha G, Drozd R, Parczewski M, Januszewicz A, Gaciong Z, Ciechanowicz A
Genetic screening for glucocorticoid-remediable aldosteronism (GRA): experience of three clinical centres in Poland.
J Appl Genet. 2005;46(3):329-32.
Glucocorticoid-remediable aldosteronism (GRA), also known as familial hyperaldosteronism type I (FH-I, OMIM 103900), is a monogenic form of inherited hypertension caused by the presence of a chimaeric gene originating from an unequal cross-over between the CYP11B1 (11beta-hydroxylase) and CYP11B2 (aldosterone synthase) genes. The hybrid gene has the CYP11B1 sequence at the 5' end, including the promoter, and the CYP11B2 sequence at the 3' end. The aim of our study was to evaluate the prevalence of GRA in a Polish population of 129 patients with primary hyperaldosteronism (PHA) and 132 patients with essential hypertension (EH), through the use of a PCR-based test revealing the chimaeric gene. None of our PHA or EH patients was positive for the CYP11B1/CYP11B2 chimaeric gene. These data suggest that GRA is unlikely to be a common cause of hypertension in Polish subjects. However, the real prevalence of GRA in Poland, both in the high-risk group of individuals with primary hyperaldosteronism and in the general population, remains to be established. [Abstract/Link to Full Text]

Bauer PO, Matoska V, Zumrova A, Boday A, Doi H, Marikova T, Goetz P
Genotype/phenotype correlation in a SCA1 family: anticipation without CAG expansion.
J Appl Genet. 2005;46(3):325-8.
We report on a family with spinocerebellar ataxia type 1 (SCA1), in which the age at onset and the severity of the disease do not correlate with the number of CAG repeat units. Although a marked anticipation was observed in the proband, it was not a consequence of an expansion of the CAG tract. None of the expanded alleles contained CAT interruptions. The pathologic expansion in this family was stable during the paternal but not maternal transmission, where it expanded by one trinucleotide and unexpectedly did not lead to anticipation. Our observations suggest that factors other than the length of the CAG repeat play a considerable role in determination of the disease course. [Abstract/Link to Full Text]

Karpi?ski TM, Kostrzewska-Poczekaj M, Stachecki I, Mikstacki A, Szyfter K
Genotoxicity of the volatile anaesthetic desflurane in human lymphocytes in vitro, established by comet assay.
J Appl Genet. 2005;46(3):319-24.
The aim of the present study was to estimate the genotoxicity of desflurane, applied as a volatile anaesthetic. The potential genotoxicity was determined by the comet assay as the extent of DNA fragmentation in human peripheral blood lymphocytes in vitro. The comet assay detects DNA strand breaks induced directly by genotoxic agents as well as DNA fragmentation due to cell death. Another anaesthetic, halothane, already proved to be a genotoxic agent, was used as a positive control. Both analysed drugs were capable of increasing DNA migration in a dose-dependent manner under experimental conditions applied. The results of the study demonstrated that the genotoxicity of desflurane was comparable with that of halothane. However, considering the pharmacodynamics of both drugs, the genotoxic activity of desflurane may be connected with a less harmful effect on the exposed patients or medical staff. [Abstract/Link to Full Text]

Dybus A, Knapik K
A new PCR-RFLP within the domestic pigeon (Columba livia var. domestica) cytochrome b (MTCYB) gene.
J Appl Genet. 2005;46(3):315-7.
A total of 244 domestic pigeons (Columba livia var. domestica) were genotyped using the PCR-RFLP method. A 999 bp fragment of the MTCYB gene was amplified. The amplification products were digested with restriction enzymes. PCR-RFLP for MvaI restriction enzyme was observed. Frequencies of alleles were as follows: MTCYB(C)--0.926, MTCYB(G)-- 0.074. The frequencies of MTCYB/MvaI alleles found in this study for non-homing pigeons considerably deviate from the values found for homing/racing pigeons (allele MTCYB(G) occurred only in the non-homing breeds). [Abstract/Link to Full Text]

Gruszczy?ska J, Brokowska K, Charon KM, Swiderek WP
Restriction fragment length polymorphism of exon 2 Ovar-DRB1 gene in Polish Heath Sheep and Polish Lowland Sheep.
J Appl Genet. 2005;46(3):311-4.
Exon 2 of the Ovar-DR gene is known to encode the MHC outer domain (alpha or beta chain) that forms the binding area to antigens presented. The study was aimed at analysing exon 2 Ovar -DRB1 gene polymorphism in Polish Heath Sheep and Polish Lowland Sheep (Zelazna variety). A total of 101 and 99 ewes of the respective breeds were included in this study. We identified 65 different haplotypes in Polish Heath Sheep and 68 in Polish Lowland Sheep. The PCR-RFLP method and PCR products sequencing made it possible to identify two new sequences of exon 2 Ovar-DRB1 gene (AY230000 and AY248695). A distinct polymorphism in the exon 2 sequence presents possibilities for immune response toward a great variety of pathogens. [Abstract/Link to Full Text]

Yilmaz A, Davis ME, Hines HCh, Chung H
Detection of two nucleotide substitutions and putative promoters in the 5' flanking region of the ovine IGF-I gene.
J Appl Genet. 2005;46(3):307-9.
The objective of this study was to search for polymorphisms and gene regulatory sequences in the 5' flanking region of the sheep insulin-like growth factor I (IGF-I) gene. PCR-SSCP analysis of the 5' flanking region revealed three banding patterns. Family study indicated that these patterns in mixed breed sheep corresponded with three genotypes (with their frequencies in parentheses) AA (0.70), AB (0.25), and BB (0.05), which arose from a one-locus, two allele (A, B) polymorphism. Genotypic frequencies in 22 purebred Polypay sheep were AA (0.77) and AB (0.23). Calculated frequency of the A allele in Polypays was 0.89. No deviation from Hardy-Weinberg equilibrium was detected in this study. Fragments amplified using DNA from homozygous individuals were sequenced and aligned next to each other. A T to C transition and a G to C transversion were found at positions 179 and 181, respectively, of the amplified PCR product, resulting in recognition sites for Bsp143II and HaeI. Analysis of a fragment of 2,162 base pairs upstream of Exon 1, assembled from sheep ESTs and sequence of our amplified PCR products, revealed a promoter sequence approximately 100 bp downstream of the polymorphic sites. The assembled DNA fragment shared 70% sequence homology between sheep and human. These results suggest that sequence of the 5' flanking region of IGF-I gene and location of the IGF-I promoters are similar in human and sheep. [Abstract/Link to Full Text]

Zabek T, Nogaj A, Radko A, Nogaj J, S?ota E
Genetic variation of Polish endangered Bi?goraj horses and two common horse breeds in microsatellite loci.
J Appl Genet. 2005;46(3):299-305.
Genetic variation of endangered Bi?goraj horses and two common Polish horse breeds was compared with the use of 12 microsatellite loci (AHT4, AHT5, ASB2, HMS2, HMS3, HMS6, HMS7, HTG4, HTG6, HTG7, HTG10, VHL20). Lower allelic diversity was detected in all investigated populations in comparison to other studies. Large differences in the frequencies of microsatellite alleles between Bi?goraj horses and two other horse breeds were discovered. In all polymorphic loci all investigated breeds were in the Hardy-Weinberg equilibrium. Mean Fis values and the results of a test for the presence of a recent bottleneck were non-significant in all studied populations. Comparable values of observed and expected gene diversity indicate no substantial loss of genetic variation in the Bi?goraj population and two other breeds. The lowest variability observed in the investigated group of Thoroughbred horses was confirmed. About 10% of genetic variation are explained by differences between breeds. Values of pairwise Fst and two measures of genetic distance demonstrated that Bi?goraj horses are distantly related to both common horse breeds. [Abstract/Link to Full Text]

Pradeep AR, Chatterjee SN, Nair CV
Genetic differentiation induced by selection in an inbred population of the silkworm Bombyx mori, revealed by RAPD and ISSR marker systems.
J Appl Genet. 2005;46(3):291-8.
Artificial selection has been widely utilized in breeding programmes concerning the commercially important silk-producing insect Bombyx mori. Selection increases the frequency of homozygotes and makes homozygous effects stronger. Molecular variation induced by selection in the inbred population of B. mori strain Nistari, was assessed in terms of genic differentiation by using a polymorphic profile generated by RAPD and ISSR marker systems. Artificial selection for longer larval duration (LLD) for 4 generations resulted in a significant prolongation of larval duration (F = 89.28; P = 5.14 x 10(-7)). The lines selected for shorter larval duration (SLD) were not significantly different from the control group. RAPD and ISSR primers generated polymorphic profiles when amplified with genomic DNA of individuals of LLD and SLD lines. Distinct markers specific to LLD individuals were observed from the 3rd generation and indicated selection-induced differentiation of allelic variants for longer larval duration. Both SLD and LLD were characterized by high gene diversity (h approximately equal to 0.197) and total heterozygosity (Ht > or =0.26), low homogeneity (chi-square test, p < 0.005) as well as a large coefficient of gene differentiation (Gst > or =0.42) but low gene flow (Nm < or =0.42). Genetic distance was the highest (0.824) between 3rd generations of SLD and LLD. High heterozygosity and prolonged larval duration substituted for shorter larval duration (the traditional trait of fitness) in the Nistari LLD larvae. [Abstract/Link to Full Text]

Song C, Gao B, Teng Y, Wang X, Wang Z, Li Q, Mi H, Jing R, Mao J
MspI polymorphisms in the 3rd intron of the swine POU1F1 gene and their associations with growth performance.
J Appl Genet. 2005;46(3):285-9.
The study aimed to compare MspI polymorphisms in the 3rd intron of the porcine gene encoding the pituitary-1 transcription factor (Pit-1, renamed as POU1F1) among 5 breeds and to determine the associations between its genotypes and growth performance in a commercial pig population by using the PCR-RFLP technique. Significant differences in genotypic and allelic frequencies were found between the meat-type and fat-type breeds (P < 0.05), and between miniature pigs and others (P < 0.05). No breed deviated from the Hardy-Weinberg equilibrium (verified by chi-square test). The general linear model analysis revealed that higher body weight on day 180 (BW180) and average daily gain (ADG) were significantly associated with POU1F1 DD genotype (P < 0.05). The differences in BW180 and ADG between DD pigs and both CD and CC pigs were significant (P < 0.05), and the DD pigs had a significantly higher body weight on day 45 (BW45) and on day 70 (BW70) than CC pigs (P < 0.05). All measured growth traits, except for body weight at birth (BWB), showed higher values in DD pigs. The D allele had a favorable positive effect on growth traits. Thus POU1F1 is a potential major gene or marker for growth traits. [Abstract/Link to Full Text]

Trivedi M, Dhawan OP, Tiwari RK, Sattar A
Genetic studies on collar rot resistance in opium poppy (Papaver somniferum L.).
J Appl Genet. 2005;46(3):279-84.
The collar rot disease has been reported recently and occurs at the 10-12-leaf stage of plants of opium poppy. Infected plants topple down and dry prematurely due to fast rotting at the collar region. The inoculum for this study was multiplied on the cornmeal-sand culture. Genetic ratios were calculated by the chi-square test. Inheritance studies on this disease show a monogenic pattern of segregation with the ratio of 3 : 1 at F2, 1 : 2 : 1 at F3 and 1 : 1 at the backcross. Such genetic ratios clearly indicate that a single recessive gene (rs-1) is responsible for disease resistance in opium poppy. The inference drawn on the basis of the present study will be a great help in the future breeding programme of opium poppy for collar rot resistance. [Abstract/Link to Full Text]

Krzakowa M, Matras J
Genetic variability among beech (Fagus sylvatica L.) populations from the Sudety Mountains, in respect of peroxidase and malate dehydrogenase loci.
J Appl Genet. 2005;46(3):271-7.
Individual trees growing in five populations of European beech (Fagus sylvatica L.) in the Sudety Mountains were investigated in respect of variability of peroxidases (2 loci) and malate dehydrogenase (1 locus). Differences between populations were illustrated by a dendrogram constructed on the basis of Hedrick's (1974) genetic distances. The mean GST coefficient (=0.0333) value demonstrated the higher level of intra-population variability, as compared to the inter-population (DST = 0.0149) variability. [Abstract/Link to Full Text]

Górecka K, Krzyzanowska D, Górecki R
The influence of several factors on the efficiency of androgenesis in carrot.
J Appl Genet. 2005;46(3):265-9.
The influence of cultivar, donor plant and culture procedure on the efficiency of androgenesis was studied in carrot anther culture. Experiments were carried out on five carrot cultivars: CxC 9900 F1, Lucky B F1, HCM, Beta III and Perfekcja, which were chosen because of their high carotene contents. Two procedures of anther culture were compared: (1) incubation in darkness for two weeks, followed by exposure to continuous light and transfer onto a fresh medium of the same composition; and (2) incubation in darkness until embryos appeared, without transfer onto a fresh medium. Temperature was +27 degrees C all the time. Genotype played an important role in the process of androgenesis in carrot anther culture.The efficiency was the highest in cv. HCM - 5.6 embryos per 100 anthers. Considerable differences in the capacity for androgenesis were observed between individual donor plants. The ratio of embryos obtained per 100 anthers for cv. HCM varied from 0.0 to 48.9. The second procedure of anther culture proved to be more efficient, cheaper and less complicated. [Abstract/Link to Full Text]

Khanna R, Bansal UK, Saini RG
Genetics of durable resistance to leaf rust and stripe rust of an Indian wheat cultivar HD2009.
J Appl Genet. 2005;46(3):259-63.
The Indian bread wheat cultivar HD2009 has maintained its partial resistance to leaf rust and stripe rust in India since its release in 1976. To examine the nature, number and mode of inheritance of its genes for partial leaf rust and stripe rust resistance, this cultivar was crossed with cultivar WL711, which is susceptible to leaf rust and stripe rust. The F1, F2, F3 and F5 generations from this cross were assessed separately for adult plant disease severity under artificial epidemic of race 77-5 of leaf rust and race 46S119 of stripe rust. Segregation for rust reaction in the F2, F3 and F5 generations indicated that resistance to each of these rust diseases is based on 2 genes, each with additive effects. Although the leaf rust resistance of HD2009 is similar in expression to that conferred by the gene Lr34, but unlike the wheats carrying this gene, cultivar HD2009 did not show leaf tip necrosis, a morphological marker believed to be tightly linked to the leaf rust resistance gene Lr34. Thus, the non-hypersensitive resistance of HD2009 was ascribed to genes other than Lr34. [Abstract/Link to Full Text]

Rad?owski M
Proteolytic enzymes from generative organs of flowering plants (Angiospermae).
J Appl Genet. 2005;46(3):247-57.
Pollen proteases were discovered over 100 years ago, whereas the enzymes from female tissues have been used since the Roman era in simple biotechnological processes. In the last decade a great progress has been made in studies on plant proteases, including those from the generative organs. This paper reviews reports published in the last decade, concerning purification, properties and localization of proteases from generative parts of flowering plants against the background of the general proteolytic machinery of the plant. Special attention is paid to differences in protease structure and properties in comparison to other enzymes from the same catalytic classes. Participation of the proteases in all steps of pollen-pistil interaction as well as in pollen tube growth is discussed. Further intensive studies with use of native substrates are necessary to understand the role of proteases in pollination. [Abstract/Link to Full Text]

Podgórska B, Chec E, Ulanowska K, Wegrzyn G
Optimisation of the microbiological mutagenicity assay based on genetically modified Vibrio harveyi strains.
J Appl Genet. 2005;46(2):241-6.
Recently, we have developed a novel assay designed for detection of mutagenic pollution of the marine environment. This assay is based on the use of a series of genetically modified strains (named BB7, BB7M, BB7X and BB7XM) of a marine bacterium Vibrio harveyi. Sensitivity of the V. harveyi mutagenicity assay was found to be similar to, or even somewhat higher than, that of the commonly used Ames test. Subsequent studies indicated that this assay may be useful in assessment of mutagenic contamination of the marine environment. Nevertheless, we assumed that improvement of this assay is still possible, and thus we aimed to optimise its procedures. Here we present our research on the optimisation of the V. harveyi mutagenicity assay, which indicated that different tester strains used in this assay give the best results depending upon the experimental conditions employed. Incubation of bacteria in a buffer, rather than in a nutrient broth, containing a mutagen, increased the efficiency of the assay with BB7 and BB7M strains, but had a deleterious effect in the case of BB7X and BB7XM. The latter couple of strains revealed higher mutagenicity in the plate assay, as compared to the liquid medium assay. However, the opposite effect was observed for BB7 and BB7M. Low-dose (1 J m(-2)) UV irradiation, as well as 30 min incubation in 0.1 M CaCl2, had no significant effect on the efficiency of the assay when using BB7 and BB7M, whereas the number of mutagen-induced mutants of BB7X and BB7XM strains increased about two times under these conditions. Our previous experiments indicated that various tester strains revealed different sensitivity to particular mutagens. Thus, a series of strains should be used in the assay. Results presented in this report show that different conditions should be used for two pairs of the tester strains: BB7 and BB7M, and BB7X and BB7XM. [Abstract/Link to Full Text]

Su?ek A, Hoffman-Zacharska D, Krysa W, Szirkowiec W, Fidzia?ska E, Zaremba J
CAG repeat polymorphism in the androgen receptor (AR) gene of SBMA patients and a control group.
J Appl Genet. 2005;46(2):237-9.
Spinobulbar muscular atrophy (SBMA) is an X-linked form of motor neuron disease characterized by progressive atrophy of the muscles, dysphagia, dysarthria and mild androgen insensitivity. SBMA is caused by CAG repeat expansion in the androgen receptor gene. CAG repeat polymorphism was analysed in a Polish control group (n = 150) and patients suspected of SBMA (n = 60). Normal and abnormal ranges of CAG repeats were established in the control group and in 21 patients whose clinical diagnosis of SBMA was molecularly confirmed. The ranges are similar to those reported for other populations. [Abstract/Link to Full Text]

Gedrange T, Büttner C, Schneider M, Oppitz R, Harzer W
Myosin heavy chain protein and gene expression in the masseter muscle of adult patients with distal or mesial malocclusion.
J Appl Genet. 2005;46(2):227-36.
The aim of this study was to determine the amount of myosin heavy chain (MyHC) proteins and MyHC mRNA in muscles of patients with different positions of the mandible. Ten adult patients for orthognathic surgery were divided into two groups: distal and mesial malocclusion. The mRNA expression of two MyHC isoforms of the anterior and posterior part of the right and left side of the human masseter muscle was analysed with a competitive RT-PCR assay. An exogenous template that includes oligonucleotide sequences specific for sarcomeric MyHC isoforms (1 and 2x) was constructed and utilized as competitor. Different isoforms of the MyHC protein were identified by Western blot analysis. In the total mRNA pool of the masseter muscle, the MyHC 1 mRNA level was 25.5 +/- 7.6% and the MyHC 2x mRNA was 2.5 +/- 1.2%. The anterior part of the masseter muscle from patients with distal occlusion contained more type 1 and 2x MyHC mRNA, as compared to patients with mesial occlusion (P < 0.05). No difference in the protein distribution was observed. The differences in mRNA expression may be caused by the enforced stress of the masticatory muscle in distal occlusion because of the disadvantageous pivot. [Abstract/Link to Full Text]

Cao W, Hunter R, Strnatka D, McQueen CA, Erickson RP
DNA constructs designed to produce short hairpin, interfering RNAs in transgenic mice sometimes show early lethality and an interferon response.
J Appl Genet. 2005;46(2):217-25.
Arylamine N-acetyltransferase (NAT) genes were targeted for inhibition using short hairpin RNA (shRNA) using two different RNA polymerase III promoters. Constructs were developed for NAT1 and NAT2, the endogenous mouse genes, and for human NAT1. There were fetal and neonatal deaths with these constructs, perhaps due in part to an interferon response as reflected in increases in oligoadenylate synthetase I mRNA levels. Seven out of 8 founders with the U6 promoter generated offspring but only 2 gave positive offspring. Out of 15 founders for H1 promoted constructs, only 4 had positive offspring. When transgenic lines were successfully established, the expression of the targeted genes was variable between animals and was not generally inhibitory. [Abstract/Link to Full Text]

Recent Articles in Genetics and Molecular Research

Okamoto HT, Soares CM, Pereira M
Comparative analyses of the structure of the 1,3-beta-glucan synthase gene in Paracoccidioides brasiliensis isolates.
Genet Mol Res. 2006;5(2):407-18.
The evolutionary origin and significance of spliceosomal introns have been the subject of many investigations. Two theories, "introns-early" theory and "introns-late" theory, have been proposed to explain the evolution of introns in eukaryotic genes. Intron position is generally conserved in paralogue and orthologue genes. Some introns occur at similar but not necessarily identical positions in homologous genes, which were separated by great evolutionary distances. This event can be explained by insertion, loss or movement of the intron over short distances. Intron loss and gain events are unique in evolution and can be useful as markers for phylogenetic analyses. The insertion of introns at an identical position suggests a common ancestor gene. Here we analyzed, using PCR and RT-PCR, the structure of the 1,3-beta-glucan synthase gene (FKS) in several clinical isolates of Paracoccidioides brasiliensis (Pb): isolates Pb 01, Pb 4940, Pb 8515, Pb 8311, Pb 8334, Pb 4268, Pb 1668, and Pb E. Our results showed that seven of the isolates examined showed identical structures concerning the position of introns in PbFKS1. PbFKS4940 showed the intron described at the 3' end and had lost that one at the 5' end. The presence of the PbFKS4940 transcript suggests that it could be a functional gene. These data suggest a divergent evolution for introns with regard to the 1,3-beta-glucan synthase gene in P. brasiliensis isolates. [Abstract/Link to Full Text]

Fernandez R, Pasaro E
Molecular analysis of an idic(Y)(qter -->p11.32::p11.32-->qter) chromosome from a female patient with a complex karyotype.
Genet Mol Res. 2006;5(2):399-406.
A female patient with a structurally abnormal idic(Y) (p11.32) chromosome was studied using fluorescence in situ hybridization and PCR to define the precise position of the breakpoint. The patient had a complex mosaic karyotype with eight cell lines and at least two morphologically distinct derivatives from the Y chromosome. The rearrangement was a result of a meiosis I exchange between sister chromatids at the pseudoautosomal region, followed by centromere misdivision at meiosis II. Due to instability of the dicentric Y chromosome, new cell lines later arose because of mitotic errors occurring during embryonic development. Physical examination revealed a normal female phenotype without genital ambiguity, a normal uterus and rudimentary gonads which were surgically removed. [Abstract/Link to Full Text]

Aráoz HV, Torrado M, Barreiro C, Chertkoff L
A combination of five short tandem repeats of chromosome 15 significantly improves the identification of Prader-Willi syndrome etiology in the Argentinean population.
Genet Mol Res. 2006;5(2):390-8.
Prader-Willi syndrome (PWS) is a multisystemic disorder caused by the loss of expression of paternally transcribed genes in the PWS critical region of chromosome 15. Various molecular mechanisms are known to lead to PWS: deletion 15q11-q13 (75% of cases), maternal uniparental disomy (matUPD15) (23%) and imprinting defects (2%). FISH and microsatellite analysis are required to establish the molecular etiology, which is essential for appropriate genetic counseling and care management. We characterized an Argentinean population, using five microsatellite markers (D15S1035, D15S11, D15S113, GABRB3, D15S211) chosen to develop an appropriate cost-effective method to establish the parental origin of chromosome 15 in nondeleted PWS patients. The range of heterozygosity for these five microsatellites was 0.59 to 0.94. The average heterozygosity obtained for joint loci was 0.81. The parental origin of chromosome 15 was established by microsatellite analysis in 19 of 21 non-deleted PWS children. We also examined the origin of the matUPD15; as expected, most of disomies were due to a maternal meiosis I error. The molecular characterization of this set of five microsatellites with high heterozygosity and polymorphism information content improves the diagnostic algorithm of Argentinean PWS children, contributing significantly to adequate genetic counseling of such families. [Abstract/Link to Full Text]

Guo X, Xu G, Zhang Y, Wen X, Hu W, Fan L
Incongruent evolution of chromosomal size in rice.
Genet Mol Res. 2006;5(2):373-89.
To investigate genome size evolution, it is usually informative to compare closely related species that vary dramatically in genome size. A whole genome duplication (polyploidy) that occurred in rice (Oryza sativa) about 70 million years ago has been well documented based on current genome sequencing. The presence of three distinct duplicate blocks from the polyploidy, of which one duplicated segment in a block is intact (no sequencing gap) and less than half the length of its syntenic duplicate segment, provided an excellent opportunity for elucidating the causes of their size variation during the post-polyploid time. The results indicated that incongruent patterns (shrunken, balanced and inflated) of chromosomal size evolution occurred in the three duplicate blocks, spanning over 30 Mb among chromosomes 2, 3, 6, 7, and 10, with an average of 20.3% for each. DNA sequences of chromosomes 2 and 3 appeared to had become as short as about half of their initial sequence lengths, chromosomes 6 and 7 had remained basically balanced, and chromosome 10 had become dramatically enlarged (approximately 70%). The size difference between duplicate segments of rice was mainly caused by variations in non-repetitive DNA loss. Amplification of long terminal repeat retrotransposons also played an important role. Moreover, a relationship seems to exist between the chromosomal size differences and the nonhomologous combination in corresponding regions in the rice genome. These findings help shed light on the evolutionary mechanism of genomic sequence variation after polyploidy and genome size evolution. [Abstract/Link to Full Text]

Brancaleoni GH, Lourenzoni MR, Degrčve L
Study of the influence of ethanol on basic fibroblast growth factor structure.
Genet Mol Res. 2006;5(2):350-72.
The growth of cells is controlled by stimulatory or inhibitory factors. More than twenty different families of polypeptide growth factors have been structurally and functionally characterized. Basic fibroblast growth factor (bFGF) of the fibroblast growth factor family was characterized in 1974 as having proliferative activity for fibroblastic cells. The inhibitory effects of ethanol on cell proliferation result from interference with mitogenic growth factors (e.g., bFGF, EGF and PDGF). In order to better understand the mode of action of bFGF, particularly regarding the influence of ethanol on the biological activity of bFGF, three recombinant bFGF mutants were produced (M6B-bFGF, M1-bFGF and M1Q-bFGF). In the present study, wild bFGF and these mutants were examined by molecular dynamics simulations in systems consisting of a solute molecule in ethanol solution at 298 K and physiological pH over 4.0 ns. The hydrogen bonds, the root mean square deviations and specific radial distribution functions were employed to identify changes in the hydrogen bond structures, in the stability and in the approximation of groups in the different peptides to get some insight into the biological role of specific bFGF regions. The detailed description of the intramolecular hydrogen bonds, hydration, and intermolecular hydrogen bonds taking place in bFGF and its mutants in the presence of ethanol established that the residues belonging to the beta5 and beta9 strands, especially SER-73(beta5), TYR-112(beta9), THR-114(beta9), TYR-115(beta9), and SER-117(beta9), are the regions most affected by the presence of ethanol molecules in solution. [Abstract/Link to Full Text]

Onrat ST, A?çi F, Ozkan M
A cytogenetics study of Hydrodroma despiciens (Müller, 1776) (Acari: Hydrachnellae: Hydrodromidae).
Genet Mol Res. 2006;5(2):342-9.
The karyotypes of water mites (Acari: Hydrachnellae: Hydrodromidae) are largely unknown. The present investigation is the first report of a study designed to characterize the chromosomes of water mites. The study was carried out with specimens of Hydrodroma despiciens collected from Eber Lake in Afyon, Turkey. Several different methods were tried to obtain chromosomes of this species. However, somatic cell culture proved to be the most effective for the preparation of chromosomes. In the present study, we determined the diploid chromosome number of Hydrodroma despiciens to be 2n = 16. However, a large metacentric chromosome was found in each metaphase, which we believed to be the X chromosome. We could not determine the sex chromosomes of this species. This study is the first approach to the cytogenetic characterization of this water mite group. Furthermore, these cytogenetic data will contribute to the understanding of the phylogenetic relationship among water mites. To our knowledge, this is the first report on the cytogenetics of water mites. [Abstract/Link to Full Text]

Fileto R, Kuser PR, Yamagishi ME, Ribeiro AA, Quinalia TG, Franco EH, Mancini AL, Higa RH, Oliveira SR, Santos EH, Vieira FD, Mazoni I, Cruz SA, Neshich G
PDB-Metrics: a web tool for exploring the PDB contents.
Genet Mol Res. 2006;5(2):333-41.
PDB-Metrics ( is a component of the Diamond STING suite of programs for the analysis of protein sequence, structure and function. It summarizes the characteristics of the collection of protein structure descriptions deposited in the Protein Data Bank (PDB) and provides a Web interface to search and browse the PDB, using a variety of alternative criteria. PDB-Metrics is a powerful tool for bioinformaticians to examine the data span in the PDB from several perspectives. Although other Web sites offer some similar resources to explore the PDB contents, PDB-Metrics is among those with the most complete set of such facilities, integrated into a single Web site. This program has been developed using SQLite, a C library that provides all the query facilities of a database management system. [Abstract/Link to Full Text]

Mukhopadhyaya PN, Jha M, Muraleedharan P, Gupta RR, Rathod RN, Mehta HH, Khoda VK
Simulation of normal, carrier and affected controls for large-scale genotyping of cattle for factor XI deficiency.
Genet Mol Res. 2006;5(2):323-32.
An insertion mutation within exon 12 of the factor XI gene has been described in Holstein cattle. This has opened the prospect for large-scale screening of cattle using the polymerase chain reaction (PCR) technique for the rapid identification of heterozygous animals. To facilitate such a screening process, the mutant and normal alleles of factor XI gene, represented by 244- and 320-bp PCR amplified fragments, were individually cloned in Escherichia coli using a multicopy plasmid cloning vehicle to generate pFXI-N and pFXI-M, respectively. The authenticity of the inserts was confirmed by nucleotide sequencing. A nested PCR method was developed, by which PCR amplicons generated from primers with annealing sites on the recombinant plasmids and by flanking the insert were used as templates for amplification of the diagnostic products using factor XI gene-specific primers. An equimolar mixture of both PCR amplicons, originating from pFXI-N and pFXI-M, constituted the carrier control while the individual amplicons were the affected and normal controls. The controls were used as references for in-gel comparison to screen a population of 307 cattle and 259 water buffaloes; the frequency of the mutant allele was found to be 0. No DNA size standards were required in this study. The simulated control DNA samples representing normal, carrier and affected cattle have the potential to help in large-scale screening of a cattle population for individuals that are carriers or affected by factor XI deficiency. [Abstract/Link to Full Text]

Clarizia AD, Bastos-Rodrigues L, Pena HB, Anacleto C, Rossi B, Soares FA, Lopes A, Rocha JC, Caballero O, Camargo A, Simpson AJ, Pena SD
Relationship of the methylenetetrahydrofolate reductase C677T polymorphism with microsatellite instability and promoter hypermethylation in sporadic colorectal cancer.
Genet Mol Res. 2006;5(2):315-22.
The methylenetetrahydrofolate reductase (MTHFR) C677T polymorphism is associated with the expression of a thermolabile enzyme with decreased activity that influences the pool of methyl-donor molecules. Several studies have reported an association between C677T polymorphism and susceptibility to colorectal cancer (CRC). Considering that methylation abnormalities appear to be important for the pathogenesis of CRC, we examined the correlation between the genotype of the MTHFR C677T polymorphism, hypermethylation of the promoter region of five relevant genes (DAPK, MGMT, hMLH1, p16(INK4a), and p14(ARF)), and microsatellite instability, in 106 patients with primary CRCs in Brazil. We did not find significant differences in the genotypic frequencies of the MTHFR C677T polymorphism when one or more loci were hypermethylated. However, we did find a significant excess of 677TT individuals among patients with CRC who had microsatellite instability. This strong association was independent of the methylation status of hMLH1 and of the biogeographical genomic ancestry of the patients. Although the mechanism responsible for the link between the C677T polymorphism and microsatellite instability was not apparent, this finding may provide a clue towards a better understanding of the pathogenesis of microsatellite instability in human colorectal cancer. [Abstract/Link to Full Text]

Dario C, Carnicella D, Dario M, Bufano G
Morphological evolution and heritability estimates for some biometric traits in the Murgese horse breed.
Genet Mol Res. 2006;5(2):309-14.
A data set concerning 1,816 subjects entered in the Italian Horse Registry from 1925 to 2002 was analyzed to investigate the morphological evolution of the Murgese horse and to obtain useful elements to enhance breeding practices. Three basic body measurements (height at withers, chest girth, and cannon bone circumference) were considered for each subject. Heritabilities were calculated for each parameter to infer the growth and development traits of this breed. Over the past 20 years the Murgese horse has undergone considerable changes, passing from a typical mesomorphic structure (height at withers: 156.30 and 151.04 cm; chest girth: 185.80 and 176.11 cm; cannon bone: 21.10 and 19.82 cm for males and females, respectively) to a mesodolichomorphic structure (height at withers: 160.31 and 156.44 cm; chest girth: 187.89 and 182.48 cm; cannon bone: 21.07 and 20.37 cm, for males and females, respectively). Due to these changes and to its characteristic strength and power, the Murgese, which was once used in agriculture and for meat production (at the end of its life), is now involved in sports, mainly in trekking and equestrian tourism. The heritability estimates for the three body measurements were found to be 0.24, 0.39 and 0.44. [Abstract/Link to Full Text]

de Melo RC, Lopes CE, Fernandes FA, da Silveira CH, Santoro MM, Carceroni RL, Meira W, Araújo Ade A
A contact map matching approach to protein structure similarity analysis.
Genet Mol Res. 2006;5(2):284-308.
We modeled the problem of identifying how close two proteins are structurally by measuring the dissimilarity of their contact maps. These contact maps are colored images, in which the chromatic information encodes the chemical nature of the contacts. We studied two conceptually distinct image-processing algorithms to measure the dissimilarity between these contact maps; one was a content-based image retrieval method, and the other was based on image registration. In experiments with contact maps constructed from the protein data bank, our approach was able to identify, with greater than 80% precision, instances of monomers of apolipoproteins, globins, plastocyanins, retinol binding proteins and thioredoxins, among the monomers of Protein Data Bank Select. The image registration approach was only slightly more accurate than the content-based image retrieval approach. [Abstract/Link to Full Text]

Nassar NM
Are genetically modified crops compatible with sustainable agriculture?
Genet Mol Res. 2006;5(1):91-2. [Abstract/Link to Full Text]

Quitzau JA, Meidanis J
A fully resolved consensus between fully resolved phylogenetic trees.
Genet Mol Res. 2006;5(1):269-83.
Nowadays, there are many phylogeny reconstruction methods, each with advantages and disadvantages. We explored the advantages of each method, putting together the common parts of trees constructed by several methods, by means of a consensus computation. A number of phylogenetic consensus methods are already known. Unfortunately, there is also a taboo concerning consensus methods, because most biologists see them mainly as comparators and not as phylogenetic tree constructors. We challenged this taboo by defining a consensus method that builds a fully resolved phylogenetic tree based on the most common parts of fully resolved trees in a given collection. We also generated results showing that this consensus is in a way a kind of "median" of the input trees; as such it can be closer to the correct tree in many situations. [Abstract/Link to Full Text]

Veiga DF, Vicente FF, Bastos G
Gene networks as a tool to understand transcriptional regulation.
Genet Mol Res. 2006;5(1):254-68.
Gene regulatory networks, or simply gene networks (GNs), have shown to be a promising approach that the bioinformatics community has been developing for studying regulatory mechanisms in biological systems. GNs are built from the genome-wide high-throughput gene expression data that are often available from DNA microarray experiments. Conceptually, GNs are (un)directed graphs, where the nodes correspond to the genes and a link between a pair of genes denotes a regulatory interaction that occurs at transcriptional level. In the present study, we had two objectives: 1) to develop a framework for GN reconstruction based on a Bayesian network model that captures direct interactions between genes through nonparametric regression with B-splines, and 2) to demonstrate the potential of GNs in the analysis of expression data of a real biological system, the yeast pheromone response pathway. Our framework also included a number of search schemes to learn the network. We present an intuitive notion of GN theory as well as the detailed mathematical foundations of the model. A comprehensive analysis of the consistency of the model when tested with biological data was done through the analysis of the GNs inferred for the yeast pheromone pathway. Our results agree fairly well with what was expected based on the literature, and we developed some hypotheses about this system. Using this analysis, we intended to provide a guide on how GNs can be effectively used to study transcriptional regulation. We also discussed the limitations of GNs and the future direction of network analysis for genomic data. The software is available upon request. [Abstract/Link to Full Text]

Mudado Mde A, Ortega JM
A picture of gene sampling/expression in model organisms using ESTs and KOG proteins.
Genet Mol Res. 2006;5(1):242-53.
The expressed sequence tag (EST) is an instrument of gene discovery. When available in large numbers, ESTs may be used to estimate gene expression. We analyzed gene expression by EST sampling, using the KOG database, which includes 24,154 proteins from Arabidopsis thaliana (Ath), 17,101 from Caenorhabditis elegans (Cel), 10,517 from Drosophila melanogaster (Dme), and 26,324 from Homo sapiens (Hsa), and 178,538 ESTs for Ath, 215,200 for Cel, 261,404 for Dme, and 1,941,556 for Hsa. BLAST similarity searches were performed to assign KOG annotation to all ESTs. We determined the amount of gene sampling or expression dedicated to each KOG functional category by each model organism. We found that the 25% most-expressed genes are frequently shared among these organisms. The KOG protein classification allowed the EST sampling calculation throughout the glycolysis pathway. We calculated the KOG cluster coverage and inferred that 50 to 80 K ESTs would efficiently cover 80-85% of the KOG database clusters in a transcriptome project. Since KOG is a database biased towards housekeeping genes, this is probably the number of ESTs needed to include the more commonly expressed genes in these organisms. We also examined a still unaddressed question: what is the minimum number of ESTs that should be produced in a transcriptome project? [Abstract/Link to Full Text]

Schrago CG
An empirical examination of the standard errors of maximum likelihood phylogenetic parameters under the molecular clock via bootstrapping.
Genet Mol Res. 2006;5(1):233-41.
The molecular clock theory has greatly enlightened our understanding of macroevolutionary events. Maximum likelihood (ML) estimation of divergence times involves the adoption of fixed calibration points, and the confidence intervals associated with the estimates are generally very narrow. The credibility intervals are inferred assuming that the estimates are normally distributed, which may not be the case. Moreover, calculation of standard errors is usually carried out by the curvature method and is complicated by the difficulty in approximating second derivatives of the likelihood function. In this study, a standard primate phylogeny was used to examine the standard errors of ML estimates via the bootstrap method. Confidence intervals were also assessed from the posterior distribution of divergence times inferred via Bayesian Markov Chain Monte Carlo. For the primate topology under evaluation, no significant differences were found between the bootstrap and the curvature methods. Also, Bayesian confidence intervals were always wider than those obtained by ML. [Abstract/Link to Full Text]

Saha S, Heber S
In silico prediction of yeast deletion phenotypes.
Genet Mol Res. 2006;5(1):224-32.
Analysis of gene deletions is a fundamental approach for investigating gene function. We evaluated an algorithm that uses classification techniques to predict the phenotypic effects of gene deletions in yeast. We used a modified simulated annealing algorithm for feature selection and weighting. The selected features with high weights were phylogenetic conservation scores for bacteria, fungi (excluding Ascomycota), Ascomycota (excluding Saccharomyces cerevisiae), plants, and mammals, degree of paralogy, and number of protein-protein interactions. Classification was performed by weighted k-nearest neighbor and with support vector machine algorithms. To demonstrate how this approach might complement existing experimental procedures, we applied our algorithm to predict essential genes and genes causing morphological alterations in yeast. [Abstract/Link to Full Text]

Bezerra WM, Carvalho CP, Moreira Rde A, Grangeiro TB
Establishment of a heterologous system for the expression of Canavalia brasiliensis lectin: a model for the study of protein splicing.
Genet Mol Res. 2006;5(1):216-23.
During its biosynthesis in developing Canavalia brasiliensis seeds, the lectin ConBr undergoes a form of protein splicing in which the order of the N- and C-domains of the protein is reversed. To investigate whether these events can occur in other eukaryotic organisms, an expression system based on Pichia pastoris cells was established. A DNA fragment encoding prepro-ConBr was cloned into the vector pPICZB, and the recombinant plasmid was transformed in P. pastoris strain GS115. Ten clones were screened for effective recombinant protein production. Based on Western blot analysis of the two clones with the highest level of protein expression: 1) diffuse high-molecular mass immunoreactive bands were produced as early as 24 h after induction; 2) a single-, high-molecular mass protein was secreted into the medium, and 3) a significant fraction of the recombinant polypeptides that cross-reacted with anti-ConBr antibodies comprised a band of approximately 34.5 kDa. Diffuse protein bands with high molecular masses are attributed to hyperglycosylation at the single potential N-glycosylation site located in the linker peptide of prepro-ConBr. In contrast, native ConBr is made up of three polypeptides, the intact alpha chain (aa 1-237) and the fragments beta (aa 1-118) and gamma (aa 119-237), which have apparent molecular masses of 30, 16 and 12 kDa, respectively. Apparently, the yeast P. pastoris is not able to carry out all the complex post-translational proteolytic processing necessary for the biosynthesis of ConBr. [Abstract/Link to Full Text]

Araújo LV, Soares MA, Oliveira SM, Chequer P, Tanuri A, Sabino EC, Ferreira JE
DBCollHIV: a database system for collaborative HIV analysis in Brazil.
Genet Mol Res. 2006;5(1):203-15.
We developed a database system for collaborative HIV analysis (DBCollHIV) in Brazil. The main purpose of our DBCollHIV project was to develop an HIV-integrated database system with analytical bioinformatics tools that would support the needs of Brazilian research groups for data storage and sequence analysis. Whenever authorized by the principal investigator, this system also allows the integration of data from different studies and/or the release of the data to the general public. The development of a database that combines sequences associated with clinical/epidemiological data is difficult without the active support of interdisciplinary investigators. A functional database that securely stores data and helps the investigator to manipulate their sequences before publication would be an attractive tool for investigators depositing their data and collaborating with other groups. DBCollHIV allows investigators to manipulate their own datasets, as well as integrating molecular and clinical HIV data, in an innovative fashion. [Abstract/Link to Full Text]

Borro LC, Oliveira SR, Yamagishi ME, Mancini AL, Jardine JG, Mazoni I, Santos EH, Higa RH, Kuser PR, Neshich G
Predicting enzyme class from protein structure using Bayesian classification.
Genet Mol Res. 2006;5(1):193-202.
Predicting enzyme class from protein structure parameters is a challenging problem in protein analysis. We developed a method to predict enzyme class that combines the strengths of statistical and data-mining methods. This method has a strong mathematical foundation and is simple to implement, achieving an accuracy of 45%. A comparison with the methods found in the literature designed to predict enzyme class showed that our method outperforms the existing methods. [Abstract/Link to Full Text]

Silva JP, Lemke N, Mombach JC, Souza JG, Sinigaglia M, Vieira R
Exploring molecular networks using MONET ontology.
Genet Mol Res. 2006;5(1):182-92.
The description of the complex molecular network responsible for cell behavior requires new tools to integrate large quantities of experimental data in the design of biological information systems. These tools could be used in the characterization of these networks and in the formulation of relevant biological hypotheses. The building of an ontology is a crucial step because it integrates in a coherent framework the concepts necessary to accomplish such a task. We present MONET (molecular network), an extensible ontology and an architecture designed to facilitate the integration of data originating from different public databases in a single- and well-documented relational database, that is compatible with MONET formal definition. We also present an example of an application that can easily be implemented using these tools. [Abstract/Link to Full Text]

Baudet C, Dias Z
Analysis of slipped sequences in EST projects.
Genet Mol Res. 2006;5(1):169-81.
Slippage is an important sequencing problem that can occur in EST projects. However, very few studies have addressed this. We propose three new methods to detect slippage artifacts: arithmetic mean method, geometric mean method, and echo coverage method. Each method is simple and has two different strategies for processing sequences: suffix and subsequence. Using the 291,689 EST sequences produced in the SUCEST project, we performed comparative tests between our proposed methods and the SUCEST method. The subsequence strategy is better than the suffix strategy, because it is not anchored at the end of the sequence, so it is more flexible to find slippage at the beginning of the EST. In a comparison with the SUCEST method, the advantage of our methods is that they do not discard the majority of the sequences marked as slippage, but instead only remove the slipped artifact from the sequence. Based on our tests the echo coverage method with subsequence strategy shows the best compromise between slippage detection and ease of calibration. [Abstract/Link to Full Text]

Cristino AS, Nascimento AM, Costa Lda F, Simőes ZL
A comparative analysis of highly conserved sex-determining genes between Apis mellifera and Drosophila melanogaster.
Genet Mol Res. 2006;5(1):154-68.
A comparison of the most conserved sex-determining genes between the fruit fly, Drosophila melanogaster, and the honey bee, Apis mellifera, was performed with bioinformatics tools developed for computational molecular biology. An initial set of protein sequences already described in the fruit fly as participants of the sex-determining cascade was retrieved from the Gene Ontology database ( and aligned against a database of protein sequences predicted from the honey bee genome. The doublesex (dsx) gene is considered one of the most conserved sex-determining genes among metazoans, and a male-specific partial cDNA of putative A. mellifera dsx gene (Amdsx) was identified experimentally. The theoretical predictions were developed in the context of sequence similarity. Experimental evidence indicates that dsx is present in embryos and larvae, and that it encodes a transcription factor widely conserved in metazoans, containing a DM DNA-binding domain implicated in the regulation of the expression of genes involved in sexual phenotype formation. [Abstract/Link to Full Text]

Galves M, Quitzau JA, Dias Z
New strategy to detect single nucleotide polymorphisms.
Genet Mol Res. 2006;5(1):143-53.
A great effort has been made to identify and map a large set of single nucleotide polymorphisms. The goal is to determine human DNA variants that contribute most significantly to population variation in each trait. Different algorithms and software packages, such as PolyBayes and PolyPhred, have been developed to address this problem. We present strategies to detect single nucleotide polymorphisms, using chromatogram analysis and consensi of multiple aligned sequences. The algorithms were tested using HIV datasets, and the results were compared with those produced by PolyBayes and PolyPhred using the same dataset. Our algorithms produced significantly better results than these two software packages. [Abstract/Link to Full Text]

Vęncio RZ, Patrăo DF, Baptista CS, Pereira CA, Zingales B
BayBoots: a model-free Bayesian tool to identify class markers from gene expression data.
Genet Mol Res. 2006;5(1):138-42.
One of the goals of gene expression experiments is the identification of differentially expressed genes among populations that could be used as markers. For this purpose, we implemented a model-free Bayesian approach in a user-friendly and freely available web-based tool called BayBoots. In spite of a common misunderstanding that Bayesian and model-free approaches are incompatible, we merged them in the BayBoots implementation using the Kernel density estimator and Rubin 's Bayesian Bootstrap. We used the Bayes error rate (BER) instead of the usual P values as an alternative statistical index to rank a class marker's discriminative potential, since it can be visualized by a simple graphical representation and has an intuitive interpretation. Subsequently, Bayesian Bootstrap was used to assess BER 's credibility. We tested BayBoots on microarray data to look for markers for Trypanosoma cruzi strains isolated from cardiac and asymptomatic patients. We found that the three most frequently used methods in microarray analysis: t-test, non-parametric Wilcoxon test and correlation methods, yielded several markers that were discarded by a time-consuming visual check. On the other hand, the BayBoots graphical output and ranking was able to automatically identify markers for which classification performance was consistent. BayBoots is available at: [Abstract/Link to Full Text]

Higa RH, Cruz SA, Kuser PR, Yamagishi ME, Fileto R, Oliveira SR, Mazoni I, Santos EH, Mancini AL, Neshich G
Building multiple sequence alignments with a flavor of HSSP alignments.
Genet Mol Res. 2006;5(1):127-37.
Homology-derived secondary structure of proteins (HSSP) is a well-known database of multiple sequence alignments (MSAs) which merges information of protein sequences and their three-dimensional structures. It is available for all proteins whose structure is deposited in the PDB. It is also used by STING and (Java)Protein Dossier to calculate and present relative entropy as a measure of the degree of conservation for each residue of proteins whose structure has been solved and deposited in the PDB. However, if the STING and (Java)Protein Dossier are to provide support for analysis of protein structures modeled in computers or being experimentally solved but not yet deposited in the PDB, then we need a new method for building alignments having a flavor of HSSP alignments (myMSAr). The present study describes a new method and its corresponding databank (SH2QS--database of sequences homologue to the query [structure-having] sequence). Our main interest in making myMSAr was to measure the degree of residue conservation for a given query sequence, regardless of whether it has a corresponding structure deposited in the PDB. In this study, we compare the measurement of residue conservation provided by corresponding alignments produced by HSSP and SH2QS. As a case study, we also present two biologically relevant examples, the first one highlighting the equivalence of analysis of the degree of residue conservation by using HSSP or SH2QS alignments, and the second one presenting the degree of residue conservation for a structure modeled in a computer, which , as a consequence, does not have an alignment reported by HSSP. [Abstract/Link to Full Text]

Catanho M, Mascarenhas D, Degrave W, Miranda AB
GenoMycDB: a database for comparative analysis of mycobacterial genes and genomes.
Genet Mol Res. 2006;5(1):115-26.
Several databases and computational tools have been created with the aim of organizing, integrating and analyzing the wealth of information generated by large-scale sequencing projects of mycobacterial genomes and those of other organisms. However, with very few exceptions, these databases and tools do not allow for massive and/or dynamic comparison of these data. GenoMycDB ( is a relational database built for large-scale comparative analyses of completely sequenced mycobacterial genomes, based on their predicted protein content. Its central structure is composed of the results obtained after pair-wise sequence alignments among all the predicted proteins coded by the genomes of six mycobacteria: Mycobacterium tuberculosis (strains H37Rv and CDC1551), M. bovis AF2122/97, M. avium subsp. paratuberculosis K10, M. leprae TN, and M. smegmatis MC2 155. The database stores the computed similarity parameters of every aligned pair, providing for each protein sequence the predicted subcellular localization, the assigned cluster of orthologous groups, the features of the corresponding gene, and links to several important databases. Tables containing pairs or groups of potential homologs between selected species/strains can be produced dynamically by user-defined criteria, based on one or multiple sequence similarity parameters. In addition, searches can be restricted according to the predicted subcellular localization of the protein, the DNA strand of the corresponding gene and/or the description of the protein. Massive data search and/or retrieval are available, and different ways of exporting the result are offered. GenoMycDB provides an on-line resource for the functional classification of mycobacterial proteins as well as for the analysis of genome structure, organization, and evolution. [Abstract/Link to Full Text]

Pereira GS, Brandăo RM, Giuliatti S, Zago MA, Silva WA
Gene Class expression: analysis tool of Gene Ontology terms with gene expression data.
Genet Mol Res. 2006;5(1):108-14.
Serial analysis of gene expression (SAGE) technology produces large sets of interesting genes that are difficult to analyze directly. Bioinformatics tools are needed to interpret the functional information in these gene sets. We present an interactive web-based tool, called Gene Class, which allows functional annotation of SAGE data using the Gene Ontology (GO) database. This tool performs searches in the GO database for each SAGE tag, making associations in the selected GO category for a level selected in the hierarchy. This system provides user-friendly data navigation and visualization for mapping SAGE data onto the gene ontology structure. This tool also provides graphical visualization of the percentage of SAGE tags in each GO category, along with confidence intervals and hypothesis testing. [Abstract/Link to Full Text]

Koide T, Salem-Izacc SM, Gomes SL, Vęncio RZ
SpotWhatR: a user-friendly microarray data analysis system.
Genet Mol Res. 2006;5(1):93-107.
SpotWhatR is a user-friendly microarray data analysis tool that runs under a widely and freely available R statistical language ( for Windows and Linux operational systems. The aim of SpotWhatR is to help the researcher to analyze microarray data by providing basic tools for data visualization, normalization, determination of differentially expressed genes, summarization by Gene Ontology terms, and clustering analysis. SpotWhatR allows researchers who are not familiar with computational programming to choose the most suitable analysis for their microarray dataset. Along with well-known procedures used in microarray data analysis, we have introduced a stand-alone implementation of the HTself method, especially designed to find differentially expressed genes in low-replication contexts. This approach is more compatible with our local reality than the usual statistical methods. We provide several examples derived from the Blastocladiella emersonii and Xylella fastidiosa Microarray Projects. SpotWhatR is freely available at, in English and Portuguese versions. In addition, the user can choose between "single experiment" and "batch processing" versions. [Abstract/Link to Full Text]

Teixeira DI, Melo LM, Gadelha CA, Cunha RM, Bloch C, Rádis-Baptista G, Cavada BS, Freitas VJ
Ion-exchange chromatography used to isolate a spermadhesin-related protein from domestic goat (Capra hircus) seminal plasma.
Genet Mol Res. 2006;5(1):79-87.
Mammalian seminal plasma contains among others, proteins called spermadhesins, which are the major proteins of boar and stallion seminal plasma. These proteins appear to be involved in capacitation and sperm-egg interaction. Previously, we reported the presence of a protein related to spermadhesins in goat seminal plasma. In the present study, we have further characterized this protein, and we propose ion-exchange chromatography to isolate this seminal protein. Semen was obtained from four adult Saanen bucks. Seminal plasma was pooled, dialyzed against distilled water and freeze-dried. Lyophilized proteins were loaded onto an ion-exchange chromatography column. Dialyzed-lyophilized proteins from the main peak of DEAE-Sephacel were applied to a C2/C18 column coupled to an RP-HPLC system, and the eluted proteins were lyophilized for electrophoresis. The N-terminal was sequenced and amino acid sequence similarity was determined using CLUSTAL W. Additionally, proteins from DEAE-Sephacel chromatography step were dialyzed and submitted to a heparin-Sepharose high-performance liquid chromatography. Goat seminal plasma after ion-exchange chromatography yielded 6.47 +/- 0.63 mg (mean +/- SEM) of the major retained fraction. The protein was designated BSFP (buck seminal fluid protein). BSFP exhibited N-terminal sequence homology to boar, stallion and bull spermadhesins. BSFP showed no heparin-binding capabilities. These results together with our previous data indicate that goat seminal plasma contains a protein that is structurally related to proteins of the spermadhesin family. Finally, this protein can be efficiently isolated by ion-exchange and reverse-phase chromatography. [Abstract/Link to Full Text]

Mashimo T, Voigt B, Tsurumi T, Naoi K, Nakanishi S, Yamasaki K, Kuramoto T, Serikawa T
A set of highly informative rat simple sequence length polymorphism (SSLP) markers and genetically defined rat strains.
BMC Genet. 2006;719.
BACKGROUND: The National Bio Resource Project for the Rat in Japan (NBRP-Rat) is focusing on collecting, preserving and distributing various rat strains, including spontaneous mutant, transgenic, congenic, and recombinant inbred (RI) strains. To evaluate their value as models of human diseases, we are characterizing them using 109 phenotypic parameters, such as clinical measurements, internal anatomy, metabolic parameters, and behavioral tests, as part of the Rat Phenome Project. Here, we report on a set of 357 simple sequence length polymorphism (SSLP) markers and 122 rat strains, which were genotyped by the marker set. RESULTS: The SSLP markers were selected according to their distribution patterns throughout the whole rat genome with an average spacing of 7.59 Mb. The average number of informative markers between all possible pairs of strains was 259 (72.5% of 357 markers), showing their high degree of polymorphism. From the genetic profile of these rat inbred strains, we constructed a rat family tree to clarify their genetic background. CONCLUSION: These highly informative SSLP markers as well as genetically and phenotypically defined rat strains are useful for designing experiments for quantitative trait loci (QTL) analysis and to choose strategies for developing new genetic resources. The data and resources are freely available at the NBRP-Rat web site 1. [Abstract/Link to Full Text]

Charizopoulou N, Wilke M, Dorsch M, Bot A, Jorna H, Jansen S, Stanke F, Hedrich HJ, de Jonge HR, Tümmler B
Spontaneous rescue from cystic fibrosis in a mouse model.
BMC Genet. 2006;718.
BACKGROUND: From the original CftrTgH(neoim)Hgu mutant mouse model with a divergent genetic background (129P2, C57BL/6, MF1) we have generated two inbred CftrTgH(neoim)Hgu mutant strains named CF/1-CftrTgH(neoim)Hgu and CF/3-CftrTgH(neoim)Hgu, which are fertile and show normal growth and lifespan. Initial genome wide scan analysis with microsatellite markers indicated that the two inbred strains differed on the genetic level. In order to further investigate whether these genetic differences have an impact on the disease phenotype of cystic fibrosis we characterised the phenotype of the two inbred strains. RESULTS: Reduced amounts, compared to wild type control animals, of correctly spliced Cftr mRNA were detected in the nasal epithelia, lungs and the intestine of both inbred CftrTgH(neoim)Hgu strains, with higher residual amount observed for CF/1-CftrTgH(neoim)Hgu than CF/3-CftrTgH(neoim)Hgu for every investigated tissue. Accordingly the amounts of wild type Cftr protein in the intestine were 9% for CF/1-CftrTgH(neoim)Hgu and 4% for CF/3-CftrTgH(neoim)Hgu. Unlike the apparent strain and/or tissue specific regulation of Cftr mRNA splicing, short circuit current measurements in the respiratory and intestinal epithelium revealed that both strains have ameliorated the basic defect of cystic fibrosis with a presentation of a normal electrophysiology in both tissues. CONCLUSION: Unlike the outbred CftrTgH(neoim)Hgu insertional mouse model, which displayed the electrophysiological defect in the gastrointestinal and respiratory tracts characteristic of cystic fibrosis, both inbred CftrTgH(neoim)Hgu strains have ameliorated the electrophysiological defect. On the basis of these findings both CF/1-CftrTgH(neoim)Hgu and CF/3-CftrTgH(neoim)Hgu offer an excellent model whereby determination of the minimal levels of protein required for the restoration of the basic defect of cystic fibrosis can be studied, along with the modulating factors which may affect this outcome. [Abstract/Link to Full Text]

Berg F, Stern S, Andersson K, Andersson L, Moller M
Refined localization of the FAT1 quantitative trait locus on pig chromosome 4 by marker-assisted backcrossing.
BMC Genet. 2006;717.
BACKGROUND: A major QTL for fatness and growth, denoted FAT1, has previously been detected on pig chromosome 4q (SSC4q) using a Large White - wild boar intercross. Progeny that carried the wild boar allele at this locus had higher fat deposition, shorter length of carcass, and reduced growth. The position and the estimated effects of the FAT1 QTL for growth and fatness have been confirmed in a previous study. In order to narrow down the QTL interval we have traced the inheritance of the wild boar allele associated with high fat deposition through six additional backcross generations. RESULTS: Progeny-testing was used to determine the QTL genotype for 10 backcross sires being heterozygous for different parts of the broad FAT1 region. The statistical analysis revealed that five of the sires were segregating at the QTL, two were negative while the data for three sires were inconclusive. We could confirm the QTL effects on fatness/meat content traits but not for the growth traits implying that growth and fatness are controlled by distinct QTLs on chromosome 4. Two of the segregating sires showed highly significant QTL effects that were as large as previously observed in the F2 generation. The estimates for the remaining three sires, which were all heterozygous for smaller fragments of the actual region, were markedly smaller. With the sample sizes used in the present study we cannot with great confidence determine whether these smaller effects in some sires are due to chance deviations, epistatic interactions or whether FAT1 is composed of two or more QTLs, each one with a smaller phenotypic effect. Under the assumption of a single locus, the critical region for FAT1 has been reduced to a 3.3 cM interval between the RXRG and SDHC loci. CONCLUSION: We have further characterized the FAT1 QTL on pig chromosome 4 and refined its map position considerably, from a QTL interval of 70 cM to a maximum region of 20 cM and a probable region as small as 3.3 cM. The flanking markers for the small region are RXRG and SDHC and the orthologous region of FAT1 in the human genome is located on HSA1q23.3 and harbors approximately 20 genes. Our strategy to further refine the map position of this major QTL will be i) to type new markers in our pigs that are recombinant in the QTL interval and ii) to perform Identity-By-Descent (IBD) mapping across breeds that have been strongly selected for lean growth. [Abstract/Link to Full Text]

Marjoram P, Wall JD
Fast "coalescent" simulation.
BMC Genet. 2006;716.
BACKGROUND: The amount of genome-wide molecular data is increasing rapidly, as is interest in developing methods appropriate for such data. There is a consequent increasing need for methods that are able to efficiently simulate such data. In this paper we implement the sequentially Markovian coalescent algorithm described by McVean and Cardin and present a further modification to that algorithm which slightly improves the closeness of the approximation to the full coalescent model. The algorithm ignores a class of recombination events known to affect the behavior of the genealogy of the sample, but which do not appear to affect the behavior of generated samples to any substantial degree. RESULTS: We show that our software is able to simulate large chromosomal regions, such as those appropriate in a consideration of genome-wide data, in a way that is several orders of magnitude faster than existing coalescent algorithms. CONCLUSION: This algorithm provides a useful resource for those needing to simulate large quantities of data for chromosomal-length regions using an approach that is much more efficient than traditional coalescent models. [Abstract/Link to Full Text]

Wimmers K, Fiedler I, Hardge T, Murani E, Schellander K, Ponsuksili S
QTL for microstructural and biophysical muscle properties and body composition in pigs.
BMC Genet. 2006;715.
BACKGROUND: The proportion of muscle fibre types and their size affect muscularity as well as functional properties of the musculature and meat quality. We aimed to identify QTL for microstructural muscle properties including muscle fibre size, their numbers and fibre type proportions as well as biophysical parameters of meat quality and traits related to body composition, i.e. pH, conductivity, area of M. longissimus dorsi and lean meat content. A QTL scan was conducted in a porcine experimental population that is based on Duroc and Berlin Miniature Pig. RESULTS: Least square regression interval mapping revealed five significant and 42 suggestive QTL for traits related to muscle fibre composition under the line-cross model as well as eight significant and 40 suggestive QTL under the half-sib model. For traits related to body composition and biophysical parameters of meat quality five and twelve significant plus nine and 22 suggestive QTL were found under the line-cross and half-sib model, respectively. Regions with either significant QTL for muscle fibre traits or significant QTL for meat quality and muscularity or both were detected on SSC1, 2, 3, 4, 5, 13, 14, 15, and 16. QTL for microstructural properties explained a larger proportion of variance than did QTL for meat quality and body composition. CONCLUSION: Microstructural properties of pig muscle and meat quality are governed by genetic variation at many loci distributed throughout the genome. QTL analysis under both, the line-cross and half-sib model, allows detecting QTL in case of fixation or segregation of the QTL alleles among the founder populations and thus provide comprehensive insight into the genetic variation of the traits under investigation. Genomic regions affecting complex traits of muscularity and meat quality as well as microstructural properties might point to QTL that in first instance affect muscle fibre traits and by this in second instance meat quality. Disentangling complex traits in their constituent phenotypes might facilitate the identification of QTL and the elucidation of the pleiotropic nature of QTL effects. [Abstract/Link to Full Text]

Sentinelli F, Romeo S, Barbetti F, Berni A, Filippi E, Fanelli M, Fallarino M, Baroni MG
Search for genetic variants in the p66Shc longevity gene by PCR-single strand conformational polymorphism in patients with early-onset cardiovascular disease.
BMC Genet. 2006;714.
BACKGROUND: Among the possible candidate genes for atherosclerosis experimental data point towards the longevity gene p66Shc. The p66Shc gene determines an increase of intracellular reactive oxygen species (ROS), affecting the rate of oxidative damage to nucleic acids. Knock-out p66Shc-/- mice show reduction of systemic oxidative stress, as well as of plasma LDL oxidation, and reduced atherogenic lesions. Thus, p66Shc may play a pivotal role in controlling oxidative stress and vascular dysfunction in vivo. METHODS: We searched for sequence variations in the p66Shc specific region of the Shc gene and its upstream promoter by PCR-SSCP in a selected group of early onset coronary artery disease (CAD) subjects (n. 78, mean age 48.5 +/- 6 years) and in 93 long-living control subjects (mean age 89 +/- 6 years). RESULTS: The analysis revealed two variant bands. Sequencing of these variants showed two SNPs: -354T>C in the regulatory region of p66Shc locus and 92C>T in the p66 specific region (CH2). Both these variants have never been described before. The first substitution partially modifies the binding consensus sequence of the Sp1 transcription factor, and was detected only in two heterozygous carriers (1 CAD subjects and 1 control subject). The 92C>T substitution in the CH2 region consists in an amino acid substitution at codon 31 (proline to leucine, P31L), and was detected in heterozygous status only in one CAD subject. No subjects homozygous for the two newly described SNPs were found. CONCLUSION: Only two sequence variations in the p66Shc gene were observed in a total of 171 subjects, and only in heterozygotes. Our observations, in accordance to other studies, suggest that important variations in the p66Shc gene may be extremely rare and probably this gene is not involved in the genetic susceptibility to CAD. [Abstract/Link to Full Text]

Wohlke A, Distl O, Drogemuller C
Characterization of the canine CLCN3 gene and evaluation as candidate for late-onset NCL.
BMC Genet. 2006;713.
BACKGROUND: The neuronal ceroid lipofuscinoses (NCL) are a heterogenous group of inherited progressive neurodegenerative diseases in different mammalian species. Tibetan Terrier and Polish Owczarek Nizinny (PON) dogs show rare late-onset NCL variants with autosomal recessive inheritance, which can not be explained by mutations of known human NCL genes. These dog breeds represent animal models for human late-onset NCL. In mice the chloride channel 3 gene (Clcn3) encoding an intracellular chloride channel was described to cause a phenotype similar to NCL. RESULTS: Two full-length cDNA splice variants of the canine CLCN3 gene are reported. The current canine whole genome sequence assembly was used for gene structure analyses and revealed 13 coding CLCN3 exons in 52 kb of genomic sequence. Sequence analysis of the coding exons and flanking intron regions of CLCN3 using six NCL-affected Tibetan terrier dogs and an NCL-affected Polish Owczarek Nizinny (PON) dog, as well as eight healthy Tibetan terrier dogs revealed 13 SNPs. No consistent CLCN3 haplotype was associated with NCL. CONCLUSION: For the examined animals we excluded the complete coding region and adjacent intronic regions of canine CLCN3 to harbor disease-causing mutations. Therefore it seems to be unlikely that a mutation in this gene is responsible for the late-onset NCL phenotype in these two dog breeds. [Abstract/Link to Full Text]

Garrick RC, Sunnucks P
Development and application of three-tiered nuclear genetic markers for basal Hexapods using single-stranded conformation polymorphism coupled with targeted DNA sequencing.
BMC Genet. 2006;711.
BACKGROUND: Molecular genetic approaches have much to offer population biology. Despite recent advances, convenient techniques to develop and screen highly-resolving markers can be limiting for some applications and taxa. We describe an improved PCR-based, cloning-free, nuclear marker development procedure, in which single-stranded conformation polymorphism (SSCP) plays a central role. Sequence-variable alleles at putative nuclear loci are simultaneously identified and isolated from diploid tissues. Based on a multiple allele alignment, locus-specific primers are designed in conserved regions, minimizing 'null' alleles. Using two undescribed endemic Australian Collembola as exemplars, we outline a comprehensive approach to generating and validating suites of codominant, sequence-yielding nuclear loci for previously unstudied invertebrates. RESULTS: Six markers per species were developed without any baseline genetic information. After evaluating the characteristics of each new locus via SSCP pre-screening, population samples were genotyped on the basis of either DNA sequence, restriction site, or insertion/deletion variation, depending on which assay was deemed most appropriate. Polymorphism was generally high (mean of nine alleles per locus), and the markers were capable of resolving population structuring over very fine spatial scales (<100 km). SSCP coupled with targeted DNA sequencing was used to obtain genotypic, genic and genealogical information from six loci (three per species). Phylogeographic analysis identified introns as being most informative. CONCLUSION: The comprehensive approach presented here feasibly overcomes technical hurdles of (i) developing suitably polymorphic nuclear loci for non-model organisms, (ii) physically isolating nuclear allele haplotypes from diploid tissues without cloning, and (iii) genotyping population samples on the basis of nuclear DNA sequence variation. [Abstract/Link to Full Text]

Morris GA, Lowe CE, Cooper JD, Payne F, Vella A, Godfrey L, Hulme JS, Walker NM, Healy BC, Lam AC, Lyons PA, Todd JA
Polymorphism discovery and association analyses of the interferon genes in type 1 diabetes.
BMC Genet. 2006;712.
BACKGROUND: The aetiology of the autoimmune disease type 1 diabetes (T1D) involves many genetic and environmental factors. Evidence suggests that innate immune responses, including the action of interferons, may also play a role in the initiation and/or pathogenic process of autoimmunity. In the present report, we have adopted a linkage disequilibrium (LD) mapping approach to test for an association between T1D and three regions encompassing 13 interferon alpha (IFNA) genes, interferon omega-1 (IFNW1), interferon beta-1 (IFNB1), interferon gamma (IFNG) and the interferon consensus-sequence binding protein 1 (ICSBP1). RESULTS: We identified 238 variants, most, single nucleotide polymorphisms (SNPs), by sequencing IFNA, IFNB1, IFNW1 and ICSBP1, 98 of which where novel when compared to dbSNP build 124. We used polymorphisms identified in the SeattleSNP database for INFG. A set of tag SNPs was selected for each of the interferon and interferon-related genes to test for an association between T1D and this complex gene family. A total of 45 tag SNPs were selected and genotyped in a collection of 472 multiplex families. CONCLUSION: We have developed informative sets of SNPs for the interferon and interferon related genes. No statistical evidence of a major association between T1D and any of the interferon and interferon related genes tested was found. [Abstract/Link to Full Text]

Laurentin HE, Karlovsky P
Genetic relationship and diversity in a sesame (Sesamum indicum L.) germplasm collection using amplified fragment length polymorphism (AFLP).
BMC Genet. 2006;710.
BACKGROUND: Sesame is an important oil crop in tropical and subtropical areas. Despite its nutritional value and historic and cultural importance, the research on sesame has been scarce, particularly as far as its genetic diversity is concerned. The aims of the present study were to clarify genetic relationships among 32 sesame accessions from the Venezuelan Germplasm Collection, which represents genotypes from five diversity centres (India, Africa, China-Korea-Japan, Central Asia and Western Asia), and to determine the association between geographical origin and genetic diversity using amplified fragment length polymorphism (AFLP). RESULTS: Large genetic variability was found within the germplasm collection. A total of 457 AFLP markers were recorded, 93 % of them being polymorphic. The Jaccard similarity coefficient ranged from 0.38 to 0.85 between pairs of accessions. The UPGMA dendrogram grouped 25 of 32 accessions in two robust clusters, but it has not revealed any association between genotype and geographical origin. Indian, African and Chinese-Korean-Japanese accessions were distributed throughout the dendrogram. A similar pattern was obtained using principal coordinates analysis. Genetic diversity studies considering five groups of accessions according to the geographic origin detected that only 20 % of the total diversity was due to diversity among groups using Nei's coefficient of population differentiation. Similarly, only 5% of the total diversity was attributed to differences among groups by the analysis of molecular variance (AMOVA). This small but significant difference was explained by the fact that the Central Asia group had a lower genetic variation than the other diversity centres studied. CONCLUSION: We found that our sesame collection was genetically very variable and did not show an association between geographical origin and AFLP patterns. This result suggests that there was considerable gene flow among diversity centres. Future germplasm collection strategies should focus on sampling a large number of plants. Covering many diversity centres is less important because each centre represents a major part of the total diversity in sesame, Central Asia centre being the only exception. The same recommendation holds for the choice of parents for segregant populations used in breeding projects. The traditional assumption that selecting genotypes of different geographical origin will maximize the diversity available to a breeding project does not hold in sesame. [Abstract/Link to Full Text]

Buyske S, Williams TA, Mars AE, Stenroos ES, Ming SX, Wang R, Sreenath M, Factura MF, Reddy C, Lambert GH, Johnson WG
Analysis of case-parent trios at a locus with a deletion allele: association of GSTM1 with autism.
BMC Genet. 2006 Feb 10;7(1):8.
ABSTRACT: BACKGROUND: Certain loci on the human genome, such as glutathione S-transferase M1 (GSTM1), do not permit heterozygotes to be reliably determined by commonly used methods. Association of such a locus with a disease is therefore generally tested with a case-control design. When subjects have already been ascertained in a case-parent design however, the question arises as to whether the data can still be used to test disease association at such a locus. RESULTS: A likelihood ratio test was constructed that can be used with a case-parents design but has somewhat less power than a Pearson's chi-squared test that uses a case-control design. The test is illustrated on a novel dataset showing a genotype relative risk near 2 for the homozygous GSTM1 deletion genotype and autism. CONCLUSIONS: Although the case-control design will remain the mainstay for a locus with a deletion, the likelihood ratio test will be useful for such a locus analyzed as part of a larger case-parent study design. The likelihood ratio test has the advantage that it can incorporate complete and incomplete case-parent trios as well as independent cases and controls. Both analyses support (p = 0.046 for the proposed test, p = 0.028 for the case-control analysis) an association of the homozygous GSTM1 deletion genotype with autism. [Abstract/Link to Full Text]

Wang T, Zeng ZB
Models and partition of variance for quantitative trait loci with epistasis and linkage disequilibrium.
BMC Genet. 2006;79.
BACKGROUND: A genetic model about quantitative trait loci (QTL) provides a basis to interpret the genetic basis of quantitative traits in a study population, such as additive, dominance and epistatic effects of QTL and the partition of genetic variance. The standard quantitative genetics model is based on the least squares partition of genetic effects and also genetic variance in an equilibrium population. However, over years many specialized QTL models have also been proposed for applications in some specific populations. How are these models related? How to analyze and partition a QTL model and genetic variance when both epistasis and linkage disequilibrium are considered? RESULTS: Starting from the classical description of Cockerham genetic model, we first represent the model in a multiple regression setting by using indicator variables to describe the segregation of QTL alleles. In this setting, the definition of additive, dominance and epistatic effects of QTL and the basis for the partition of genetic variance are elaborated. We then build the connection between this general genetic model and a few specialized models (a haploid model, a diploid F2 model and a general two-allele model), and derive the genetic effects and partition of genetic variance for multiple QTL with epistasis and linkage disequilibrium for these specialized models. CONCLUSION: In this paper, we study extensively the composition and property of the genetic model parameters, such as genetic effects and partition of genetic variance, when both epistasis and linkage disequilibrium are considered. This is the first time that both epistasis and linkage disequilibrium are considered in modeling multiple QTL. This analysis would help us to understand the structure of genetic parameters and relationship of various genetic quantities, such as allelic frequencies and linkage disequilibrium, on the definition of genetic effects, and will also help us to understand and properly interpret estimates of the genetic effects and variance components in a QTL mapping experiment. [Abstract/Link to Full Text]

Duan T, Finch SJ, Ye KQ, Chase GA, Mendell NR
Using mixture models to characterize disease-related traits.
BMC Genet. 2005 Dec 30;6 Suppl 1S99.
ABSTRACT : We consider 12 event-related potentials and one electroencephalogram measure as disease-related traits to compare alcohol-dependent individuals (cases) to unaffected individuals (controls). We use two approaches: 1) two-way analysis of variance (with sex and alcohol dependency as the factors), and 2) likelihood ratio tests comparing sex adjusted values of cases to controls assuming that within each group the trait has a 2 (or 3) component normal mixture distribution. In the second approach, we test the null hypothesis that the parameters of the mixtures are equal for the cases and controls. Based on the two-way analysis of variance, we find 1) males have significantly (p < 0.05) lower mean response values than females for 7 of these traits. 2) Alcohol-dependent cases have significantly lower mean response than controls for 3 traits. The mixture analysis of sex-adjusted values of 1 of these traits, the event-related potential obtained at the parietal midline channel (ttth4), found the appearance of a 3-component normal mixture in cases and controls. The mixtures differed in that the cases had significantly lower mean values than controls and significantly different mixing proportions in 2 of the 3 components. Implications of this study are: 1) Sex needs to be taken into account when studying risk factors for alcohol dependency to prevent finding a spurious association between alcohol dependency and the risk factor. 2) Mixture analysis indicates that for the event-related potential "ttth4", the difference observed reflects strong evidence of heterogeneity of response in both the cases and controls. [Abstract/Link to Full Text]

Bourgain C
Comparing strategies for association mapping in samples with related individuals.
BMC Genet. 2005 Dec 30;6 Suppl 1S98.
ABSTRACT : In this paper, different strategies to test for association in samples with related individuals designed for linkage studies are compared. Because no independent controls are available, a family-based association test and case-control tests corrected for the presence of related individuals in which unaffected relatives are used as controls were tested. When unrelated controls are available, additional strategies including selection of a single case per family considering either all families or a subset of linked families, are also considered. Analyses are performed on the simulated dataset, blind to the answers. The case-control test corrected for the presence of related individuals is the most powerful strategy to detect three loci associated with the disease under study. Using a correction factor for the case-control test performed conditional on the marker information rather than unconditional does not impact the power significantly. [Abstract/Link to Full Text]

Wu X, Kan D, Cooper RS, Zhu X
Identifying genetic variation affecting a complex trait in simulated data: a comparison of meta-analysis with pooled data analysis.
BMC Genet. 2005 Dec 30;6 Suppl 1S97.
ABSTRACT : We explored the power and consistency to detect linkage and association with meta-analysis and pooled data analysis using Genetic Analysis Workshop 14 simulated data. The first 10 replicates from Aipotu population were used. Significant linkage and association was found at all 4 regions containing the major loci for Kofendrerd Personality Disorder (KPD) using both combined analyses although no significant linkage and association was found at all these regions in a single replicate. The linkage results from both analyses are consistent in terms of the significance level of linkage test and the estimate of locus location. After correction for multiple-testing, significant associations were detected for the same 8 single-nucleotide polymorphisms (SNP) in both analyses. There were another 2 SNPs for which significant associations with KPD were found only by pooled data analysis. Our study showed that, under homogeneous condition, the results from meta-analysis and pooled data analysis are similar in both linkage and association studies and the loss of power is limited using meta-analysis. Thus, meta-analysis can provide an overall evaluation of linkage and association when the original raw data is not available for combining. [Abstract/Link to Full Text]

McQueen MB, Murphy A, Kraft P, Su J, Lazarus R, Laird NM, Lange C, Van Steen K
Comparison of linkage and association strategies for quantitative traits using the COGA dataset.
BMC Genet. 2005 Dec 30;6 Suppl 1S96.
ABSTRACT : Genome scans using dense single-nucleotide polymorphism (SNP) data have recently become a reality. It is thought that the increase in information content for linkage analysis as a result of the denser scans will help refine previously identified linkage regions and possibly identify new regions not identifiable using the sparser, microsatellite scans. In the context of the dense SNP scans, it is also possible to consider association strategies to provide even more information about potential regions of interest. To circumvent the multiple-testing issues inherent in association analysis, we use a recently developed strategy, implemented in PBAT, which screens the data to identify the optimal SNPs for testing, without biasing the nominal significance level. We compare the results from the PBAT analysis to that of quantitative linkage analysis on chromosome 4 using the Collaborative Study on the Genetics of Alcoholism data, as released through Genetic Analysis Workshop 14. [Abstract/Link to Full Text]

Larkin EK, Cartier KC, Gray-McGuire C
A regression based transmission/disequilibrium test for binary traits: the power of joint tests for linkage and association.
BMC Genet. 2005 Dec 30;6 Suppl 1S95.
ABSTRACT : BACKGROUND : In this analysis we applied a regression based transmission disequilibrium test to the binary trait presence or absence of Kofendred Personality Disorder in the Genetic Analysis Workshop 14 (GAW14) simulated dataset and determined the power and type I error rate of the method at varying map densities and sample sizes. To conduct this transmission disequilibrium test, the logit transformation was applied to a binary outcome and regressed on an indicator variable for the transmitted allele from informative matings. All 100 replicates from chromosomes 1, 3, 5, and 9 for the Aipotu and the combined Aipotu, Karangar, and Danacaa populations were used at densities of 3, 1, and 0.3 cM. Power and type I error were determined by the number of replicates significant at the 0.05 level. RESULTS : The maximum power to detect linkage and association with the Aipotu population was 93% for chromosome 3 using a 0.3-cM map. For chromosomes 1, 5, and 9 the power was less than 10% at the 3-cM scan and less than 22% for the 0.3-cM map. With the larger sample size, power increased to 38% for chromosome 1, 100% for chromosome 3, 31% for chromosome 5, and 23% for chromosome 9. Type I error was approximately 7%. CONCLUSION : The power of this method is highly dependent on the amount of information in a region. This study suggests that single-point methods are not particularly effective in narrowing a fine-mapping region, particularly when using single-nucleotide polymorphism data and when linkage disequilibrium in the region is variable. [Abstract/Link to Full Text]

Kraja AT, Borecki IB, Province MA
Microsatellite linkage analysis, single-nucleotide polymorphisms, and haplotype associations with ECB21 in the COGA data.
BMC Genet. 2005 Dec 30;6 Suppl 1S94.
ABSTRACT : This study, part of the Genetic Analysis Workshop 14 (GAW14), explored real Collaborative Study on the Genetics of Alcoholism data for linkage and association mapping between genetic polymorphisms (microsatellite and single-nucleotide polymorphisms (SNPs)) and beta (16.5-20 Hz) oscillations of the brain rhythms (ecb21). The ecb21 phenotype underwent the statistical adjustments for the age of participants, and for attaining a normal distribution. A total of 1,000 subjects' available phenotypes were included in linkage analysis with microsatellite markers. Linkage analysis was performed only for chromosome 4 where a quantitative trait locus with 5.01 LOD score had been previously reported. Previous findings related this location with the gamma-aminobutyric acid type A (GABAA) receptor. At the same location, our analysis showed a LOD score of 2.2. This decrease in the LOD score is the result of a drastic reduction (one-third) of the available GAW14 phenotypic data. We performed SNP and haplotype association analyses with the same phenotypic data under the linkage peak region on chromosome 4. Seven Affymetrix and two Illumina SNPs showed significant associations with ecb21 phenotype. A haplotype, a combination of SNPs TSC0044171 and TSC0551006 (the latter almost under the region of GABAA genes), showed a significant association with ecb21 (p = 0.015) and a relatively high frequency in the sample studied. Our results affirmed that the GABA region has potential of harboring genes that contribute quantitatively to the beta oscillation of the brain rhythms. The inclusion of the remaining 614 subjects, which in the GAW14 had missing data for the ecb21, can improve the strength of the associations as they have already shown that they contribute quite important information in the linkage analysis. [Abstract/Link to Full Text]

Joo J, Tian X, Zheng G, Lin JP, Geller NL
Selection of single-nucleotide polymorphisms in disease association data.
BMC Genet. 2005 Dec 30;6 Suppl 1S93.
ABSTRACT : We studied several methods for selecting single-nucleotide polymorphisms (SNPs) in a disease association study. Two major categories for analytical strategy are the univariate and the set selection approaches. The univariate approach evaluates each SNP marker one at a time, while the set selection approach tests disease association of a set of SNP markers simultaneously. We examined various test statistics that can be utilized in testing disease association and also reviewed several multiple testing procedures that can properly control the family-wise error rates when the univariate approach is applied to multiple markers. The set association methods were then briefly reviewed. Finally, we applied these methods to the data from Collaborative Study on the Genetics of Alcoholism (COGA). [Abstract/Link to Full Text]

Jonasdottir G, Palmgren J, Humphreys K
Analysis of binary traits: testing association in the presence of linkage.
BMC Genet. 2005 Dec 30;6 Suppl 1S92.
ABSTRACT : Most methods for testing association in the presence of linkage, using family-based studies, have been developed for continuous traits. FBAT (family-based association tests) is one of few methods appropriate for discrete outcomes. In this article we describe a new test of association in the presence of linkage for binary traits. We use a gamma random effects model in which association and linkage are modelled as fixed effects and random effects, respectively. We have compared the gamma random effects model to an FBAT and a generalized estimating equation-based alternative, using two regions in the Genetic Analysis Workshop 14 simulated data. One of these regions contained haplotypes associated with disease, and the other did not. [Abstract/Link to Full Text]

Havill LM, Dyer TD, Richardson DK, Mahaney MC, Blangero J
The quantitative trait linkage disequilibrium test: a more powerful alternative to the quantitative transmission disequilibrium test for use in the absence of population stratification.
BMC Genet. 2005 Dec 30;6 Suppl 1S91.
ABSTRACT : Linkage analysis based on identity-by-descent allele-sharing can be used to identify a chromosomal region harboring a quantitative trait locus (QTL), but lacks the resolution required for gene identification. Consequently, linkage disequilibrium (association) analysis is often employed for fine-mapping. Variance-components based combined linkage and association analysis for quantitative traits in sib pairs, in which association is modeled as a mean effect and linkage is modeled in the covariance structure has been extended to general pedigrees (quantitative transmission disequilibrium test, QTDT). The QTDT approach accommodates data not only from parents and siblings, but also from all available relatives. QTDT is also robust to population stratification. However, when population stratification is absent, it is possible to utilize even more information, namely the additional information contained in the founder genotypes. In this paper, we introduce a simple modification of the allelic transmission scoring method used in the QTDT that results in a more powerful test of linkage disequilibrium, but is only applicable in the absence of population stratification. This test, the quantitative trait linkage disequilibrium (QTLD) test, has been incorporated into a new procedure in the statistical genetics computer package SOLAR. We apply this procedure in a linkage/association analysis of an electrophysiological measurement previously shown to be related to alcoholism. We also demonstrate by simulation the increase in power obtained with the QTLD test, relative to the QTDT, when a true association exists between a marker and a QTL. [Abstract/Link to Full Text]

Guo CY, Cui J, Cupples LA
Impact of non-ignorable missingness on genetic tests of linkage and/or association using case-parent trios.
BMC Genet. 2005 Dec 30;6 Suppl 1S90.
ABSTRACT : The transmission/disequilibrium test was introduced to test for linkage disequilibrium between a marker and a putative disease locus using case-parent trios. However, parental genotypes may be incomplete in such a study. When parental information is non-randomly missing, due, for example, to death from the disease under study, the impact on type I error and power under dominant and recessive disease models has been reported. In this paper, we examine non-ignorable missingness by assigning missing values to the genotypes of affected parents. We used unrelated case-parent trios in the Genetic Analysis Workshop 14 simulated data for the Danacaa population. Our computer simulations revealed that the type I error of these tests using incomplete trios was not inflated over the nominal level under either recessive or dominant disease models. However, the power of these tests appears to be inflated over the complete information case due to an excess of heterozygous parents in dyads. [Abstract/Link to Full Text]

Namkung J, Kim Y, Park T
Whole-genome association studies of alcoholism with loci linked to schizophrenia susceptibility.
BMC Genet. 2005 Dec 30;6 Suppl 1S9.
ABSTRACT : BACKGROUND : Alcoholism is a complex disease. There have been many reports on significant comorbidity between alcoholism and schizophrenia. For the genetic study of complex diseases, association analysis has been recommended because of its higher power than that of the linkage analysis for detecting genes with modest effects on disease. RESULTS : To identify alcoholism susceptibility loci, we performed genome-wide single-nucleotide polymorphisms (SNP) association tests, which yielded 489 significant SNPs at the 1% significance level. The association tests showed that tsc0593964 (P-value 0.000013) on chromosome 7 was most significantly associated with alcoholism. From 489 SNPs, 74 genes were identified. Among these genes, GABRA1 is a member of the same gene family with GABRA2 that was recently reported as alcoholism susceptibility gene. CONCLUSION : By comparing 74 genes to the published results of various linkage studies of schizophrenia, we identified 13 alcoholism associated genes that were located in the regions reported to be linked to schizophrenia. These 13 identified genes can be important candidate genes to study the genetic mechanism of co-occurrence of both diseases. [Abstract/Link to Full Text]

Chiu YF, Liu SY, Tsai YY
A comparison in association and linkage genome-wide scans for alcoholism susceptibility genes using single-nucleotide polymorphisms.
BMC Genet. 2005 Dec 30;6 Suppl 1S89.
ABSTRACT : We conducted genome-wide linkage scans using both microsatellite and single-nucleotide polymorphism (SNP) markers. Regions showing the strongest evidence of linkage to alcoholism susceptibility genes were identified. Haplotype analyses using a sliding-window approach for SNPs in these regions were performed. In addition, we performed a genome-wide association scan using SNP data. SNPs in these regions with evidence of association (P <== 0.0001) were identified. We found that the general patterns for nonparametric linkage (NPL) scores from SNP and microsatellite genome scans are fairly consistent; however, the peaks of the NPL scores are mostly higher in the SNP-based scan than those using microsatellite markers, which might be located at different regions. Furthermore, SNPs identified from linkage screens were not so strongly associated with alcoholism (the most significant SNP had a p-value of 0.030) as those identified from association genomic screening (the most significant SNP had a p-value of 2.0 x 10-8). [Abstract/Link to Full Text]

Chen MH, Van Eerdewegh P, Dupuis J
Identification of polymorphisms explaining a linkage signal: application to the GAW14 simulated data.
BMC Genet. 2005 Dec 30;6 Suppl 1S88.
ABSTRACT : We applied three approaches for the identification of polymorphisms explaining the linkage evidence to the Genetic Analysis Workshop 14 simulated data: 1) the genotype-IBD sharing test (GIST); 2) an approach suggested by Horikawa and colleagues; and 3) the homozygote sharing test (HST). These tests were compared with a family-based association test. Two linked regions with highest nonparametric linkage scores were selected to apply these methods. In the first region, Horikawa's method identified the most SNPs within the region containing the disease susceptibility locus, while HST performed best in the second region. However, Horikawa's method also had the most type I errors. These methods show potential as additional tools to complement family-based association tests for the identification of disease susceptibility variants. [Abstract/Link to Full Text]

Bourgey M, Leutenegger AL, Cousin E, Bourgain C, Babron MC, Clerget-Darpoux F
Modeling the effect of a genetic factor for a complex trait in a simulated population.
BMC Genet. 2005 Dec 30;6 Suppl 1S87.
ABSTRACT : Genetic Analysis Workshop 14 simulated data have been analyzed with MASC(marker association segregation chi-squares) in which we implemented a bootstrap procedure to provide the variation intervals of parameter estimates. We model here the effect of a genetic factor, S, for Kofendrerd Personality Disorder in the region of the marker C03R0281 for the Aipotu population. The goodness of fit of several genetic models with two alleles for one locus has been tested. The data are not compatible with a direct effect of a single-nucleotide polymorphism (SNP) (SNP 16, 17, 18, 19 of pack 153) in the region. Therefore, we can conclude that the functional polymorphism has not been typed and is in linkage disequilibrium with the four studied SNPs. We obtained very large variation intervals both of the disease allele frequency and the degree of dominance. The uncertainty of the model parameters can be explained first, by the method used, which models marginal effects when the disease is due to complex interactions, second, by the presence of different sub-criteria used for the diagnosis that are not determined by S in the same way, and third, by the fact that the segregation of the disease in the families was not taken into account. However, we could not find any model that could explain the familial segregation of the trait, namely the higher proportion of affected parents than affected sibs. [Abstract/Link to Full Text]

Peralta JM, Dyer TD, Warren DM, Blangero J, Almasy L
Linkage disequilibrium across two different single-nucleotide polymorphism genome scans.
BMC Genet. 2005 Dec 30;6 Suppl 1S86.
ABSTRACT : Linkage disequilibrium (LD) content was calculated for the Genetic Analysis Workshop 14 Affymetrix and Illumina single-nucleotide polymorphism (SNP) genome scans of the Collaborative Study on the Genetics of Alcoholism samples. Pair-wise LD was measured as both D' and r2 on 505 pedigree founder individuals. The r2 estimates were then used to correct the multipoint identity by descent matrix (MIBD) calculation to account for LD and LOD scores on chromosomes 3 and 18 were calculated for COGA's ttdt3 electrophysiological trait using those MIBDs. Extensive LD was observed throughout both marker sets, and it was higher in Affymetrix's more dense SNP map. However, SNP density did not solely account for Affymetrix's higher LD. MIBD estimation procedures assume linkage equilibrium to construct genotypes of non-genotyped pedigree founder individuals, and dense SNP genotyping maps are likely to contain moderate to high LD between markers. LOD score plots calculated after correction for LD followed the same general pattern as uncorrected ones. Since in our study almost half of the pedigree founders were genotyped, it is possible that LD had a minor impact on the LOD scores. Caution should probably be taken when using high density SNP maps when many non-genotyped founders are present in the study pedigrees. [Abstract/Link to Full Text]

Murray SS
Evaluation of linkage disequilibrium and its effect on non-parametric multipoint linkage analysis using two high density single-nucleotide polymorphism mapping panels.
BMC Genet. 2005 Dec 30;6 Suppl 1S85.
ABSTRACT : Genotype data from the Illumina Linkage III SNP panel (n = 4,720 SNPs) and the Affymetrix 10 k mapping array (n = 11,120 SNPs) were used to test the effects of linkage disequilibrium (LD) between SNPs in a linkage analysis in the Collaborative Study on the Genetics of Alcoholism pedigree collection (143 pedigrees; 1,614 individuals). The average r2 between adjacent markers across the genetic map was 0.099 +/- 0.003 in the Illumina III panel and 0.17 +/- 0.003 in the Affymetrix 10 k array. In order to determine the effect of LD between marker loci in a nonparametric multipoint linkage analysis, markers in strong LD with another marker (r2 > 0.40) were removed (n = 471 loci in the Illumina panel; n = 1,804 loci in the Affymetrix panel) and the linkage analysis results were compared to the results using the entire marker sets. In all analyses using the ALDX1 phenotype, 8 linkage regions on 5 chromosomes (2, 7, 10, 11, X) were detected (peak markers p < 0.01), and the Illumina panel detected an additional region on chromosome 6. Analysis of the same pedigree set and ALDX1 phenotype using short tandem repeat markers (STRs) resulted in 3 linkage regions on 3 chromosomes (peak markers p < 0.01). These results suggest that in this pedigree set, LD between loci with spacing similar to the SNP panels tested may not significantly affect the overall detection of linkage regions in a genome scan. Moreover, since the data quality and information content are greatly improved in the SNP panels over STR genotyping methods, new linkage regions may be identified due to higher information content and data quality in a dense SNP linkage panel. [Abstract/Link to Full Text]

Kauwe JS, Bertelsen S, Bierut LJ, Dunn G, Hinrichs AL, Jin CH, Suarez BK
The efficacy of short tandem repeat polymorphisms versus single-nucleotide polymorphisms for resolving population structure.
BMC Genet. 2005 Dec 30;6 Suppl 1S84.
ABSTRACT : Accurately resolving population structure in a sample is important for both linkage and association studies. In this study we investigated the power of single-nucleotide polymorphisms (SNPs) in detecting population structure in a sample of 286 unrelated individuals. We varied the number of SNPs to determine how many are required to approach the degree of resolution obtained with the Collaborative Study on the Genetics of Alcoholism (COGA) short tandem repeat polymorphisms (STRPs). In addition, we selected SNPs with varying minor allele frequencies (MAFs) to determine whether low or high frequency SNPs are more efficient in resolving population structure. We conclude that a set of at least 100 evenly spaced SNPs with MAFs of 40-50% is required to resolve population structure in this dataset. If SNPs with lower MAFs are used, then more than 250 SNPs may be required to obtain reliable results. [Abstract/Link to Full Text]

Huang Q, Shete S, Swartz M, Amos CI
Examining the effect of linkage disequilibrium on multipoint linkage analysis.
BMC Genet. 2005 Dec 30;6 Suppl 1S83.
ABSTRACT : Most linkage programs assume linkage equilibrium among multiple linked markers. This assumption may lead to bias for tightly linked markers where strong linkage disequilibrium (LD) exists. We used simulated data from Genetic Analysis Workshop 14 to examine the possible effect of LD on multipoint linkage analysis. Single-nucleotide polymorphism packets from a non-disease-related region that was generated with LD were used for both model-free and parametric linkage analyses. Results showed that high LD among markers can induce false-positive evidence of linkage for affected sib-pair analysis when parental data are missing. Bias can be eliminated with parental data and can be reduced when additional markers not in LD are included in the analyses. [Abstract/Link to Full Text]

Li JZ, Meng F, Tsavaler L, Evans SJ, Choudary PV, Tomita H, Vawter MP, Walsh D, Shokoohi V, Chung T, Bunney WE, Jones EG, Akil H, Watson SJ, Myers RM
Sample matching by inferred agonal stress in gene expression analyses of the brain.
BMC Genomics. 2007 Sep 24;8(1):336.
ABSTRACT: BACKGROUND: Gene expression patterns in the brain are strongly influenced by the severity and duration of physiological stress at the time of death. This agonal effect, if not well controlled, can lead to spurious findings and diminished statistical power in case-control comparisons. While some recent studies match samples by tissue pH and clinically recorded agonal conditions, we found that these indicators were sometimes at odds with observed stress-related gene expression patterns, and that matching by these criteria still sometimes results in identifying case-control differences that are primarily driven by residual agonal effects. This problem is analogous to the one encountered in genetic association studies, where self-reported race and ethnicity are often imprecise proxies for an individual's actual genetic ancestry. RESULTS: We developed an Agonal Stress Rating (ASR) system that evaluates each sample's degree of stress based on gene expression data, and used ASRs in post hoc sample matching or covariate analysis. While gene expression patterns are generally correlated across different brain regions, we found strong region-region differences in empirical ASRs in many subjects that likely reflect inter-individual variabilities in local structure or function, resulting in region-specific vulnerability to agonal stress. CONCLUSION: Variation of agonal stress across different brain regions differs between individuals, revealing a new level of complexity for gene expression studies of brain tissues. The Agonal Stress Ratings quantitatively assess each sample's extent of regulatory response to agonal stress, and allow a strong control of this important confounder. [Abstract/Link to Full Text]

Kuehn C, Weikard R
Multiple splice variants within the bovine silver homologue (SILV) gene affecting coat color in cattle indicate a function additional to fibril formation in melanophores.
BMC Genomics. 2007;8335.
BACKGROUND: The silver homologue(SILV) gene plays a major role in melanosome development. SILV is a target for studies concerning melanoma diagnostics and therapy in humans as well as on skin and coat color pigmentation in many species ranging from zebra fish to mammals. However, the precise functional cellular mechanisms, in which SILV is involved, are still not completely understood. While there are many studies addressing SILV function upon a eumelaneic pigment background, there is a substantial lack of information regarding the further relevance of SILV, e.g. for phaeomelanosome development. RESULTS: In contrast to previous results in other species reporting SILV expression exclusively in pigmented tissues, our experiments provide evidence that the bovine SILV gene is expressed in a variety of tissues independent of pigmentation. Our data show that the bovine SILV gene generates an unexpectedly large number of different transcripts occurring in skin as well as in non-pigmented tissues, e.g. liver or mammary gland. The alternative splice sites are generated by internal splicing and primarily remove complete exons. Alternative splicing predominantly affects the repeat domain of the protein, which has a functional key role in fibril formation during eumelanosome development. CONCLUSION: The expression of the bovine SILV gene independent of pigmentation suggests SILV functions exceeding melanosome development in cattle. This hypothesis is further supported by transcript variants lacking functional key elements of the SILV protein relevant for eumelanosome development. Thus, the bovine SILV gene can serve as a model for the investigation of the putative additional functions of SILV. Furthermore, the splice variants of the bovine SILV gene represent a comprehensive natural model to refine the knowledge about functional domains in the SILV protein. Our study exemplifies that the extent of alternative splicing is presumably much higher than previously estimated and that alternatively spliced transcripts presumably can generate molecules of deviating function compared to their constitutive counterpart. [Abstract/Link to Full Text]

Ho EC, Cahill MJ, Saville BJ
Gene discovery and transcript analyses in the corn smut pathogen Ustilago maydis: expressed sequence tag and genome sequence comparison.
BMC Genomics. 2007 Sep 24;8(1):334.
ABSTRACT: BACKGROUND: Ustilago maydis is the basidiomycete fungus responsible for common smut of corn and is a model organism for the study of fungal phytopathogenesis. To aid in the annotation of the genome sequence of this organism, several expressed sequence tag (EST) libraries were generated from a variety of U. maydis cell types. In addition to utility in the context of gene identification and structure annotation, the ESTs were analyzed to identify differentially abundant transcripts and to detect evidence of alternative splicing and anti-sense transcription. RESULTS: Four cDNA libraries were constructed using RNA isolated from U. maydis diploid teliospores (U. maydis strains 518x521) and haploid cells of strain 521 grown under nutrient rich, carbon starved, and nitrogen starved conditions. Using the genome sequence as a scaffold, the 15,901 ESTs were assembled into 6,101 contiguous expressed sequences (contigs); among these, 5,482 corresponded to predicted genes in the MUMDB (MIPS Ustilago maydis database), while 619 aligned to regions of the genome not yet designated as genes in MUMDB. A comparison of EST abundance identified numerous genes that may be regulated in a cell type or starvation-specific manner. The transcriptional response to nitrogen starvation was assessed using RT-qPCR. The results of this suggest that there may be cross-talk between the nitrogen and carbon signalling pathways in U. maydis. Bioinformatic analysis identified numerous examples of alternative splicing and anti-sense transcription. While intron retention was the predominant form of alternative splicing in U. maydis, other varieties were also evident (e.g. exon skipping). Selected instances of both alternative splicing and anti-sense transcription were independently confirmed using RT-PCR. CONCLUSIONS: Through this work: 1) substantial sequence information has been provided for U. maydis genome annotation; 2) new genes were identified through the discovery of 619 contigs that had previously escaped annotation; 3) evidence is provided that suggests the regulation of nitrogen metabolism in U. maydis differs from that of other model fungi, and 4) Alternative splicing and anti-sense transcription were identified in U. maydis and, amid similar observations in other basidiomycetes, this suggests these phenomena may be widespread in this group of fungi. These advances emphasize the importance of EST analysis in genome annotation. [Abstract/Link to Full Text]

Hene L, Sreenu VB, Vuong MT, Abidi SH, Sutton JK, Rowland-Jones SL, Davis SJ, Evans EJ
Deep analysis of cellular transcriptomes - LongSAGE versus classic MPSS.
BMC Genomics. 2007;8333.
BACKGROUND: Deep transcriptome analysis will underpin a large fraction of post-genomic biology. 'Closed' technologies, such as microarray analysis, only detect the set of transcripts chosen for analysis, whereas 'open' e.g. tag-based technologies are capable of identifying all possible transcripts, including those that were previously uncharacterized. Although new technologies are now emerging, at present the major resources for open-type analysis are the many publicly available SAGE (serial analysis of gene expression) and MPSS (massively parallel signature sequencing) libraries. These technologies have never been compared for their utility in the context of deep transcriptome mining. RESULTS: We used a single LongSAGE library of 503,431 tags and a "classic" MPSS library of 1,744,173 tags, both prepared from the same T cell-derived RNA sample, to compare the ability of each method to probe, at considerable depth, a human cellular transcriptome. We show that even though LongSAGE is more error-prone than MPSS, our LongSAGE library nevertheless generated 6.3-fold more genome-matching (and therefore likely error-free) tags than the MPSS library. An analysis of a set of 8,132 known genes detectable by both methods, and for which there is no ambiguity about tag matching, shows that MPSS detects only half (54%) the number of transcripts identified by SAGE (3,617 versus 1,955). Analysis of two additional MPSS libraries shows that each library samples a different subset of transcripts, and that in combination the three MPSS libraries (4,274,992 tags in total) still only detect 73% of the genes identified in our test set using SAGE. The fraction of transcripts detected by MPSS is likely to be even lower for uncharacterized transcripts, which tend to be more weakly expressed. The source of the loss of complexity in MPSS libraries compared to SAGE is unclear, but its effects become more severe with each sequencing cycle (i.e. as MPSS tag length increases). CONCLUSION: We show that MPSS libraries are significantly less complex than much smaller SAGE libraries, revealing a serious bias in the generation of MPSS data unlikely to have been circumvented by later technological improvements. Our results emphasize the need for the rigorous testing of new expression profiling technologies. [Abstract/Link to Full Text]

Zhang W, Li L, Li X, Jiang W, Huo J, Wang Y, Lin M, Rao S
Unravelling the hidden heterogeneities of diffuse large B-cell lymphoma based on coupled two-way clustering.
BMC Genomics. 2007;8332.
BACKGROUND: It becomes increasingly clear that our current taxonomy of clinical phenotypes is mixed with molecular heterogeneity. Of vital importance for refined clinical practice and improved intervention strategies is to define the hidden molecular distinct diseases using modern large-scale genomic approaches. Microarray omics technology has provided a powerful way to dissect hidden genetic heterogeneity of complex diseases. The aim of this study was thus to develop a bioinformatics approach to seek the transcriptional features leading to the hidden subtyping of a complex clinical phenotype. The basic strategy of the proposed method was to iteratively partition in two ways sample and feature space with super-paramagnetic clustering technique and to seek for hard and robust gene clusters that lead to a natural partition of disease samples and that have the highest functionally conceptual consensus evaluated with Gene Ontology. RESULTS: We applied the proposed method to two publicly available microarray datasets of diffuse large B-cell lymphoma (DLBCL), a notoriously heterogeneous phenotype. A feature subset of 30 genes (38 probes) derived from analysis of the first dataset consisting of 4026 genes and 42 DLBCL samples identified three categories of patients with very different five-year overall survival rates (70.59%, 44.44% and 14.29% respectively; p = 0.0017). Analysis of the second dataset consisting of 7129 genes and 58 DLBCL samples revealed a feature subset of 13 genes (16 probes) that not only replicated the findings of the important DLBCL genes (e.g. JAW1 and BCL7A), but also identified three clinically similar subtypes (with 5-year overall survival rates of 63.13%, 34.92% and 15.38% respectively; p = 0.0009) to those identified in the first dataset. Finally, we built a multivariate Cox proportional-hazards prediction model for each feature subset and defined JAW1 as one of the most significant predictor (p = 0.005 and 0.014; hazard ratios = 0.02 and 0.03, respectively for two datasets) for both DLBCL cohorts under study. CONCLUSION: Our results showed that the proposed algorithm is a promising computational strategy for peeling off the hidden genetic heterogeneity based on transcriptionally profiling disease samples, which may lead to an improved diagnosis and treatment of cancers. [Abstract/Link to Full Text]

Zhang Z, Chen D, Fenstermacher DA
Integrated analysis of independent gene expression microarray datasets improves the predictability of breast cancer outcome.
BMC Genomics. 2007;8331.
BACKGROUND: Gene expression profiles based on microarray data have been suggested by many studies as potential molecular prognostic indexes of breast cancer. However, due to the confounding effect of clinical background, independent studies often obtained inconsistent results. The current study investigated the possibility to improve the quality and generality of expression profiles by integrated analysis of multiple datasets. Profiles of recurrence outcome were derived from two independent datasets and validated by a third dataset. RESULTS: The clinical background of patients significantly influenced the content and performance of expression profiles when the training samples were unbalanced. The integrated profiling of two independent datasets lead to higher classification accuracy (71.11% vs. 70.59%) and larger ROC curve area (0.789 vs. 0.767) of the testing samples. Cell cycle, especially M phase mitosis, was significantly overrepresented by the 60-gene profile obtained from integrated analysis (p < 0.0001). This profiles significantly differentiated poor and good prognosis in a third patient cohort (p = 0.003). Simulation procedures demonstrated that the change of profile specificity had more instant influence on the performance of expression profiles than the change of profile sensitivity. CONCLUSION: The current study confirmed that the gene expression profile generated by integrated analysis of multiple datasets achieved better prediction of breast cancer recurrence. However, the content and performance of profiles was confounded by clinical background of training patients. In future studies, prognostic profile applicable to the general population should be derived from more diversified and balanced patient cohorts in larger scale. [Abstract/Link to Full Text]

Schlueter JA, Lin JY, Schlueter SD, Vasylenko-Sanders IF, Deshpande S, Yi J, O'Bleness M, Roe BA, Nelson RT, Scheffler BE, Jackson SA, Shoemaker RC
Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing.
BMC Genomics. 2007;8330.
BACKGROUND: Soybean, Glycine max (L.) Merr., is a well documented paleopolyploid. What remains relatively under characterized is the level of sequence identity in retained homeologous regions of the genome. Recently, the Department of Energy Joint Genome Institute and United States Department of Agriculture jointly announced the sequencing of the soybean genome. One of the initial concerns is to what extent sequence identity in homeologous regions would have on whole genome shotgun sequence assembly. RESULTS: Seventeen BACs representing approximately 2.03 Mb were sequenced as representative potential homeologous regions from the soybean genome. Genetic mapping of each BAC shows that 11 of the 20 chromosomes are represented. Sequence comparisons between homeologous BACs shows that the soybean genome is a mosaic of retained paleopolyploid regions. Some regions appear to be highly conserved while other regions have diverged significantly. Large-scale "batch" reassembly of all 17 BACs combined showed that even the most homeologous BACs with upwards of 95% sequence identity resolve into their respective homeologous sequences. Potential assembly errors were generated by tandemly duplicated pentatricopeptide repeat containing genes and long simple sequence repeats. Analysis of a whole-genome shotgun assembly of 80,000 randomly chosen JGI-DOE sequence traces reveals some new soybean-specific repeat sequences. CONCLUSION: This analysis investigated both the structure of the paleopolyploid soybean genome and the potential effects retained homeology will have on assembling the whole genome shotgun sequence. Based upon these results, homeologous regions similar to those characterized here will not cause major assembly issues. [Abstract/Link to Full Text]

Wang X, Jia S, Meyer L, Yassai MB, Naumov YN, Gorski J, Hessner MJ
Quantitative measurement of pathogen-specific human memory T cell repertoire diversity using a CDR3 beta-specific microarray.
BMC Genomics. 2007;8329.
BACKGROUND: Providing quantitative microarray data that is sensitive to very small differences in target sequence would be a useful tool in any number of venues where a sample can consist of a multiple related sequences present in various abundances. Examples of such applications would include measurement of pseudo species in viral infections and the measurement of species of antibodies or T cell receptors that constitute immune repertoires. Difficulties that must be overcome in such a method would be to account for cross-hybridization and for differences in hybridization efficiencies between the arrayed probes and their corresponding targets. We have used the memory T cell repertoire to an influenza-derived peptide as a test case for developing such a method. RESULTS: The arrayed probes were corresponded to a 17 nucleotide TCR-specific region that distinguished sequences differing by as little as a single nucleotide. Hybridization efficiency between highly related Cy5-labeled subject sequences was normalized by including an equimolar mixture of Cy3-labeled synthetic targets representing all 108 arrayed probes. The same synthetic targets were used to measure the degree of cross hybridization between probes. Reconstitution studies found the system sensitive to input ratios as low as 0.5% and accurate in measuring known input percentages (R2 = 0.81, R = 0.90, p < 0.0001). A data handling protocol was developed to incorporate the differences in hybridization efficiency. To validate the array in T cell repertoire analysis, it was used to analyze human recall responses to influenza in three human subjects and compared to traditional cloning and sequencing. When evaluating the rank order of clonotype abundance determined by each method, the approaches were not found significantly different (Wilcoxon rank-sum test, p > 0.05). CONCLUSION: This novel strategy appears to be robust and can be adapted to any situation where complex mixtures of highly similar sequences need to be quantitatively resolved. [Abstract/Link to Full Text]

Salem M, Silverstein J, Rexroad CE, Yao J
Effect of starvation on global gene expression and proteolysis in rainbow trout (Oncorhynchus mykiss).
BMC Genomics. 2007;8328.
BACKGROUND: Fast, efficiently growing animals have increased protein synthesis and/or reduced protein degradation relative to slow, inefficiently growing animals. Consequently, minimizing the energetic cost of protein turnover is a strategic goal for enhancing animal growth. Characterization of gene expression profiles associated with protein turnover would allow us to identify genes that could potentially be used as molecular biomarkers to select for germplasm with improved protein accretion. RESULTS: We evaluated changes in hepatic global gene expression in response to 3-week starvation in rainbow trout (Oncorhynchus mykiss). Microarray analysis revealed a coordinated, down-regulated expression of protein biosynthesis genes in starved fish. In addition, the expression of genes involved in lipid metabolism/transport, aerobic respiration, blood functions and immune response were decreased in response to starvation. However, the microarray approach did not show a significant increase of gene expression in protein catabolic pathways. Further studies, using real-time PCR and enzyme activity assays, were performed to investigate the expression of genes involved in the major proteolytic pathways including calpains, the multi-catalytic proteasome and cathepsins. Starvation reduced mRNA expression of the calpain inhibitor, calpastatin long isoform (CAST-L), with a subsequent increase in the calpain catalytic activity. In addition, starvation caused a slight but significant increase in 20S proteasome activity without affecting mRNA levels of the proteasome genes. Neither the mRNA levels nor the activities of cathepsin D and L were affected by starvation. CONCLUSION: These results suggest a significant role of calpain and 20S proteasome pathways in protein mobilization as a source of energy during fasting and a potential association of the CAST-L gene with fish protein accretion. [Abstract/Link to Full Text]

Jones AK, Sattelle DB
The cys-loop ligand-gated ion channel gene superfamily of the red flour beetle, Tribolium castaneum.
BMC Genomics. 2007;8327.
BACKGROUND: Members of the cys-loop ligand-gated ion channel (cys-loop LGIC) superfamily mediate chemical neurotransmission and are studied extensively as potential targets of drugs used to treat neurological disorders such as Alzheimer's disease. Insect cys-loop LGICs are also of interest as they are targets of highly successful insecticides. The red flour beetle, Tribolium castaneum, is a major pest of stored agricultural products and is also an important model organism for studying development. RESULTS: As part of the T. castaneum genome sequencing effort, we have characterized the beetle cys-loop LGIC superfamily which is the third insect superfamily to be described after those of Drosophila melanogaster and Apis mellifera, and also the largest consisting of 24 genes. As with Drosophila and Apis, Tribolium possesses ion channels gated by acetylcholine, gamma-amino butyric acid (GABA), glutamate and histamine as well as orthologs of the Drosophila pH-sensitive chloride channel subunit (pHCl), CG8916 and CG12344. Similar to Drosophila and Apis, Tribolium cys-loop LGIC diversity is broadened by alternative splicing although the beetle orthologs of RDL and GluCl possess more variants of exon 3. Also, RNA A-to-I editing was observed in two Tribolium nicotinic acetylcholine receptor subunits, Tcasalpha6 and Tcasbeta1. Editing in Tcasalpha6 is evolutionarily conserved with D. melanogaster, A. mellifera and Heliothis virescens, whereas Tcasbeta1 is edited at a site so far only observed in the beetle. CONCLUSION: Our findings reveal that in diverse insect species the cys-loop LGIC superfamily has remained compact with only minor changes in gene numbers. However, alternative splicing, RNA editing and the presence of divergent subunits broadens the cys-loop LGIC proteome and generates species-specific receptor isoforms. These findings on Tribolium castaneum enhance our understanding of cys-loop LGIC functional genomics and provide a useful basis for the development of improved insecticides that target an important agricultural pest. [Abstract/Link to Full Text]

Altincicek B, Vilcinskas A
Analysis of the immune-inducible transcriptome from microbial stress resistant, rat-tailed maggots of the drone fly Eristalis tenax.
BMC Genomics. 2007;8326.
BACKGROUND: The saprophagous and coprophagous maggots of the drone fly Eristalis tenax (Insecta, Diptera) have evolved the unique ability to survive in aquatic habitats with extreme microbial stress such as drains, sewage pools, and farmyard liquid manure storage pits. Therefore, they represent suitable models for the investigation of trade-offs between the benefits resulting from colonization of habitats lacking predators, parasitoids, or competitors and the investment in immunity against microbial stress. In this study, we screened for genes in E. tenax that are induced upon septic injury. Suppression subtractive hybridization was performed to selectively amplify and identify cDNAs that are differentially expressed in response to injected crude bacterial endotoxin (LPS). RESULTS: Untreated E. tenax maggots exhibit significant antibacterial activity in the hemolymph which strongly increases upon challenge with LPS. In order to identify effector molecules contributing to this microbial defense we constructed a subtractive cDNA library using RNA samples from untreated and LPS injected maggots. Analysis of 288 cDNAs revealed induced expression of 117 cDNAs corresponding to 30 novel gene clusters in E. tenax. Among these immune-inducible transcripts we found homologues of known genes from other Diptera such as Drosophila and Anopheles that mediate pathogen recognition (e.g. peptidoglycan recognition protein) or immune-related signaling (e.g. relish). As predicted, we determined a high diversity of novel putative antimicrobial peptides including one E. tenax defensin. CONCLUSION: We identified 30 novel genes of E. tenax that were induced in response to septic injury including novel putative antimicrobial peptides. Further analysis of these immune-related effector molecules from Eristalis may help to elucidate the interdependency of ecological adaptation and molecular evolution of the innate immunity in Diptera. [Abstract/Link to Full Text]

Arvas M, Kivioja T, Mitchell A, Saloheimo M, Ussery D, Penttila M, Oliver S
Comparison of protein coding gene contents of the fungal phyla Pezizomycotina and Saccharomycotina.
BMC Genomics. 2007;8325.
BACKGROUND: Several dozen fungi encompassing traditional model organisms, industrial production organisms and human and plant pathogens have been sequenced recently and their particular genomic features analysed in detail. In addition comparative genomics has been used to analyse specific sub groups of fungi. Notably, analysis of the phylum Saccharomycotina has revealed major events of evolution such as the recent genome duplication and subsequent gene loss. However, little has been done to gain a comprehensive comparative view to the fungal kingdom. We have carried out a computational genome wide comparison of protein coding gene content of Saccharomycotina and Pezizomycotina, which include industrially important yeasts and filamentous fungi, respectively. RESULTS: Our analysis shows that based on genome redundancy, the traditional model organisms Saccharomyces cerevisiae and Neurospora crassa are exceptional among fungi. This can be explained by the recent genome duplication in S. cerevisiae and the repeat induced point mutation mechanism in N. crassa. Interestingly in Pezizomycotina a subset of protein families related to plant biomass degradation and secondary metabolism are the only ones showing signs of recent expansion. In addition, Pezizomycotina have a wealth of phylum specific poorly characterised genes with a wide variety of predicted functions. These genes are well conserved in Pezizomycotina, but show no signs of recent expansion. The genes found in all fungi except Saccharomycotina are slightly better characterised and predicted to encode mainly enzymes. The genes specific to Saccharomycotina are enriched in transcription and mitochondrion related functions. Especially mitochondrial ribosomal proteins seem to have diverged from those of Pezizomycotina. In addition, we highlight several individual gene families with interesting phylogenetic distributions. CONCLUSION: Our analysis predicts that all Pezizomycotina unlike Saccharomycotina can potentially produce a wide variety of secondary metabolites and secreted enzymes and that the responsible gene families are likely to evolve fast. Both types of fungal products can be of commercial value, or on the other hand cause harm to humans. In addition, a great number of novel predicted and known enzymes are found from all fungi except Saccharomycotina. Therefore further studies and exploitation of fungal metabolism appears very promising. [Abstract/Link to Full Text]

Shao YM, Dong K, Zhang CX
The nicotinic acetylcholine receptor gene family of the silkworm, Bombyx mori.
BMC Genomics. 2007;8324.
BACKGROUND: Nicotinic acetylcholine receptors (nAChRs) mediate fast synaptic cholinergic transmission in the insect central nervous system. The insect nAChR is the molecular target of a class of insecticides, neonicotinoids. Like mammalian nAChRs, insect nAChRs are considered to be made up of five subunits, coded by homologous genes belonging to the same family. The nAChR subunit genes of Drosophila melanogaster, Apis mellifera and Anopheles gambiae have been cloned previously based on their genome sequences. The silkworm Bombyx mori is a model insect of Lepidoptera, among which are many agricultural pests. Identification and characterization of B. mori nAChR genes could provide valuable basic information for this important family of receptor genes and for the study of the molecular mechanisms of neonicotinoid action and resistance. RESULTS: We searched the genome sequence database of B. mori with the fruit fly and honeybee nAChRs by tBlastn and cloned all putative silkworm nAChR cDNAs by reverse transcriptase-polymerase chain reaction (RT-PCR) and rapid amplification of cDNA ends (RACE) methods. B. mori appears to have the largest known insect nAChR gene family to date, including nine alpha-type subunits and three beta-type subunits. The silkworm possesses three genes having low identity with others, including one alpha and two beta subunits, alpha 9, beta2 and beta 3. Like the fruit fly and honeybee counterparts, silkworm nAChR gene alpha 6 has RNA-editing sites, and alpha 4, alpha 6 and alpha 8 undergo alternative splicing. In particular, alternative exon 7 of Bm alpha 8 may have arisen from a recent duplication event. Truncated transcripts were found for Bm alpha 4 and Bm alpha 5. CONCLUSION: B. mori possesses a largest known insect nAChR gene family characterized to date, including nine alpha-type subunits and three beta-type subunits. RNA-editing, alternative splicing and truncated transcripts were found in several subunit genes, which might enhance the diversity of the gene family. [Abstract/Link to Full Text]

Chu Z, Li J, Eshaghi M, Karuturi RK, Lin K, Liu J
Adaptive expression responses in the Pol-gamma null strain of S. pombe depleted of mitochondrial genome.
BMC Genomics. 2007;8323.
BACKGROUND: DNA polymerase gamma(Pol-gamma) has been shown to be essential for maintenance of the mitochondrial genome (mtDNA) in the petite-positive budding yeast Saccharomyces cerevisiae. Budding yeast cells lacking mitochondria exhibit a slow-growing or petite-colony phenotype. Petite strains fail to grow on non-fermentable carbon sources. However, it is not clear whether the Pol-gamma is required for mtDNA maintenance in the petite-negative fission yeast Schizosaccharomyces pombe. RESULTS: We show that disruption of the nuclear gene pog1+ that encodes Pol-gamma is sufficient to deplete mtDNA in S. pombe. Cells bearing pog1Delta allele require substantial growth periods to form petite colonies. Mitotracker assays indicate that pog1Delta cells are defective in mitochondrial function and EM analyses suggest that pog1Delta cells lack normal mitochondrial structures. Depletion of mtDNA in pog1Delta cells is evident from quantitative real-time PCR assays. Genome-wide expression profiles of pog1Delta and other mtDNA-less cells reveal that many genes involved in response to stimulus, energy derivation by oxidation of organic compounds, cellular carbohydrate metabolism, and energy reserve metabolism are induced. Conversely, many genes encoding proteins involved in amino acid metabolism and oxidative phosphorylation are repressed. CONCLUSION: By showing that Pol-gamma is essential for mtDNA maintenance and disruption of pog1+ alters the genome-wide expression profiles, we demonstrated that cells lacking mtDNA exhibit adaptive nuclear gene expression responses in the petite-negative S. pombe. [Abstract/Link to Full Text]

McCann JA, Muro EM, Palmer C, Palidwor G, Porter CJ, Andrade-Navarro MA, Rudnicki MA
ChIP on SNP-chip for genome-wide analysis of human histone H4 hyperacetylation.
BMC Genomics. 2007 Sep 14;8(1):322.
ABSTRACT: BACKGROUND: SNP microarrays are designed to genotype Single Nucleotide Polymorphisms (SNPs). These microarrays report hybridization of DNA fragments and therefore can be used for the purpose of detecting genomic fragments. RESULTS: Here, we demonstrate that a SNP microarray can be effectively used in this way to perform chromatin immunoprecipitation (ChIP) on chip as an alternative to tiling microarrays. We illustrate this novel application by mapping whole genome histone H4 hyperacetylation in human myoblasts and myotubes. We detect clusters of hyperacetylated histone H4, often spanning across up to 300 kilobases of genomic sequence. Using complementary genome-wide analyses of gene expression by DNA microarray we demonstrate that these clusters of hyperacetylated histone H4 tend to be associated with expressed genes. CONCLUSIONS: The use of a SNP array for a ChIP-on-chip application (ChIP on SNP-chip) will be of great value to laboratories whose interest is the determination of general rules regarding the relationship of specific chromatin modifications to transcriptional status throughout the genome and to examine the asymmetric modification of chromatin at heterozygous loci. [Abstract/Link to Full Text]

Latreille P, Norton S, Goldman BS, Henkhaus J, Miller N, Barbazuk B, Bode HB, Darby C, Du Z, Forst S, Gaudriault S, Goodner B, Goodrich-Blair H, Slater S
Optical mapping as a routine tool for bacterial genome sequence finishing.
BMC Genomics. 2007;8321.
BACKGROUND: In sequencing the genomes of two Xenorhabdus species, we encountered a large number of sequence repeats and assembly anomalies that stalled finishing efforts. This included a stretch of about 12 Kb that is over 99.9% identical between the plasmid and chromosome of X. nematophila. RESULTS: Whole genome restriction maps of the sequenced strains were produced through optical mapping technology. These maps allowed rapid resolution of sequence assembly problems, permitted closing of the genome, and allowed correction of a large inversion in a genome assembly that we had considered finished. CONCLUSION: Our experience suggests that routine use of optical mapping in bacterial genome sequence finishing is warranted. When combined with data produced through 454 sequencing, an optical map can rapidly and inexpensively generate an ordered and oriented set of contigs to produce a nearly complete genome sequence assembly. [Abstract/Link to Full Text]

Dolan J, Walshe K, Alsbury S, Hokamp K, O'keeffe S, Okafuji T, Miller SF, Tear G, Mitchell KJ
The extracellular Leucine-Rich Repeat superfamily; a comparative survey and analysis of evolutionary relationships and expression patterns.
BMC Genomics. 2007 Sep 14;8(1):320.
ABSTRACT: BACKGROUND: Leucine-rich repeats (LRRs) are highly versatile and evolvable protein-ligand interaction motifs found in a large number of proteins with diverse functions, including innate immunity and nervous system development. Here we catalogue all of the extracellular LRR (eLRR) proteins in worms, flies, mice and humans. We use convergent evidence from several transmembrane-prediction and motif-detection programs, including a customised algorithm, LRRscan, to identify eLRR proteins, and a hierarchical clustering method based on TribeMCL to establish their evolutionary relationships. RESULTS: This yields a total of 369 proteins (29 in worm, 66 in fly, 135 in mouse and 139 in human), many of them of unknown function. We group eLRR proteins into several classes: those with only LRRs, those that cluster with Toll-like receptors (Tlrs), those with immunoglobulin or fibronectin-type 3 (FN3) domains and those with some other domain. These groups show differential patterns of expansion and diversification across species. Our analyses reveal several clusters of novel genes, including two Elfn genes, encoding transmembrane proteins with eLRRs and an FN3 domain, and six genes encoding transmembrane proteins with eLRRs only (the Elron cluster). Many of these are expressed in discrete patterns in the developing mouse brain, notably in the thalamus and cortex. We have also identified a number of novel fly eLRR proteins with discrete expression in the embryonic nervous system. CONCLUSIONS: This study provides the necessary foundation for a systematic analysis of the functions of this class of genes, which are likely to include prominently innate immunity, inflammation and neural development, especially the specification of neuronal connectivity. [Abstract/Link to Full Text]

Yao Z, Jaeger JC, Ruzzo WL, Morales CZ, Emond M, Francke U, Milewicz DM, Schwartz SM, Mulvihill ER
A Marfan syndrome gene expression phenotype in cultured skin fibroblasts.
BMC Genomics. 2007 Sep 12;8(1):319.
ABSTRACT: BACKGROUND: Marfan Syndrome (MFS) is a heritable connective tissue disorder caused by mutations in the fibrillin-1 gene. This syndrome constitutes a significant identifiable subtype of aortic aneurysmal disease, accounting for over 5% of non-atherosclerotic thoracic aortic aneurysms. RESULTS: We used DNA microarrays to identify genes whose altered expression levels may contribute to the phenotype of the disease. Our analysis of 4132 genes identified a subset with significant expression differences between skin fibroblast cultures from unaffected controls versus cultures from affected individuals with known fibrillin-1 mutations. Subsequently, 10 genes were chosen for validation by quantitative qRT-PCR. CONCLUSIONS: Differential expression of many of the validated genes was associated with the MFS samples when an additional group of unaffected and MFS affected subjects were analyzed (p-value < 3 x 10-6 under the null hypothesis that expression levels in cultured fibroblasts are unaffected by MFS status). An unexpected observation was the range of individual gene expression. In unaffected control subjects, expression ranges exceeding 10 fold were seen in many of the genes selected for qRT-PCR validation. The variation in expression in the MFS affected subjects was even greater. [Abstract/Link to Full Text]

Konrad L, Scheiber JA, Völck-Badouin E, Keilani MM, Laible L, Brandt H, Schmidt A, Aumüller G, Hofmann R
Alternative splicing of TGF-betas and their high-affinity receptors T beta RI, T beta RII and T beta RIII (betaglycan) reveal new variants in human prostatic cells.
BMC Genomics. 2007;8318.
BACKGROUND: The transforming growth factors (TGF)-beta, TGF-beta1, TGF-beta2 and TGF-beta 3, and their receptors [T beta RI, T beta RII, T beta R III (betaglycan)] elicit pleiotropic functions in the prostate. Although expression of the ligands and receptors have been investigated, the splice variants have never been analyzed. We therefore have analyzed all ligands, the receptors and the splice variants T beta RIB, T beta RIIB and TGF-beta 2B in human prostatic cells. RESULTS: Interestingly, a novel human receptor transcript T beta RIIC was identified, encoding additional 36 amino acids in the extracellular domain, that is expressed in the prostatic cancer cells PC-3, stromal hPCPs, and other human tissues. Furthermore, the receptor variant T beta RIB with four additional amino acids was identified also in human. Expression of the variant T beta RIIB was found in all prostate cell lines studied with a preferential localization in epithelial cells in some human prostatic glands. Similarly, we observed localization of T beta RIIC and TGF-beta 2B mainly in the epithelial cells with a preferential localization of TGF-beta 2B in the apical cell compartment. Whereas in the androgen-independent hPCPs and PC-3 cells all TGF-beta ligands and receptors are expressed, the androgen-dependent LNCaP cells failed to express all ligands. Additionally, stimulation of PC-3 cells with TGF-beta2 resulted in a significant and strong increase in secretion of plasminogen activator inhibitor-1 (PAI-1) with a major participation of T beta RII. CONCLUSION: In general, expression of the splice variants was more heterogeneous in contrast to the well-known isoforms. The identification of the splice variants T beta RIB and the novel isoform T beta RIIC in man clearly contributes to the growing complexity of the TGF-beta family. [Abstract/Link to Full Text]

Hoegg S, Boore JL, Kuehl JV, Meyer A
Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni.
BMC Genomics. 2007;8317.
BACKGROUND: Teleost fish have seven paralogous clusters of Hox genes stemming from two complete genome duplications early in vertebrate evolution, and an additional genome duplication during the evolution of ray-finned fish, followed by the secondary loss of one cluster. Gene duplications on the one hand, and the evolution of regulatory sequences on the other, are thought to be among the most important mechanisms for the evolution of new gene functions. Cichlid fish, the largest family of vertebrates with about 2500 species, are famous examples of speciation and morphological diversity. Since this diversity could be based on regulatory changes, we chose to study the coding as well as putative regulatory regions of their Hox clusters within a comparative genomic framework. RESULTS: We sequenced and characterized all seven Hox clusters of Astatotilapia burtoni, a haplochromine cichlid fish. Comparative analyses with data from other teleost fish such as zebrafish, two species of pufferfish, stickleback and medaka were performed. We traced losses of genes and microRNAs of Hox clusters, the medaka lineage seems to have lost more microRNAs than the other fish lineages. We found that each teleost genome studied so far has a unique set of Hox genes. The hoxb7a gene was lost independently several times during teleost evolution, the most recent event being within the radiation of East African cichlid fish. The conserved non-coding sequences (CNS) encompass a surprisingly large part of the clusters, especially in the HoxAa, HoxCa, and HoxDa clusters. Across all clusters, we observe a trend towards an increased content of CNS towards the anterior end. CONCLUSION: The gene content of Hox clusters in teleost fishes is more variable than expected, with each species studied so far having a different set. Although the highest loss rate of Hox genes occurred immediately after whole genome duplications, our analyses showed that gene loss continued and is still ongoing in all teleost lineages. Along with the gene content, the CNS content also varies across clusters. The excess of CNS at the anterior end of clusters could imply a stronger conservation of anterior expression patters than those towards more posterior areas of the embryo. [Abstract/Link to Full Text]

Seemann SE, Gilchrist MJ, Hofacker IL, Stadler PF, Gorodkin J
Detection of RNA structures in porcine EST data and related mammals.
BMC Genomics. 2007;8316.
BACKGROUND: Non-coding RNAs (ncRNAs) are involved in a wide spectrum of regulatory functions. Within recent years, there have been increasing reports of observed polyadenylated ncRNAs and mRNA like ncRNAs in eukaryotes. To investigate this further, we examined the large data set in the Sino-Danish PigEST resource which also contains expression information distributed on 97 non-normalized cDNA libraries. RESULTS: We constructed a pipeline, EST2ncRNA, to search for known and novel ncRNAs. The pipeline utilises sequence similarity to ncRNA databases (blast), structure similarity to Rfam (RaveNnA) as well as multiple alignments to predict conserved novel putative RNA structures (RNAz). EST2ncRNA was fed with 48,000 contigs and 73,000 singletons available from the PigEST resource. Using the pipeline we identified known RNA structures in 137 contigs and single reads (conreads), and predicted high confidence RNA structures in non-protein coding regions of additional 1,262 conreads. Of these, structures in 270 conreads overlap with existing predictions in human. To sum up, the PigEST resource comprises trans-acting elements (ncRNAs) in 715 contigs and 340 singletons as well as cis-acting elements (inside UTRs) in 311 contigs and 51 singletons, of which 18 conreads contain both predictions of trans- and cis-acting elements. The predicted RNAz candidates were compared with the PigEST expression information and we identify 114 contigs with an RNAz prediction and expression in at least ten of the non-normalised cDNA libraries. We conclude that the contigs with RNAz and known predictions are in general expressed at a much lower level than protein coding transcripts. In addition, we also observe that our ncRNA candidates constitute about one to two percent of the genes expressed in the cDNA libraries. Intriguingly, the cDNA libraries from developmental (brain) tissues contain the highest amount of ncRNA candidates, about two percent. These observations are related to existing knowledge and hypotheses about the role of ncRNAs in higher organisms. Furthermore, about 80% porcine coding transcripts (of 18,600 identified) as well as less than one-third ORF-free transcripts are conserved at least in the closely related bovine genome. Approximately one percent of the coding and 10% of the remaining matches are unique between the PigEST data and cow genome. Based on the pig-cow alignments, we searched for similarities to 16 other organisms by UCSC available alignments, which resulted in a 87% coverage by the human genome for instance. CONCLUSION: Besides recovering several of the already annotated functional RNA structures, we predicted a large number of high confidence conserved secondary structures in polyadenylated porcine transcripts. Our observations of relatively low expression levels of predicted ncRNA candidates together with the observations of higher relative amount in cDNA libraries from developmental stages are in agreement with the current paradigm of ncRNA roles in higher organisms and supports the idea of polyadenylated ncRNAs. [Abstract/Link to Full Text]

Zeng CJ, Pan HJ, Gong SB, Yu JQ, Wan QH, Fang SG
Giant panda BAC library construction and assembly of a 650-kb contig spanning major histocompatibility complex class II region.
BMC Genomics. 2007;8315.
BACKGROUND: Giant panda is rare and endangered species endemic to China. The low rates of reproductive success and infectious disease resistance have severely hampered the development of captive and wild populations of the giant panda. The major histocompatibility complex (MHC) plays important roles in immune response and reproductive system such as mate choice and mother-fetus bio-compatibility. It is thus essential to understand genetic details of the giant panda MHC. Construction of a bacterial artificial chromosome (BAC) library will provide a new tool for panda genome physical mapping and thus facilitate understanding of panda MHC genes. RESULTS: A giant panda BAC library consisting of 205,800 clones has been constructed. The average insert size was calculated to be 97 kb based on the examination of 174 randomly selected clones, indicating that the giant panda library contained 6.8-fold genome equivalents. Screening of the library with 16 giant panda PCR primer pairs revealed 6.4 positive clones per locus, in good agreement with an expected 6.8-fold genomic coverage of the library. Based on this BAC library, we constructed a contig map of the giant panda MHC class II region from BTNL2 to DAXX spanning about 650 kb by a three-step method: (1) PCR-based screening of the BAC library with primers from homologous MHC class II gene loci, end sequences and BAC clone shotgun sequences, (2) DNA sequencing validation of positive clones, and (3) restriction digest fingerprinting verification of inter-clone overlapping. CONCLUSION: The identifications of genes and genomic regions of interest are greatly favored by the availability of this giant panda BAC library. The giant panda BAC library thus provides a useful platform for physical mapping, genome sequencing or complex analysis of targeted genomic regions. The 650 kb sequence-ready BAC contig map of the giant panda MHC class II region from BTNL2 to DAXX, verified by the three-step method, offers a powerful tool for further studies on the giant panda MHC class II genes. [Abstract/Link to Full Text]

Suetsugu Y, Minami H, Shimomura M, Sasanuma S, Narukawa J, Mita K, Yamamoto K
End-sequencing and characterization of silkworm (Bombyx mori) bacterial artificial chromosome libraries.
BMC Genomics. 2007;8314.
BACKGROUND: We performed large-scale bacterial artificial chromosome (BAC) end-sequencing of two BAC libraries (an EcoRI- and a BamHI-digested library) and conducted an in silico analysis to characterize the obtained sequence data, to make them a useful resource for genomic research on the silkworm (Bombyx mori). RESULTS: More than 94000 BAC end sequences (BESs), comprising more than 55 Mbp and covering about 10.4% of the silkworm genome, were sequenced. Repeat-sequence analysis with known repeat sequences indicated that the long interspersed nuclear elements (LINEs) were abundant in BamHI BESs, whereas DNA-type elements were abundant in EcoRI BESs. Repeat-sequence analysis revealed that the abundance of LINEs might be due to a GC bias of the restriction sites and that the GC content of silkworm LINEs was higher than that of mammalian LINEs. In a BLAST-based sequence analysis of the BESs against two available whole-genome shotgun sequence data sets, more than 70% of the BESs had a BLAST hit with an identity of > or = 99%. About 14% of EcoRI BESs and about 8% of BamHI BESs were paired-end clones with unique sequences at both ends. Cluster analysis of the BESs clarified the proportion of BESs containing protein-coding regions. CONCLUSION: As a result of this characterization, the identified BESs will be a valuable resource for genomic research on Bombyx mori, for example, as a base for construction of a BAC-based physical map. The use of multiple complementary BAC libraries constructed with different restriction enzymes also makes the BESs a more valuable genomic resource. The GenBank accession numbers of the obtained end sequences are DE283657-DE378560. [Abstract/Link to Full Text]

Rios JJ, Perelygin AA, Long MT, Lear TL, Zharkikh AA, Brinton MA, Adelson DL
Characterization of the equine 2'-5' oligoadenylate synthetase 1 (OAS1) and ribonuclease L (RNASEL) innate immunity genes.
BMC Genomics. 2007;8313.
BACKGROUND: The mammalian OAS/RNASEL pathway plays an important role in antiviral host defense. A premature stop-codon within the murine Oas1b gene results in the increased susceptibility of mice to a number of flaviviruses, including West Nile virus (WNV). Mutations in either the OAS1 or RNASEL genes may also modulate the outcome of WNV-induced disease or other viral infections in horses. Polymorphisms in the human OAS gene cluster have been previously utilized for case-control analysis of virus-induced disease in humans. No polymorphisms have yet been identified in either the equine OAS1 or RNASEL genes for use in similar case-control studies. RESULTS: Genomic sequence for equine OAS1 was obtained from a contig assembly generated from a shotgun subclone library of CHORI-241 BAC 100I10. Specific amplification of regions of the OAS1 gene from 13 horses of various breeds identified 33 single nucleotide polymorphisms (SNP) and two microsatellites. RNASEL cDNA sequences were determined for 8 mammals and utilized in a phylogenetic analysis. The chromosomal location of the RNASEL gene was assigned by FISH to ECA5p17-p16 using two selected CHORI-241 BAC clones. The horse genomic RNASEL sequence was assembled. Specific amplification of regions of the RNASEL gene from 13 horses identified 31 SNPs. CONCLUSION: In this report, two dinucleotide microsatellites and 64 single nucleotide polymorphisms within the equine OAS1 and RNASEL genes were identified. These polymorphisms are the first to be reported for these genes and will facilitate future case-control studies of horse susceptibility to infectious diseases. [Abstract/Link to Full Text]

Siegel N, Hoegg S, Salzburger W, Braasch I, Meyer A
Comparative genomics of ParaHox clusters of teleost fishes: gene cluster breakup and the retention of gene sets following whole genome duplications.
BMC Genomics. 2007;8312.
BACKGROUND: The evolutionary lineage leading to the teleost fish underwent a whole genome duplication termed FSGD or 3R in addition to two prior genome duplications that took place earlier during vertebrate evolution (termed 1R and 2R). Resulting from the FSGD, additional copies of genes are present in fish, compared to tetrapods whose lineage did not experience the 3R genome duplication. Interestingly, we find that ParaHox genes do not differ in number in extant teleost fishes despite their additional genome duplication from the genomic situation in mammals, but they are distributed over twice as many paralogous regions in fish genomes. RESULTS: We determined the DNA sequence of the entire ParaHox C1 paralogon in the East African cichlid fish Astatotilapia burtoni, and compared it to orthologous regions in other vertebrate genomes as well as to the paralogous vertebrate ParaHox D paralogons. Evolutionary relationships among genes from these four chromosomal regions were studied with several phylogenetic algorithms. We provide evidence that the genes of the ParaHox C paralogous cluster are duplicated in teleosts, just as it had been shown previously for the D paralogon genes. Overall, however, synteny and cluster integrity seems to be less conserved in ParaHox gene clusters than in Hox gene clusters. Comparative analyses of non-coding sequences uncovered conserved, possibly co-regulatory elements, which are likely to contain promoter motives of the genes belonging to the ParaHox paralogons. CONCLUSION: There seems to be strong stabilizing selection for gene order as well as gene orientation in the ParaHox C paralogon, since with a few exceptions, only the lengths of the introns and intergenic regions differ between the distantly related species examined. The high degree of evolutionary conservation of this gene cluster's architecture in particular - but possibly clusters of genes more generally - might be linked to the presence of promoter, enhancer or inhibitor motifs that serve to regulate more than just one gene. Therefore, deletions, inversions or relocations of individual genes could destroy the regulation of the clustered genes in this region. The existence of such a regulation network might explain the evolutionary conservation of gene order and orientation over the course of hundreds of millions of years of vertebrate evolution. Another possible explanation for the highly conserved gene order might be the existence of a regulator not located immediately next to its corresponding gene but further away since a relocation or inversion would possibly interrupt this interaction. Different ParaHox clusters were found to have experienced differential gene loss in teleosts. Yet the complete set of these homeobox genes was maintained, albeit distributed over almost twice the number of chromosomes. Selection due to dosage effects and/or stoichiometric disturbance might act more strongly to maintain a modal number of homeobox genes (and possibly transcription factors more generally) per genome, yet permit the accumulation of other (non regulatory) genes associated with these homeobox gene clusters. [Abstract/Link to Full Text]

Ranganathan S, Nagaraj SH, Hu M, Strube C, Schnieder T, Gasser RB
A transcriptomic analysis of the adult stage of the bovine lungworm, Dictyocaulus viviparus.
BMC Genomics. 2007;8311.
BACKGROUND: Lungworms of the genus Dictyocaulus (family Dictyocaulidae) are parasitic nematodes of major economic importance. They cause pathological effects and clinical disease in various ruminant hosts, particularly in young animals. Dictyocaulus viviparus, called the bovine lungworm, is a major pathogen of cattle, with severe infections being fatal. In this study, we provide first insights into the transcriptome of the adult stage of D. viviparus through the analysis of expressed sequence tags (ESTs). RESULTS: Using our EST analysis pipeline, we estimate that the present dataset of 4436 ESTs is derived from 2258 genes based on cluster and comparative genomic analyses of the ESTs. Of the 2258 representative ESTs, 1159 (51.3%) had homologues in the free-living nematode C. elegans, 1174 (51.9%) in parasitic nematodes, 827 (36.6%) in organisms other than nematodes, and 863 (38%) had no significant match to any sequence in the current databases. Of the C. elegans homologues, 569 had observed 'non-wildtype' RNAi phenotypes, including embryonic lethality, maternal sterility, sterility in progeny, larval arrest and slow growth. We could functionally classify 776 (35%) sequences using the Gene Ontologies (GO) and established pathway associations to 696 (31%) sequences in Kyoto Encyclopedia of Genes and Genomes (KEGG). In addition, we predicted 85 secreted proteins which could represent potential candidates for developing novel anthelmintics or vaccines. CONCLUSION: The bioinformatic analyses of ESTs data for D. viviparus has elucidated sets of relatively conserved and potentially novel genes. The genes discovered in this study should assist research toward a better understanding of the basic molecular biology of D. viviparus, which could lead, in the longer term, to novel intervention strategies. The characterization of the D. viviparus transcriptome also provides a foundation for whole genome sequence analysis and future comparative transcriptomic analyses. [Abstract/Link to Full Text]

Prasad A, Schiex T, McKay S, Murdoch B, Wang Z, Womack JE, Stothard P, Moore SS
High resolution radiation hybrid maps of bovine chromosomes 19 and 29: comparison with the bovine genome sequence assembly.
BMC Genomics. 2007;8310.
BACKGROUND: High resolution radiation hybrid (RH) maps can facilitate genome sequence assembly by correctly ordering genes and genetic markers along chromosomes. The objective of the present study was to generate high resolution RH maps of bovine chromosomes 19 (BTA19) and 29 (BTA29), and compare them with the current 7.1X bovine genome sequence assembly (bovine build 3.1). We have chosen BTA19 and 29 as candidate chromosomes for mapping, since many Quantitative Trait Loci (QTL) for the traits of carcass merit and residual feed intake have been identified on these chromosomes. RESULTS: We have constructed high resolution maps of BTA19 and BTA29 consisting of 555 and 253 Single Nucleotide Polymorphism (SNP) markers respectively using a 12,000 rad whole genome RH panel. With these markers, the RH map of BTA19 and BTA29 extended to 4591.4 cR and 2884.1 cR in length respectively. When aligned with the current bovine build 3.1, the order of markers on the RH map for BTA19 and 29 showed inconsistencies with respect to the genome assembly. Maps of both the chromosomes show that there is a significant internal rearrangement of the markers involving displacement, inversion and flips within the scaffolds with some scaffolds being misplaced in the genome assembly. We also constructed cattle-human comparative maps of these chromosomes which showed an overall agreement with the comparative maps published previously. However, minor discrepancies in the orientation of few homologous synteny blocks were observed. CONCLUSION: The high resolution maps of BTA19 (average 1 locus/139 kb) and BTA29 (average 1 locus/208 kb) presented in this study suggest that by the incorporation of RH mapping information, the current bovine genome sequence assembly can be significantly improved. Furthermore, these maps can serve as a potential resource for fine mapping QTL and identification of causative mutations underlying QTL for economically important traits. [Abstract/Link to Full Text]

Miranda-Saavedra D, Stark MJ, Packer JC, Vivares CP, Doerig C, Barton GJ
The complement of protein kinases of the microsporidium Encephalitozoon cuniculi in relation to those of Saccharomyces cerevisiae and Schizosaccharomyces pombe.
BMC Genomics. 2007;8309.
BACKGROUND: Microsporidia, parasitic fungi-related eukaryotes infecting many cell types in a wide range of animals (including humans), represent a serious health threat in immunocompromised patients. The 2.9 Mb genome of the microsporidium Encephalitozoon cuniculi is the smallest known of any eukaryote. Eukaryotic protein kinases are a large superfamily of enzymes with crucial roles in most cellular processes, and therefore represent potential drug targets. We report here an exhaustive analysis of the E. cuniculi genomic database aimed at identifying and classifying all protein kinases of this organism with reference to the kinomes of two highly-divergent yeast species, Saccharomyces cerevisiae and Schizosaccharomyces pombe. RESULTS: A database search with a multi-level protein kinase family hidden Markov model library led to the identification of 29 conventional protein kinase sequences in the E. cuniculi genome, as well as 3 genes encoding atypical protein kinases. The microsporidian kinome presents striking differences from those of other eukaryotes, and this minimal kinome underscores the importance of conserved protein kinases involved in essential cellular processes. Approximately 30% of its kinases are predicted to regulate cell cycle progression while another approximately 28% have no identifiable homologues in model eukaryotes and are likely to reflect parasitic adaptations. E. cuniculi lacks MAP kinase cascades and almost all protein kinases that are involved in stress responses, ion homeostasis and nutrient signalling in the model fungi S. cerevisiae and S. pombe, including AMPactivated protein kinase (Snf1), previously thought to be ubiquitous in eukaryotes. A detailed database search and phylogenetic analysis of the kinomes of the two model fungi showed that the degree of homology between their kinomes of approximately 85% is much higher than that previously reported. CONCLUSION: The E. cuniculi kinome is by far the smallest eukaryotic kinome characterised to date. The difficulty in assigning clear homology relationships for nine out of the twentynine microsporidian conventional protein kinases despite its compact genome reflects the phylogenetic distance between microsporidia and other eukaryotes. Indeed, the E. cuniculi genome presents a high proportion of genes in which evolution has been accelerated by up to four-fold. There are no orthologues of the protein kinases that constitute MAP kinase pathways and many other protein kinases with roles in nutrient signalling are absent from the E. cuniculi kinome. However, orthologous kinases can nonetheless be identified that correspond to members of the yeast kinomes with roles in some of the most fundamental cellular processes. For example, E. cuniculi has clear orthologues of virtually all the major conserved protein kinases that regulate the core cell cycle machinery (Aurora, Polo, DDK, CDK and Chk1). A comprehensive comparison of the homology relationships between the budding and fission yeast kinomes indicates that, despite an estimated 800 million years of independent evolution, the two model fungi share approximately 85% of their protein kinases. This will facilitate the annotation of many of the as yet uncharacterised fission yeast kinases, and also those of novel fungal genomes. [Abstract/Link to Full Text]

Menard A, Estrada de Los Santos P, Graindorge A, Cournoyer B
Architecture of Burkholderia cepacia complex sigma 70 gene family: evidence of alternative primary and clade-specific factors, and genomic instability.
BMC Genomics. 2007 Sep 4;8(1):308.
ABSTRACT: BACKGROUND: The Burkholderia cepacia complex (Bcc) groups bacterial species with beneficial properties that can improve crop yields or remediate polluted sites but can also lead to dramatic human clinical outcomes among cystic fibrosis (CF) or immuno-compromised individuals. Genome-wide regulatory processes of gene expression could explain parts of this bacterial duality. Transcriptional sigma 70 factors are components of these processes. They allow the reversible binding of the DNA-dependent RNA polymerase to form the holoenzyme that will lead to mRNA synthesis from a DNA promoter region. Bcc genome-wide analyses were performed to investigate the major evolutionary trends taking place in the sigma 70 family of these bacteria. RESULTS: Twenty sigma 70 paralogous genes were detected in the Burkholderia cenocepacia strain J2315 (Bcen-J2315) genome, of which 14 were of the ECF (extracytoplasmic function) group. Non-ECF paralogs were related to primary (rpoD), alternative primary, stationary phase (rpoS), flagellin biosynthesis (fliA), and heat shock (rpoH) factors. The number of sigma 70 genetic determinants among this genome was of 2,86 per Mb. This number is lower than the one of Pseudomonas aeruginosa, a species found in similar habitats including CF lungs. These two bacterial groups showed strikingly different sigma 70 family architectures, with only three ECF paralogs in common (fecI-like, pvdS and algU). Bcen-J2315 sigma 70 paralogs showed clade-specific distributions. Some paralogs appeared limited to the ET12 epidemic clone (ecfA2), particular Bcc species (sigI), the Burkholderia genus (ecfJ, ecfF, and sigJ), certain proteobacterial groups (ecfA1, ecfC, ecfD, ecfE, ecfG, ecfL, ecfM and rpoS), or were broadly distributed in the eubacteria (ecfI, ecfK, ecfH, ecfB, and rpoD-, rpoH-, fliA-like genes). Genomic instability of this gene family was driven by chromosomal inversion (ecfA2), recent duplication events (ecfA and RpoD), localized (ecfG) and large scale deletions (sigI, sigJ, ecfC, ecfH, and ecfK), and a phage integration event (ecfE). CONCLUSIONS: The Bcc sigma 70 gene family was found to be under strong selective pressures that could lead to acquisition/deletion, and duplication events modifying its architecture. Comparative analysis of Bcc and Pseudomonas aeruginosa sigma 70 gene families revealed distinct evolutionary strategies, with the Bcc having selected several alternative primary factors, something not recorded among P. aeruginosa and only previously reported to occur among the actinobacteria. [Abstract/Link to Full Text]

Hübscher J, Jansen A, Kotte O, Schäfer J, Majcherczyk PA, Harris LG, Bierbaum G, Heinemann M, Berger-Bächi B
Living with an imperfect cell wall: compensation of femAB inactivation in Staphylococcus aureus.
BMC Genomics. 2007;8307.
BACKGROUND: Synthesis of the Staphylococcus aureus peptidoglycan pentaglycine interpeptide bridge is catalyzed by the nonribosomal peptidyl transferases FemX, FemA and FemB. Inactivation of the femAB operon reduces the interpeptide to a monoglycine, leading to a poorly crosslinked peptidoglycan. femAB mutants show a reduced growth rate and are hypersusceptible to virtually all antibiotics, including methicillin, making FemAB a potential target to restore beta-lactam susceptibility in methicillin-resistant S. aureus (MRSA). Cis-complementation with wild type femAB only restores synthesis of the pentaglycine interpeptide and methicillin resistance, but the growth rate remains low. This study characterizes the adaptations that ensured survival of the cells after femAB inactivation. RESULTS: In addition to slow growth, the cis-complemented femAB mutant showed temperature sensitivity and a higher methicillin resistance than the wild type. Transcriptional profiling paired with reporter metabolite analysis revealed multiple changes in the global transcriptome. A number of transporters for sugars, glycerol, and glycine betaine, some of which could serve as osmoprotectants, were upregulated. Striking differences were found in the transcription of several genes involved in nitrogen metabolism and the arginine-deiminase pathway, an alternative for ATP production. In addition, microarray data indicated enhanced expression of virulence factors that correlated with premature expression of the global regulators sae, sarA, and agr. CONCLUSION: Survival under conditions preventing normal cell wall formation triggered complex adaptations that incurred a fitness cost, showing the remarkable flexibility of S. aureus to circumvent cell wall damage. Potential FemAB inhibitors would have to be used in combination with other antibiotics to prevent selection of resistant survivors. [Abstract/Link to Full Text]

Maranda B, Lemieux N, Lemyre E
Familial deletion 18p syndrome: case report.
BMC Med Genet. 2006;760.
BACKGROUND: Deletion 18p is a frequent deletion syndrome characterized by dysmorphic features, growth deficiencies, and mental retardation with a poorer verbal performance. Until now, five families have been described with limited clinical description. We report transmission of deletion 18p from a mother to her two daughters and review the previous cases. CASE PRESENTATION: The proband is 12 years old and has short stature, dysmorphic features and moderate mental retardation. Her sister is 9 years old and also has short stature and similar dysmorphic features. Her cognitive performance is within the borderline to mild mental retardation range. The mother also presents short stature. Psychological evaluation showed moderate mental retardation. Chromosome analysis from the sisters and their mother revealed the same chromosomal deletion: 46, XX, del(18)(p11.2). Previous familial cases were consistent regarding the transmission of mental retardation. Our family differs in this regard with variable cognitive impairment and does not display poorer verbal than non-verbal abilities. An exclusive maternal transmission is observed throughout those families. Women with del(18p) are fertile and seem to have a normal miscarriage rate. CONCLUSION: Genetic counseling for these patients should take into account a greater range of cognitive outcome than previously reported. [Abstract/Link to Full Text]

Maciolek NL, Alward WL, Murray JC, Semina EV, McNally MT
Analysis of RNA splicing defects in PITX2 mutants supports a gene dosage model of Axenfeld-Rieger syndrome.
BMC Med Genet. 2006;759.
BACKGROUND: Axenfeld-Rieger syndrome (ARS) is associated with mutations in the PITX2 gene that encodes a homeobox transcription factor. Several intronic PITX2 mutations have been reported in Axenfeld-Rieger patients but their effects on gene expression have not been tested. METHODS: We present two new families with recurrent PITX2 intronic mutations and use PITX2c minigenes and transfected cells to address the hypothesis that intronic mutations effect RNA splicing. Three PITX2 mutations have been analyzed: a G>T mutation within the AG 3' splice site (ss) junction associated with exon 4 (IVS4-1G>T), a G>C mutation at position +5 of the 5' (ss) of exon 4 (IVS4+5G>C), and a previously reported A>G substitution at position -11 of 3'ss of exon 5 (IVS5-11A>G). RESULTS: Mutation IVS4+5G>C showed 71% retention of the intron between exons 4 and 5, and poorly expressed protein. Wild-type protein levels were proportionally expressed from correctly spliced mRNA. The G>T mutation within the exon 4 AG 3'ss junction shifted splicing exclusively to a new AG and resulted in a severely truncated, poorly expressed protein. Finally, the A>G substitution at position -11 of the 3'ss of exon 5 shifted splicing exclusively to a newly created upstream AG and resulted in generation of a protein with a truncated homeodomain. CONCLUSION: This is the first direct evidence to support aberrant RNA splicing as the mechanism underlying the disorder in some patients and suggests that the magnitude of the splicing defect may contribute to the variability of ARS phenotypes, in support of a gene dosage model of Axenfeld-Rieger syndrome. [Abstract/Link to Full Text]

Borlak J, Reamon-Buettner SM
N-acetyltransferase 2 (NAT2) gene polymorphisms in colon and lung cancer patients.
BMC Med Genet. 2006;758.
BACKGROUND: N-acetyltransferase 2 (NAT2) metabolizes arylamines and hydrazines moeities found in many therapeutic drugs, chemicals and carcinogens. The gene encoding NAT2 is polymorphic, thus resulting in rapid or slow acetylator phenotypes. The acetylator status may, therefore, predispose drug-induced toxicities and cancer risks, such as bladder, colon and lung cancer. Indeed, some studies demonstrate a positive association between NAT2 rapid acetylator phenotype and colon cancer, but results are inconsistent. The role of NAT2 acetylation status in lung cancer is likewise unclear, in which both the rapid and slow acetylator genotypes have been associated with disease. METHODS: We investigated three genetic variations, c.481C>T, c.590G>A (p.R197Q) and c.857G>A (p.G286E), of the NAT2 gene, which are known to result in a slow acetylator phenotype. Using validated PCR-RFLP assays, we genotyped 243 healthy unrelated Caucasian control subjects, 92 colon and 67 lung cancer patients for these genetic variations. As there is a recent meta-analysis of NAT2 studies on colon cancer (unlike in lung cancer), we have also undertaken a systematic review of NAT2 studies on lung cancer, and we incorporated our results in a meta-analysis consisting of 16 studies, 3,865 lung cancer patients and 6,077 control subjects. RESULTS: We did not obtain statistically significant differences in NAT2 allele and genotype frequencies in colon cancer patients and control group. Certain genotypes, however, such as [c.590AA+c.857GA] and [c.590GA+c.857GA] were absent among the colon cancer patients. Similarly, allele frequencies in lung cancer patients and controls did not differ significantly. Nevertheless, there was a significant increase of genotypes [c.590GA] and [c.481CT+c.590GA], but absence of homozygous c.590AA and [c.590AA+c.857GA] in the lung cancer group. Meta-analysis of 16 NAT2 studies on lung cancer did not evidence an overall association of the rapid or slow acetylator status to lung cancer. Similarly, the summary odds ratios obtained with stratified meta-analysis based on ethnicity, and smoking status were not significant. CONCLUSION: Our study failed to show an overall association of NAT2 genotypes to either colon or lung cancer risk. [Abstract/Link to Full Text]

Wang CY, Nguyen ND, Morrison NA, Eisman JA, Center JR, Nguyen TV
Beta3-adrenergic receptor gene, body mass index, bone mineral density and fracture risk in elderly men and women: the Dubbo Osteoporosis Epidemiology Study (DOES).
BMC Med Genet. 2006;757.
BACKGROUND: Recent studies have suggested that the Arg allele of beta3-adrenergic receptor (ADRB3) gene is associated with body mass index (BMI), which is an important predictor of bone mineral density (BMD) and fracture risk. However, whether the ADRB3 gene polymorphism is associated with fracture risk has not been investigated. The aim of study was to examine the inter-relationships between ADRB3 gene polymorphisms, BMI, BMD and fracture risk in elderly Caucasians. METHODS: Genotypes of the ADRB3 gene were determined in 265 men and 446 women aged 60+ in 1989 at entry into the study, whose BMD were measured by DXA (GE Lunar, WI USA) at baseline. During the follow-up period (between 1989 and 2004), fractures were ascertained by reviewing radiography reports and personal interviews. RESULTS: The allelic frequencies of the Trp and the Arg alleles were 0.925 and 0.075 respectively, and the relative frequencies of genotypes Trp/Trp, Trp/Arg and Arg/Arg 0.857, 0.138 and 0.006 respectively. There was no significant association between BMI and ADRB3 genotypes (p = 0.10 in women and p = 0.68 in men). There was also no significant association between ADRB3 genotypes and lumbar spine or femoral neck BMD in either men and women. Furthermore, there were no significant association between ADRB3 genotypes and fracture risk in both women and men, either before or after adjusting for and, BMD and BMI. CONCLUSION: The present data suggested that in Caucasian population the contribution of ADRB3 genotypes to the prediction of BMI, BMD and fracture risk is limited. [Abstract/Link to Full Text]

Ortiz J, Fernández-Arquero M, Urcelay E, López-Mejías R, Ferreira A, Fontán G, de la Concha EG, Martínez A
Interleukin-10 polymorphisms in Spanish IgA deficiency patients: a case-control and family study.
BMC Med Genet. 2006;756.
BACKGROUND: IgA deficiency (IgAD) is the most common primary immunodeficiency in Caucasians. Genetic and environmental factors are suspected to be involved in the development of the disease. Interleukin-10 (IL-10) is a cytokine with stimulatory activity on immunoglobulin production and it may be an important regulator in IgAD pathogenesis. The IL-10 gene contains several single nucleotide polymorphisms (SNPs) and two polymorphic microsatellites located in the 5'-flanking region. Our aim was to ascertain if any of these polymorphic markers are associated or linked to IgAD in Spanish patients. METHODS: We genotyped 278 patients with IgAD and 573 ethnically matched controls for the microsatellites IL-10R and IL-10G and for three single nucleotide polymorphisms at positions -1082, -819 and -592 in the proximal promoter of the gene. We also included in this study the parents of 194 patients in order to study the IL-10 haplotypes transmitted and not transmitted to the affected offspring. RESULTS: The only allele where a significant difference was observed in the comparison between IgA deficiency patients and controls was the IL-10G12 allele (OR = 1.58 and p = 0.021). However, this p value could not withstand a Bonferroni correction. None of the IL-10R or promoter SNP alleles was found at a different frequency when patients were compared with controls. CONCLUSION: Our data do not show any significant difference in IL-10 polymorphism frequencies between control and IgAD patient samples. Their haplotype distribution among patients and controls was also equivalent and therefore these microsatellites and SNPs do not seem to influence IgAD susceptibility. [Abstract/Link to Full Text]

Nissen PH, Damgaard D, Stenderup A, Nielsen GG, Larsen ML, Faergeman O
Genomic characterization of five deletions in the LDL receptor gene in Danish Familial Hypercholesterolemic subjects.
BMC Med Genet. 2006;755.
BACKGROUND: Familial Hypercholesterolemia is a common autosomal dominantly inherited disease that is most frequently caused by mutations in the gene encoding the receptor for low density lipoproteins (LDLR). Deletions and other major structural rearrangements of the LDLR gene account for approximately 5% of the mutations in many populations. METHODS: Five genomic deletions in the LDLR gene were characterized by amplification of mutated alleles and sequencing to identify genomic breakpoints. A diagnostic assay based on duplex PCR for the exon 7-8 deletion was developed to discriminate between heterozygotes and normals, and bioinformatic analyses were used to identify interspersed repeats flanking the deletions. RESULTS: In one case 15 bp had been inserted at the site of the deleted DNA, and, in all five cases, Alu elements flanked the sites where deletions had occurred. An assay developed to discriminate the wildtype and the deletion allele in a simple duplex PCR detected three FH patients as heterozygotes, and two individuals with normal lipid values were detected as normal homozygotes. CONCLUSION: The identification of the breakpoints should make it possible to develop specific tests for these mutations, and the data provide further evidence for the role of Alu repeats in intragenic deletions. [Abstract/Link to Full Text]

Santiago JL, Martínez A, de la Calle H, Fernández-Arquero M, Figueredo MA, de la Concha EG, Urcelay E
Evidence for the association of the SLC22A4 and SLC22A5 genes with type 1 diabetes: a case control study.
BMC Med Genet. 2006;754.
BACKGROUND: Type 1 diabetes (T1D) is a chronic, autoimmune and multifactorial disease characterized by abnormal metabolism of carbohydrate and fat. Diminished carnitine plasma levels have been previously reported in T1D patients and carnitine increases the sensitivity of the cells to insulin. Polymorphisms in the carnitine transporters, encoded by the SLC22A4 and SLC22A5 genes, have been involved in susceptibility to two other autoimmune diseases, rheumatoid arthritis and Crohn's disease. For these reasons, we investigated for the first time the association with T1D of six single nucleotide polymorphisms (SNPs) mapping to these candidate genes: slc2F2, slc2F11, T306I, L503F, OCTN2-promoter and OCTN2-intron. METHODS: A case-control study was performed in the Spanish population with 295 T1D patients and 508 healthy control subjects. Maximum-likelihood haplotype frequencies were estimated by applying the Expectation-Maximization (EM) algorithm implemented by the Arlequin software. RESULTS: When independently analyzed, one of the tested polymorphisms in the SLC22A4 gene at 1672 showed significant association with T1D in our Spanish cohort. The overall comparison of the inferred haplotypes was significantly different between patients and controls (chi2 = 10.43; p = 0.034) with one of the haplotypes showing a protective effect for T1D (rs3792876/rs1050152/rs2631367/rs274559, CCGA: OR = 0.62 (0.41-0.93); p = 0.02). CONCLUSION: The haplotype distribution in the carnitine transporter locus seems to be significantly different between T1D patients and controls; however, additional studies in independent populations would allow to confirm the role of these genes in T1D risk. [Abstract/Link to Full Text]

Engelfried K, Vorgerd M, Hagedorn M, Haas G, Gilles J, Epplen JT, Meins M
Charcot-Marie-Tooth neuropathy type 2A: novel mutations in the mitofusin 2 gene (MFN2).
BMC Med Genet. 2006;753.
BACKGROUND: Charcot-Marie-Tooth neuropathies are a group of genetically heterogeneous diseases of the peripheral nervous system. Mutations in the MFN2 gene have been reported as the primary cause of Charcot-Marie-Tooth disease type 2A. METHODS: Patients with the clinical diagnosis of Charcot-Marie-Tooth type 2 were screened using single strand conformation polymorphism (SSCP). All DNA samples showing band shifts in the SSCP analysis were amplified from genomic DNA and cycle sequenced. RESULTS: We analyzed a total of 73 unrelated patients with a clinical diagnosis of CMT 2. Overall, novel mutations were detected in 6 patients. c.380G>T (G127V), c.1128G>A (M376I), c.1040A>T (E347V), c.1403G>A (R468H), c.2113G>A (V705I), and c.2258_2259insT (L753fs). CONCLUSION: We confirmed a significant role of mutations in MFN2 in the pathogenesis of Charcot-Marie-Tooth disease type 2. [Abstract/Link to Full Text]

Abu-Amero KK, Al-Boudari OM, Mohamed GH, Dzimiri N
E-selectin S128R polymorphism and severe coronary artery disease in Arabs.
BMC Med Genet. 2006;752.
BACKGROUND: The E-selectin p. S128R (g. A561C) polymorphism has been associated with the presence of angiographic coronary artery disease (CAD) in some populations, but no data is currently available on its association with CAD in Arabs. METHODS: In the present study, we determined the potential relevance of the E-selectin S128R polymorphism for severe CAD and its associated risk factors among Arabs. We genotyped Saudi Arabs for this polymorphism by PCR, followed by restriction enzyme digestion. RESULTS: The polymorphism was determined in 556 angiographically confirmed severe CAD patients and 237 control subjects with no CAD as established angiographically (CON). Frequencies of the S/S, S/R and R/R genotypes were found as 81.1%, 16.6% and 2.3% in CAD patients and 87.8%, 11.8%, and 0.4% in CON subjects, respectively. The frequency of the mutant 128R allele was higher among CAD patients compared to CON group (11% vs. 6%; odds ratio = 1.76; 95% CI 1.14 - 2.72; p = .007), thus indicating a significant association of the 128R allele with CAD among our population. However, the stepwise logistic regression for the 128R allele and different CAD risk factors showed no significant association. CONCLUSION: Among the Saudi population, The E-selectin p. S128R (g. A561C) polymorphism was associated with angiographic CAD in Univariate analysis, but lost its association in multivariate analysis. [Abstract/Link to Full Text]

Freathy RM, Weedon MN, Melzer D, Shields B, Hitman GA, Walker M, McCarthy MI, Hattersley AT, Frayling TM
The functional "KL-VS" variant of KLOTHO is not associated with type 2 diabetes in 5028 UK Caucasians.
BMC Med Genet. 2006;751.
BACKGROUND: Klotho has an important role in insulin signalling and the development of ageing-like phenotypes in mice. The common functional "KL-VS" variant in the KLOTHO (KL) gene is associated with longevity in humans but its role in type 2 diabetes is not known. We performed a large case-control and family-based study to test the hypothesis that KL-VS is associated with type 2 diabetes in a UK Caucasian population. METHODS: We genotyped 1793 cases, 1619 controls and 1616 subjects from 509 families for the single nucleotide polymorphism (SNP) F352V (rs9536314) that defines the KL-VS variant. Allele and genotype frequencies were compared between cases and controls. Family-based analysis was used to test for over- or under-transmission of V352 to affected offspring. RESULTS: Despite good power to detect odds ratios of 1.2, there were no significant associations between alleles or genotypes and type 2 diabetes (V352 allele: odds ratio = 0.96 (0.84-1.09)). Additional analysis of quantitative trait data in 1177 healthy control subjects showed no association of the variant with fasting insulin, glucose, triglycerides, HDL- or LDL-cholesterol (all P > 0.05). However, the HDL-cholesterol levels observed across the genotype groups showed a similar, but non-significant, pattern to previously reported data. CONCLUSION: This is the first large-scale study to examine the association between common functional variation in KL and type 2 diabetes risk. We have found no evidence that the functional KL-VS variant is a risk factor for type 2 diabetes in a large UK Caucasian case-control and family-based study. [Abstract/Link to Full Text]

Marti A, Ochoa MC, Sánchez-Villegas A, Martínez JA, Martínez-González MA, Hebebrand J, Hinney A, Vedder H
Meta-analysis on the effect of the N363S polymorphism of the glucocorticoid receptor gene (GRL) on human obesity.
BMC Med Genet. 2006;750.
BACKGROUND: Since both excess glucocorticoid secretion and central obesity are clinical features of some obese patients, it is worthwhile to study a possible association of glucocorticoid receptor gene (GRL) variants with obesity. Previous studies have linked the N363S variant of the GRL gene to increased glucocorticoid effects such as higher body fat, a lower lean-body mass and a larger insulin response to dexamethasone. However, contradictory findings have been also reported about the association between this variant and obesity phenotypes. Individual studies may lack statistical power which may result in disparate results. This limitation can be overcome using meta-analytic techniques. METHODS: We conducted a meta-analysis to assess the association between the N363S polymorphism of the GRL gene and obesity risk. In addition to published research, we included also our own unpublished data -three novel case-control studies- in the meta-analysis The new case-control studies were conducted in German and Spanish children, adolescents and adults (total number of subjects: 1,117). Genotype was assessed by PCR-RFLP (Tsp509I). The final formal meta-analysis included a total number of 5,909 individuals. RESULTS: The meta-analysis revealed a higher body mass index (BMI) with an overall estimation of +0.18 kg/m2 (95% CI: +0.004 to +0.35) for homo-/heterozygous carriers of the 363S allele of the GRL gene in comparison to non-carriers. Moreover, differences in pooled BMI were statistically significant and positive when considering one-group studies from the literature in which participants had a BMI below 27 kg/m2 (+ 0.41 kg/m2 [95% CI +0.17 to +0.66]), but the differences in BMI were negative when only our novel data from younger (aged under 45) and normal weight subjects were pooled together (-0.50 kg/m2 [95% CI -0.84 to -0.17]). The overall risk for obesity for homo-/heterozygous carriers of the 363S allele was not statistically significant in the meta-analysis (pooled OR = 1.02; 95% CI: 0.56-1.87). CONCLUSION: Although certain genotypic effects could be population-specific, we conclude that there is no compelling evidence that the N363S polymorphism of the GRL gene is associated with either average BMI or obesity risk. [Abstract/Link to Full Text]

Chowdhury MA, Kuivaniemi H, Romero R, Edwin S, Chaiworapongsa T, Tromp G
Identification of novel functional sequence variants in the gene for peptidase inhibitor 3.
BMC Med Genet. 2006;749.
BACKGROUND: Peptidase inhibitor 3 (PI3) inhibits neutrophil elastase and proteinase-3, and has a potential role in skin and lung diseases as well as in cancer. Genome-wide expression profiling of chorioamniotic membranes revealed decreased expression of PI3 in women with preterm premature rupture of membranes. To elucidate the molecular mechanisms contributing to the decreased expression in amniotic membranes, the PI3 gene was searched for sequence variations and the functional significance of the identified promoter variants was studied. METHODS: Single nucleotide polymorphisms (SNPs) were identified by direct sequencing of PCR products spanning a region from 1,173 bp upstream to 1,266 bp downstream of the translation start site. Fourteen SNPs were genotyped from 112 and nine SNPs from 24 unrelated individuals. Putative transcription factor binding sites as detected by in silico search were verified by electrophoretic mobility shift assay (EMSA) using nuclear extract from Hela and amnion cell nuclear extract. Deviation from Hardy-Weinberg equilibrium (HWE) was tested by chi2 goodness-of-fit test. Haplotypes were estimated using expectation maximization (EM) algorithm. RESULTS: Twenty-three sequence variations were identified by direct sequencing of polymerase chain reaction (PCR) products covering 2,439 nt of the PI3 gene (-1,173 nt of promoter sequences and all three exons). Analysis of 112 unrelated individuals showed that 20 variants had minor allele frequencies (MAF) ranging from 0.02 to 0.46 representing "true polymorphisms", while three had MAF < or = 0.01. Eleven variants were in the promoter region; several putative transcription factor binding sites were found at these sites by database searches. Differential binding of transcription factors was demonstrated at two polymorphic sites by electrophoretic mobility shift assays, both in amniotic and HeLa cell nuclear extracts. Differential binding of the transcription factor GATA1 at -689C>G site was confirmed by a supershift. CONCLUSION: The promoter sequences of PI3 have a high degree of variability. Functional promoter variants provide a possible mechanism for explaining the differences in PI3 mRNA expression levels in the chorioamniotic membranes, and are also likely to be useful in elucidating the role of PI3 in other diseases. [Abstract/Link to Full Text]

Sánchez E, Sabio JM, Callejas JL, de Ramón E, Garcia-Portales R, García-Hernández FJ, Jiménez-Alonso J, González-Escribano MF, Martín J, Koeleman BP
Association study of genetic variants of pro-inflammatory chemokine and cytokine genes in systemic lupus erythematosus.
BMC Med Genet. 2006;748.
BACKGROUND: Several lines of evidence suggest that chemokines and cytokines play an important role in the inflammatory development and progression of systemic lupus erythematosus. The aim of this study was to evaluate the relevance of functional genetic variations of RANTES, IL-8, IL-1alpha, and MCP-1 for systemic lupus erythematosus. METHODS: The study was conducted on 500 SLE patients and 481 ethnically matched healthy controls. Genotyping of polymorphisms in the RANTES, IL-8, IL-1alpha, and MCP-1 genes were performed using a real-time polymerase chain reaction (PCR) system with pre-developed TaqMan allelic discrimination assay. RESULTS: No significant differences between SLE patients and healthy controls were observed when comparing genotype, allele or haplotype frequencies of the RANTES, IL-8, IL-1alpha, and MCP-1 polymorphisms. In addition, no evidence for association with clinical sub-features of SLE was found. CONCLUSION: These results suggest that the tested functional variation of RANTES, IL-8, IL-1alpha, and MCP-1 genes do not confer a relevant role in the susceptibility or severity of SLE in the Spanish population. [Abstract/Link to Full Text]

Soler JM, Pereira AC, Tôrres CH, Krieger JE
Gene by environment QTL mapping through multiple trait analyses in blood pressure salt-sensitivity: identification of a novel QTL in rat chromosome 5.
BMC Med Genet. 2006;747.
BACKGROUND: The genetic mechanisms underlying interindividual blood pressure variation reflect the complex interplay of both genetic and environmental variables. The current standard statistical methods for detecting genes involved in the regulation mechanisms of complex traits are based on univariate analysis. Few studies have focused on the search for and understanding of quantitative trait loci responsible for gene x environmental interactions or multiple trait analysis. Composite interval mapping has been extended to multiple traits and may be an interesting approach to such a problem. METHODS: We used multiple-trait analysis for quantitative trait locus mapping of loci having different effects on systolic blood pressure with NaCl exposure. Animals studied were 188 rats, the progenies of an F2 rat intercross between the hypertensive and normotensive strain, genotyped in 179 polymorphic markers across the rat genome. To accommodate the correlational structure from measurements taken in the same animals, we applied univariate and multivariate strategies for analyzing the data. RESULTS: We detected a new quantitative train locus on a region close to marker R589 in chromosome 5 of the rat genome, not previously identified through serial analysis of individual traits. In addition, we were able to justify analytically the parametric restrictions in terms of regression coefficients responsible for the gain in precision with the adopted analytical approach. CONCLUSION: Future work should focus on fine mapping and the identification of the causative variant responsible for this quantitative trait locus signal. The multivariable strategy might be valuable in the study of genetic determinants of interindividual variation of antihypertensive drug effectiveness. [Abstract/Link to Full Text]

Kimberley KW, Morris CA, Hobart HH
BAC-FISH refutes report of an 8p22-8p23.1 inversion or duplication in 8 patients with Kabuki syndrome.
BMC Med Genet. 2006;746.
BACKGROUND: Kabuki syndrome is a multiple congenital anomaly/mental retardation syndrome. The syndrome is characterized by varying degrees of mental retardation, postnatal growth retardation, distinct facial characteristics resembling the Kabuki actor's make-up, cleft or high-arched palate, brachydactyly, scoliosis, and persistence of finger pads. The multiple organ involvement suggests that this is a contiguous gene syndrome but no chromosomal anomalies have been isolated as an etiology. Recent studies have focused on possible duplications in the 8p22-8p23.1 region but no consensus has been reached. METHODS: We used bacterial artificial chromosome-fluorescent in-situ hybridization (BAC-FISH) and G-band analysis to study eight patients with Kabuki syndrome. RESULTS: Metaphase analysis revealed no deletions or duplications with any of the BAC probes. Interphase studies of the Kabuki patients yielded no evidence of inversions when using three-color FISH across the region. These results agree with other research groups' findings but disagree with the findings of Milunsky and Huang. CONCLUSION: It seems likely that Kabuki syndrome is not a contiguous gene syndrome of the 8p region studied. [Abstract/Link to Full Text]

Vidal-Taboada JM, Cucala M, Mas Herrero S, Lafuente A, Cobos A
Satisfaction survey with DNA cards method to collect genetic samples for pharmacogenetics studies.
BMC Med Genet. 2006;745.
BACKGROUND: Pharmacogenetic studies are essential in understanding the interindividual variability of drug responses. DNA sample collection for genotyping is a critical step in genetic studies. A method using dried blood samples from finger-puncture, collected on DNA-cards, has been described as an alternative to the usual venepuncture technique. The purpose of this study is to evaluate the implementation of the DNA cards method in a multicentre clinical trial, and to assess the degree of investigators' satisfaction and the acceptance of the patients perceived by the investigators. METHODS: Blood samples were collected on DNA-cards. The quality and quantity of DNA recovered were analyzed. Investigators were questioned regarding their general interest, previous experience, safety issues, preferences and perceived patient satisfaction. RESULTS: 151 patients' blood samples were collected. Genotyping of GST polymorphisms was achieved in all samples (100%). 28 investigators completed the survey. Investigators perceived patient satisfaction as very good (60.7%) or good (39.3%), without reluctance to finger puncture. Investigators preferred this method, which was considered safer and better than the usual methods. All investigators would recommend using it in future genetic studies. CONCLUSION: Within the clinical trial setting, the DNA-cards method was very well accepted by investigators and patients (in perception of investigators), and was preferred to conventional methods due to its ease of use and safety. [Abstract/Link to Full Text]

Cheyssac C, Lecoeur C, Dechaume A, Bibi A, Charpentier G, Balkau B, Marre M, Froguel P, Gibson F, Vaxillaire M
Analysis of common PTPN1 gene variants in type 2 diabetes, obesity and associated phenotypes in the French population.
BMC Med Genet. 2006;744.
BACKGROUND: The protein tyrosine phosphatase-1B, a negative regulator for insulin and leptin signalling, potentially modulates glucose and energy homeostasis. PTP1B is encoded by the PTPN1 gene located on chromosome 20q13 showing linkage with type 2 diabetes (T2D) in several populations. PTPN1 gene variants have been inconsistently associated with T2D, and the aim of our study was to investigate the effect of PTPN1 genetic variations on the risk of T2D, obesity and on the variability of metabolic phenotypes in the French population. METHODS: Fourteen single nucleotide polymorphisms (SNPs) spanning the PTPN1 locus were selected from previous association reports and from HapMap linkage disequilibrium data. SNPs were evaluated for association with T2D in two case-control groups with 1227 cases and 1047 controls. Association with moderate and severe obesity was also tested in a case-control study design. Association with metabolic traits was evaluated in 736 normoglycaemic, non-obese subjects from a general population. Five SNPs showing a trend towards association with T2D, obesity or metabolic parameters were investigated for familial association. RESULTS: From 14 SNPs investigated, only SNP rs914458, located 10 kb downstream of the PTPN1 gene significantly associated with T2D (p = 0.02 under a dominant model; OR = 1.43 [1.06-1.94]) in the combined sample set. SNP rs914458 also showed association with moderate obesity (allelic p = 0.04; OR = 1.2 [1.01-1.43]). When testing for association with metabolic traits, two strongly correlated SNPs, rs941798 and rs2426159, present multiple consistent associations. SNP rs2426159 exhibited evidence of association under a dominant model with glucose homeostasis related traits (p = 0.04 for fasting insulin and HOMA-B) and with lipid markers (0.02 = p = 0.04). Moreover, risk allele homozygotes for this SNP had an increased systolic blood pressure (p = 0.03). No preferential transmission of alleles was observed for the SNPs tested in the family sample. CONCLUSION: In our study, PTPN1 variants showed moderate association with T2D and obesity. However, consistent associations with metabolic variables reflecting insulin resistance and dyslipidemia are found for two intronic SNPs as previously reported. Thus, our data indicate that PTPN1 variants may modulate the lipid profile, thereby influencing susceptibility to metabolic disease. [Abstract/Link to Full Text]

Alsmadi OA, Al-Kayal F, Al-Hamed M, Meyer BF
Frequency of common HFE variants in the Saudi population: a high throughput molecular beacon-based study.
BMC Med Genet. 2006;743.
BACKGROUND: Hereditary Hemochromatosis (HH) is an autosomal recessive disorder highlighted by iron-overload. Two popular mutations in HFE, p.C282Y and p.H63D, have been discovered and found to associate with HH in different ethnic backgrounds. p.C282Y and p.H63D diagnosis is usually made by restriction enzyme analysis. However, the use of this technique is largely limited to research laboratories because they are relatively expensive, time-consuming, and difficult to transform into a high throughput format. METHODS: Single nucleotide variations in target DNA sequences can be readily identified using molecular beacon fluorescent probes. These are quenched probes with loop and hairpin structure, and they become fluorescent upon specific target recognition. We developed high throughput homogeneous real-time PCR assays using molecular beacon technology, to genotype p.C282Y and p.H63D variants. Representative samples of different genotypes for these variants were assayed by restriction enzyme analysis and direct sequencing as bench mark methods for comparison with the newly developed molecular beacon-based real-time PCR assay. RESULTS: Complete concordance was achieved by all three assay formats. Homozygotes (mutant and wildtype) and heterozygotes were readily differentiated by the allele specific molecular beacons as reported by the associated fluorophore in the real-time assay developed in this study. Additionally, these assays were used in a high throughput format to establish the allele frequency of C282Y and H63D in Saudis for the first time. CONCLUSION: These assays may be reliably applied as a diagnostic test or large scale method for population screening. [Abstract/Link to Full Text]

Prasad P, Tiwari AK, Kumar KM, Ammini AC, Gupta A, Gupta R, Sharma AK, Rao AR, Nagendra R, Chandra TS, Tiwari SC, Rastogi P, Gupta BL, Thelma BK
Chronic renal insufficiency among Asian Indians with type 2 diabetes: I. Role of RAAS gene polymorphisms.
BMC Med Genet. 2006;742.
BACKGROUND: Renal failure in diabetes is mediated by multiple pathways. Experimental and clinical evidences suggest that renin-angiotensin-aldosterone system (RAAS) has a crucial role in diabetic kidney disease. A relationship between the RAAS genotypes and chronic renal insufficiency (CRI) among type 2 diabetes subjects has therefore been speculated. We investigated the contribution of selected RAAS gene polymorphisms to CRI among type 2 diabetic Asian Indian subjects. METHODS: Twelve single nucleotide polymorphisms (SNPs) from six genes namely-renin (REN), angiotensinogen (ATG), angiotensin converting enzyme I (ACE), angiotensin II type 1 receptor (AT1) and aldosterone synthase (CYP11B2) gene from the RAAS pathway and one from chymase pathway were genotyped using polymerase chain reaction-restriction fragment length polymorphism (PCR-RFLP) method and tested for their association with diabetic CRI using a case-control approach. Successive cases presenting to study centres with type 2 diabetes of > or =2 years duration and moderate CRI diagnosed by serum creatinine > or =3 mg/dl after exclusion of non-diabetic causes of CRI (n = 196) were compared with diabetes subjects with no evidence of renal disease (n = 225). Logistic regression analysis was carried out to correlate various clinical parameters with genotypes, and to study pair wise interactions between SNPs of different genes. RESULTS: Of the 12 SNPs genotyped, Glu53Stop in AGT and A>T (-777) in AT1 genes, were monomorphic and not included for further analysis. We observed a highly significant association of Met235Thr SNP in angiotensinogen gene with CRI (O.R. 2.68, 95%CI: 2.01-3.57 for Thr allele, O.R. 2.94, 95%CI: 1.88-4.59 for Thr/Thr genotype and O.R. 2.68, 95%CI: 1.97-3.64 for ACC haplotype). A significant allelic and genotypic association of T>C (-344) SNP in aldosterone synthase gene (O.R. 1.57, 95%CI: 1.16-2.14 and O.R. 1.81, 95%CI: 1.21-2.71 respectively), and genotypic association of GA genotype of G>A (-1903) in chymase gene (O.R. 2.06, 95%CI: 1.34-3.17) were also observed. CONCLUSION: SNPs Met235Thr in angiotensinogen, T>C (-344) in aldosterone synthase, and G>A (-1903) in chymase genes are significantly associated with diabetic chronic renal insufficiency in Indian patients and warrant replication in larger sample sets. Use of such markers for prediction of susceptibility to diabetes specific renal disease in the ethnically Indian population appears promising. [Abstract/Link to Full Text]

Mayeur H, Roche O, Vętu C, Jaliffa C, Marchant D, Dollfus H, Bonneau D, Munier FL, Schorderet DF, Levin AV, Héon E, Sutherland J, Lacombe D, Said E, Mezer E, Kaplan J, Dufier JL, Marsac C, Menasche M, Abitbol M
Eight previously unidentified mutations found in the OA1 ocular albinism gene.
BMC Med Genet. 2006;741.
BACKGROUND: Ocular albinism type 1 (OA1) is an X-linked ocular disorder characterized by a severe reduction in visual acuity, nystagmus, hypopigmentation of the retinal pigmented epithelium, foveal hypoplasia, macromelanosomes in pigmented skin and eye cells, and misrouting of the optical tracts. This disease is primarily caused by mutations in the OA1 gene. METHODS: The ophthalmologic phenotype of the patients and their family members was characterized. We screened for mutations in the OA1 gene by direct sequencing of the nine PCR-amplified exons, and for genomic deletions by PCR-amplification of large DNA fragments. RESULTS: We sequenced the nine exons of the OA1 gene in 72 individuals and found ten different mutations in seven unrelated families and three sporadic cases. The ten mutations include an amino acid substitution and a premature stop codon previously reported by our team, and eight previously unidentified mutations: three amino acid substitutions, a duplication, a deletion, an insertion and two splice-site mutations. The use of a novel Taq polymerase enabled us to amplify large genomic fragments covering the OA1 gene. and to detect very likely six distinct large deletions. Furthermore, we were able to confirm that there was no deletion in twenty one patients where no mutation had been found. CONCLUSION: The identified mutations affect highly conserved amino acids, cause frameshifts or alternative splicing, thus affecting folding of the OA1 G protein coupled receptor, interactions of OA1 with its G protein and/or binding with its ligand. [Abstract/Link to Full Text]

Natividad A, Cooke G, Holland MJ, Burton MJ, Joof HM, Rockett K, Kwiatkowski DP, Mabey DC, Bailey RL
A coding polymorphism in matrix metalloproteinase 9 reduces risk of scarring sequelae of ocular Chlamydia trachomatis infection.
BMC Med Genet. 2006;740.
BACKGROUND: Trachoma, an infectious disease of the conjunctiva caused by Chlamydia trachomatis, is an important global cause of blindness. A dysregulated extracellular matrix (ECM) proteolysis during the processes of tissue repair following infection and inflammation are thought to play a key role in the development of fibrotic sequelae of infection, which ultimately leads to blindness. Expression and activity of matrix metalloproteinase 9 (MMP-9), a major effector of ECM turnover, is up-regulated in the inflamed conjunctiva of trachoma subjects. Genetic variation within the MMP9 gene affects in vitro MMP9 expression levels, enzymatic activity and susceptibility to various inflammatory and fibrotic conditions. METHODS: We genotyped 651 case-control pairs from trachoma endemic villages in The Gambia for coding single nucleotide polymorphisms (SNPs) in the MMP9 gene using the high-throughput Sequenom system. Single marker and haplotype conditional logistic regression (CLR) analysis for disease association was performed. RESULTS: The Q279R mutation located in exon 6 of MMP9 was found to be associated with lower risk for severe disease sequelae of ocular Chlamydia trachomatis infection. This mutation, which leads to a nonsynonymous amino-acid change within the active site of the enzyme may reduce MMP-9-induced degradation of the structural components of the ECM during inflammatory episodes in trachoma and its associated fibrosis. CONCLUSION: This work supports the hypothesis that MMP-9 has a role in the pathogenesis of blinding trachoma. [Abstract/Link to Full Text]

Asselbergs FW, Moore JH, van den Berg MP, Rimm EB, de Boer RA, Dullaart RP, Navis G, van Gilst WH
A role for CETP TaqIB polymorphism in determining susceptibility to atrial fibrillation: a nested case control study.
BMC Med Genet. 2006;739.
BACKGROUND: Studies investigating the genetic and environmental characteristics of atrial fibrillation (AF) may provide new insights in the complex development of AF. We aimed to investigate the association between several environmental factors and loci of candidate genes, which might be related to the presence of AF. METHODS: A nested case-control study within the PREVEND cohort was conducted. Standard 12 lead electrocardiograms were recorded and AF was defined according to Minnesota codes. For every case, an age and gender matched control was selected from the same population (n = 194). In addition to logistic regression analyses, the multifactor-dimensionality reduction (MDR) method and interaction entropy graphs were used for the evaluation of gene-gene and gene-environment interactions. Polymorphisms in genes from the Renin-angiotensin, Bradykinin and CETP systems were included. RESULTS: Subjects with AF had a higher prevalence of electrocardiographic left ventricular hypertrophy, ischemic heart disease, hypertension, renal dysfunction, elevated levels of C-reactive protein (CRP) and increased urinary albumin excretion as compared to controls. The polymorphisms of the Renin-angiotensin system and Bradykinin gene did not show a significant association with AF (p > 0.05). The TaqIB polymorphism of the CETP gene was significantly associated with the presence of AF (p < 0.05). Using the MDR method, the best genotype-phenotype models included the combination of micro- or macroalbuminuria and CETP TaqIB polymorphism, CRP >3 mg/L and CETP TaqIB polymorphism, renal dysfunction and the CETP TaqIB polymorphism, and ischemic heart disease and CETP TaqIB polymorphism (1000 fold permutation testing, P < 0.05). Interaction entropy graph showed that the combination of albuminuria and CETP TaqIB polymorphism removed the most entropy. CONCLUSION: CETP TaqIB polymorphism is significantly associated with the presence of AF in the context of micro- or macroalbuminuria, elevated C-reactive protein, renal dysfunction, and ischemic heart disease. [Abstract/Link to Full Text]

Abu-Amero KK, Al-Boudari OM, Mohamed GH, Dzimiri N
T null and M null genotypes of the glutathione S-transferase gene are risk factor for CAD independent of smoking.
BMC Med Genet. 2006;738.
BACKGROUND: The association of the deletion in GSTT1 and GSTM1 genes with coronary artery disease (CAD) among smokers is controversial. In addition, no such investigation has previously been conducted among Arabs. METHODS: We genotyped 1054 CAD patients and 762 controls for GSTT1 and GSTM1 deletion by multiplex polymerase chain reaction. Both CAD and controls were Saudi Arabs. RESULTS: In the control group (n = 762), 82.3% had the T wild M wildgenotype, 9% had the Twild M null, 2.4% had the Tnull M wild and 6.3% had the Tnull M null genotype. Among the CAD group (n = 1054), 29.5% had the Twild M wild genotype, 26.6% (p < .001) had the Twild M null, 8.3% (p < .001) had the Tnull M wild and 35.6% (p < .001) had the Tnull M null genotype, indicating a significant association of the Twild M null, Tnull M wild and Tnull M null genotypes with CAD. Univariate analysis also showed that smoking, age, hypercholesterolemia and hypertriglyceridemia, diabetes mellitus, family history of CAD, hypertension and obesity are all associated with CAD, whereas gender and myocardial infarction are not. Binary logistic regression for smoking and genotypes indicated that only M null and Tnullare interacting with smoking. However, further subgroup analysis stratifying the data by smoking status suggested that genotype-smoking interactions have no effect on the development of CAD. CONCLUSION: GSTT1 and GSTM1 null-genotypes are risk factor for CAD independent of genotype-smoking interaction. [Abstract/Link to Full Text]

Kashyap VK, Sahoo S, Sitalaximi T, Trivedi R
Deletions in the Y-derived amelogenin gene fragment in the Indian population.
BMC Med Genet. 2006;737.
BACKGROUND: Rare failures in amelogenin-based gender typing of individuals have been observed globally. In this study, we report the deletion of a large fragment of the amelogenin gene in 10 individuals out of 4,257 male samples analyzed from 104 different endogamous populations of India. METHODS: Samples were analyzed using commercial genetic profiling kits. Those that exhibited failures in amelogenin-based gender identification were further analyzed with published as well as newly designed primers to ascertain the nature and extent of mutation. RESULTS: The failure rate among Indian males was 0.23 %. Though the exact size and nature of the deletion (single point mutations at a number of positions or a single large deletion) could not be determined in the present study, it is inferred that the deletion spans a region downstream of the reverse primer-binding site of commercially available amelogenin primer sets. Deletions were conspicuously absent among the Mongoloid tribes of Northeast India, while both caste and tribal groups harbored these mutations, which was predominantly among the Y-chromosomes belonging to J2 lineage. CONCLUSION: Our study indicates that the different amelogenin primer sets currently included in genetic profiling multiplex kits may result in erroneous interpretations due to mutations undetectable during routine testing. Further there are indications that these mutations could possibly be lineage-specific, inherited deletions. [Abstract/Link to Full Text]

Lin TC, Yen JM, Gong KB, Kuo TC, Ku DC, Liang SF, Wu MJ
Abnormal glucose tolerance and insulin resistance in polycystic ovary syndrome amongst the Taiwanese population- not correlated with insulin receptor substrate-1 Gly972Arg/Ala513Pro polymorphism.
BMC Med Genet. 2006;736.
BACKGROUND: Insulin resistance and glucose dysmetabolism in polycystic ovary syndrome (PCOS) are related with the polymorphisms in the genes encoding the insulin receptor substrate (IRS) proteins, especially Gly972Arg/Ala513Pro polymorphism being reported to be associated with type-2 diabetes and PCOS. We intended to assess the prevalence of abnormal glucose tolerance (AGT) and insulin resistance in Taiwanese PCOS women. We also tried to assess whether the particular identity of Gly972Arg/Ala513Pro polymorphic alleles of the IRS-1 gene mutation can be used as an appropriate diagnostic indicator for PCOS. METHODS: We designed a prospective clinical study. Forty-seven Taiwanese Hoklo and Hakka women, diagnosed with PCOS were enrolled in this study as were forty-five healthy Hoklo and Hakka women as the control group. Insulin resistance was evaluated with fasting insulin, fasting glucose/insulin ratio, and homeostasis model assessment index for insulin resistance (HOMAIR). The genomic DNA of the subjects was amplified by PCR and digested by restriction fragmented length polymorphism (RFLP) with Bst N1 used for codon 972 and Dra III for codon 513. RESULTS: AGT was found in 46.8% of these PCOS patients and was significantly related to high insulin resistance rather than the low insulin resistance. Those patients with either insulin resistance or AGT comprised the majority of PCOS affected patients (AGT + fasting insulin > or =17: 83%, AGT + glucose/insulin ratio > or =6.5: 85.1%, AGT + HOMAIR > or = 2: 87.2%, and AGT + HOMAIR > or = 3.8: 72.3%). None of the tested samples revealed any polymorphism due to the absence of any Dra III recognition site or any Bst N1 recognition site in the amplified PCR fragment digested by restriction fragmented length polymorphism. CONCLUSION: There is significantly high prevalence of AGT and insulin resistance in PCOS women, but Gly972Arg and Ala513Pro polymorphic alleles of IRS-1 are rare and are not associated with the elevated risk of PCOS amongst Taiwanese subjects. This is quite different from the similar study in phylogenetically diverged Caucasian subjects. [Abstract/Link to Full Text]

Gamundi MJ, Hernan I, Martínez-Gimeno M, Maseras M, García-Sandoval B, Ayuso C, Antińolo G, Baiget M, Carballo M
Three novel and the common Arg677Ter RP1 protein truncating mutations causing autosomal dominant retinitis pigmentosa in a Spanish population.
BMC Med Genet. 2006;735.
BACKGROUND: Retinitis pigmentosa (RP), a clinically and genetically heterogeneous group of retinal degeneration disorders affecting the photoreceptor cells, is one of the leading causes of genetic blindness. Mutations in the photoreceptor-specific gene RP1 account for 3-10% of cases of autosomal dominant RP (adRP). Most of these mutations are clustered in a 500 bp region of exon 4 of RP1. METHODS: Denaturing gradient gel electrophoresis (DGGE) analysis and direct genomic sequencing were used to evaluate the 5' coding region of exon 4 of the RP1 gene for mutations in 150 unrelated index adRP patients. Ophthalmic and electrophysiological examination of RP patients and relatives according to pre-existing protocols were carried out. RESULTS: Three novel disease-causing mutations in RP1 were detected: Q686X, K705fsX712 and K722fsX737, predicting truncated proteins. One novel missense mutation, Thr752Met, was detected in one family but the mutation does not co-segregate in the family, thereby excluding this amino acid variation in the protein as a cause of the disease. We found the Arg677Ter mutation, previously reported in other populations, in two independent families, confirming that this mutation is also present in a Spanish population. CONCLUSION: Most of the mutations reported in the RP1 gene associated with adRP are expected to encode mutant truncated proteins that are approximately one third or half of the size of wild type protein. Patients with mutations in RP1 showed mild RP with variability in phenotype severity. We also observed several cases of non-penetrant mutations. [Abstract/Link to Full Text]

Macgregor S, Khan IA
GAIA: an easy-to-use web-based application for interaction analysis of case-control data.
BMC Med Genet. 2006;734.
BACKGROUND: The advent of cheap, large scale genotyping has led to widespread adoption of genetic association mapping as the tool of choice in the search for loci underlying susceptibility to common complex disease. Whilst simple single locus analysis is relatively trivial to conduct, this is not true of more complex analysis such as those involving interactions between loci. The importance of testing for interactions between loci in association analysis has been highlighted in a number of recent high profile publications. RESULTS: Genetic Association Interaction Analysis (GAIA) is a web-based application for testing for statistical interactions between loci. It is based upon the widely used case-control study design for genetic association analysis and is designed so that non-specialists may routinely apply tests for interaction. GAIA allows simple testing of both additive and additive plus dominance interaction models and includes permutation testing to appropriately correct for multiple testing. The application will find use both in candidate gene based studies and in genome-wide association studies. For large scale studies GAIA includes a screening approach which prioritizes loci (based on the significance of main effects at one or both loci) for further interaction analysis. CONCLUSION: GAIA is available at [Abstract/Link to Full Text]

Homanics GE, Skvorak K, Ferguson C, Watkins S, Paul HS
Production and characterization of murine models of classic and intermediate maple syrup urine disease.
BMC Med Genet. 2006;733.
BACKGROUND: Maple Syrup Urine Disease (MSUD) is an inborn error of metabolism caused by a deficiency of branched-chain keto acid dehydrogenase. MSUD has several clinical phenotypes depending on the degree of enzyme deficiency. Current treatments are not satisfactory and require new approaches to combat this disease. A major hurdle in developing new treatments has been the lack of a suitable animal model. METHODS: To create a murine model of classic MSUD, we used gene targeting and embryonic stem cell technologies to create a mouse line that lacked a functional E2 subunit gene of branched-chain keto acid dehydrogenase. To create a murine model of intermediate MSUD, we used transgenic technology to express a human E2 cDNA on the knockout background. Mice of both models were characterized at the molecular, biochemical, and whole animal levels. RESULTS: By disrupting the E2 subunit gene of branched-chain keto acid dehydrogenase, we created a gene knockout mouse model of classic MSUD. The homozygous knockout mice lacked branched-chain keto acid dehydrogenase activity, E2 immunoreactivity, and had a 3-fold increase in circulating branched-chain amino acids. These metabolic derangements resulted in neonatal lethality. Transgenic expression of a human E2 cDNA in the liver of the E2 knockout animals produced a model of intermediate MSUD. Branched-chain keto acid dehydrogenase activity was 5-6% of normal and was sufficient to allow survival, but was insufficient to normalize circulating branched-chain amino acids levels, which were intermediate between wildtype and the classic MSUD mouse model. CONCLUSION: These mice represent important animal models that closely approximate the phenotype of humans with the classic and intermediate forms of MSUD. These animals provide useful models to further characterize the pathogenesis of MSUD, as well as models to test novel therapeutic strategies, such as gene and cellular therapies, to treat this devastating metabolic disease. [Abstract/Link to Full Text]

Núńez C, Alecsandru D, Varadé J, Polanco I, Maluenda C, Fernández-Arquero M, de la Concha EG, Urcelay E, Martínez A
Interleukin-10 haplotypes in Celiac Disease in the Spanish population.
BMC Med Genet. 2006;732.
BACKGROUND: Celiac disease (CD) is a chronic disorder characterized by a pathological inflammatory response after exposure to gluten in genetically susceptible individuals. The HLA complex accounts for less than half of the genetic component of the disease, and additional genes must be implicated. Interleukin-10 (IL-10) is an important regulator of mucosal immunity, and several reports have described alterations of IL-10 levels in celiac patients. The IL-10 gene is located on chromosome 1, and its promoter carries several single nucleotide polymorphisms (SNPs) and microsatellites which have been associated to production levels. Our aim was to study the role of those polymorphisms in susceptibility to CD in our population. METHODS: A case-control and a familial study were performed. Positions -1082, -819 and -592 of the IL-10 promoter were typed by TaqMan and allele specific PCR. IL10R and IL10G microsatellites were amplified with labelled primers, and they were subsequently run on an automatic sequencer. In this study 446 patients and 573 controls were included, all of them white Spaniards. Extended haplotypes encompassing microsatellites and SNPs were obtained in families and estimated in controls by the Expectation-Maximization algorithm. RESULTS: No significant associations after Bonferroni correction were observed in the SNPs or any of the microsatellites. Stratification by HLA-DQ2 (DQA1*0501-DQB1*02) status did not alter the results. When extended haplotypes were analyzed, no differences were apparent either. CONCLUSION: The IL-10 polymorphisms studied are not associated with celiac disease. Our data suggest that the IL-10 alteration seen in patients may be more consequence than cause of the disease. [Abstract/Link to Full Text]

Abu-Amero KK, Al-Boudari OM, Mohamed GH, Dzimiri N
The Glu27 genotypes of the beta2-adrenergic receptor are predictors for severe coronary artery disease.
BMC Med Genet. 2006;731.
BACKGROUND: The role of the Beta2-adrenoceptor (beta2-AR) Gln27Glu polymorphism in the manifestation of cardiovascular diseases is still unclear. METHODS: In the present study, we evaluated the potential relevance of the c.79 C>G (p.Gln27Glu) polymorphism of this receptor gene for coronary artery disease (CAD) and its associated risk factors in Saudi Arabs. Genotyping was performed by PCR using the confronting two-pair primer (PCR-CTPP) method. RESULTS: In the general population group (BD) (n = 895), 68.5% were homozygous wild-type C/C, 28.3% were heterozygous C/G and 3.2% were homozygous mutant G/G. Among the CAD patients (n = 773), 50.6% were homozygous wild-type C/C, 43.6% were heterozygous C/G and 5.8% were homozygous mutant G/G, while in the angiographed control group (CON) (n = 528), 71.8% were C/C, 24.4% C/G and 3.8% G/G genotypes. These results indicate that both the C/G (p = or < .001) and G/G (p = .005) genotypes are significantly associated with CAD, when compared to the CON group. In addition, C/G (p = or < .001) and G/G (p = or < .001) were significantly associated with CAD, when compared to the BD group. Furthermore, stepwise logistic regression showed that the genotype [C/G (p < .001) and G/G (p < .001)] increase the risk of CAD. CONCLUSION: These results shows that the Gln27Glu genotypes (homo- or heterozygous) of the beta2-AR may be independent predictors of severe CAD. [Abstract/Link to Full Text]

Recent Articles in American Journal of Human Genetics

Agrawal PB, Greenleaf RS, Tomczak KK, Lehtokari VL, Wallgren-Pettersson C, Wallefeld W, Laing NG, Darras BT, Maciver SK, Dormitzer PR, Beggs AH
Nemaline myopathy with minicores caused by mutation of the CFL2 gene encoding the skeletal muscle actin-binding protein, cofilin-2.
Am J Hum Genet. 2007 Jan;80(1):162-7.
Nemaline myopathy (NM) is a congenital myopathy characterized by muscle weakness and nemaline bodies in affected myofibers. Five NM genes, all encoding components of the sarcomeric thin filament, are known. We report identification of a sixth gene, CFL2, encoding the actin-binding protein muscle cofilin-2, which is mutated in two siblings with congenital myopathy. The proband's muscle contained characteristic nemaline bodies, as well as occasional fibers with minicores, concentric laminated bodies, and areas of F-actin accumulation. Her affected sister's muscle was reported to exhibit nonspecific myopathic changes. Cofilin-2 levels were significantly lower in the proband's muscle, and the mutant protein was less soluble when expressed in Escherichia coli, suggesting that deficiency of cofilin-2 may result in reduced depolymerization of actin filaments, causing their accumulation in nemaline bodies, minicores, and, possibly, concentric laminated bodies. [Abstract/Link to Full Text]

Valdmanis PN, Meijer IA, Reynolds A, Lei A, MacLeod P, Schlesinger D, Zatz M, Reid E, Dion PA, Drapeau P, Rouleau GA
Mutations in the KIAA0196 gene at the SPG8 locus cause hereditary spastic paraplegia.
Am J Hum Genet. 2007 Jan;80(1):152-61.
Hereditary spastic paraplegia (HSP) is a progressive upper-motor neurodegenerative disease. The eighth HSP locus, SPG8, is on chromosome 8p24.13. The three families previously linked to the SPG8 locus present with relatively severe, pure spastic paraplegia. We have identified three mutations in the KIAA0196 gene in six families that map to the SPG8 locus. One mutation, V626F, segregated in three large North American families with European ancestry and in one British family. An L619F mutation was found in a Brazilian family. The third mutation, N471D, was identified in a smaller family of European origin and lies in a spectrin domain. None of these mutations were identified in 500 control individuals. Both the L619 and V626 residues are strictly conserved across species and likely have a notable effect on the structure of the protein product strumpellin. Rescue studies with human mRNA injected in zebrafish treated with morpholino oligonucleotides to knock down the endogenous protein showed that mutations at these two residues impaired the normal function of the KIAA0196 gene. However, the function of the 1,159-aa strumpellin protein is relatively unknown. The identification and characterization of the KIAA0196 gene will enable further insight into the pathogenesis of HSP. [Abstract/Link to Full Text]

Upadhyaya M, Huson SM, Davies M, Thomas N, Chuzhanova N, Giovannini S, Evans DG, Howard E, Kerr B, Griffiths S, Consoli C, Side L, Adams D, Pierpont M, Hachen R, Barnicoat A, Li H, Wallace P, Van Biervliet JP, Stevenson D, Viskochil D, Baralle D, Haan E, Riccardi V, Turnpenny P, Lazaro C, Messiaen L
An absence of cutaneous neurofibromas associated with a 3-bp inframe deletion in exon 17 of the NF1 gene (c.2970-2972 delAAT): evidence of a clinically significant NF1 genotype-phenotype correlation.
Am J Hum Genet. 2007 Jan;80(1):140-51.
Neurofibromatosis type 1 (NF1) is characterized by cafe-au-lait spots, skinfold freckling, and cutaneous neurofibromas. No obvious relationships between small mutations (<20 bp) of the NF1 gene and a specific phenotype have previously been demonstrated, which suggests that interaction with either unlinked modifying genes and/or the normal NF1 allele may be involved in the development of the particular clinical features associated with NF1. We identified 21 unrelated probands with NF1 (14 familial and 7 sporadic cases) who were all found to have the same c.2970-2972 delAAT (p.990delM) mutation but no cutaneous neurofibromas or clinically obvious plexiform neurofibromas. Molecular analysis identified the same 3-bp inframe deletion (c.2970-2972 delAAT) in exon 17 of the NF1 gene in all affected subjects. The Delta AAT mutation is predicted to result in the loss of one of two adjacent methionines (codon 991 or 992) ( Delta Met991), in conjunction with silent ACA-->ACG change of codon 990. These two methionine residues are located in a highly conserved region of neurofibromin and are expected, therefore, to have a functional role in the protein. Our data represent results from the first study to correlate a specific small mutation of the NF1 gene to the expression of a particular clinical phenotype. The biological mechanism that relates this specific mutation to the suppression of cutaneous neurofibroma development is unknown. [Abstract/Link to Full Text]

Pearson JV, Huentelman MJ, Halperin RF, Tembe WD, Melquist S, Homer N, Brun M, Szelinger S, Coon KD, Zismann VL, Webster JA, Beach T, Sando SB, Aasly JO, Heun R, Jessen F, Kolsch H, Tsolaki M, Daniilidou M, Reiman EM, Papassotiropoulos A, Hutton ML, Stephan DA, Craig DW
Identification of the genetic basis for complex disorders by use of pooling-based genomewide single-nucleotide-polymorphism association studies.
Am J Hum Genet. 2007 Jan;80(1):126-39.
We report the development and validation of experimental methods, study designs, and analysis software for pooling-based genomewide association (GWA) studies that use high-throughput single-nucleotide-polymorphism (SNP) genotyping microarrays. We first describe a theoretical framework for establishing the effectiveness of pooling genomic DNA as a low-cost alternative to individually genotyping thousands of samples on high-density SNP microarrays. Next, we describe software called "GenePool," which directly analyzes SNP microarray probe intensity data and ranks SNPs by increased likelihood of being genetically associated with a trait or disorder. Finally, we apply these methods to experimental case-control data and demonstrate successful identification of published genetic susceptibility loci for a rare monogenic disease (sudden infant death with dysgenesis of the testes syndrome), a rare complex disease (progressive supranuclear palsy), and a common complex disease (Alzheimer disease) across multiple SNP genotyping platforms. On the basis of these theoretical calculations and their experimental validation, our results suggest that pooling-based GWA studies are a logical first step for determining whether major genetic associations exist in diseases with high heritability. [Abstract/Link to Full Text]

Zheng M, McPeek MS
Multipoint linkage-disequilibrium mapping with haplotype-block structure.
Am J Hum Genet. 2007 Jan;80(1):112-25.
The HapMap Project is providing a great deal of new information on high-resolution haplotype structure in various human populations. This information has the potential to greatly increase the power of association mapping for a fixed amount of genotyping. A number of methods have been proposed for the identification of haplotype blocks, common haplotypes, and tagging single-nucleotide polymorphisms. Here, we build on this work by developing novel methods for case-control multipoint linkage-disequilibrium (LD) mapping that gain power and speed by making explicit use of the inferred block structure. Specifically, we developed a virtual-variant approach that uses the haplotype-block information to greatly increase power for detection of untyped common variants associated with a trait. Because full multipoint LD mapping can be slow, we exploited the haplotype-block information to develop a fast single-block multipoint mapping method. Our methods are appropriate for genotype data and take into account the uncertainty in phase. We describe the methods in the context of case-parents trios, although they are also applicable to unrelated cases and controls. Our simulations indicate that the most important gains from taking into account the haplotype-block structure at the analysis stage of multipoint LD mapping come from (1) greatly increased power to detect association with untyped variants and (2) greatly improved localization of untyped variants associated with the trait. More-modest gains are obtained in improving power to detect association with a variant that is typed with a moderate amount of missing data. The methods are applied to a Crohn disease data set. [Abstract/Link to Full Text]

Naveed M, Nath SK, Gaines M, Al-Ali MT, Al-Khaja N, Hutchings D, Golla J, Deutsch S, Bottani A, Antonarakis SE, Ratnamala U, Radhakrishna U
Genomewide linkage scan for split-hand/foot malformation with long-bone deficiency in a large Arab family identifies two novel susceptibility loci on chromosomes 1q42.2-q43 and 6q14.1.
Am J Hum Genet. 2007 Jan;80(1):105-11.
Split-hand/foot malformation with long-bone deficiency (SHFLD) is a rare, severe limb deformity characterized by tibia aplasia with or without split-hand/split-foot deformity. Identification of genetic susceptibility loci for SHFLD has been unsuccessful because of its rare incidence, variable phenotypic expression and associated anomalies, and uncertain inheritance pattern. SHFLD is usually inherited as an autosomal dominant trait with reduced penetrance, although recessive inheritance has also been postulated. We conducted a genomewide linkage analysis, using a 10K SNP array in a large consanguineous family (UR078) from the United Arab Emirates (UAE) who had disease transmission consistent with an autosomal dominant inheritance pattern. The study identified two novel SHFLD susceptibility loci at 1q42.2-q43 (nonparametric linkage [NPL] 9.8, P=.000065) and 6q14.1 (NPL 7.12, P=.000897). These results were also supported by multipoint parametric linkage analysis. Maximum multipoint LOD scores of 3.20 and 3.78 were detected for genomic locations 1q42.2-43 and 6q14.1, respectively, with the use of an autosomal dominant mode of inheritance with reduced penetrance. Haplotype analysis with informative crossovers enabled mapping of the SHFLD loci to a region of approximately 18.38 cM (8.4 Mb) between single-nucleotide polymorphisms rs1124110 and rs535043 on 1q42.2-q43 and to a region of approximately 1.96 cM (4.1 Mb) between rs623155 and rs1547251 on 6q14.1. The study identified two novel loci for the SHFLD phenotype in this UAE family. [Abstract/Link to Full Text]

Wong KK, deLeeuw RJ, Dosanjh NS, Kimm LR, Cheng Z, Horsman DE, MacAulay C, Ng RT, Brown CJ, Eichler EE, Lam WL
A comprehensive analysis of common copy-number variations in the human genome.
Am J Hum Genet. 2007 Jan;80(1):91-104.
Segmental copy-number variations (CNVs) in the human genome are associated with developmental disorders and susceptibility to diseases. More importantly, CNVs may represent a major genetic component of our phenotypic diversity. In this study, using a whole-genome array comparative genomic hybridization assay, we identified 3,654 autosomal segmental CNVs, 800 of which appeared at a frequency of at least 3%. Of these frequent CNVs, 77% are novel. In the 95 individuals analyzed, the two most diverse genomes differed by at least 9 Mb in size or varied by at least 266 loci in content. Approximately 68% of the 800 polymorphic regions overlap with genes, which may reflect human diversity in senses (smell, hearing, taste, and sight), rhesus phenotype, metabolism, and disease susceptibility. Intriguingly, 14 polymorphic regions harbor 21 of the known human microRNAs, raising the possibility of the contribution of microRNAs to phenotypic diversity in humans. This in-depth survey of CNVs across the human genome provides a valuable baseline for studies involving human genetics. [Abstract/Link to Full Text]

Hegele RA
Copy-number variations and human disease.
Am J Hum Genet. 2007 Aug;81(2):414-5; author reply 415. [Abstract/Link to Full Text]

Lynch AG, Marioni JC, Tavaré S
Numbers of copy-number variations and false-negative rates will be underestimated if we do not account for the dependence between repeated experiments.
Am J Hum Genet. 2007 Aug;81(2):418-20; author reply 420-1. [Abstract/Link to Full Text]

Jakobsdottir J, Weeks DE
Estimating prevalence, false-positive rate, and false-negative rate with use of repeated testing when true responses are unknown.
Am J Hum Genet. 2007 Nov;81(5):1111-3. [Abstract/Link to Full Text]

Shi M, Christensen K, Weinberg CR, Romitti P, Bathum L, Lozada A, Morris RW, Lovett M, Murray JC
Orofacial cleft risk is increased with maternal smoking and specific detoxification-gene variants.
Am J Hum Genet. 2007 Jan;80(1):76-90.
Maternal smoking is a recognized risk factor for orofacial clefts. Maternal or fetal pharmacogenetic variants are plausible modulators of this risk. In this work, we studied 5,427 DNA samples, including 1,244 from subjects in Denmark and Iowa with facial clefting and 4,183 from parents, siblings, or unrelated population controls. We examined 25 single-nucleotide polymorphisms in 16 genes in pathways for detoxification of components of cigarette smoke, to look for evidence of gene-environment interactions. For genes identified as related to oral clefting, we studied gene-expression profiles in fetal development in the relevant tissues and time intervals. Maternal smoking was a significant risk factor for clefting and showed dosage effects, in both the Danish and Iowan data. Suggestive effects of variants in the fetal NAT2 and CYP1A1 genes were observed in both the Iowan and the Danish participants. In an expanded case set, NAT2 continued to show significant overtransmission of an allele to the fetus, with a final P value of .00003. There was an interaction between maternal smoking and fetal inheritance of a GSTT1-null deletion, seen in both the Danish (P=.03) and Iowan (P=.002) studies, with a Fisher's combined P value of <.001, which remained significant after correction for multiple comparisons. Gene-expression analysis demonstrated expression of GSTT1 in human embryonic craniofacial tissues during the relevant developmental interval. This study benefited from two large samples, involving independent populations, that provided substantial power and a framework for future studies that could identify a susceptible population for preventive health care. [Abstract/Link to Full Text]

Liquori CL, Berg MJ, Squitieri F, Leedom TP, Ptacek L, Johnson EW, Marchuk DA
Deletions in CCM2 are a common cause of cerebral cavernous malformations.
Am J Hum Genet. 2007 Jan;80(1):69-75.
Cerebral cavernous malformations (CCMs) are vascular abnormalities of the brain that can result in a variety of neurological disabilities, including hemorrhagic stroke and seizures. Mutations in the gene KRIT1 are responsible for CCM1, mutations in the gene MGC4607 are responsible for CCM2, and mutations in the gene PDCD10 are responsible for CCM3. DNA sequence analysis of the known CCM genes in a cohort of 63 CCM-affected families showed that a high proportion (40%) of these lacked any identifiable mutation. We used multiplex ligation-dependent probe analysis to screen 25 CCM1, -2, and -3 mutation-negative probands for potential deletions or duplications within all three CCM genes. We identified a total of 15 deletions: 1 in the CCM1 gene, 0 in the CCM3 gene, and 14 in the CCM2 gene. In our cohort, mutation screening that included sequence and deletion analyses gave disease-gene frequencies of 40% for CCM1, 38% for CCM2, 6% for CCM3, and 16% with no mutation detected. These data indicate that the prevalence of CCM2 is much higher than previously predicted, nearly equal to CCM1, and that large genomic deletions in the CCM2 gene represent a major component of this disease. A common 77.6-kb deletion spanning CCM2 exons 2-10 was identified, which is present in 13% of our entire CCM cohort. Eight probands exhibit an apparently identical recombination event in the CCM2 gene, involving an AluSx in intron 1 and an AluSg distal to exon 10. Haplotype analysis revealed that this CCM2 deletion occurred independently at least twice in our families. We hypothesize that these deletions occur in a hypermutable region because of surrounding repetitive sequence elements that may catalyze the formation of intragenic deletions. [Abstract/Link to Full Text]

Chung RH, Morris RW, Zhang L, Li YJ, Martin ER
X-APL: an improved family-based test of association in the presence of linkage for the X chromosome.
Am J Hum Genet. 2007 Jan;80(1):59-68.
Family-based association methods have been developed primarily for autosomal markers. The X-linked sibling transmission/disequilibrium test (XS-TDT) and the reconstruction-combined TDT for X-chromosome markers (XRC-TDT) are the first association-based methods for testing markers on the X chromosome in family data sets. These are valid tests of association in family triads or discordant sib pairs but are not theoretically valid in multiplex families when linkage is present. Recently, XPDT and XMCPDT, modified versions of the pedigree disequilibrium test (PDT), were proposed. Like the PDT, XPDT compares genotype transmissions from parents to affected offspring or genotypes of discordant siblings; however, the XPDT can have low power if there are many missing parental genotypes. XMCPDT uses a Monte Carlo sampling approach to infer missing parental genotypes on the basis of true or estimated population allele frequencies. Although the XMCPDT was shown to be more powerful than the XPDT, variability in the statistic due to the use of an estimate of allele frequency is not properly accounted for. Here, we present a novel family-based test of association, X-APL, a modification of the test for association in the presence of linkage (APL) test. Like the APL, X-APL can use singleton or multiplex families and properly infers missing parental genotypes in linkage regions by considering identity-by-descent parameters for affected siblings. Sampling variability of parameter estimates is accounted for through a bootstrap procedure. X-APL can test individual marker loci or X-chromosome haplotypes. To allow for different penetrances in males and females, separate sex-specific tests are provided. Using simulated data, we demonstrated validity and showed that the X-APL is more powerful than alternative tests. To show its utility and to discuss interpretation in real-data analysis, we also applied the X-APL to candidate-gene data in a sample of families with Parkinson disease. [Abstract/Link to Full Text]

Valente L, Tiranti V, Marsano RM, Malfatti E, Fernandez-Vizarra E, Donnini C, Mereghetti P, De Gioia L, Burlina A, Castellan C, Comi GP, Savasta S, Ferrero I, Zeviani M
Infantile encephalopathy and defective mitochondrial DNA translation in patients with mutations of mitochondrial elongation factors EFG1 and EFTu.
Am J Hum Genet. 2007 Jan;80(1):44-58.
Mitochondrial protein translation is a complex process performed within mitochondria by an apparatus composed of mitochondrial DNA (mtDNA)-encoded RNAs and nuclear DNA-encoded proteins. Although the latter by far outnumber the former, the vast majority of mitochondrial translation defects in humans have been associated with mutations in RNA-encoding mtDNA genes, whereas mutations in protein-encoding nuclear genes have been identified in a handful of cases. Genetic investigation involving patients with defective mitochondrial translation led us to the discovery of novel mutations in the mitochondrial elongation factor G1 (EFG1) in one affected baby and, for the first time, in the mitochondrial elongation factor Tu (EFTu) in another one. Both patients were affected by severe lactic acidosis and rapidly progressive, fatal encephalopathy. The EFG1-mutant patient had early-onset Leigh syndrome, whereas the EFTu-mutant patient had severe infantile macrocystic leukodystrophy with micropolygyria. Structural modeling enabled us to make predictions about the effects of the mutations at the molecular level. Yeast and mammalian cell systems proved the pathogenic role of the mutant alleles by functional complementation in vivo. Nuclear-gene abnormalities causing mitochondrial translation defects represent a new, potentially broad field of mitochondrial medicine. Investigation of these defects is important to expand the molecular characterization of mitochondrial disorders and also may contribute to the elucidation of the complex control mechanisms, which regulate this fundamental pathway of mtDNA homeostasis. [Abstract/Link to Full Text]

Hill C, Soares P, Mormina M, Macaulay V, Clarke D, Blumbach PB, Vizuete-Forster M, Forster P, Bulbeck D, Oppenheimer S, Richards M
A mitochondrial stratigraphy for island southeast Asia.
Am J Hum Genet. 2007 Jan;80(1):29-43.
Island Southeast Asia (ISEA) was first colonized by modern humans at least 45,000 years ago, but the extent to which the modern inhabitants trace their ancestry to the first settlers is a matter of debate. It is widely held, in both archaeology and linguistics, that they are largely descended from a second wave of dispersal, proto-Austronesian-speaking agriculturalists who originated in China and spread to Taiwan approximately 5,500 years ago. From there, they are thought to have dispersed into ISEA approximately 4,000 years ago, assimilating the indigenous populations. Here, we demonstrate that mitochondrial DNA diversity in the region is extremely high and includes a large number of indigenous clades. Only a fraction of these date back to the time of first settlement, and the majority appear to mark dispersals in the late-Pleistocene or early-Holocene epoch most likely triggered by postglacial flooding. There are much closer genetic links to Taiwan than to the mainland, but most of these probably predated the mid-Holocene "Out of Taiwan" event as traditionally envisioned. Only approximately 20% at most of modern mitochondrial DNAs in ISEA could be linked to such an event, suggesting that, if an agriculturalist migration did take place, it was demographically minor, at least with regard to the involvement of women. [Abstract/Link to Full Text]

Morton N, Maniatis N, Zhang W, Ennis S, Collins A
Genome scanning by composite likelihood.
Am J Hum Genet. 2007 Jan;80(1):19-28.
Ambitious programs have recently been advocated or launched to create genomewide databases for meta-analysis of association between DNA markers and phenotypes of medical and/or social concern. A necessary but not sufficient condition for success in association mapping is that the data give accurate estimates of both genomic location and its standard error, which are provided for multifactorial phenotypes by composite likelihood. That class includes the Malecot model, which we here apply with an illustrative example. This preliminary analysis leads to five inferences: permutation of cases and controls provides a test of association free of autocorrelation; two hypotheses give similar estimates, but one is consistently more accurate; estimation of the false-discovery rate is extended to causal genes in a small proportion of regions; the minimal data for successful meta-analysis are inferred; and power is robust for all genomic factors except minor-allele frequency. An extension to meta-analysis is proposed. Other approaches to genome scanning and meta-analysis should, if possible, be similarly extended so that their operating characteristics can be compared. [Abstract/Link to Full Text]

Zhao X, Tang R, Gao B, Shi Y, Zhou J, Guo S, Zhang J, Wang Y, Tang W, Meng J, Li S, Wang H, Ma G, Lin C, Xiao Y, Feng G, Lin Z, Zhu S, Xing Y, Sang H, St Clair D, He L
Functional variants in the promoter region of Chitinase 3-like 1 (CHI3L1) and susceptibility to schizophrenia.
Am J Hum Genet. 2007 Jan;80(1):12-8.
The chitinase 3-like 1 gene (CHI3L1) is abnormally expressed in the hippocampus of subjects with schizophrenia and may be involved in the cellular response to various environmental events that are reported to increase the risk of schizophrenia. Here, we provide evidence that the functional variants at the CHI3L1 locus influence the genetic risk of schizophrenia. First, using case-control and transmission/disequilibrium-test (TDT) methodologies, we detected a significant association between schizophrenia and haplotypes within the promoter region of CHI3L1 in two independent cohorts of Chinese individuals. Second, the at-risk CCC haplotype (P=.00058 and .0018 in case-control and TDT studies, respectively) revealed lower transcriptional activity (P=2.2 x 10(-7)) and was associated with lower expression (P=3.1 x 10(-5)) compared with neutral and protective haplotypes. Third, we found that an allele of SNP4 (rs4950928), the tagging SNP of CCC, impaired the MYC/MAX-regulated transcriptional activation of CHI3L1 by altering the transcriptional-factor consensus sequences, and this may be responsible for the decreased expression of the CCC haplotype. In contrast, the protective TTG haplotype was associated with a high level of CHI3L1 expression. Our findings identify CHI3L1 as a potential schizophrenia-susceptibility gene and suggest that the genes involved in the biological response to adverse environmental conditions are likely to play roles in the predisposition to schizophrenia. [Abstract/Link to Full Text]

Stoetzel C, Muller J, Laurier V, Davis EE, Zaghloul NA, Vicaire S, Jacquelin C, Plewniak F, Leitch CC, Sarda P, Hamel C, de Ravel TJ, Lewis RA, Friederich E, Thibault C, Danse JM, Verloes A, Bonneau D, Katsanis N, Poch O, Mandel JL, Dollfus H
Identification of a novel BBS gene (BBS12) highlights the major role of a vertebrate-specific branch of chaperonin-related proteins in Bardet-Biedl syndrome.
Am J Hum Genet. 2007 Jan;80(1):1-11.
Bardet-Biedl syndrome (BBS) is primarily an autosomal recessive ciliopathy characterized by progressive retinal degeneration, obesity, cognitive impairment, polydactyly, and kidney anomalies. The disorder is genetically heterogeneous, with 11 BBS genes identified to date, which account for ~70% of affected families. We have combined single-nucleotide-polymorphism array homozygosity mapping with in silico analysis to identify a new BBS gene, BBS12. Patients from two Gypsy families were homozygous and haploidentical in a 6-Mb region of chromosome 4q27. FLJ35630 was selected as a candidate gene, because it was predicted to encode a protein with similarity to members of the type II chaperonin superfamily, which includes BBS6 and BBS10. We found pathogenic mutations in both Gypsy families, as well as in 14 other families of various ethnic backgrounds, indicating that BBS12 accounts for approximately 5% of all BBS cases. BBS12 is vertebrate specific and, together with BBS6 and BBS10, defines a novel branch of the type II chaperonin superfamily. These three genes are characterized by unusually rapid evolution and are likely to perform ciliary functions specific to vertebrates that are important in the pathophysiology of the syndrome, and together they account for about one-third of the total BBS mutational load. Consistent with this notion, suppression of each family member in zebrafish yielded gastrulation-movement defects characteristic of other BBS morphants, whereas simultaneous suppression of all three members resulted in severely affected embryos, possibly hinting at partial functional redundancy within this protein family. [Abstract/Link to Full Text]

Syrris P, Ward D, Evans A, Asimaki A, Gandjbakhch E, Sen-Chowdhry S, McKenna WJ
Arrhythmogenic right ventricular dysplasia/cardiomyopathy associated with mutations in the desmosomal gene desmocollin-2.
Am J Hum Genet. 2006 Nov;79(5):978-84.
Arrhythmogenic right ventricular dysplasia/cardiomyopathy (ARVD/C) is an inherited myocardial disorder associated with arrhythmias, heart failure, and sudden death. To date, mutations in four genes encoding major desmosomal proteins (plakoglobin, desmoplakin, plakophilin-2, and desmoglein-2) have been implicated in the pathogenesis of ARVD/C. We screened 77 probands with ARVD/C for mutations in desmocollin-2 (DSC2), a gene coding for a desmosomal cadherin. Two heterozygous mutations--a deletion and an insertion--were identified in four probands. Both mutations result in frameshifts and premature truncation of the desmocollin-2 protein. For the first time, we have identified mutations in desmocollin-2 in patients with ARVD/C, a finding that is consistent with the hypothesis that ARVD/C is a disease of the desmosome. [Abstract/Link to Full Text]

Wycisk KA, Zeitz C, Feil S, Wittmer M, Forster U, Neidhardt J, Wissinger B, Zrenner E, Wilke R, Kohl S, Berger W
Mutation in the auxiliary calcium-channel subunit CACNA2D4 causes autosomal recessive cone dystrophy.
Am J Hum Genet. 2006 Nov;79(5):973-7.
Retinal signal transmission depends on the activity of high voltage-gated l-type calcium channels in photoreceptor ribbon synapses. We recently identified a truncating frameshift mutation in the Cacna2d4 gene in a spontaneous mouse mutant with profound loss of retinal signaling and an abnormal morphology of ribbon synapses in rods and cones. The Cacna2d4 gene encodes an l-type calcium-channel auxiliary subunit of the alpha (2) delta type. Mutations in its human orthologue, CACNA2D4, were not yet known to be associated with a disease. We performed mutation analyses of 34 patients who received an initial diagnosis of night blindness, and, in two affected siblings, we detected a homozygous nucleotide substitution (c.2406C-->A) in CACNA2D4. The mutation introduces a premature stop codon that truncates one-third of the corresponding open reading frame. Both patients share symptoms of slowly progressing cone dystrophy. These findings represent the first report of a mutation in the human CACNA2D4 gene and define a novel gene defect that causes autosomal recessive cone dystrophy. [Abstract/Link to Full Text]

Feuk L, Kalervo A, Lipsanen-Nyman M, Skaug J, Nakabayashi K, Finucane B, Hartung D, Innes M, Kerem B, Nowaczyk MJ, Rivlin J, Roberts W, Senman L, Summers A, Szatmari P, Wong V, Vincent JB, Zeesman S, Osborne LR, Cardy JO, Kere J, Scherer SW, Hannula-Jouppi K
Absence of a paternally inherited FOXP2 gene in developmental verbal dyspraxia.
Am J Hum Genet. 2006 Nov;79(5):965-72.
Mutations in FOXP2 cause developmental verbal dyspraxia (DVD), but only a few cases have been described. We characterize 13 patients with DVD--5 with hemizygous paternal deletions spanning the FOXP2 gene, 1 with a translocation interrupting FOXP2, and the remaining 7 with maternal uniparental disomy of chromosome 7 (UPD7), who were also given a diagnosis of Silver-Russell Syndrome (SRS). Of these individuals with DVD, all 12 for whom parental DNA was available showed absence of a paternal copy of FOXP2. Five other individuals with deletions of paternally inherited FOXP2 but with incomplete clinical information or phenotypes too complex to properly assess are also described. Four of the patients with DVD also meet criteria for autism spectrum disorder. Individuals with paternal UPD7 or with partial maternal UPD7 or deletion starting downstream of FOXP2 do not have DVD. Using quantitative real-time polymerase chain reaction, we show the maternally inherited FOXP2 to be comparatively underexpressed. Our results indicate that absence of paternal FOXP2 is the cause of DVD in patients with SRS with maternal UPD7. The data also point to a role for differential parent-of-origin expression of FOXP2 in human speech development. [Abstract/Link to Full Text]

Spencer DH, Bubb KL, Olson MV
Detecting disease-causing mutations in the human genome by haplotype matching.
Am J Hum Genet. 2006 Nov;79(5):958-64.
Comparisons between haplotypes from affected patients and the human reference genome are frequently used to identify candidates for disease-causing mutations, even though these alignments are expected to reveal a high level of background neutral polymorphism. This limits the scope of genetic studies to relatively small genomic intervals, because current methods for distinguishing potential causal mutations from neutral variation are inefficient. Here we describe a new strategy for detecting mutations that is based on comparing affected haplotypes with closely matched control sequences from healthy individuals, rather than with the human reference genome. We use theory, simulation, and a real data set to show that this approach is expected to reduce the number of sequence variants that must be subjected to follow-up analysis by at least a factor of 20 when closely matched control sequences are selected from a reference panel with as few as 100 control genomes. We also define a reference data resource that would allow efficient application of this strategy to large critical intervals across the genome. [Abstract/Link to Full Text]

Konrad M, Schaller A, Seelow D, Pandey AV, Waldegger S, Lesslauer A, Vitzthum H, Suzuki Y, Luk JM, Becker C, Schlingmann KP, Schmid M, Rodriguez-Soriano J, Ariceta G, Cano F, Enriquez R, Juppner H, Bakkaloglu SA, Hediger MA, Gallati S, Neuhauss SC, Nurnberg P, Weber S
Mutations in the tight-junction gene claudin 19 (CLDN19) are associated with renal magnesium wasting, renal failure, and severe ocular involvement.
Am J Hum Genet. 2006 Nov;79(5):949-57.
Claudins are major components of tight junctions and contribute to the epithelial-barrier function by restricting free diffusion of solutes through the paracellular pathway. We have mapped a new locus for recessive renal magnesium loss on chromosome 1p34.2 and have identified mutations in CLDN19, a member of the claudin multigene family, in patients affected by hypomagnesemia, renal failure, and severe ocular abnormalities. CLDN19 encodes the tight-junction protein claudin-19, and we demonstrate high expression of CLDN19 in renal tubules and the retina. The identified mutations interfere severely with either cell-membrane trafficking or the assembly of the claudin-19 protein. The identification of CLDN19 mutations in patients with chronic renal failure and severe visual impairment supports the fundamental role of claudin-19 for normal renal tubular function and undisturbed organization and development of the retina. [Abstract/Link to Full Text]

Khateeb S, Flusser H, Ofir R, Shelef I, Narkis G, Vardi G, Shorer Z, Levy R, Galil A, Elbedour K, Birk OS
PLA2G6 mutation underlies infantile neuroaxonal dystrophy.
Am J Hum Genet. 2006 Nov;79(5):942-8.
Infantile neuroaxonal dystrophy (INAD) is an autosomal recessive progressive neurodegenerative disease that presents within the first 2 years of life and culminates in death by age 10 years. Affected individuals from two unrelated Bedouin Israeli kindreds were studied. Brain imaging demonstrated diffuse cerebellar atrophy and abnormal iron deposition in the medial and lateral globus pallidum. Progressive white-matter disease and reduction of the N-acetyl aspartate : chromium ratio were evident on magnetic resonance spectroscopy, suggesting loss of myelination. The clinical and radiological diagnosis of INAD was verified by sural nerve biopsy. The disease gene was mapped to a 1.17-Mb locus on chromosome 22q13.1 (LOD score 4.7 at recombination fraction 0 for SNP rs139897), and an underlying mutation common to both affected families was identified in PLA2G6, the gene encoding phospholipase A2 group VI (cytosolic, calcium-independent). These findings highlight a role of phospholipase in neurodegenerative disorders. [Abstract/Link to Full Text]

Toydemir RM, Brassington AE, Bayrak-Toydemir P, Krakowiak PA, Jorde LB, Whitby FG, Longo N, Viskochil DH, Carey JC, Bamshad MJ
A novel mutation in FGFR3 causes camptodactyly, tall stature, and hearing loss (CATSHL) syndrome.
Activating mutations of FGFR3, a negative regulator of bone growth, are well known to cause a variety of short-limbed bone dysplasias and craniosynostosis syndromes. We mapped the locus causing a novel disorder characterized by camptodactyly, tall stature, scoliosis, and hearing loss (CATSHL syndrome) to chromosome 4p. Because this syndrome recapitulated the phenotype of the Fgfr3 knockout mouse, we screened FGFR3 and subsequently identified a heterozygous missense mutation that is predicted to cause a p.R621H substitution in the tyrosine kinase domain and partial loss of FGFR3 function. These findings indicate that abnormal FGFR3 signaling can cause human anomalies by promoting as well as inhibiting endochondral bone growth. [Abstract/Link to Full Text]

Mutation-positive and mutation-negative patients with Cowden and Bannayan-Riley-Ruvalcaba syndromes associated with distinct 10q haplotypes.
Phosphatase and tensin homolog deleted on chromosome 10 (PTEN) encodes a tumor-suppressor phosphatase frequently mutated in both sporadic and heritable forms of human cancer. Germline mutations are associated with a number of heritable cancer syndromes that are jointly referred to as the "PTEN hamartoma tumor syndrome" (PHTS) and include Cowden syndrome, Bannayan-Riley-Ruvalcaba syndrome, Proteus syndrome, and Proteus-like syndrome. Germline PTEN mutations have been identified in a significant proportion of patients with PHTS; however, there are still many individuals with classic diagnostic features for whom mutations have yet to be identified. To address this, we took a haplotype-based approach and investigated the association of specific genomic regions of the PTEN locus with PHTS. We found this locus to be characterized by three distinct haplotype blocks 33 kb, 65 kb, and 43 kb in length. Comparisons of the haplotype distributions for all three blocks differed significantly among patients with PHTS and controls (P=.0098, P<.0001, and P<.0001 for blocks 1, 2, and 3, respectively). "Rare" haplotype blocks and extended haplotypes account for two-to-threefold more PHTS chromosomes than control chromosomes. PTEN mutation-negative patients are strongly associated with a haplotype block spanning a region upstream of PTEN and the gene's first intron (P=.0027). Furthermore, allelic combinations contribute to the phenotypic complexity of this syndrome. Taken together, these data suggest that specific haplotypes and rare alleles underlie the disease etiology in these sample populations; constitute low-penetrance, modifying loci; and, specifically in the case of patients with PHTS for whom traditional mutations have yet to be identified, may harbor pathogenic variant(s) that have escaped detection by standard PTEN mutation-scanning methodologies. [Abstract/Link to Full Text]

Minichiello MJ, Durbin R
Mapping trait loci by use of inferred ancestral recombination graphs.
Large-scale association studies are being undertaken with the hope of uncovering the genetic determinants of complex disease. We describe a computationally efficient method for inferring genealogies from population genotype data and show how these genealogies can be used to fine map disease loci and interpret association signals. These genealogies take the form of the ancestral recombination graph (ARG). The ARG defines a genealogical tree for each locus, and, as one moves along the chromosome, the topologies of consecutive trees shift according to the impact of historical recombination events. There are two stages to our analysis. First, we infer plausible ARGs, using a heuristic algorithm, which can handle unphased and missing data and is fast enough to be applied to large-scale studies. Second, we test the genealogical tree at each locus for a clustering of the disease cases beneath a branch, suggesting that a causative mutation occurred on that branch. Since the true ARG is unknown, we average this analysis over an ensemble of inferred ARGs. We have characterized the performance of our method across a wide range of simulated disease models. Compared with simpler tests, our method gives increased accuracy in positioning untyped causative loci and can also be used to estimate the frequencies of untyped causative alleles. We have applied our method to Ueda et al.'s association study of CTLA4 and Graves disease, showing how it can be used to dissect the association signal, giving potentially interesting results of allelic heterogeneity and interaction. Similar approaches analyzing an ensemble of ARGs inferred using our method may be applicable to many other problems of inference from population genotype data. [Abstract/Link to Full Text]

Mutsuddi M, Morris DW, Waggoner SG, Daly MJ, Scolnick EM, Sklar P
Analysis of high-resolution HapMap of DTNBP1 (Dysbindin) suggests no consistency between reported common variant associations and schizophrenia.
DTNBP1 was first identified as a putative schizophrenia-susceptibility gene in Irish pedigrees, with a report of association to common genetic variation. Several replication studies have reported confirmation of an association to DTNBP1 in independent European samples; however, reported risk alleles and haplotypes appear to differ between studies, and comparison among studies has been confounded because different marker sets were employed by each group. To facilitate evaluation of existing evidence of association and further work, we supplemented the extensive genotype data, available through the International HapMap Project (HapMap), about DTNBP1 by specifically typing all associated single-nucleotide polymorphisms reported in each of the studies of the Centre d'Etude du Polymorphisme Humain (CEPH)-derived HapMap sample (CEU). Using this high-density reference map, we compared the putative disease-associated haplotype from each study and found that the association studies are inconsistent with regard to the identity of the disease-associated haplotype at DTNBP1. Specifically, all five "replication" studies define a positively associated haplotype that is different from the association originally reported. We further demonstrate that, in all six studies, the European-derived populations studied have haplotype patterns and frequencies that are consistent with HapMap CEU samples (and each other). Thus, it is unlikely that population differences are creating the inconsistency of the association studies. Evidence of association is, at present, equivocal and unsatisfactory. The new dense map of the region may be valuable in more-comprehensive follow-up studies. [Abstract/Link to Full Text]

Lindsay SJ, Khajavi M, Lupski JR, Hurles ME
A chromosomal rearrangement hotspot can be identified from population genetic variation and is coincident with a hotspot for allelic recombination.
Insights into the origins of structural variation and the mutational mechanisms underlying genomic disorders would be greatly improved by a genomewide map of hotspots of nonallelic homologous recombination (NAHR). Moreover, our understanding of sequence variation within the duplicated sequences that are substrates for NAHR lags far behind that of sequence variation within the single-copy portion of the genome. Perhaps the best-characterized NAHR hotspot lies within the 24-kb-long Charcot-Marie-Tooth disease type 1A (CMT1A)-repeats (REPs) that sponsor deletions and duplications that cause peripheral neuropathies. We investigated structural and sequence diversity within the CMT1A-REPs, both within and between species. We discovered a high frequency of retroelement insertions, accelerated sequence evolution after duplication, extensive paralogous gene conversion, and a greater than twofold enrichment of SNPs in humans relative to the genome average. We identified an allelic recombination hotspot underlying the known NAHR hotspot, which suggests that the two processes are intimately related. Finally, we used our data to develop a novel method for inferring the location of an NAHR hotspot from sequence variation within segmental duplications and applied it to identify a putative NAHR hotspot within the LCR22 repeats that sponsor velocardiofacial syndrome deletions. We propose that a large-scale project to map sequence variation within segmental duplications would reveal a wealth of novel chromosomal-rearrangement hotspots. [Abstract/Link to Full Text]

Wimplinger I, Morleo M, Rosenberger G, Iaconis D, Orth U, Meinecke P, Lerer I, Ballabio A, Gal A, Franco B, Kutsche K
Mutations of the mitochondrial holocytochrome c-type synthase in X-linked dominant microphthalmia with linear skin defects syndrome.
The microphthalmia with linear skin defects syndrome (MLS, or MIDAS) is an X-linked dominant male-lethal disorder almost invariably associated with segmental monosomy of the Xp22 region. In two female patients, from two families, with MLS and a normal karyotype, we identified heterozygous de novo point mutations--a missense mutation (p.R217C) and a nonsense mutation (p.R197X)--in the HCCS gene. HCCS encodes the mitochondrial holocytochrome c-type synthase that functions as heme lyase by covalently adding the prosthetic heme group to both apocytochrome c and c(1). We investigated a third family, displaying phenotypic variability, in which the mother and two of her daughters carry an 8.6-kb submicroscopic deletion encompassing part of the HCCS gene. Functional analysis demonstrates that both mutant proteins (R217C and Delta 197-268) were unable to complement a Saccharomyces cerevisiae mutant deficient for the HCCS orthologue Cyc3p, in contrast to wild-type HCCS. Moreover, ectopically expressed HCCS wild-type and the R217C mutant protein are targeted to mitochondria in CHO-K1 cells, whereas the C-terminal-truncated Delta 197-268 mutant failed to be sorted to mitochondria. Cytochrome c, the final product of holocytochrome c-type synthase activity, is implicated in both oxidative phosphorylation (OXPHOS) and apoptosis. We hypothesize that the inability of HCCS-deficient cells to undergo cytochrome c-mediated apoptosis may push cell death toward necrosis that gives rise to severe deterioration of the affected tissues. In summary, we suggest that disturbance of both OXPHOS and the balance between apoptosis and necrosis, as well as the X-inactivation pattern, may contribute to the variable phenotype observed in patients with MLS. [Abstract/Link to Full Text]

Smeitink JA, Elpeleg O, Antonicka H, Diepstra H, Saada A, Smits P, Sasarman F, Vriend G, Jacob-Hirsch J, Shaag A, Rechavi G, Welling B, Horst J, Rodenburg RJ, van den Heuvel B, Shoubridge EA
Distinct clinical phenotypes associated with a mutation in the mitochondrial translation elongation factor EFTs.
The 13 polypeptides encoded in mitochondrial DNA (mtDNA) are synthesized in the mitochondrial matrix on a dedicated protein-translation apparatus that resembles that found in prokaryotes. Here, we have investigated the genetic basis for a mitochondrial protein-synthesis defect associated with a combined oxidative phosphorylation enzyme deficiency in two patients, one of whom presented with encephalomyopathy and the other with hypertrophic cardiomyopathy. Sequencing of candidate genes revealed the same homozygous mutation (C997T) in both patients in TSFM, a gene coding for the mitochondrial translation elongation factor EFTs. EFTs functions as a guanine nucleotide exchange factor for EFTu, another translation elongation factor that brings aminoacylated transfer RNAs to the ribosomal A site as a ternary complex with guanosine triphosphate. The mutation predicts an Arg333Trp substitution at an evolutionarily conserved site in a subdomain of EFTs that interacts with EFTu. Molecular modeling showed that the substitution disrupts local subdomain structure and the dimerization interface. The steady-state levels of EFTs and EFTu in patient fibroblasts were reduced by 75% and 60%, respectively, and the amounts of assembled complexes I, IV, and V were reduced by 35%-91% compared with the amounts in controls. These phenotypes and the translation defect were rescued by retroviral expression of either EFTs or EFTu. These data clearly establish mutant EFTs as the cause of disease in these patients. The fact that the same mutation is associated with distinct clinical phenotypes suggests the presence of genetic modifiers of the mitochondrial translation apparatus. [Abstract/Link to Full Text]

Zhou H, Brockington M, Jungbluth H, Monk D, Stanier P, Sewry CA, Moore GE, Muntoni F
Epigenetic allele silencing unveils recessive RYR1 mutations in core myopathies.
Epigenetic regulation of gene expression is a source of genetic variation, which can mimic recessive mutations by creating transcriptional haploinsufficiency. Germline epimutations and genomic imprinting are typical examples, although their existence can be difficult to reveal. Genomic imprinting can be tissue specific, with biallelic expression in some tissues and monoallelic expression in others or with polymorphic expression in the general population. Mutations in the skeletal-muscle ryanodine-receptor gene (RYR1) are associated with malignant hyperthermia susceptibility and the congenital myopathies central core disease and multiminicore disease. RYR1 has never been thought to be affected by epigenetic regulation. However, during the RYR1-mutation analysis of a cohort of patients with recessive core myopathies, we discovered that 6 (55%) of 11 patients had monoallelic RYR1 transcription in skeletal muscle, despite being heterozygous at the genomic level. In families for which parental DNA was available, segregation studies showed that the nonexpressed allele was maternally inherited. Transcription analysis in patients' fibroblasts and lymphoblastoid cell lines indicated biallelic expression, which suggests tissue-specific silencing. Transcription analysis of normal human fetal tissues showed that RYR1 was monoallelically expressed in skeletal and smooth muscles, brain, and eye in 10% of cases. In contrast, 25 normal adult human skeletal-muscle samples displayed only biallelic expression. Finally, the administration of the DNA methyltransferase inhibitor 5-aza-deoxycytidine to cultured patient skeletal-muscle myoblasts reactivated the transcription of the silenced allele, which suggests hypermethylation as a mechanism for RYR1 silencing. Our data indicate that RYR1 undergoes polymorphic, tissue-specific, and developmentally regulated allele silencing and that this unveils recessive mutations in patients with core myopathies. Furthermore, our data suggest that imprinting is a likely mechanism for this phenomenon and that similar mechanisms could play a role in human phenotypic heterogeneity. [Abstract/Link to Full Text]

Wijsman EM, Rothstein JH, Thompson EA
Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain-Monte Carlo provides practical approaches for genome scans on general pedigrees.
Computations for genome scans need to adapt to the increasing use of dense diallelic markers as well as of full-chromosome multipoint linkage analysis with either diallelic or multiallelic markers. Whereas suitable exact-computation tools are available for use with small pedigrees, equivalent exact computation for larger pedigrees remains infeasible. Markov chain-Monte Carlo (MCMC)-based methods currently provide the only computationally practical option. To date, no systematic comparison of the performance of MCMC-based programs is available, nor have these programs been systematically evaluated for use with dense diallelic markers. Using simulated data, we evaluate the performance of two MCMC-based linkage-analysis programs--lm_markers from the MORGAN package and SimWalk2--under a variety of analysis conditions. Pedigrees consisted of 14, 52, or 98 individuals in 3, 5, or 6 generations, respectively, with increasing amounts of missing data in larger pedigrees. One hundred replicates of markers and trait data were simulated on a 100-cM chromosome, with up to 10 multiallelic and up to 200 diallelic markers used simultaneously for computation of multipoint LOD scores. Exact computation was available for comparison in most situations, and comparison with a perfectly informative marker or interprogram comparison was available in the remaining situations. Our results confirm the accuracy of both programs in multipoint analysis with multiallelic markers on pedigrees of varied sizes and missing-data patterns, but there are some computational differences. In contrast, for large numbers of dense diallelic markers, only the lm_markers program was able to provide accurate results within a computationally practical time. Thus, programs in the MORGAN package are the first available to provide a computationally practical option for accurate linkage analyses in genome scans with both large numbers of diallelic markers and large pedigrees. [Abstract/Link to Full Text]

Zhao J, Jin L, Xiong M
Test for interaction between two unlinked loci.
Despite the growing consensus on the importance of testing gene-gene interactions in genetic studies of complex diseases, the effect of gene-gene interactions has often been defined as a deviance from genetic additive effects, which is essentially treated as a residual term in genetic analysis and leads to low power in detecting the presence of interacting effects. To what extent the definition of gene-gene interaction at population level reflects the genes' biochemical or physiological interaction remains a mystery. In this article, we introduce a novel definition and a new measure of gene-gene interaction between two unlinked loci (or genes). We developed a general theory for studying linkage disequilibrium (LD) patterns in disease population under two-locus disease models. The properties of using the LD measure in a disease population as a function of the measure of gene-gene interaction between two unlinked loci were also investigated. We examined how interaction between two loci creates LD in a disease population and showed that the mathematical formulation of the new definition for gene-gene interaction between two loci was similar to that of the LD between two loci. This finding motived us to develop an LD-based statistic to detect gene-gene interaction between two unlinked loci. The null distribution and type I error rates of the LD-based statistic for testing gene-gene interaction were validated using extensive simulation studies. We found that the new test statistic was more powerful than the traditional logistic regression under three two-locus disease models and demonstrated that the power of the test statistic depends on the measure of gene-gene interaction. We also investigated the impact of using tagging SNPs for testing interaction on the power to detect interaction between two unlinked loci. Finally, to evaluate the performance of our new method, we applied the LD-based statistic to two published data sets. Our results showed that the P values of the LD-based statistic were smaller than those obtained by other approaches, including logistic regression models. [Abstract/Link to Full Text]

Gasper J, Swanson WJ
Molecular population genetics of the gene encoding the human fertilization protein zonadhesin reveals rapid adaptive evolution.
A hallmark of positive selection (adaptive evolution) in protein-coding regions is a d(N)/d(S) ratio >1, where d(N) is the number of nonsynonymous substitutions/nonsynonymous sites and d(S) is the number of synonymous substitutions/synonymous sites. Zonadhesin is a male reproductive protein localized on the sperm head, comprising many domains known to be involved in cell-cell interaction or cell adhesion. Previous studies have shown that VWD domains (homologous to the D domains of the von Willebrand factor) are involved directly in binding to the female zona pellucida (ZP) in a species-specific manner. In this study, we sequenced 47 coding exons in 12 primate species and, by using maximum-likelihood methods to determine sites under positive selection, we show that VWD2, membrane/A5 antigen mu receptor, and mucin-like domains in zonadhesin are rapidly evolving and, thus, may be involved in binding to the ZP in a species-specific manner in primates. In addition, polymorphism data from 48 human individuals revealed significant polymorphism-to-divergence heterogeneity and a significant departure from equilibrium-neutral expectations in the frequency spectrum, suggesting balancing selection and positive selection occurring in zonadhesin (ZAN) within human populations. Finally, we observe adaptive evolution in haplotypes segregating for a frameshift mutation that was previously thought to indicate that ZAN was a potential pseudogene. [Abstract/Link to Full Text]

Hrebícek M, Mrázová L, Seyrantepe V, Durand S, Roslin NM, Nosková L, Hartmannová H, Ivánek R, Cízkova A, Poupetová H, Sikora J, Urinovská J, Stranecký V, Zeman J, Lepage P, Roquis D, Verner A, Ausseil J, Beesley CE, Maire I, Poorthuis BJ, van de Kamp J, van Diggelen OP, Wevers RA, Hudson TJ, Fujiwara TM, Majewski J, Morgan K, Kmoch S, Pshezhetsky AV
Mutations in TMEM76* cause mucopolysaccharidosis IIIC (Sanfilippo C syndrome).
Mucopolysaccharidosis IIIC (MPS IIIC, or Sanfilippo C syndrome) is a lysosomal storage disorder caused by the inherited deficiency of the lysosomal membrane enzyme acetyl-coenzyme A: alpha -glucosaminide N-acetyltransferase (N-acetyltransferase), which leads to impaired degradation of heparan sulfate. We report the narrowing of the candidate region to a 2.6-cM interval between D8S1051 and D8S1831 and the identification of the transmembrane protein 76 gene (TMEM76), which encodes a 73-kDa protein with predicted multiple transmembrane domains and glycosylation sites, as the gene that causes MPS IIIC when it is mutated. Four nonsense mutations, 3 frameshift mutations due to deletions or a duplication, 6 splice-site mutations, and 14 missense mutations were identified among 30 probands with MPS IIIC. Functional expression of human TMEM76 and the mouse ortholog demonstrates that it is the gene that encodes the lysosomal N-acetyltransferase and suggests that this enzyme belongs to a new structural class of proteins that transport the activated acetyl residues across the cell membrane. [Abstract/Link to Full Text]

Wessel J, Schork NJ
Generalized genomic distance-based regression methodology for multilocus association analysis.
Large-scale, multilocus genetic association studies require powerful and appropriate statistical-analysis tools that are designed to relate genotype and haplotype information to phenotypes of interest. Many analysis approaches consider relating allelic, haplotypic, or genotypic information to a trait through use of extensions of traditional analysis techniques, such as contingency-table analysis, regression methods, and analysis-of-variance techniques. In this work, we consider a complementary approach that involves the characterization and measurement of the similarity and dissimilarity of the allelic composition of a set of individuals' diploid genomes at multiple loci in the regions of interest. We describe a regression method that can be used to relate variation in the measure of genomic dissimilarity (or "distance") among a set of individuals to variation in their trait values. Weighting factors associated with functional or evolutionary conservation information of the loci can be used in the assessment of similarity. The proposed method is very flexible and is easily extended to complex multilocus-analysis settings involving covariates. In addition, the proposed method actually encompasses both single-locus and haplotype-phylogeny analysis methods, which are two of the most widely used approaches in genetic association analysis. We showcase the method with data described in the literature. Ultimately, our method is appropriate for high-dimensional genomic data and anticipates an era when cost-effective exhaustive DNA sequence data can be obtained for a large number of individuals, over and above genotype information focused on a few well-chosen loci. [Abstract/Link to Full Text]

Mirault ME, Boucher P, Tremblay A
Nucleotide-resolution mapping of topoisomerase-mediated and apoptotic DNA strand scissions at or near an MLL translocation hotspot.
The emergence of therapy-related acute myeloid leukemia (t-AML) has been associated with DNA topoisomerase II (TOP2)-targeted drug treatments and chromosomal translocations frequently involving the MLL, or ALL-1, gene. Two distinct mechanisms have been implicated as potential triggers of t-AML translocations: TOP2-mediated DNA cleavage and apoptotic higher-order chromatin fragmentation. Assessment of the role of TOP2 in this process has been hampered by a lack of techniques allowing in vivo mapping of TOP2-mediated DNA cleavage at nucleotide resolution in single-copy genes. A novel method, extension ligation-mediated polymerase chain reaction (ELMPCR), was used here for mapping topoisomerase-mediated DNA strand breaks and apoptotic DNA cleavage across a translocation-prone region of MLL in human cells. We report the first genomic map integrating translocation breakpoints and topoisomerase I, TOP2, and apoptotic DNA cleavage sites at nucleotide resolution across an MLL region harboring a t-AML translocation hotspot. This hotspot is flanked by a TOP2 cleavage site and is localized at one extremity of a minor apoptotic cleavage region, where multiple single- and double-strand breaks were induced by caspase-activated apoptotic nucleases. This cleavage pattern was in sharp contrast to that observed approximately 200 bp downstream in the exon 12 region, which displayed much stronger apoptotic cleavage but where no double-strand breaks were detected and no t-AML-associated breakpoints were reported. The localization and remarkable clustering of the t-AML breakpoints cannot be explained simply by the DNA cleavage patterns but might result from potential interactions between TOP2 poisoning, apoptotic DNA cleavage, and DNA repair attempts at specific sites of higher-order chromatin structure in apoptosis-evading cells. ELMPCR provides a new tool for investigating the role of DNA topoisomerases in fundamental genetic processes and translocations associated with cancer treatments involving topoisomerase-targeted drugs. [Abstract/Link to Full Text]

Meuwissen TH, Goddard ME
Multipoint identity-by-descent prediction using dense markers to map quantitative trait loci and estimate effective population size.
A novel multipoint method, based on an approximate coalescence approach, to analyze multiple linked markers is presented. Unlike other approximate coalescence methods, it considers all markers simultaneously but only two haplotypes at a time. We demonstrate the use of this method for linkage disequilibrium (LD) mapping of QTL and estimation of effective population size. The method estimates identity-by-descent (IBD) probabilities between pairs of marker haplotypes. Both LD and combined linkage and LD mapping rely on such IBD probabilities. The method is approximate in that it considers only the information on a pair of haplotypes, whereas a full modeling of the coalescence process would simultaneously consider all haplotypes. However, full coalescence modeling is computationally feasible only for few linked markers. Using simulations of the coalescence process, the method is shown to give almost unbiased estimates of the effective population size. Compared to direct marker and haplotype association analyses, IBD-based QTL mapping showed clearly a higher power to detect a QTL and a more realistic confidence interval for its position. The modeling of LD could be extended to estimate other LD-related parameters such as recombination rates. [Abstract/Link to Full Text]

Song Y, Fee L, Lee TH, Wharton RP
The molecular chaperone Hsp90 is required for mRNA localization in Drosophila melanogaster embryos.
Localization of maternal nanos mRNA to the posterior pole is essential for development of both the abdominal segments and primordial germ cells in the Drosophila embryo. Unlike maternal mRNAs such as bicoid and oskar that are localized by directed transport along microtubules, nanos is thought to be trapped as it swirls past the posterior pole during cytoplasmic streaming. Anchoring of nanos depends on integrity of the actin cytoskeleton and the pole plasm; other factors involved specifically in its localization have not been described to date. Here we use genetic approaches to show that the Hsp90 chaperone (encoded by Hsp83 in Drosophila) is a localization factor for two mRNAs, nanos and pgc. Other components of the pole plasm are localized normally when Hsp90 function is partially compromised, suggesting a specific role for the chaperone in localization of nanos and pgc mRNAs. Although the mechanism by which Hsp90 acts is unclear, we find that levels of the LKB1 kinase are reduced in Hsp83 mutant egg chambers and that localization of pgc (but not nos) is rescued upon overexpression of LKB1 in such mutants. These observations suggest that LKB1 is a primary Hsp90 target for pgc localization and that other Hsp90 partners mediate localization of nos. [Abstract/Link to Full Text]

Grote MN
A covariance structure model for the admixture of binary genetic variation.
I derive a covariance structure model for pairwise linkage disequilibrium (LD) between binary markers in a recently admixed population and use a generalized least-squares method to fit the model to two different data sets. Both linked and unlinked marker pairs are incorporated in the model. Under the model, a pairwise LD matrix is decomposed into two component matrices, one containing LD attributable to admixture, and another containing, in an aggregate form, LD specific to the populations forming the mixture. I use population genetics theory to show that the latter matrix has block-diagonal structure. For the data sets considered here, I show that the number of source populations can be determined by statistical inference on the canonical correlations of the sample LD matrix. [Abstract/Link to Full Text]

Meligkotsidou L, Fearnhead P
Postprocessing of genealogical trees.
We consider inference for demographic models and parameters based upon postprocessing the output of an MCMC method that generates samples of genealogical trees (from the posterior distribution for a specific prior distribution of the genealogy). This approach has the advantage of taking account of the uncertainty in the inference for the tree when making inferences about the demographic model and can be computationally efficient in terms of reanalyzing data under a wide variety of models. We consider a (simulation-consistent) estimate of the likelihood for variable population size models, which uses importance sampling, and propose two new approximate likelihoods, one for migration models and one for continuous spatial models. [Abstract/Link to Full Text]

Kamau E, Charlesworth B, Charlesworth D
Linkage disequilibrium and recombination rate estimates in the self-incompatibility region of Arabidopsis lyrata.
Genetic diversity is unusually high at loci in the S-locus region of the self-incompatible species of the flowering plant, Arabidopsis lyrata, not just in the S loci themselves, but also at two nearby loci. In a previous study of a single natural population from Iceland, we attributed this elevated polymorphism to linkage disequilibrium (LD) between variants at loci close to the S locus and the S alleles, which are maintained in the population by balancing selection. With the four S-flanking loci whose diversity we previously studied, we could not determine the extent of the region linked to the S loci in which neutral sites are affected. We also could not exclude the possibility of a population bottleneck, or of admixture, as causes of the LD. We have now studied four more distant loci flanking the S-locus region, and more populations, and we analyze the results using a theoretical model of the effect of balancing selection on diversity at linked neutral sites within and between different functional S-allelic classes. In the model, diversity is a function of the number of selectively maintained alleles and the recombination distances from the selectively maintained sites. We use the model to estimate the number of different functional S alleles, their turnover rate, and recombination rates between the S-locus region and other loci. Our estimates suggest that there is a small region of very low recombination surrounding the S-locus region. [Abstract/Link to Full Text]

Smolikov S, Eizinger A, Schild-Prufert K, Hurlburt A, McDonald K, Engebrecht J, Villeneuve AM, Colaiácovo MP
SYP-3 restricts synaptonemal complex assembly to bridge paired chromosome axes during meiosis in Caenorhabditis elegans.
Synaptonemal complex (SC) formation must be regulated to occur only between aligned pairs of homologous chromosomes, ultimately ensuring proper chromosome segregation in meiosis. Here we identify SYP-3, a coiled-coil protein that is required for assembly of the central region of the SC and for restricting its loading to occur only in an appropriate context, forming structures that bridge the axes of paired meiotic chromosomes in Caenorhabditis elegans. We find that inappropriate loading of central region proteins interferes with homolog pairing, likely by triggering a premature change in chromosome configuration during early prophase that terminates the search for homologs. As a result, syp-3 mutants lack chiasmata and exhibit increased chromosome mis-segregation. Altogether, our studies lead us to propose that SYP-3 regulates synapsis along chromosomes, contributing to meiotic progression in early prophase. [Abstract/Link to Full Text]

Tsumura Y, Kado T, Takahashi T, Tani N, Ujino-Ihara T, Iwata H
Genome scan to detect genetic structure and adaptive genes of natural populations of Cryptomeria japonica.
We investigated 29 natural populations of Cryptomeria japonica using 148 cleaved amplified polymorphic sequence markers to elucidate their genetic structure and identify candidate adaptive genes of this species. In accordance with the inferred evolutionary history of the species during and after the last glacial episode, the genetic diversity was higher in western populations than in northern populations. The results of phylogenetic and genetic structure analyses suggest that populations of the two main varieties of the species have clearly diverged from each other and that two of the examined loci are strongly associated with the differentiation between the two varieties. Using a coalescent simulation based on F(ST) and H(e) values, we detected five genes that had higher, and two that had lower, values than the respective 99% confidence intervals (C.I.s) that are theoretically expected intervals under a neutral infinite-island model. We also detected 13 outlier loci using a coalescent simulation based on the assumption that the 2 varieties originated from the splitting of an ancestral population. Four of these loci were detected by both methods, two of which were detected in a genetic structure analysis as loci associated with differentiation between the two varieties of the species, and are strong candidates for genes that have been subject to selection. [Abstract/Link to Full Text]

Zurita-Martinez SA, Puria R, Pan X, Boeke JD, Cardenas ME
Efficient Tor signaling requires a functional class C Vps protein complex in Saccharomyces cerevisiae.
The Tor kinases regulate responses to nutrients and control cell growth. Unlike most organisms that only contain one Tor protein, Saccharomyces cerevisiae expresses two, Tor1 and Tor2, which are thought to share all of the rapamycin-sensitive functions attributable to Tor signaling. Here we conducted a genetic screen that defined the global TOR1 synthetic fitness or lethal interaction gene network. This screen identified mutations in distinctive functional categories that impaired vacuolar function, including components of the EGO/Gse and PAS complexes that reduce fitness. In addition, tor1 is lethal in combination with mutations in class C Vps complex components. We find that Tor1 does not regulate the known function of the class C Vps complex in protein sorting. Instead class C vps mutants fail to recover from rapamycin-induced growth arrest or to survive nitrogen starvation and have low levels of amino acids. Remarkably, addition of glutamate or glutamine restores viability to a tor1 pep3 mutant strain. We conclude that Tor1 is more effective than Tor2 at providing rapamycin-sensitive Tor signaling under conditions of amino acid limitation, and that an intact class C Vps complex is required to mediate intracellular amino acid homeostasis for efficient Tor signaling. [Abstract/Link to Full Text]

Ren N, Charlton J, Adler PN
The flare gene, which encodes the AIP1 protein of Drosophila, functions to regulate F-actin disassembly in pupal epidermal cells.
Adult Drosophila are decorated with several types of polarized cuticular structures, such as hairs and bristles. The morphogenesis of these takes place in pupal cells and is mediated by the actin and microtubule cytoskeletons. Mutations in flare (flr) result in grossly abnormal epidermal hairs. We report here that flr encodes the Drosophila actin interacting protein 1 (AIP1). In other systems this protein has been found to promote cofilin-mediated F-actin disassembly. In Drosophila cofilin is encoded by twinstar (tsr). We show that flr mutations result in increased levels of F-actin accumulation and increased F-actin stability in vivo. Further, flr is essential for cell proliferation and viability and for the function of the frizzled planar cell polarity system. All of these phenotypes are similar to those seen for tsr mutations. This differs from the situation in yeast where cofilin is essential while aip1 mutations result in only subtle defects in the actin cytoskeleton. Surprisingly, we found that mutations in flr and tsr also result in greatly increased tubulin staining, suggesting a tight linkage between the actin and microtubule cytoskeleton in these cells. [Abstract/Link to Full Text]

Chabot A, Shrit RA, Blekhman R, Gilad Y
Using reporter gene assays to identify cis regulatory differences between humans and chimpanzees.
Most phenotypic differences between human and chimpanzee are likely to result from differences in gene regulation, rather than changes to protein-coding regions. To date, however, only a handful of human-chimpanzee nucleotide differences leading to changes in gene regulation have been identified. To hone in on differences in regulatory elements between human and chimpanzee, we focused on 10 genes that were previously found to be differentially expressed between the two species. We then designed reporter gene assays for the putative human and chimpanzee promoters of the 10 genes. Of seven promoters that we found to be active in human liver cell lines, human and chimpanzee promoters had significantly different activity in four cases, three of which recapitulated the gene expression difference seen in the microarray experiment. For these three genes, we were therefore able to demonstrate that a change in cis influences expression differences between humans and chimpanzees. Moreover, using site-directed mutagenesis on one construct, the promoter for the DDA3 gene, we were able to identify three nucleotides that together lead to a cis regulatory difference between the species. High-throughput application of this approach can provide a map of regulatory element differences between humans and our close evolutionary relatives. [Abstract/Link to Full Text]

Smith CA, Woloshuk CP, Robertson D, Payne GA
Silencing of the aflatoxin gene cluster in a diploid strain of Aspergillus flavus is suppressed by ectopic aflR expression.
Aflatoxins are toxic secondary metabolites produced by a 70-kb cluster of genes in Aspergillus flavus. The cluster genes are coordinately regulated and reside as a single copy within the genome. Diploids between a wild-type strain and a mutant (649) lacking the aflatoxin gene cluster fail to produce aflatoxin or transcripts of the aflatoxin pathway genes. This dominant phenotype is rescued in diploids between a wild-type strain and a transformant of the mutant containing an ectopic copy of aflR, the transcriptional regulator of the aflatoxin biosynthetic gene cluster. Further characterization of the mutant showed that it is missing 317 kb of chromosome III, including the known genes for aflatoxin biosynthesis. In addition, 939 kb of chromosome II is present as a duplication on chromosome III in the region previously containing the aflatoxin gene cluster. The lack of aflatoxin production in the diploid was not due to a unique or a mis-expressed repressor of aflR. Instead a form of reversible silencing based on the position of aflR is likely preventing the aflatoxin genes from being expressed in 649 x wild-type diploids. Gene expression analysis revealed the silencing effect is specific to the aflatoxin gene cluster. [Abstract/Link to Full Text]

Lake CM, Teeter K, Page SL, Nielsen R, Hawley RS
A genetic analysis of the Drosophila mcm5 gene defines a domain specifically required for meiotic recombination.
Members of the minichromosome maintenance (MCM) family have pivotal roles in many biological processes. Although originally studied for their role in DNA replication, it is becoming increasingly apparent that certain members of this family are multifunctional and also play roles in transcription, cohesion, condensation, and recombination. Here we provide a genetic dissection of the mcm5 gene in Drosophila that demonstrates an unexpected function for this protein. First, we show that homozygotes for a null allele of mcm5 die as third instar larvae, apparently as a result of blocking those replication events that lead to mitotic divisions without impairing endo-reduplication. However, we have also recovered a viable and fertile allele of mcm5 (denoted mcm5(A7)) that specifically impairs the meiotic recombination process. We demonstrate that the decrease in recombination observed in females homozygous for mcm5(A7) is not due to a failure to create or repair meiotically induced double strand breaks (DSBs), but rather to a failure to resolve those DSBs into meiotic crossovers. Consistent with their ability to repair meiotically induced DSBs, flies homozygous for mcm5(A7) are fully proficient in somatic DNA repair. These results strengthen the observation that members of the prereplicative complex have multiple functions and provide evidence that mcm5 plays a critical role in the meiotic recombination pathway. [Abstract/Link to Full Text]

Bentolila S, Elliott LE, Hanson MR
Genetic Architecture of Mitochondrial Editing in Arabidopsis thaliana.
We have analyzed the mitochondrial editing behavior of two Arabidopsis thaliana accessions, Landberg erecta (Ler) and Columbia (Col). A survey of 362 C-to-U editing sites in 33 mitochondrial genes was conducted on RNA extracted from rosette leaves. We detected 67 new editing events in A. thaliana rosette leaves that had not been observed in a prior study of mitochondrial editing in suspension cultures. Furthermore, 37 of the 441 C-to-U editing events reported in A. thaliana suspension cultures were not observed in rosette leaves. Forty editing sites that are polymorphic in extent of editing were detected between Col and Ler. Silent editing sites, which do not change the encoded amino acid, were found in a large excess compared to non-silent sites among the editing events that differed between accessions and between tissue types. Dominance relationships were assessed for 15 of the most polymorphic sites by evaluating the editing values of the reciprocal hybrids. Dominance is more common in non-silent sites than in silent sites, while additivity was only observed in silent sites. A maternal effect was detected for 8 sites. QTL mapping with recombinant inbred lines detected 12 major QTL for 11 of the 13 editing traits analyzed, demonstrating that efficiency of editing of individual mitochondrial C targets is generally governed by a major factor. [Abstract/Link to Full Text]

Chang SB, Anderson LK, Sherman JD, Royer SM, Stack SM
Predicting and testing physical locations of genetically mapped loci on tomato pachytene chromosome 1.
Predicting the chromosomal location of mapped markers has been difficult because linkage maps do not reveal differences in crossover frequencies along the physical structure of chromosomes. Here we combine a physical crossover map based on the distribution of recombination nodules (RNs) on Solanum lycopersicum (tomato) synaptonemal complex 1 with a molecular genetic linkage map from the interspecific hybrid S. lycopersicum x S. pennellii to predict the physical locations of 17 mapped loci on tomato pachytene chromosome 1. Except for one marker located in heterochromatin, the predicted locations agree well with the observed locations determined by fluorescence in situ hybridization. One advantage of this approach is that once the RN distribution has been determined, the chromosomal location of any mapped locus (current or future) can be predicted with a high level of confidence. [Abstract/Link to Full Text]

Stupar RM, Bhaskar PB, Yandell BS, Rensink WA, Hart AL, Ouyang S, Veilleux RE, Busse JS, Erhardt RJ, Buell CR, Jiang J
Phenotypic and transcriptomic changes associated with potato autopolyploidization.
Polyploidy is remarkably common in the plant kingdom and polyploidization is a major driving force for plant genome evolution. Polyploids may contain genomes from different parental species (allopolyploidy) or include multiple sets of the same genome (autopolyploidy). Genetic and epigenetic changes associated with allopolyploidization have been a major research subject in recent years. However, we know little about the genetic impact imposed by autopolyploidization. We developed a synthetic autopolyploid series in potato (Solanum phureja) that includes one monoploid (1x) clone, two diploid (2x) clones, and one tetraploid (4x) clone. Cell size and organ thickness were positively correlated with the ploidy level. However, the 2x plants were generally the most vigorous and the 1x plants exhibited less vigor compared to the 2x and 4x individuals. We analyzed the transcriptomic variation associated with this autopolyploid series using a potato cDNA microarray containing approximately 9000 genes. Statistically significant expression changes were observed among the ploidies for approximately 10% of the genes in both leaflet and root tip tissues. However, most changes were associated with the monoploid and were within the twofold level. Thus, alteration of ploidy caused subtle expression changes of a substantial percentage of genes in the potato genome. We demonstrated that there are few genes, if any, whose expression is linearly correlated with the ploidy and can be dramatically changed because of ploidy alteration. [Abstract/Link to Full Text]

Dinka SJ, Campbell MA, Demers T, Raizada MN
Predicting the size of the progeny mapping population required to positionally clone a gene.
A key frustration during positional gene cloning (map-based cloning) is that the size of the progeny mapping population is difficult to predict, because the meiotic recombination frequency varies along chromosomes. We describe a detailed methodology to improve this prediction using rice (Oryza sativa L.) as a model system. We derived and/or validated, then fine-tuned, equations that estimate the mapping population size by comparing these theoretical estimates to 41 successful positional cloning attempts. We then used each validated equation to test whether neighborhood meiotic recombination frequencies extracted from a reference RFLP map can help researchers predict the mapping population size. We developed a meiotic recombination frequency map (MRFM) for approximately 1400 marker intervals in rice and anchored each published allele onto an interval on this map. We show that neighborhood recombination frequencies (R-map, >280-kb segments) extracted from the MRFM, in conjunction with the validated formulas, better predicted the mapping population size than the genome-wide average recombination frequency (R-avg), with improved results whether the recombination frequency was calculated as genes/cM or kb/cM. Our results offer a detailed road map for better predicting mapping population size in diverse eukaryotes, but useful predictions will require robust recombination frequency maps based on sampling more progeny. [Abstract/Link to Full Text]

Rong J, Feltus FA, Waghmare VN, Pierce GJ, Chee PW, Draye X, Saranga Y, Wright RJ, Wilkins TA, May OL, Smith CW, Gannaway JR, Wendel JF, Paterson AH
Meta-analysis of polyploid cotton QTL shows unequal contributions of subgenomes to a complex network of genes and gene clusters implicated in lint fiber development.
QTL mapping experiments yield heterogeneous results due to the use of different genotypes, environments, and sampling variation. Compilation of QTL mapping results yields a more complete picture of the genetic control of a trait and reveals patterns in organization of trait variation. A total of 432 QTL mapped in one diploid and 10 tetraploid interspecific cotton populations were aligned using a reference map and depicted in a CMap resource. Early demonstrations that genes from the non-fiber-producing diploid ancestor contribute to tetraploid lint fiber genetics gain further support from multiple populations and environments and advanced-generation studies detecting QTL of small phenotypic effect. Both tetraploid subgenomes contribute QTL at largely non-homeologous locations, suggesting divergent selection acting on many corresponding genes before and/or after polyploid formation. QTL correspondence across studies was only modest, suggesting that additional QTL for the target traits remain to be discovered. Crosses between closely-related genotypes differing by single-gene mutants yield profoundly different QTL landscapes, suggesting that fiber variation involves a complex network of interacting genes. Members of the lint fiber development network appear clustered, with cluster members showing heterogeneous phenotypic effects. Meta-analysis linked to synteny-based and expression-based information provides clues about specific genes and families involved in QTL networks. [Abstract/Link to Full Text]

Gomez-Raya L, Okomo-Adhiambo M, Beattie C, Osborne K, Rink A, Rauw WM
Modeling inheritance of malignant melanoma with DNA markers in Sinclair swine.
Cutaneous malignant melanoma in Sinclair swine is a hereditary disease that develops in utero or during the first 6 weeks of life. In many cases, the tumors regress and piglets survive the disease. Two different sets of gene(s) might be involved in the disease: tumor initiator (suppressor) locus or loci and loci affecting the aggressiveness of the disease (number and stage of tumors). We develop maximum-likelihood methods for interval mapping for both types of loci. The experimental design consisted of a boar mated to tumor-bearing sows with recording of tumor status and number of tumors in the 6 weeks of life of the offspring. The model to search for the tumor initiator locus (with alleles T and t) was tested by computer simulation. Estimates of penetrances (Psi(TT) and Psi(Tt) for genotypes TT and Tt, respectively) were accurate even for small family sizes. Statistical power was >99% for a family size of 70 with Psi(TT) = 1 and Psi(Tt) = 0. The models to test for number of tumors incorporated genotype information for the tumor initiator locus. All models were tested with data from a single boar family of 72 piglets over swine chromosomes 6 and 8 (SSC6 and SSC8). No tumor evidence for initiator loci was found associated with these chromosomes. However, association of a QTL affecting number of tumors at birth near microsatellite SW1953 on SSC8 was chromosomewise significant (P<0.0124). [Abstract/Link to Full Text]

Kirchner J, Gross S, Bennett D, Alphey L
Essential, overlapping and redundant roles of the Drosophila protein phosphatase 1 alpha and 1 beta genes.
Protein serine/threonine phosphatase type 1 (PP1) has been found in all eukaryotes examined to date and is involved in the regulation of many cellular functions, including glycogen metabolism, muscle contraction, and mitosis. In Drosophila, four genes code for the catalytic subunit of PP1 (PP1c), three of which belong to the PP1 alpha subtype. PP1 beta 9C (flapwing) encodes the fourth PP1c gene and has a specific and nonredundant function as a nonmuscle myosin phosphatase. PP1 alpha 87B is the major form and contributes approximately 80% of the total PP1 activity. We describe the first mutant alleles of PP1 alpha 96A and show that PP1 alpha 96A is not an essential gene, but seems to have a function in the regulation of nonmuscle myosin. We show that overexpression of the PP1 alpha isozymes does not rescue semilethal PP1 beta 9C mutants, whereas overexpression of either PP1 alpha 96A or PP1 beta 9C does rescue a lethal PP1 alpha 87B mutant combination, showing that the lethality is due to a quantitative reduction in the level of PP1c. Overexpression of PP1 beta 9C does not rescue a PP1 alpha 87B, PP1 alpha 96A double mutant, suggesting an essential PP1 alpha-specific function in Drosophila. [Abstract/Link to Full Text]

Mulder KW, Inagaki A, Cameroni E, Mousson F, Winkler GS, De Virgilio C, Collart MA, Timmers HT
Modulation of Ubc4p/Ubc5p-mediated stress responses by the RING-finger-dependent ubiquitin-protein ligase Not4p in Saccharomyces cerevisiae.
The Ccr4-Not complex consists of nine subunits and acts as a regulator of mRNA biogenesis in Saccharomyces cerevisiae. The human ortholog of yeast NOT4, CNOT4, displays UbcH5B-dependent ubiquitin-protein ligase (E3 ligase) activity in a reconstituted in vitro system. However, an in vivo role for this enzymatic activity has not been identified. Site-directed mutagenesis of the RING finger of yeast Not4p identified residues required for interaction with Ubc4p and Ubc5p, the yeast orthologs of UbcH5B. Subsequent in vitro assays with purified Ccr4-Not complexes showed Not4p-mediated E3 ligase activity, which was dependent on the interaction with Ubc4p. To investigate the in vivo relevance of this activity, we performed synthetic genetic array (SGA) analyses using not4Delta and not4L35A alleles. This indicates involvement of the RING finger of Not4p in transcription, ubiquitylation, and DNA damage responses. In addition, we found a phenotypic overlap between deletions of UBC4 and mutants encoding single-amino-acid substitutions of the RING finger of Not4p. Together, our results show that Not4p functions as an E3 ligase by modulating Ubc4p/Ubc5p-mediated stress responses in vivo. [Abstract/Link to Full Text]

Takahata N
Molecular clock: an anti-neo-Darwinian legacy.
Li J, Harper LC, Golubovskaya I, Wang CR, Weber D, Meeley RB, McElver J, Bowen B, Cande WZ, Schnable PS
Functional analysis of maize RAD51 in meiosis and double-strand break repair.
In Saccharomyces cerevisiae, Rad51p plays a central role in homologous recombination and the repair of double-strand breaks (DSBs). Double mutants of the two Zea mays L. (maize) rad51 homologs are viable and develop well under normal conditions, but are male sterile and have substantially reduced seed set. Light microscopic analyses of male meiosis in these plants reveal reduced homologous pairing, synapsis of nonhomologous chromosomes, reduced bivalents at diakinesis, numerous chromosome breaks at anaphase I, and that >33% of quartets carry cells that either lack an organized nucleolus or have two nucleoli. This indicates that RAD51 is required for efficient chromosome pairing and its absence results in nonhomologous pairing and synapsis. These phenotypes differ from those of an Arabidopsis rad51 mutant that exhibits completely disrupted chromosome pairing and synapsis during meiosis. Unexpectedly, surviving female gametes produced by maize rad51 double mutants are euploid and exhibit near-normal rates of meiotic crossovers. The finding that maize rad51 double mutant embryos are extremely susceptible to radiation-induced DSBs demonstrates a conserved role for RAD51 in the repair of mitotic DSBs in plants, vertebrates, and yeast. [Abstract/Link to Full Text]

Nakao F, Hudson ML, Suzuki M, Peckler Z, Kurokawa R, Liu Z, Gengyo-Ando K, Nukazuka A, Fujii T, Suto F, Shibata Y, Shioi G, Fujisawa H, Mitani S, Chisholm AD, Takagi S
The PLEXIN PLX-2 and the ephrin EFN-4 have distinct roles in MAB-20/Semaphorin 2A signaling in Caenorhabditis elegans morphogenesis.
Semaphorins are extracellular proteins that regulate axon guidance and morphogenesis by interacting with a variety of cell surface receptors. Most semaphorins interact with plexin-containing receptor complexes, although some interact with non-plexin receptors. Class 2 semaphorins are secreted molecules that control axon guidance and epidermal morphogenesis in Drosophila and Caenorhabditis elegans. We show that the C. elegans class 2 semaphorin MAB-20 binds the plexin PLX-2. plx-2 mutations enhance the phenotypes of hypomorphic mab-20 alleles but not those of mab-20 null alleles, indicating that plx-2 and mab-20 act in a common pathway. Both mab-20 and plx-2 mutations affect epidermal morphogenesis during embryonic and in postembryonic development. In both contexts, plx-2 null mutant phenotypes are much less severe than mab-20 null phenotypes, indicating that PLX-2 is not essential for MAB-20 signaling. Mutations in the ephrin efn-4 do not synergize with mab-20, indicating that EFN-4 may act in MAB-20 signaling. EFN-4 and PLX-2 are coexpressed in the late embryonic epidermis where they play redundant roles in MAB-20-dependent cell sorting. [Abstract/Link to Full Text]

Zak M, Baierl A, Bogdan M, Futschik A
Locating multiple interacting quantitative trait Loci using rank-based model selection.
In previous work, a modified version of the Bayesian information criterion (mBIC) was proposed to locate multiple interacting quantitative trait loci (QTL). Simulation studies and real data analysis demonstrate good properties of the mBIC in situations where the error distribution is approximately normal. However, as with other standard techniques of QTL mapping, the performance of the mBIC strongly deteriorates when the trait distribution is heavy tailed or when the data contain a significant proportion of outliers. In the present article, we propose a suitable robust version of the mBIC that is based on ranks. We investigate the properties of the resulting method on the basis of theoretical calculations, computer simulations, and a real data analysis. Our simulation results show that for the sample sizes typically used in QTL mapping, the methods based on ranks are almost as efficient as standard techniques when the data are normal and are much better when the data come from some heavy-tailed distribution or include a proportion of outliers. [Abstract/Link to Full Text]

Barbosa V, Kimm N, Lehmann R
A maternal screen for genes regulating Drosophila oocyte polarity uncovers new steps in meiotic progression.
Meiotic checkpoints monitor chromosome status to ensure correct homologous recombination, genomic integrity, and chromosome segregation. In Drosophila, the persistent presence of double-strand DNA breaks (DSB) activates the ATR/Mei-41 checkpoint, delays progression through meiosis, and causes defects in DNA condensation of the oocyte nucleus, the karyosome. Checkpoint activation has also been linked to decreased levels of the TGFalpha-like molecule Gurken, which controls normal eggshell patterning. We used this easy-to-score eggshell phenotype in a germ-line mosaic screen in Drosophila to identify new genes affecting meiotic progression, DNA condensation, and Gurken signaling. One hundred eighteen new ventralizing mutants on the second chromosome fell into 17 complementation groups. Here we describe the analysis of 8 complementation groups, including Kinesin heavy chain, the SR protein kinase cuaba, the cohesin-related gene dPds5/cohiba, and the Tudor-domain gene montecristo. Our findings challenge the hypothesis that checkpoint activation upon persistent DSBs is exclusively mediated by ATR/Mei-41 kinase and instead reveal a more complex network of interactions that link DSB formation, checkpoint activation, meiotic delay, DNA condensation, and Gurken protein synthesis. [Abstract/Link to Full Text]

McVey M, Andersen SL, Broze Y, Sekelsky J
Multiple functions of Drosophila BLM helicase in maintenance of genome stability.
Bloom Syndrome, a rare human disorder characterized by genomic instability and predisposition to cancer, is caused by mutation of BLM, which encodes a RecQ-family DNA helicase. The Drosophila melanogaster ortholog of BLM, DmBlm, is encoded by mus309. Mutations in mus309 cause hypersensitivity to DNA-damaging agents, female sterility, and defects in repairing double-strand breaks (DSBs). To better understand these phenotypes, we isolated novel mus309 alleles. Mutations that delete the N terminus of DmBlm, but not the helicase domain, have DSB repair defects as severe as those caused by null mutations. We found that female sterility is due to a requirement for DmBlm in early embryonic cell cycles; embryos lacking maternally derived DmBlm have anaphase bridges and other mitotic defects. These defects were less severe for the N-terminal deletion alleles, so we used one of these mutations to assay meiotic recombination. Crossovers were decreased to about half the normal rate, and the remaining crossovers were evenly distributed along the chromosome. We also found that spontaneous mitotic crossovers are increased by several orders of magnitude in mus309 mutants. These results demonstrate that DmBlm functions in multiple cellular contexts to promote genome stability. [Abstract/Link to Full Text]

Bowles EJ, Lee JH, Alberio R, Lloyd RE, Stekel D, Campbell KH, St John JC
Contrasting effects of in vitro fertilization and nuclear transfer on the expression of mtDNA replication factors.
Mitochondrial DNA (mtDNA) is normally only inherited through the oocyte. However, nuclear transfer (NT), the fusion of a donor cell with an enucleated oocyte, can transmit both donor cell and recipient oocyte mtDNA. mtDNA replication is under the control of nuclear-encoded replication factors, such as polymerase gamma (POLG) and mitochondrial transcription factor A (TFAM). These are first expressed during late preimplantation embryo development. To account for the persistence of donor cell mtDNA, even when introduced at residual levels (mtDNA(R)), we hypothesized that POLG and TFAM would be upregulated in intra- and interspecific (ovine-ovine) and intergeneric (caprine-ovine) NT embryos when compared to in vitro fertilized (IVF) embryos. For the intra- and interspecific crosses, PolGA (catalytic subunit), PolGB (accessory subunit), and TFAM mRNA were expressed at the 2-cell stage in both nondepleted (mtDNA(+)) and mtDNA(R) embryos with protein being expressed up to the 16-cell stage for POLGA and TFAM. However, at the 16-cell stage, there was significantly more PolGA expression in the mtDNA(R) embryos compared to their mtDNA(+) counterparts. Expression for all three genes first matched IVF embryos at the blastocyst stage. In the intergeneric model, POLG was upregulated during preimplantation development. Although these embryos did not persist further than the 16+-cell stage, significantly more mtDNA(R) embryos reached this stage. However, the vast majority of these embryos were homoplasmic for recipient oocyte mtDNA. The upreglation in mtDNA replication factors was most likely due to the donor cells still expressing these factors prior to NT. [Abstract/Link to Full Text]

Brandström M, Ellegren H
The genomic landscape of short insertion and deletion polymorphisms in the chicken (Gallus gallus) Genome: a high frequency of deletions in tandem duplicates.
It is increasingly recognized that insertions and deletions (indels) are an important source of genetic as well as phenotypic divergence and diversity. We analyzed length polymorphisms identified through partial (0.25x) shotgun sequencing of three breeds of domestic chicken made by the International Chicken Polymorphism Map Consortium. A data set of 140,484 short indel polymorphisms in unique DNA was identified after filtering for microsatellite structures. There was a significant excess of tandem duplicates at indel sites, with deletions of a duplicate motif outnumbering the generation of duplicates through insertion. Indel density was lower in microchromosomes than in macrochromosomes, in the Z chromosome than in autosomes, and in 100 bp of upstream sequence, 5'-UTR, and first introns than in intergenic DNA and in other introns. Indel density was highly correlated with single nucleotide polymorphism (SNP) density. The mean density of indels in pairwise sequence comparisons was 1.9 x 10(-4) indel events/bp, approximately 5% the density of SNPs segregating in the chicken genome. The great majority of indels involved a limited number of nucleotides (median 1 bp), with A-rich motifs being overrepresented at indel sites. The overrepresentation of deletions at tandem duplicates indicates that replication slippage in duplicate sequences is a common mechanism behind indel mutation. The correlation between indel and SNP density indicates common effects of mutation and/or selection on the occurrence of indels and point mutations. [Abstract/Link to Full Text]

Yi N, Banerjee S, Pomp D, Yandell BS
Bayesian mapping of genomewide interacting quantitative trait loci for ordinal traits.
Development of statistical methods and software for mapping interacting QTL has been the focus of much recent research. We previously developed a Bayesian model selection framework, based on the composite model space approach, for mapping multiple epistatic QTL affecting continuous traits. In this study we extend the composite model space approach to complex ordinal traits in experimental crosses. We jointly model main and epistatic effects of QTL and environmental factors on the basis of the ordinal probit model (also called threshold model) that assumes a latent continuous trait underlies the generation of the ordinal phenotypes through a set of unknown thresholds. A data augmentation approach is developed to jointly generate the latent data and the thresholds. The proposed ordinal probit model, combined with the composite model space framework for continuous traits, offers a convenient way for genomewide interacting QTL analysis of ordinal traits. We illustrate the proposed method by detecting new QTL and epistatic effects for an ordinal trait, dead fetuses, in a F(2) intercross of mice. Utility and flexibility of the method are also demonstrated using a simulated data set. Our method has been implemented in the freely available package R/qtlbim, which greatly facilitates the general usage of the Bayesian methodology for genomewide interacting QTL analysis for continuous, binary, and ordinal traits in experimental crosses. [Abstract/Link to Full Text]

Maside X, Charlesworth B
Patterns of molecular variation and evolution in Drosophila americana and its relatives.
We present the results of a survey of DNA sequence variability at X-linked and autosomal loci in Drosophila americana and of patterns of DNA sequence evolution among D. americana and four other related species in the virilis group of Drosophila. D. americana shows a typical level of silent polymorphism for a Drosophila species, but has an unusually low ratio of nonsynonymous to silent variation. Both D. virilis and D. americana also show a low ratio of nonsynonymous to synonymous substitutions along their respective lineages since the split from their common ancestor. The proportion of amino acid substitutions between D. americana and its relatives that are caused by positive selection, as estimated by extensions of the McDonald-Kreitman test, appears to be unusually high. We cannot, however, exclude the possibility that this reflects a recent increase in the intensity of selection on nonsynonymous mutations in D. americana and D. virilis. We also find that base composition at neutral sites appears to be in overall equilibrium among these species, but there is evidence for departure from equilibrium for codon usage in some lineages. [Abstract/Link to Full Text]

Marińo-Ramírez L, Jordan IK, Landsman D
Multiple independent evolutionary solutions to core histone gene regulation.
BACKGROUND: Core histone genes are periodically expressed along the cell cycle and peak during S phase. Core histone gene expression is deeply evolutionarily conserved from the yeast Saccharomyces cerevisiae to human. RESULTS: We evaluated the evolutionary dynamics of the specific regulatory mechanisms that give rise to the conserved histone regulatory phenotype. In contrast to the conservation of core histone gene expression patterns, the core histone regulatory machinery is highly divergent between species. There has been substantial evolutionary turnover of cis-regulatory sequence motifs along with the transcription factors that bind them. The regulatory mechanisms employed by members of the four core histone families are more similar within species than within gene families. The presence of species-specific histone regulatory mechanisms is opposite to what is seen at the protein sequence level. Core histone proteins are more similar within families, irrespective of their species of origin, than between families, which is consistent with the shared common ancestry of the members of individual histone families. Structure and sequence comparisons between histone families reveal that H2A and H2B form one related group whereas H3 and H4 form a distinct group, which is consistent with the nucleosome assembly dynamics. CONCLUSION: The dissonance between the evolutionary conservation of the core histone gene regulatory phenotypes and the divergence of their regulatory mechanisms indicates a highly dynamic mode of regulatory evolution. This distinct mode of regulatory evolution is probably facilitated by a solution space for promoter sequences, in terms of functionally viable cis-regulatory sites, that is substantially greater than that of protein sequences. [Abstract/Link to Full Text]

Moon H, Ahn H, Kodell RL, Lin CJ, Baek S, Chen JJ
Classification methods for the development of genomic signatures from high-dimensional data.
Personalized medicine is defined by the use of genomic signatures of patients to assign effective therapies. We present Classification by Ensembles from Random Partitions (CERP) for class prediction and apply CERP to genomic data on leukemia patients and to genomic data with several clinical variables on breast cancer patients. CERP performs consistently well compared to the other classification algorithms. The predictive accuracy can be improved by adding some relevant clinical/histopathological measurements to the genomic data. [Abstract/Link to Full Text]

Sironi M, Menozzi G, Comi GP, Cereda M, Cagliani R, Bresolin N, Pozzoli U
Gene function and expression level influence the insertion/fixation dynamics of distinct transposon families in mammalian introns.
BACKGROUND: Transposable elements (TEs) represent more than 45% of the human and mouse genomes. Both parasitic and mutualistic features have been shown to apply to the host-TE relationship but a comprehensive scenario of the forces driving TE fixation within mammalian genes is still missing. RESULTS: We show that intronic multispecies conserved sequences (MCSs) have been affecting TE integration frequency over time. We verify that a selective economizing pressure has been acting on TEs to decrease their frequency in highly expressed genes. After correcting for GC content, MCS density and intron size, we identified TE-enriched and TE-depleted gene categories. In addition to developmental regulators and transcription factors, TE-depleted regions encompass loci that might require subtle regulation of transcript levels or precise activation timing, such as growth factors, cytokines, hormones, and genes involved in the immune response. The latter, despite having reduced frequencies of most TE types, are significantly enriched in mammalian-wide interspersed repeats (MIRs). Analysis of orthologous genes indicated that MIR over-representation also occurs in dog and opossum immune response genes, suggesting, given the partially independent origin of MIR sequences in eutheria and metatheria, the evolutionary conservation of a specific function for MIRs located in these loci. Consistently, the core MIR sequence is over-represented in defense response genes compared to the background intronic frequency. CONCLUSION: Our data indicate that gene function, expression level, and sequence conservation influence TE insertion/fixation in mammalian introns. Moreover, we provide the first report showing that a specific TE family is evolutionarily associated with a gene function category. [Abstract/Link to Full Text]

Mar JC, Rubio R, Quackenbush J
Inferring steady state single-cell gene expression distributions from analysis of mesoscopic samples.
BACKGROUND: A great deal of interest has been generated by systems biology approaches that attempt to develop quantitative, predictive models of cellular processes. However, the starting point for all cellular gene expression, the transcription of RNA, has not been described and measured in a population of living cells. RESULTS: Here we present a simple model for transcript levels based on Poisson statistics and provide supporting experimental evidence for genes known to be expressed at high, moderate, and low levels. CONCLUSION: Although the model describes a microscopic process occurring at the level of an individual cell, the supporting data we provide uses a small number of cells where the echoes of the underlying stochastic processes can be seen. Not only do these data confirm our model, but this general strategy opens up a potential new approach, Mesoscopic Biology, that can be used to assess the natural variability of processes occurring at the cellular level in biological systems. [Abstract/Link to Full Text]

Kawaji H, Frith MC, Katayama S, Sandelin A, Kai C, Kawai J, Carninci P, Hayashizaki Y
Dynamic usage of transcription start sites within core promoters.
BACKGROUND: Mammalian promoters do not initiate transcription at single, well defined base pairs, but rather at multiple, alternative start sites spread across a region. We previously characterized the static structures of transcription start site usage within promoters at the base pair level, based on large-scale sequencing of transcript 5' ends. RESULTS: In the present study we begin to explore the internal dynamics of mammalian promoters, and demonstrate that start site selection within many mouse core promoters varies among tissues. We also show that this dynamic usage of start sites is associated with CpG islands, broad and multimodal promoter structures, and imprinting. CONCLUSION: Our results reveal a new level of biologic complexity within promoters--fine-scale regulation of transcription starting events at the base pair level. These events are likely to be related to epigenetic transcriptional regulation. [Abstract/Link to Full Text]

Roepman P, de Koning E, van Leenen D, de Weger RA, Kummer JA, Slootweg PJ, Holstege FC
Dissection of a metastatic gene expression signature into distinct components.
BACKGROUND: Metastasis, the process whereby cancer cells spread, is in part caused by an incompletely understood interplay between cancer cells and the surrounding stroma. Gene expression studies typically analyze samples containing tumor cells and stroma. Samples with less than 50% tumor cells are generally excluded, thereby reducing the number of patients that can benefit from clinically relevant signatures. RESULTS: For a head-neck squamous cell carcinoma (HNSCC) primary tumor expression signature that predicts the presence of lymph node metastasis, we first show that reduced proportions of tumor cells results in decreased predictive accuracy. To determine the influence of stroma on the predictive signature and to investigate the interaction between tumor cells and the surrounding microenvironment, we used laser capture microdissection to divide the metastatic signature into six distinct components based on tumor versus stroma expression and on association with the metastatic phenotype. A strikingly skewed distribution of metastasis associated genes is revealed. CONCLUSION: Dissection of predictive signatures into different components has implications for design of expression signatures and for our understanding of the metastatic process. Compared to primary tumors that have not formed metastases, primary HNSCC tumors that have metastasized are characterized by predominant down-regulation of tumor cell specific genes and exclusive up-regulation of stromal cell specific genes. The skewed distribution agrees with poor signature performance on samples that contain less than 50% tumor cells. Methods for reducing tumor composition bias that lead to greater predictive accuracy and an increase in the types of samples that can be included are presented. [Abstract/Link to Full Text]

Taniguchi Y, Takeda S, Furutani-Seiki M, Kamei Y, Todo T, Sasado T, Deguchi T, Kondoh H, Mudde J, Yamazoe M, Hidaka M, Mitani H, Toyoda A, Sakaki Y, Plasterk RH, Cuppen E
Generation of medaka gene knockout models by target-selected mutagenesis.
We have established a reverse genetics approach for the routine generation of medaka (Oryzias latipes) gene knockouts. A cryopreserved library of N-ethyl-N-nitrosourea (ENU) mutagenized fish was screened by high-throughput resequencing for induced point mutations. Nonsense and splice site mutations were retrieved for the Blm, Sirt1, Parkin and p53 genes and functional characterization of p53 mutants indicated a complete knockout of p53 function. The current cryopreserved resource is expected to contain knockouts for most medaka genes. [Abstract/Link to Full Text]

Ruiz-Herrera A, Castresana J, Robinson TJ
Is mammalian chromosomal evolution driven by regions of genome fragility?
BACKGROUND: A fundamental question in comparative genomics concerns the identification of mechanisms that underpin chromosomal change. In an attempt to shed light on the dynamics of mammalian genome evolution, we analyzed the distribution of syntenic blocks, evolutionary breakpoint regions, and evolutionary breakpoints taken from public databases available for seven eutherian species (mouse, rat, cattle, dog, pig, cat, and horse) and the chicken, and examined these for correspondence with human fragile sites and tandem repeats. RESULTS: Our results confirm previous investigations that showed the presence of chromosomal regions in the human genome that have been repeatedly used as illustrated by a high breakpoint accumulation in certain chromosomes and chromosomal bands. We show, however, that there is a striking correspondence between fragile site location, the positions of evolutionary breakpoints, and the distribution of tandem repeats throughout the human genome, which similarly reflect a non-uniform pattern of occurrence. CONCLUSION: These observations provide further evidence that certain chromosomal regions in the human genome have been repeatedly used in the evolutionary process. As a consequence, the genome is a composite of fragile regions prone to reorganization that have been conserved in different lineages, and genomic tracts that do not exhibit the same levels of evolutionary plasticity. [Abstract/Link to Full Text]

Willenbrock H, Friis C, Juncker AS, Ussery DW
An environmental signature for 323 microbial genomes based on codon adaptation indices.
BACKGROUND: Codon adaptation indices (CAIs) represent an evolutionary strategy to modulate gene expression and have widely been used to predict potentially highly expressed genes within microbial genomes. Here, we evaluate and compare two very different methods for estimating CAI values, one corresponding to translational codon usage bias and the second obtained mathematically by searching for the most dominant codon bias. RESULTS: The level of correlation between these two CAI methods is a simple and intuitive measure of the degree of translational bias in an organism, and from this we confirm that fast replicating bacteria are more likely to have a dominant translational codon usage bias than are slow replicating bacteria, and that this translational codon usage bias may be used for prediction of highly expressed genes. By analyzing more than 300 bacterial genomes, as well as five fungal genomes, we show that codon usage preference provides an environmental signature by which it is possible to group bacteria according to their lifestyle, for instance soil bacteria and soil symbionts, spore formers, enteric bacteria, aquatic bacteria, and intercellular and extracellular pathogens. CONCLUSION: The results and the approach described here may be used to acquire new knowledge regarding species lifestyle and to elucidate relationships between organisms that are far apart evolutionarily. [Abstract/Link to Full Text]

Gama-Carvalho M, Barbosa-Morais NL, Brodsky AS, Silver PA, Carmo-Fonseca M
Genome-wide identification of functionally distinct subsets of cellular mRNAs associated with two nucleocytoplasmic-shuttling mammalian splicing factors.
BACKGROUND: Pre-mRNA splicing is an essential step in gene expression that occurs co-transcriptionally in the cell nucleus, involving a large number of RNA binding protein splicing factors, in addition to core spliceosome components. Several of these proteins are required for the recognition of intronic sequence elements, transiently associating with the primary transcript during splicing. Some protein splicing factors, such as the U2 small nuclear RNP auxiliary factor (U2AF), are known to be exported to the cytoplasm, despite being implicated solely in nuclear functions. This observation raises the question of whether U2AF associates with mature mRNA-ribonucleoprotein particles in transit to the cytoplasm, participating in additional cellular functions. RESULTS: Here we report the identification of RNAs immunoprecipitated by a monoclonal antibody specific for the U2AF 65 kDa subunit (U2AF65) and demonstrate its association with spliced mRNAs. For comparison, we analyzed mRNAs associated with the polypyrimidine tract binding protein (PTB), a splicing factor that also binds to intronic pyrimidine-rich sequences but additionally participates in mRNA localization, stability, and translation. Our results show that 10% of cellular mRNAs expressed in HeLa cells associate differentially with U2AF65 and PTB. Among U2AF65-associated mRNAs there is a predominance of transcription factors and cell cycle regulators, whereas PTB-associated transcripts are enriched in mRNA species that encode proteins implicated in intracellular transport, vesicle trafficking, and apoptosis. CONCLUSION: Our results show that U2AF65 associates with specific subsets of spliced mRNAs, strongly suggesting that it is involved in novel cellular functions in addition to splicing. [Abstract/Link to Full Text]

Bergman CM, Quesneville H, Anxolabéhčre D, Ashburner M
Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome.
BACKGROUND: The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences. RESULTS: Using a high-resolution annotation of TEs in Release 4 genome sequence, we revise estimates of TE abundance in Drosophila melanogaster. We show that TEs are non-randomly distributed within regions of high and low TE abundance, and that pericentromeric regions with high TE abundance are mosaics of distinct regions of extreme and normal TE density. Comparative analysis revealed that this punctate pattern evolves jointly by transposition and duplication, but not by inversion of TE-rich regions from unsequenced heterochromatin. Analysis of genome-wide patterns of TE nesting revealed a 'nesting network' that includes virtually all of the known TE families in the genome. Numerous directed cycles exist among TE families in the nesting network, implying concurrent or overlapping periods of transpositional activity. CONCLUSION: Rapid restructuring of the genomic landscape by transposition and duplication has recently added hundreds of kilobases of TE sequence to pericentromeric regions in D. melanogaster. These events create ragged transitions between unique and repetitive sequences in the zone between euchromatic and beta-heterochromatic regions. Complex relationships of TE nesting in beta-heterochromatic regions raise the possibility of a co-suppression network that may act as a global surveillance system against the majority of TE families in D. melanogaster. [Abstract/Link to Full Text]

Schlueter SD, Wilkerson MD, Dong Q, Brendel V
xGDB: open-source computational infrastructure for the integrated evaluation and analysis of genome features.
The eXtensible Genome Data Broker (xGDB) provides a software infrastructure consisting of integrated tools for the storage, display, and analysis of genome features in their genomic context. Common features include gene structure annotations, spliced alignments, mapping of repetitive sequence, and microarray probes, but the software supports inclusion of any property that can be associated with a genomic location. The xGDB distribution and user support utilities are available online at the xGDB project website, [Abstract/Link to Full Text]

Zhu X, Gerstein M, Snyder M
ProCAT: a data analysis approach for protein microarrays.
Protein microarrays provide a versatile method for the analysis of many protein biochemical activities. Existing DNA microarray analytical methods do not translate to protein microarrays due to differences between the technologies. Here we report a new approach, ProCAT, which corrects for background bias and spatial artifacts, identifies significant signals, filters nonspecific spots, and normalizes the resulting signal to protein abundance. ProCAT provides a powerful and flexible new approach for analyzing many types of protein microarrays. [Abstract/Link to Full Text]

Freimoser FM, Hürlimann HC, Jakob CA, Werner TP, Amrhein N
Systematic screening of polyphosphate (poly P) levels in yeast mutant cells reveals strong interdependence with primary metabolism.
BACKGROUND: Inorganic polyphosphate (poly P) occurs universally in all organisms from bacteria to man. It functions, for example, as a phosphate and energy store, and is involved in the activation and regulation of proteins. Despite its ubiquitous occurrence and important functions, it is unclear how poly P is synthesized or how poly P metabolism is regulated in higher eukaryotes. This work describes a systematic analysis of poly P levels in yeast knockout strains mutated in almost every non-essential gene. RESULTS: After three consecutive screens, 255 genes (almost 4% of the yeast genome) were found to be involved in the maintenance of normal poly P content. Many of these genes encoded proteins functioning in the cytoplasm, the vacuole or in transport and transcription. Besides reduced poly P content, many strains also exhibited reduced total phosphate content, showed altered ATP and glycogen levels and were disturbed in the secretion of acid phosphatase. CONCLUSION: Cellular energy and phosphate homeostasis is suggested to result from the equilibrium between poly P, ATP and free phosphate within the cell. Poly P serves as a buffer for both ATP and free phosphate levels and is, therefore, the least essential and consequently most variable component in this network. However, strains with reduced poly P levels are not only affected in their ATP and phosphate content, but also in other components that depend on ATP or free phosphate content, such as glycogen or secreted phosphatase activity. [Abstract/Link to Full Text]

David H, Hofmann G, Oliveira AP, Jarmer H, Nielsen J
Metabolic network driven analysis of genome-wide transcription data from Aspergillus nidulans.
BACKGROUND: Aspergillus nidulans (the asexual form of Emericella nidulans) is a model organism for aspergilli, which are an important group of filamentous fungi that encompasses human and plant pathogens as well as industrial cell factories. Aspergilli have a highly diversified metabolism and, because of their medical, agricultural and biotechnological importance, it would be valuable to have an understanding of how their metabolism is regulated. We therefore conducted a genome-wide transcription analysis of A. nidulans grown on three different carbon sources (glucose, glycerol, and ethanol) with the objective of identifying global regulatory structures. Furthermore, we reconstructed the complete metabolic network of this organism, which resulted in linking 666 genes to metabolic functions, as well as assigning metabolic roles to 472 genes that were previously uncharacterized. RESULTS: Through combination of the reconstructed metabolic network and the transcription data, we identified subnetwork structures that pointed to coordinated regulation of genes that are involved in many different parts of the metabolism. Thus, for a shift from glucose to ethanol, we identified coordinated regulation of the complete pathway for oxidation of ethanol, as well as upregulation of gluconeogenesis and downregulation of glycolysis and the pentose phosphate pathway. Furthermore, on change in carbon source from glucose to ethanol, the cells shift from using the pentose phosphate pathway as the major source of NADPH (nicotinamide adenine dinucleotide phosphatase, reduced form) for biosynthesis to use of the malic enzyme. CONCLUSION: Our analysis indicates that some of the genes are regulated by common transcription factors, making it possible to establish new putative links between known transcription factors and genes through clustering. [Abstract/Link to Full Text]

Regenberg B, Grotkjaer T, Winther O, Fausbřll A, Akesson M, Bro C, Hansen LK, Brunak S, Nielsen J
Growth-rate regulated genes have profound impact on interpretation of transcriptome profiling in Saccharomyces cerevisiae.
BACKGROUND: Growth rate is central to the development of cells in all organisms. However, little is known about the impact of changing growth rates. We used continuous cultures to control growth rate and studied the transcriptional program of the model eukaryote Saccharomyces cerevisiae, with generation times varying between 2 and 35 hours. RESULTS: A total of 5930 transcripts were identified at the different growth rates studied. Consensus clustering of these revealed that half of all yeast genes are affected by the specific growth rate, and that the changes are similar to those found when cells are exposed to different types of stress (>80% overlap). Genes with decreased transcript levels in response to faster growth are largely of unknown function (>50%) whereas genes with increased transcript levels are involved in macromolecular biosynthesis such as those that encode ribosomal proteins. This group also covers most targets of the transcriptional activator RAP1, which is also known to be involved in replication. A positive correlation between the location of replication origins and the location of growth-regulated genes suggests a role for replication in growth rate regulation. CONCLUSION: Our data show that the cellular growth rate has great influence on transcriptional regulation. This, in turn, implies that one should be cautious when comparing mutants with different growth rates. Our findings also indicate that much of the regulation is coordinated via the chromosomal location of the affected genes, which may be valuable information for the control of heterologous gene expression in metabolic engineering. [Abstract/Link to Full Text]

King NL, Deutsch EW, Ranish JA, Nesvizhskii AI, Eddes JS, Mallick P, Eng J, Desiere F, Flory M, Martin DB, Kim B, Lee H, Raught B, Aebersold R
Analysis of the Saccharomyces cerevisiae proteome with PeptideAtlas.
We present the Saccharomyces cerevisiae PeptideAtlas composed from 47 diverse experiments and 4.9 million tandem mass spectra. The observed peptides align to 61% of Saccharomyces Genome Database (SGD) open reading frames (ORFs), 49% of the uncharacterized SGD ORFs, 54% of S. cerevisiae ORFs with a Gene Ontology annotation of 'molecular function unknown', and 76% of ORFs with Gene names. We highlight the use of this resource for data mining, construction of high quality lists for targeted proteomics, validation of proteins, and software development. [Abstract/Link to Full Text]

Hadley D, Murphy T, Valladares O, Hannenhalli S, Ungar L, Kim J, Bu?an M
Patterns of sequence conservation in presynaptic neural genes.
BACKGROUND: The neuronal synapse is a fundamental functional unit in the central nervous system of animals. Because synaptic function is evolutionarily conserved, we reasoned that functional sequences of genes and related genomic elements known to play important roles in neurotransmitter release would also be conserved. RESULTS: Evolutionary rate analysis revealed that presynaptic proteins evolve slowly, although some members of large gene families exhibit accelerated evolutionary rates relative to other family members. Comparative sequence analysis of 46 megabases spanning 150 presynaptic genes identified more than 26,000 elements that are highly conserved in eight vertebrate species, as well as a small subset of sequences (6%) that are shared among unrelated presynaptic genes. Analysis of large gene families revealed that upstream and intronic regions of closely related family members are extremely divergent. We also identified 504 exceptionally long conserved elements (> or =360 base pairs, > or =80% pair-wise identity between human and other mammals) in intergenic and intronic regions of presynaptic genes. Many of these elements form a highly stable stem-loop RNA structure and consequently are candidates for novel regulatory elements, whereas some conserved noncoding elements are shown to correlate with specific gene expression profiles. The SynapseDB online database integrates these findings and other functional genomic resources for synaptic genes. CONCLUSION: Highly conserved elements in nonprotein coding regions of 150 presynaptic genes represent sequences that may be involved in the transcriptional or post-transcriptional regulation of these genes. Furthermore, comparative sequence analysis will facilitate selection of genes and noncoding sequences for future functional studies and analysis of variation studies in neurodevelopmental and psychiatric disorders. [Abstract/Link to Full Text]

Guimarăes KS, Jothi R, Zotenko E, Przytycka TM
Predicting domain-domain interactions using a parsimony approach.
We propose a novel approach to predict domain-domain interactions from a protein-protein interaction network. In our method we apply a parsimony-driven explanation of the network, where the domain interactions are inferred using linear programming optimization, and false positives in the protein network are handled by a probabilistic construction. This method outperforms previous approaches by a considerable margin. The results indicate that the parsimony principle provides a correct approach for detecting domain-domain contacts. [Abstract/Link to Full Text]

Vandepoele K, Casneuf T, Van de Peer Y
Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics.
BACKGROUND: Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation. RESULTS: Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other. CONCLUSION: These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view. [Abstract/Link to Full Text]

Wang LY, Snyder M, Gerstein M
BoCaTFBS: a boosted cascade learner to refine the binding sites suggested by ChIP-chip experiments.
Comprehensive mapping of transcription factor binding sites is essential in postgenomic biology. For this, we propose a mining approach combining noisy data from ChIP (chromatin immunoprecipitation)-chip experiments with known binding site patterns. Our method (BoCaTFBS) uses boosted cascades of classifiers for optimum efficiency, in which components are alternating decision trees; it exploits interpositional correlations; and it explicitly integrates massive negative information from ChIP-chip experiments. We applied BoCaTFBS within the ENCODE project and showed that it outperforms many traditional binding site identification methods (for instance, profiles). [Abstract/Link to Full Text]

Teschendorff AE, Naderi A, Barbosa-Morais NL, Pinder SE, Ellis IO, Aparicio S, Brenton JD, Caldas C
A consensus prognostic gene expression classifier for ER positive breast cancer.
BACKGROUND: A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer. RESULTS: Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation. CONCLUSION: The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors. [Abstract/Link to Full Text]

Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, Friman O, Guertin DA, Chang JH, Lindquist RA, Moffat J, Golland P, Sabatini DM
CellProfiler: image analysis software for identifying and quantifying cell phenotypes.
ABSTRACT : Biologists can now prepare and image thousands of samples per day using automation, enabling chemical screens and functional genomics (for example, using RNA interference). Here we describe the first free, open-source system designed for flexible, high-throughput cell image analysis, CellProfiler. CellProfiler can address a variety of biological questions quantitatively, including standard assays (for example, cell count, size, per-cell protein levels) and complex morphological assays (for example, cell/organelle shape or subcellular patterns of DNA or protein staining). [Abstract/Link to Full Text]

Andersson AF, Lundgren M, Eriksson S, Rosenlund M, Bernander R, Nilsson P
Global analysis of mRNA stability in the archaeon Sulfolobus.
BACKGROUND: Transcript half-lives differ between organisms, and between groups of genes within the same organism. The mechanisms underlying these differences are not clear, nor are the biochemical properties that determine the stability of a transcript. To address these issues, genome-wide mRNA decay studies have been conducted in eukaryotes and bacteria. In contrast, relatively little is known about RNA stability in the third domain of life, Archaea. Here, we present a microarray-based analysis of mRNA half-lives in the hyperthermophilic crenarchaea Sulfolobus solfataricus and Sulfolobus acidocaldarius, constituting the first genome-wide study of RNA decay in archaea. RESULTS: The two transcriptomes displayed similar half-life distributions, with medians of about five minutes. Growth-related genes, such as those involved in transcription, translation and energy production, were over-represented among unstable transcripts, whereas uncharacterized genes were over-represented among the most stable. Half-life was negatively correlated with transcript abundance and, unlike the situation in other organisms, also negatively correlated with transcript length. CONCLUSION: The mRNA half-life distribution of Sulfolobus species is similar to those of much faster growing bacteria, contrasting with the earlier observation that median mRNA half-life is proportional to the minimal length of the cell cycle. Instead, short half-lives may be a general feature of prokaryotic transcriptomes, possibly related to the absence of a nucleus and/or more limited post-transcriptional regulatory mechanisms. The pattern of growth-related transcripts being among the least stable in Sulfolobus may also indicate that the short half-lives reflect a necessity to rapidly reprogram gene expression upon sudden changes in environmental conditions. [Abstract/Link to Full Text]

Staub E, Mackowiak S, Vingron M
An inventory of yeast proteins associated with nucleolar and ribosomal components.
BACKGROUND: Although baker's yeast is a primary model organism for research on eukaryotic ribosome assembly and nucleoli, the list of its proteins that are functionally associated with nucleoli or ribosomes is still incomplete. We trained a naďve Bayesian classifier to predict novel proteins that are associated with yeast nucleoli or ribosomes based on parts lists of nucleoli in model organisms and large-scale protein interaction data sets. Phylogenetic profiling and gene expression analysis were carried out to shed light on evolutionary and regulatory aspects of nucleoli and ribosome assembly. RESULTS: We predict that, in addition to 439 known proteins, a further 62 yeast proteins are associated with components of the nucleolus or the ribosome. The complete set comprises a large core of archaeal-type proteins, several bacterial-type proteins, but mostly eukaryote-specific inventions. Expression of nucleolar and ribosomal genes tends to be strongly co-regulated compared to other yeast genes. CONCLUSION: The number of proteins associated with nucleolar or ribosomal components in yeast is at least 14% higher than known before. The nucleolus probably evolved from an archaeal-type ribosome maturation machinery by recruitment of several bacterial-type and mostly eukaryote-specific factors. Not only expression of ribosomal protein genes, but also expression of genes encoding the 90S processosome, are strongly co-regulated and both regulatory programs are distinct from each other. [Abstract/Link to Full Text]

Cheung TH, Kwan YL, Hamady M, Liu X
Unraveling transcriptional control and cis-regulatory codes using the software suite GeneACT.
Deciphering gene regulatory networks requires the systematic identification of functional cis-acting regulatory elements. We present a suite of web-based bioinformatics tools, called GeneACT, that can rapidly detect evolutionarily conserved transcription factor binding sites or microRNA target sites that are either unique or over-represented in differentially expressed genes from DNA microarray data. GeneACT provides graphic visualization and extraction of common regulatory sequence elements in the promoters and 3'-untranslated regions that are conserved across multiple mammalian species. [Abstract/Link to Full Text]

Moriyama EN, Strope PK, Opiyo SO, Chen Z, Jones AM
Mining the Arabidopsis thaliana genome for highly-divergent seven transmembrane receptors.
To identify divergent seven-transmembrane receptor (7TMR) candidates from the Arabidopsis thaliana genome, multiple protein classification methods were combined, including both alignment-based and alignment-free classifiers. This resolved problems in optimally training individual classifiers using limited and divergent samples, and increased stringency for candidate proteins. We identified 394 proteins as 7TMR candidates and highlighted 54 with corresponding expression patterns for further investigation. [Abstract/Link to Full Text]

Morozova TV, Anholt RR, Mackay TF
Transcriptional response to alcohol exposure in Drosophila melanogaster.
BACKGROUND: Alcoholism presents widespread social and human health problems. Alcohol sensitivity, the development of tolerance to alcohol and susceptibility to addiction vary in the population. Genetic factors that predispose to alcoholism remain largely unknown due to extensive genetic and environmental variation in human populations. Drosophila, however, allows studies on genetically identical individuals in controlled environments. Although addiction to alcohol has not been demonstrated in Drosophila, flies show responses to alcohol exposure that resemble human intoxication, including hyperactivity, loss of postural control, sedation, and exposure-dependent development of tolerance. RESULTS: We assessed whole-genome transcriptional responses following alcohol exposure and demonstrate immediate down-regulation of genes affecting olfaction, rapid upregulation of biotransformation enzymes and, concomitant with development of tolerance, altered transcription of transcriptional regulators, proteases and metabolic enzymes, including biotransformation enzymes and enzymes associated with fatty acid biosynthesis. Functional tests of P-element disrupted alleles corresponding to genes with altered transcription implicated 75% of these in the response to alcohol, two-thirds of which have human orthologues. CONCLUSION: Expression microarray analysis is an efficient method for identifying candidate genes affecting complex behavioral and physiological traits, including alcohol abuse. Drosophila provides a valuable genetic model for comparative genomic analysis, which can inform subsequent studies in human populations. Transcriptional analyses following alcohol exposure in Drosophila implicate biotransformation pathways, transcriptional regulators, proteolysis and enzymes that act as metabolic switches in the regulation of fatty acid metabolism as important targets for future studies of the physiological consequences of human alcohol abuse. [Abstract/Link to Full Text]

Zhang Y, Romero H, Salinas G, Gladyshev VN
Dynamic evolution of selenocysteine utilization in bacteria: a balance between selenoprotein loss and evolution of selenocysteine from redox active cysteine residues.
BACKGROUND: Selenocysteine (Sec) is co-translationally inserted into protein in response to UGA codons. It occurs in oxidoreductase active sites and often is catalytically superior to cysteine (Cys). However, Sec is used very selectively in proteins and organisms. The wide distribution of Sec and its restricted use have not been explained. RESULTS: We conducted comparative genomics and phylogenetic analyses to examine dynamics of Sec decoding in bacteria at both selenium utilization trait and selenoproteome levels. These searches revealed that 21.5% of sequenced bacteria utilize Sec, their selenoproteomes have 1 to 31 selenoproteins, and selenoprotein-rich organisms are mostly Deltaproteobacteria or Firmicutes/Clostridia. Evolutionary histories of selenoproteins suggest that Cys-to-Sec replacement is a general trend for most selenoproteins. In contrast, only a small number of Sec-to-Cys replacements were detected, and these were mostly restricted to formate dehydrogenase and selenophosphate synthetase families. In addition, specific selenoprotein gene losses were observed in many sister genomes. Thus, the Sec/Cys replacements were mostly unidirectional, and increased utilization of Sec by existing protein families was counterbalanced by loss of selenoprotein genes or entire selenoproteomes. Lateral transfers of the Sec trait were an additional factor, and we describe the first example of selenoprotein gene transfer between archaea and bacteria. Finally, oxygen requirement and optimal growth temperature were identified as environmental factors that correlate with changes in Sec utilization. CONCLUSION: Our data reveal a dynamic balance between selenoprotein origin and loss, and may account for the discrepancy between catalytic advantages provided by Sec and the observed low number of selenoprotein families and Sec-utilizing organisms. [Abstract/Link to Full Text]

Levine DM, Haynor DR, Castle JC, Stepaniants SB, Pellegrini M, Mao M, Johnson JM
Pathway and gene-set activation measurement from mRNA expression data: the tissue distribution of human pathways.
BACKGROUND: Interpretation of lists of genes or proteins with altered expression is a critical and time-consuming part of microarray and proteomics research, but relatively little attention has been paid to methods for extracting biological meaning from these output lists. One powerful approach is to examine the expression of predefined biological pathways and gene sets, such as metabolic and signaling pathways and macromolecular complexes. Although many methods for measuring pathway expression have been proposed, a systematic analysis of the performance of multiple methods over multiple independent data sets has not previously been reported. RESULTS: Five different measures of pathway expression were compared in an analysis of nine publicly available mRNA expression data sets. The relative sensitivity of the metrics varied greatly across data sets, and the biological pathways identified for each data set are also dependent on the choice of pathway activation metric. In addition, we show that removing incoherent pathways prior to analysis improves specificity. Finally, we create and analyze a public map of pathway expression in human tissues by gene-set analysis of a large compendium of human expression data. CONCLUSION: We show that both the detection sensitivity and identity of pathways significantly perturbed in a microarray experiment are highly dependent on the analysis methods used and how incoherent pathways are treated. Analysts should thus consider using multiple approaches to test the robustness of their biological interpretations. We also provide a comprehensive picture of the tissue distribution of human gene pathways and a useful public archive of human pathway expression data. [Abstract/Link to Full Text]

Kusnezow W, Syagailo YV, Rüffer S, Baudenstiel N, Gauer C, Hoheisel JD, Wild D, Goychuk I
Optimal design of microarray immunoassays to compensate for kinetic limitations: theory and experiment.
In this report we examine the limitations of existing microarray immunoassays and investigate how best to optimize them using theoretical and experimental approaches. Derived from DNA technology, microarray immunoassays present a major technological challenge with much greater physicochemical complexity. A key physicochemical limitation of the current generation of microarray immunoassays is a strong dependence of antibody microspot kinetics on the mass flux to the spot as was reported by us previously. In this report we analyze, theoretically and experimentally, the effects of microarray design parameters (incubation vessel geometry, incubation time, stirring, spot size, antibody-binding site density, etc.) on microspot reaction kinetics and sensitivity. Using a two-compartment model, the quantitative descriptors of the microspot reaction were determined for different incubation and microarray design conditions. This analysis revealed profound mass transport limitations in the observed kinetics, which may be slowed down as much as hundreds of times compared with the solution kinetics. The data obtained were considered with relevance to microspot assay diffusional and adsorptive processes, enabling us to validate some of the underlying principles of the antibody microspot reaction mechanism and provide guidelines for optimal microspot immunoassay design. For an assay optimized to maximize the reaction velocity on a spot, we demonstrate sensitivities in the am and low fm ranges for a system containing a representative sample of antigen-antibody pairs. In addition, a separate panel of low abundance cytokines in blood plasma was detected with remarkably high signal-to-noise ratios. [Abstract/Link to Full Text]

Madoz-Gúrpide J, López-Serra P, Martínez-Torrecuadrada JL, Sánchez L, Lombardía L, Casal JI
Proteomics-based validation of genomic data: applications in colorectal cancer diagnosis.
Multiple factors are involved in the translation of functional genomic results into proteins for proteome research and target validation on tumoral tissues. In this report, genes were selected by using DNA microarrays on a panel of colorectal cancer (CRC) paired samples. A large number of up-regulated genes in colorectal cancer patients were investigated for cellular location, and those corresponding to membrane or extracellular proteins were used for a non-biased expression in Escherichia coli. We investigated different sources of cDNA clones for protein expression as well as the influence of the protein size and the different tags with respect to protein expression levels and solubility in E. coli. From 29 selected genes, 21 distinct proteins were finally expressed as soluble proteins with, at least, one different fusion protein. In addition, seven of these potential markers (ANXA3, BMP4, LCN2, SPARC, SPP1, MMP7, and MMP11) were tested for antibody production and/or validation. Six of the seven proteins (all except SPP1) were confirmed to be overexpressed in colorectal tumoral tissues by using immunoblotting and tissue microarray analysis. Although none of them could be associated to early stages of the tumor, two of them (LCN2 and MMP11) were clearly overexpressed in late Dukes' stages (B and C). This proteomic study reveals novel clues for the assembly of a robust and highly efficient high throughput system for the validation of genomic data. Moreover it illustrates the different difficulties and bottlenecks encountered for performing a quick conversion of genomic results into clinically useful proteins. [Abstract/Link to Full Text]

Mayr M, Zhang J, Greene AS, Gutterman D, Perloff J, Ping P
Proteomics-based development of biomarkers in cardiovascular disease: mechanistic, clinical, and therapeutic insights.
Casiano CA, Mediavilla-Varela M, Tan EM
Tumor-associated antigen arrays for the serological diagnosis of cancer.
The recognition that human tumors stimulate the production of autoantibodies against autologous cellular proteins called tumor-associated antigens (TAAs) has opened the door to the possibility that autoantibodies could be exploited as serological tools for the early diagnosis and management of cancer. Cancer-associated autoantibodies are often driven by intracellular proteins that are mutated, modified, or aberrantly expressed in tumor cells and hence are regarded as immunological reporters that could help uncover molecular events underlying tumorigenesis. Emerging evidence suggests that each type of cancer might trigger unique autoantibody signatures that reflect the nature of the malignant process in the affected organ. The advent of novel genomic, proteomic, and high throughput approaches has accelerated interest in the serum autoantibody repertoire in human cancers for the discovery of candidate TAAs. The use of individual anti-TAA autoantibodies as diagnostic or prognostic tools has been tempered by their low frequency and heterogeneity in most human cancers. However, TAA arrays comprising several antigens significantly increase this frequency and hold great promise for the early detection of cancer, monitoring cancer progression, guiding individualized therapeutic interventions, and identification of novel therapeutic targets. Our recent studies suggest that the implementation of TAA arrays in screening programs for the diagnosis of prostate cancer and other cancers should be preceded by the optimization of their sensitivity and specificity through the careful selection of the most favorable combinations of TAAs. [Abstract/Link to Full Text]

Bertucci F, Birnbaum D, Goncalves A
Proteomics of breast cancer: principles and potential clinical applications.
Progresses in screening, early diagnosis, prediction of aggressiveness and of therapeutic response or toxicity, and identification of new targets for therapeutic will improve survival of breast cancer. These progresses will likely be accelerated by the new proteomic techniques. In this review, we describe the different techniques currently applied to clinical samples of breast cancer and the most important results obtained with the two most popular proteomic approaches in translational research (tissue microarrays and SELDI-TOF). [Abstract/Link to Full Text]

van den Bemd GJ, Krijgsveld J, Luider TM, van Rijswijk AL, Demmers JA, Jenster G
Mass spectrometric identification of human prostate cancer-derived proteins in serum of xenograft-bearing mice.
Lack of sensitivity and specificity of current tumor markers has intensified research efforts to find new biomarkers. The identification of potential tumor markers in human body fluids is hampered by large variability and complexity of both control and patient samples, laborious biochemical analyses, and the fact that the identified proteins are unlikely produced by the diseased cells but are due to secondary body defense mechanisms. In a new approach presented here, we eliminate these problems by performing proteomic analysis in a prostate cancer xenograft model in which human prostate cancer cells form a tumor in an immune-incompetent nude mouse. Using this concept, proteins present in mouse serum that can be identified as human will, by definition, originate from the human prostate cancer xenograft and might have potential diagnostic and prognostic value. Using one-dimensional gel electrophoresis, liquid chromatography, and mass spectrometry, we identified tumor-derived human nm23/nucleoside-diphosphate kinase (NME) in the serum of a nude mouse bearing the androgen-independent human prostate cancer xenograft PC339. NME is known to be involved in the metastatic potential of several tumor cells, including prostate cancer cells. Furthermore we identified six human enzymes involved in glycolysis (fructose-bisphosphate aldolase A, triose-phosphate isomerase, glyceraldehyde-3-phosphate dehydrogenase, alpha enolase, and lactate dehydrogenases A and B) in the serum of the tumor-bearing mice. The presence of human NME and glyceraldehyde-3-phosphate dehydrogenase in the serum of PC339-bearing mice was confirmed by Western blotting. Although the putative usefulness of these proteins in predicting prognosis of prostate cancer remains to be determined, the present data illustrate that our approach is a promising tool for the focused discovery of new prostate cancer biomarkers. [Abstract/Link to Full Text]

Sleat DE, Wang Y, Sohar I, Lackland H, Li Y, Li H, Zheng H, Lobel P
Identification and validation of mannose 6-phosphate glycoproteins in human plasma reveal a wide range of lysosomal and non-lysosomal proteins.
Acid hydrolase activities are normally confined within the cell to the lysosome, a membrane-delimited cytoplasmic organelle primarily responsible for the degradation of macromolecules. However, lysosomal proteins are also present in human plasma, and a proportion of these retain mannose 6-phosphate (Man-6-P), a modification on N-linked glycans that is recognized by Man-6-P receptors (MPRs) that normally direct the targeting of these proteins to the lysosome. In this study, we purified the Man-6-P glycoforms of proteins from human plasma by affinity chromatography on immobilized MPRs and characterized this subproteome by two-dimensional gel electrophoresis and by tandem mass spectrometry. As expected, we identified many known and potential candidate lysosomal proteins. In addition, we also identified a number of abundant classical plasma proteins that were retained even after two consecutive rounds of affinity purification. Given their abundance in plasma, we initially considered these proteins to be likely contaminants, but a mass spectrometric study of Man-6-phosphorylation sites using MPR-purified glycopeptides revealed that some proportion of these classical plasma proteins contained the Man-6-P modification. We propose that these glycoproteins are phosphorylated at low levels by the lysosomal enzyme phosphotransferase, but their high abundance results in detection of Man-6-P glycoforms in plasma. These results may provide useful insights into the molecular processes underlying Man-6-phosphorylation and highlight circumstances under which the presence of Man-6-P may not be indicative of lysosomal function. In addition, characterization of the plasma Man-6-P glycoproteome should facilitate development of mass spectrometry-based tools for the diagnosis of lysosomal storage diseases and for investigating the involvement of Man-6-P-containing glycoproteins in more widespread human diseases and their potential utility as biomarkers. [Abstract/Link to Full Text]

McAfee KJ, Duncan DT, Assink M, Link AJ
Analyzing proteomes and protein function using graphical comparative analysis of tandem mass spectrometry results.
Although generating large amounts of proteomic data using tandem mass spectrometry has become routine, there is currently no single set of comprehensive tools for the rigorous analysis of tandem mass spectrometry results given the large variety of possible experimental aims. Currently available applications are typically designed for displaying proteins and posttranslational modifications from the point of view of the mass spectrometrist and are not versatile enough to allow investigators to develop biological models of protein function, protein structure, or cell state. In addition, storage and dissemination of mass spectrometry-based proteomic data are problems facing the scientific community. To address these issues, we have developed a relational database model that efficiently stores and manages large amounts of tandem mass spectrometry results. We have developed an integrated suite of multifunctional analysis software for interpreting, comparing, and displaying these results. Our system, Bioinformatic Graphical Comparative Analysis Tools (BIGCAT), allows sophisticated analysis of tandem mass spectrometry results in a biologically intuitive format and provides a solution to many data storage and dissemination issues. [Abstract/Link to Full Text]

Meistermann H, Norris JL, Aerni HR, Cornett DS, Friedlein A, Erskine AR, Augustin A, De Vera Mudry MC, Ruepp S, Suter L, Langen H, Caprioli RM, Ducret A
Biomarker discovery by imaging mass spectrometry: transthyretin is a biomarker for gentamicin-induced nephrotoxicity in rat.
Adverse drug effects are often associated with pathological changes in tissue. An accurate depiction of the undesired affected area, possibly supported by mechanistic data, is important to classify the effects with regard to relevance for human patients. MALDI imaging MS represents a new analytical tool to directly provide the spatial distribution and the relative abundance of proteins in tissue. Here we evaluate this technique to investigate potential toxicity biomarkers in kidneys of rats that were administered gentamicin, a well known nephrotoxicant. Differential analysis of the mass spectrum profiles revealed a spectral feature at 12,959 Da that strongly correlates with histopathology alterations of the kidney. We unambiguously identified this spectral feature as transthyretin (Ser(28)-Gln(146)) using an innovative combination of tissue microextraction and fractionation by reverse-phase liquid chromatography followed by a top-down tandem mass spectrometric approach. Our findings clearly demonstrate the emerging role of imaging MS in the discovery of toxicity biomarkers and in obtaining mechanistic insights concerning toxicity mechanisms. [Abstract/Link to Full Text]

Deeg CA, Pompetzki D, Raith AJ, Hauck SM, Amann B, Suppmann S, Goebel TW, Olazabal U, Gerhards H, Reese S, Stangassinger M, Kaspers B, Ueffing M
Identification and functional validation of novel autoantigens in equine uveitis.
The development, progression, and recurrence of autoimmune diseases are frequently driven by a group of participatory autoantigens. We identified and characterized novel autoantigens by analyzing the autoantibody binding pattern from horses affected by spontaneous equine recurrent uveitis to the retinal proteome. Cellular retinaldehyde-binding protein (cRALBP) had not been described previously as autoantigen, but subsequent characterization in equine recurrent uveitis horses revealed B and T cell autoreactivity to this protein and established a link to epitope spreading. We further immunized healthy rats and horses with cRALBP and observed uveitis in both species with typical tissue lesions at cRALBP expression sites. The autoantibody profiling outlined here could be used in various autoimmune diseases to detect autoantigens involved in the dynamic spreading cascade or serve as predictive markers. [Abstract/Link to Full Text]

Liu T, Qian WJ, Gritsenko MA, Xiao W, Moldawer LL, Kaushal A, Monroe ME, Varnum SM, Moore RJ, Purvine SO, Maier RV, Davis RW, Tompkins RG, Camp DG, Smith RD
High dynamic range characterization of the trauma patient plasma proteome.
Although human plasma represents an attractive sample for disease biomarker discovery, the extreme complexity and large dynamic range in protein concentrations present significant challenges for characterization, candidate biomarker discovery, and validation. Herein we describe a strategy that combines immunoaffinity subtraction and subsequent chemical fractionation based on cysteinyl peptide and N-glycopeptide captures with two-dimensional LC-MS/MS to increase the dynamic range of analysis for plasma. Application of this "divide-and-conquer" strategy to trauma patient plasma significantly improved the overall dynamic range of detection and resulted in confident identification of 22,267 unique peptides from four different peptide populations (cysteinyl peptides, non-cysteinyl peptides, N-glycopeptides, and non-glycopeptides) that covered 3,654 different proteins with 1,494 proteins identified by multiple peptides. Numerous low abundance proteins were identified, exemplified by 78 "classic" cytokines and cytokine receptors and by 136 human cell differentiation molecules. Additionally a total of 2,910 different N-glycopeptides that correspond to 662 N-glycoproteins and 1,553 N-glycosylation sites were identified. A panel of the proteins identified in this study is known to be involved in inflammation and immune responses. This study established an extensive reference protein database for trauma patients that provides a foundation for future high throughput quantitative plasma proteomic studies designed to elucidate the mechanisms that underlie systemic inflammatory responses. [Abstract/Link to Full Text]

Onder O, Yoon H, Naumann B, Hippler M, Dancis A, Daldal F
Modifications of the lipoamide-containing mitochondrial subproteome in a yeast mutant defective in cysteine desulfurase.
Comparison and identification of mitochondrial matrix proteins from wild-type and cysteine desulfurase-defective (nfs1-14, carrying a hypomorphic allele of NFS1) yeast strains, using two-dimensional gel electrophoresis coupled to mass spectrometry analyses, revealed large changes in the amounts of various proteins. Protein spots that were specifically increased in the nfs1-14 mutant included subunits of lipoamide-containing enzyme complexes: Kgd2, Lat1, and Gcv3, subunits of the mitochondrial alpha-ketoglutarate dehydrogenase, pyruvate dehydrogenase, and glycine cleavage system complexes, respectively. Moreover the increased protein spots corresponded to lipoamide-deficient forms in the nfs1-14 mutant. The increased proteins migrated as separate, cathode-shifted spots, consistent with gain of a lysine charge due to lack of lipoamide addition. Lack of lipoylation of these proteins was further validated using an antibody specific for lipoamide-containing proteins. In addition, this antibody revealed a fourth lipoamide-containing protein, probably corresponding to the E2 component of the branched-chain keto acid dehydrogenase complex. Like the lipoamide-containing forms of Kgd2, Lat1, and Gcv3, this protein also showed decreased lipoic acid reactivity in the nfs1-14 mutant. Cysteine desulfurases, such as yeast NFS1, are required for sulfur addition to iron-sulfur clusters and other sulfur-requiring processes. The results demonstrate that Nfs1 protein is required for the proper post-translational modification of the lipoamide-containing mitochondrial subproteome in yeast and pave the road toward a thorough understanding of its precise role in lipoic acid synthesis. [Abstract/Link to Full Text]

Adkins JN, Mottaz HM, Norbeck AD, Gustin JK, Rue J, Clauss TR, Purvine SO, Rodland KD, Heffron F, Smith RD
Analysis of the Salmonella typhimurium proteome through environmental response toward infectious conditions.
Salmonella enterica serovar Typhimurium (also known as Salmonella typhimurium) is a facultative intracellular pathogen that causes approximately 8,000 reported cases of acute gastroenteritis and diarrhea each year in the United States. Although many successful physiological, biochemical, and genetic approaches have been taken to determine the key virulence determinants encoded by this organism, the sheer number of uncharacterized reading frames observed within the S. enterica genome suggests that many more virulence factors remain to be discovered. We used a liquid chromatography-mass spectrometry-based "bottom-up" proteomic approach to generate a more complete picture of the gene products that S. typhimurium synthesizes under typical laboratory conditions as well as in culture media that are known to induce expression of virulence genes. When grown to logarithmic phase in rich medium, S. typhimurium is known to express many genes that are required for invasion of epithelial cells. Conversely stationary phase cultures of S. typhimurium express genes that are needed for both systemic infection and growth within infected macrophages. Lastly bacteria grown in an acidic, magnesium-depleted minimal medium (MgM) designed to mimic the phagocytic vacuole have been shown to up-regulate virulence gene expression. Initial comparisons of protein abundances from bacteria grown under each of these conditions indicated that the majority of proteins do not change significantly. However, we observed subsets of proteins whose expression was largely restricted to one of the three culture conditions. For example, cells grown in MgM had a higher abundance of Mg(2+) transport proteins than found in other growth conditions. A second more virulent S. typhimurium strain (14028) was also cultured under these same growth conditions, and the results were directly compared with those obtained for strain LT2. This comparison offered a unique opportunity to contrast protein populations in these closely related bacteria. Among a number of proteins displaying a higher abundance in strain 14028 were the products of the pdu operon, which encodes enzymes required for propanediol utilization. These pdu operon proteins were validated in culture and during macrophage infection. Our work provides further support for earlier observations that suggest pdu gene expression contributes to S. typhimurium pathogenesis. [Abstract/Link to Full Text]

Vincourt JB, Lionneton F, Kratassiouk G, Guillemin F, Netter P, Mainard D, Magdalou J
Establishment of a reliable method for direct proteome characterization of human articular cartilage.
Articular cartilage consists mainly of extracellular matrix, mostly made of collagens and proteoglycans. These macromolecules have so far impaired the detailed two-dimensional electrophoresis-based proteomic analysis of articular cartilage. Here we describe a method for selective protein extraction from cartilage, which excludes proteoglycans and collagen species, thus allowing direct profiling of the protein content of cartilage by two-dimensional electrophoresis. Consistent electrophoretic patterns of more than 600 protein states were reproducibly obtained after silver staining from 500 mg of human articular cartilage from joints with diverse pathologies. The extraction yield increased when the method was applied to a chondrosarcoma sample, consistent with selective extraction of cellular components. Nearly 200 of the most intensely stained protein spots were analyzed by MALDI-TOF mass spectrometry after trypsin digestion. They represented 127 different proteins with diverse functions. Our method provides a rapid, efficient, and pertinent alternative to previously proposed approaches for proteomic characterization of cartilage phenotypes. It will be useful for detecting protein expression patterns that relate pathophysiological processes of cartilaginous tissues such as osteoarthritis and chondrosarcoma. [Abstract/Link to Full Text]

Bradshaw RA, Burlingame AL, Carr S, Aebersold R
Reporting protein identification data: the next generation of guidelines.
Turkina MV, Kargul J, Blanco-Rivero A, Villarejo A, Barber J, Vener AV
Environmentally modulated phosphoproteome of photosynthetic membranes in the green alga Chlamydomonas reinhardtii.
Mapping of in vivo protein phosphorylation sites in photosynthetic membranes of the green alga Chlamydomonas reinhardtii revealed that the major environmentally dependent changes in phosphorylation are clustered at the interface between the photosystem II (PSII) core and its light-harvesting antennae (LHCII). The photosynthetic membranes that were isolated form the algal cells exposed to four distinct environmental conditions affecting photosynthesis: (i) dark aerobic, corresponding to photosynthetic State 1; (ii) dark under nitrogen atmosphere, corresponding to photosynthetic State 2; (iii) moderate light; and (iv) high light. The surface-exposed phosphorylated peptides were cleaved from the membrane by trypsin, methyl-esterified, enriched by immobilized metal affinity chromatography, and sequenced by nanospray-quadrupole time-of-flight mass spectrometry. A total of 19 in vivo phosphorylation sites were mapped in the proteins corresponding to 15 genes in C. reinhardtii. Amino-terminal acetylation of seven proteins was concomitantly determined. Sequenced amino termini of six mature LHCII proteins differed from the predicted ones. The State 1-to-State 2 transition induced phosphorylation of the PSII core components D2 and PsbR and quadruple phosphorylation of a minor LHCII antennae subunit, CP29, as well as phosphorylation of constituents of a major LHCII complex, Lhcbm1 and Lhcbm10. Exposure of the algal cells to either moderate or high light caused additional phosphorylation of the D1 and CP43 proteins of the PSII core. The high light treatment led to specific hyperphosphorylation of CP29 at seven distinct residues, phosphorylation of another minor LHCII constituent, CP26, at a single threonine, and double phosphorylation of additional subunits of a major LHCII complex including Lhcbm4, Lhcbm6, Lhcbm9, and Lhcbm11. Environmentally induced protein phosphorylation at the interface of PSII core and the associated antenna proteins, particularly multiple differential phosphorylations of CP29 linker protein, suggests the mechanisms for control of photosynthetic state transitions and for LHCII uncoupling from PSII under high light stress to allow thermal energy dissipation. [Abstract/Link to Full Text]

Morel J, Claverol S, Mongrand S, Furt F, Fromentin J, Bessoule JJ, Blein JP, Simon-Plas F
Proteomics of plant detergent-resistant membranes.
A large body of evidence from the past decade supports the existence, in membrane from animal and yeast cells, of functional microdomains that play important roles in protein sorting, signal transduction, or infection by pathogens. Recent reports demonstrated the presence, in plants, of detergent-resistant fractions isolated from plasma membrane. Analysis of the lipidic composition of this fraction revealed its enrichment in sphingolipids and sterols and depletion in phospho- and glycerolipids as previously observed for animal microdomains. One-dimensional gel electrophoresis experiments indicated that these detergent-resistant fractions are able to recruit a specific set of plasma membrane proteins and exclude others. In the present study, we used mass spectrometry to give an extensive description of a tobacco plasma membrane fraction resistant to solubilization with Triton X-100. This led to the identification of 145 proteins whose functional and physicochemical characteristics were analyzed in silico. Parameters such as isoelectric point, molecular weight, number and length of transmembrane segments, or global hydrophobicity were analyzed and compared with the data available concerning plant plasma membrane proteins. Post-translational modifications, such as myristoylation, palmitoylation, or presence of a glycosylphosphatidylinositol anchor, were examined in relation to the presence of the corresponding proteins in these microdomains. From a functional point of view, this analysis indicated that if a primary function of the plasma membrane, such as transport, seems under-represented in the detergent-resistant fraction, others undergo a significant increase of their relative importance. Among these are signaling and response to biotic and abiotic stress, cellular trafficking, and cell wall metabolism. This suggests that these domains are likely to constitute, as in animal cells, signaling platforms involved in these physiological functions. [Abstract/Link to Full Text]

Kabuyama Y, Langer SJ, Polvinen K, Homma Y, Resing KA, Ahn NG
Functional proteomics identifies protein-tyrosine phosphatase 1B as a target of RhoA signaling.
Rho GTPases are signal transduction effectors that control cell motility, cell attachment, and cell shape by the control of actin polymerization and tyrosine phosphorylation. To identify cellular targets regulated by Rho GTPases, we screened global protein responses to Rac1, Cdc42, and RhoA activation by two-dimensional gel electrophoresis and mass spectrometry. A total of 22 targets were identified of which 19 had never been previously linked to Rho GTPase pathways, providing novel insight into pathway function. One novel target of RhoA was protein-tyrosine phosphatase 1B (PTP1B), which catalyzes dephosphorylation of key signaling molecules in response to activation of diverse pathways. Subsequent analysis demonstrated that RhoA enhances post-translational modification of PTP1B, inactivates phosphotyrosine phosphatase activity, and up-regulates tyrosine phosphorylation of p130Cas, a key mediator of focal adhesion turnover and cell migration. Thus, protein profiling reveals a novel role for PTP1B as a mediator of RhoA-dependent phosphorylation of p130Cas. [Abstract/Link to Full Text]

Haas W, Faherty BK, Gerber SA, Elias JE, Beausoleil SA, Bakalarski CE, Li X, Villén J, Gygi SP
Optimization and use of peptide mass measurement accuracy in shotgun proteomics.
Mass spectrometers that provide high mass accuracy such as FT-ICR instruments are increasingly used in proteomic studies. Although the importance of accurately determined molecular masses for the identification of biomolecules is generally accepted, its role in the analysis of shotgun proteomic data has not been thoroughly studied. To gain insight into this role, we used a hybrid linear quadrupole ion trap/FT-ICR (LTQ FT) mass spectrometer for LC-MS/MS analysis of a highly complex peptide mixture derived from a fraction of the yeast proteome. We applied three data-dependent MS/MS acquisition methods. The FT-ICR part of the hybrid mass spectrometer was either not exploited, used only for survey MS scans, or also used for acquiring selected ion monitoring scans to optimize mass accuracy. MS/MS data were assigned with the SEQUEST algorithm, and peptide identifications were validated by estimating the number of incorrect assignments using the composite target/decoy database search strategy. We developed a simple mass calibration strategy exploiting polydimethylcyclosiloxane background ions as calibrant ions. This strategy allowed us to substantially improve mass accuracy without reducing the number of MS/MS spectra acquired in an LC-MS/MS run. The benefits of high mass accuracy were greatest for assigning MS/MS spectra with low signal-to-noise ratios and for assigning phosphopeptides. Confident peptide identification rates from these data sets could be doubled by the use of mass accuracy information. It was also shown that improving mass accuracy at a cost to the MS/MS acquisition rate substantially lowered the sensitivity of LC-MS/MS analyses. The use of FT-ICR selected ion monitoring scans to maximize mass accuracy reduced the number of protein identifications by 40%. [Abstract/Link to Full Text]

Song E, Gao S, Tian R, Ma S, Huang H, Guo J, Li Y, Zhang L, Gao Y
A high efficiency strategy for binding property characterization of peptide-binding domains.
A large proportion of protein-protein interactions is mediated by families of peptide-binding domains. Comprehensive characterization of each of these domains is critical for understanding the mechanisms and networks of protein interaction at the domain level. However, existing methods are all based on large scale screenings for each domain that are inefficient to deal with hundreds of members in major domain families. We developed a systematic strategy for efficient binding property characterization of peptide-binding domains based on high throughput validation screening of a specialized candidate ligand library using yeast two-hybrid mating array. Its outstanding feature is that the overall efficiency is dramatically improved compared with that of traditional screening, and it will be higher as the system cycles. PDZ domain family was first used to test the strategy. Five PDZ domains were rapidly characterized. Broader binding properties were identified compared with other methods, including novel recognition specificities that provided the basis for major revision of conventional PDZ classification. Several novel interactions were discovered, serving as significant clues for further functional investigation. This strategy can be easily extended to a variety of peptide-binding domains as a powerful tool for comprehensive analysis of domain binding property in proteomic scale. [Abstract/Link to Full Text]

Nelson CJ, Hegeman AD, Harms AC, Sussman MR
A quantitative analysis of Arabidopsis plasma membrane using trypsin-catalyzed (18)O labeling.
Typical mass spectrometry-based protein lists from purified fractions are confounded by the absence of tools for evaluating contaminants. In this report, we compare the results of a standard survey experiment using an ion trap mass spectrometer with those obtained using dual isotope labeling and a Q-TOF mass spectrometer to quantify the degree of enrichment of proteins in purified subcellular fractions of Arabidopsis plasma membrane. Incorporation of a stable isotope, either H(2)(18)O or H(2)(16)O, during trypsinization allowed relative quantification of the degree of enrichment of proteins within membranes after phase partitioning with polyethylene glycol/dextran mixtures. The ratios allowed the quantification of 174 membrane-associated proteins with 70 showing plasma membrane enrichment equal to or greater than ATP-dependent proton pumps, canonical plasma membrane proteins. Enriched proteins included several hallmark plasma membrane proteins, such as H(+)-ATPases, aquaporins, receptor-like kinases, and various transporters, as well as a number of proteins with unknown functions. Most importantly, a comparison of the datasets from a sequencing "survey" analysis using the ion trap mass spectrometer with that from the quantitative dual isotope labeling ratio method indicates that as many as one-fourth of the putative survey identifications are biological contaminants rather than bona fide plasma membrane proteins. [Abstract/Link to Full Text]

Beck HC, Nielsen EC, Matthiesen R, Jensen LH, Sehested M, Finn P, Grauslund M, Hansen AM, Jensen ON
Quantitative proteomic analysis of post-translational modifications of human histones.
Histone proteins are subject to a range of post-transcriptional modifications in living cells. The combinatorial nature of these modifications constitutes the "histone code" that dictates chromatin structure and function during development, growth, differentiation, and homeostasis of cells. Deciphering of the histone code is hampered by the lack of analytical methods for monitoring the combinatorial complexity of reversible multisite modifications of histones, including acetylation and methylation. To address this problem, we used LC-MSMS technology and Virtual Expert Mass Spectrometrist software for qualitative and quantitative proteomic analysis of histones extracted from human small cell lung cancer cells. A total of 32 acetylations, methylations, and ubiquitinations were located in the human histones H2A, H2B, H3, and H4, including seven novel modifications. An LC-MSMS-based method was applied in a quantitative proteomic study of the dose-response effect of the histone deacetylase inhibitor (HDACi) PXD101 on histone acetylation in human cell cultures. Triplicate LC-MSMS runs at six different HDACi concentrations demonstrated that PXD101 affects acetylation of histones H2A, H2B, H3, and H4 in a site-specific and dose-dependent manner. This unbiased analysis revealed that a relative increase in acetylated peptide from the histone variants H2A, H2B, and H4 was accompanied by a relative decrease of dimethylated Lys(57) from histone H2B. The dose-response results obtained by quantitative proteomics of histones from HDACi-treated cells were consistent with Western blot analysis of histone acetylation, cytotoxicity, and dose-dependent expression profiles of p21 and cyclin A2. This demonstrates that mass spectrometry-based quantitative proteomic analysis of post-translational modifications is a viable approach for functional analysis of candidate drugs, such as HDAC inhibitors. [Abstract/Link to Full Text]

Emadali A, Muscatelli-Groux B, Delom F, Jenna S, Boismenu D, Sacks DB, Metrakos PP, Chevet E
Proteomic analysis of ischemia-reperfusion injury upon human liver transplantation reveals the protective role of IQGAP1.
Ischemia-reperfusion injury (IRI) represents a major determinant of liver transplantation. IRI-induced graft dysfunction is related to biliary damage, partly due to a loss of bile canaliculi (BC) integrity associated with a dramatic remodeling of actin cytoskeleton. However, the molecular mechanisms associated with these events remain poorly characterized. Using liver biopsies collected during the early phases of organ procurement (ischemia) and transplantation (reperfusion), we characterized the global patterns of expression and phosphorylation of cytoskeleton-related proteins during hepatic IRI. This targeted functional proteomic approach, which combined protein expression pattern profiling and phosphoprotein enrichment followed by mass spectrometry analysis, allowed us to identify IQGAP1, a Cdc42/Rac1 effector, as a potential regulator of actin cytoskeleton remodeling and maintenance of BC integrity. Cell fractionation and immunohistochemistry revealed that IQGAP1 expression and localization were affected upon IRI and related to actin reorganization. Furthermore using an IRI model in human hepatoma cells, we demonstrated that IQGAP1 silencing decreased the basal level of actin polymerization at BC periphery, reflecting a defect in BC structure coincident with reduced cellular resistance to IRI. In summary, this study uncovered new mechanistic insights into the global regulation of IRI-induced cytoskeleton remodeling and led to the identification of IQGAP1 as a regulator of BC structure. IQGAP1 therefore represents a potential target for the design of new organ preservation strategies to improve transplantation outcome. [Abstract/Link to Full Text]

Apraiz I, Mi J, Cristobal S
Identification of proteomic signatures of exposure to marine pollutants in mussels (Mytilus edulis).
Bivalves and especially mussels are very good indicators of marine and estuarine pollution, and so they have been widely used in biomonitoring programs all around the world. However, traditional single parameter biomarkers face the problem of high sensitivity to biotic and abiotic factors. In our study, digestive gland peroxisome-enriched fractions of Mytilus edulis (L., 1758) were analyzed by DIGE and MS. We identified several proteomic signatures associated with the exposure to several marine pollutants (diallyl phthalate, PBDE-47, and bisphenol-A). Animals collected from North Atlantic Sea were exposed to the contaminants independently under controlled laboratory conditions. One hundred and eleven spots showed a significant increase or decrease in protein abundance in the two-dimensional electrophoresis maps from the groups exposed to pollutants. We obtained a unique protein expression signature of exposure to each of those chemical compounds. Moreover a set of proteins composed a proteomic signature in common to the three independent exposures. It is remarkable that the principal component analysis of these spots showed a discernible separation between groups, and so did the hierarchical clustering into four classes. The 14 proteins identified by MS participate in alpha- and beta-oxidation pathways, xenobiotic and amino acid metabolism, cell signaling, oxyradical metabolism, peroxisomal assembly, respiration, and the cytoskeleton. Our results suggest that proteomic signatures could become a valuable tool to monitor the presence of pollutants in field experiments where a mixture of pollutants is often present. Further studies on the identified proteins could provide crucial information to understand possible mechanisms of toxicity of single xenobiotics or mixtures of them in marine ecosystems. [Abstract/Link to Full Text]

Gilson PR, Nebl T, Vukcevic D, Moritz RL, Sargeant T, Speed TP, Schofield L, Crabb BS
Identification and stoichiometry of glycosylphosphatidylinositol-anchored membrane proteins of the human malaria parasite Plasmodium falciparum.
Most proteins that coat the surface of the extracellular forms of the human malaria parasite Plasmodium falciparum are attached to the plasma membrane via glycosylphosphatidylinositol (GPI) anchors. These proteins are exposed to neutralizing antibodies, and several are advanced vaccine candidates. To identify the GPI-anchored proteome of P. falciparum we used a combination of proteomic and computational approaches. Focusing on the clinically relevant blood stage of the life cycle, proteomic analysis of proteins labeled with radioactive glucosamine identified GPI anchoring on 11 proteins (merozoite surface protein (MSP)-1, -2, -4, -5, -10, rhoptry-associated membrane antigen, apical sushi protein, Pf92, Pf38, Pf12, and Pf34). These proteins represent approximately 94% of the GPI-anchored schizont/merozoite proteome and constitute by far the largest validated set of GPI-anchored proteins in this organism. Moreover MSP-1 and MSP-2 were present in similar copy number, and we estimated that together these proteins comprise approximately two-thirds of the total membrane-associated surface coat. This is the first time the stoichiometry of MSPs has been examined. We observed that available software performed poorly in predicting GPI anchoring on P. falciparum proteins where such modification had been validated by proteomics. Therefore, we developed a hidden Markov model (GPI-HMM) trained on P. falciparum sequences and used this to rank all proteins encoded in the completed P. falciparum genome according to their likelihood of being GPI-anchored. GPI-HMM predicted GPI modification on all validated proteins, on several known membrane proteins, and on a number of novel, presumably surface, proteins expressed in the blood, insect, and/or pre-erythrocytic stages of the life cycle. Together this work identified 11 and predicted a further 19 GPI-anchored proteins in P. falciparum. [Abstract/Link to Full Text]

Van Hoof D, Passier R, Ward-Van Oostwaard D, Pinkse MW, Heck AJ, Mummery CL, Krijgsveld J
A quest for human and mouse embryonic stem cell-specific proteins.
Embryonic stem cells (ESCs) are of immense interest as they can proliferate indefinitely in vitro and give rise to any adult cell type, serving as a potentially unlimited source for tissue replacement in regenerative medicine. Extensive analyses of numerous human and mouse ESC lines have shown generic similarities and differences at both the transcriptional and functional level. However, comprehensive proteome analyses are missing or are restricted to mouse ESCs. Here we have used an extensive proteomic approach to search for ESC-specific proteins by analyzing the differential protein expression profiles of human and mouse ESCs and their differentiated derivatives. The data sets comprise 1,775 non-redundant proteins identified in human ESCs, 1,532 in differentiated human ESCs, 1,871 in mouse ESCs, and 1,552 in differentiated mouse ESCs with a false positive rate of <0.2%. Comparison of the data sets distinguished 191 proteins exclusively identified in both human and mouse ESCs but not in their differentiated derivatives. Besides well known ESC benchmarks, this subset included many uncharacterized proteins, some of which may be novel ESC-specific markers. To complement the mass spectrometric approach, differential expression of a selection of these proteins was confirmed by Western blotting, immunofluorescence confocal microscopy, and fluorescence-activated cell sorting. Additionally two other independently isolated and cultured human ESC lines as well as their differentiated derivatives were monitored for differential expression of selected proteins. Some of these proteins were identified exclusively in ESCs of all three human lines and may thus serve as generic ESC markers. Our wide scale proteomic approach enabled us to screen thousands of proteins rapidly and select putative ESC-associated proteins for further analysis. Validation by three independent conventional protein analysis techniques shows that our methodology is robust, provides an excellent tool to characterize ESCs at the protein level, and may disclose novel ESC-specific benchmarks. [Abstract/Link to Full Text]

Maltman DJ, Przyborski SP
Can large-scale analysis of the proteome identify effective new markers for embryonic stem cells?
Luo Q, Nieves E, Kzhyshkowska J, Angeletti RH
Endogenous transforming growth factor-beta receptor-mediated Smad signaling complexes analyzed by mass spectrometry.
ASmad proteins are the central feature of the transforming growth factor-beta (TGF-beta) intracellular signaling cascade. They function by carrying signals from the cell surface to the nucleus through the formation of a series of signaling complexes. Changes in Smad proteins and their complexes upon treatment with TGF-beta were studied in mink lung epithelial (Mv1Lu) cell cultures. A time course of incubation with TGF-beta was carried out to determine the peak of appearance of phosphorylated Smad2. Immobilized monoclonal antibody against Smad2 was then used to isolate the naturally occurring complexes. Three strategies were used to identify changes in proteins partnering with Smad2: separation by one-dimensional SDS-PAGE followed by MALDI peptide mass fingerprinting, cleavable ICAT labeling of the protein mixtures analyzed by LC-MS/MS, and nano-LC followed by MALDI MS TOF/TOF. Smad2 forms complexes with many other polypeptides both in the presence and absence of TGF-beta. Some of the classes of proteins identified include: transcription regulators, proteins of the cytoskeletal scaffold and other tethering proteins, motility proteins, proteins involved in transport between the cytoplasm and nucleus, and a group of membrane adaptor proteins. Although some of these have been reported in the literature, most have not been reported previously. This work expands the repertoire of proteins known to participate in the TGF-beta signal transduction processes. [Abstract/Link to Full Text]

Zhang L, Shao C, Zheng D, Gao Y
An integrated machine learning system to computationally screen protein databases for protein binding peptide ligands.
A fairly large set of protein interactions is mediated by families of peptide binding domains, such as Src homology 2 (SH2), SH3, PDZ, major histocompatibility complex, etc. To identify their ligands by experimental screening is not only labor-intensive but almost futile in screening low abundance species due to the suppression by high abundance species. An ideal way of studying protein-protein interactions is to use high throughput computational approaches to screen protein sequence databases to direct the validating experiments toward the most promising peptides. Predictors with only good cross-validation were not good enough to screen protein databases. In the current study we built integrated machine learning systems using three novel coding methods and screened the Swiss-Prot and GenBank protein databases for potential ligands of 10 SH3 and three PDZ domains. A large fraction of predictions has already been experimentally confirmed by other independent research groups, indicating a satisfying generalization capability for future applications in identifying protein interactions. [Abstract/Link to Full Text]

Bhalla J, Storchan GB, MacCarthy CM, Uversky VN, Tcherkasskaya O
Local flexibility in molecular function paradigm.
It is generally accepted that the functional activity of biological macromolecules requires tightly packed three-dimensional structures. Recent theoretical and experimental evidence indicates, however, the importance of molecular flexibility for the proper functioning of some proteins. We examined high resolution structures of proteins in various functional categories with respect to the secondary structure assessment. The latter was considered as a characteristic of the inherent flexibility of a polypeptide chain. We found that the proteins in functionally competent conformational states might be comprised of 20-70% flexible residues. For instance, proteins involved in gene regulation, e.g. transcription factors, are on average largely disordered molecules with over 60% of amino acids residing in "coiled" configurations. In contrast, oxygen transporters constitute a class of relatively rigid molecules with only 30% of residues being locally flexible. Phylogenic comparison of a large number of protein families with respect to the propagation of secondary structure illuminates the growing role of the local flexibility in organisms of greater complexity. Furthermore the local flexibility in protein molecules appears to be dependent on the molecular confinement and is essentially larger in extracellular proteins. [Abstract/Link to Full Text]

Lukas TJ, Luo WW, Mao H, Cole N, Siddique T
Informatics-assisted protein profiling in a transgenic mouse model of amyotrophic lateral sclerosis.
One of the causes of amyotrophic lateral sclerosis (ALS) is due to mutations in Cu,Zn-superoxide dismutase (SOD1). The mutant protein exhibits a toxic gain of function that adversely affects the function of neurons in the spinal cord, brain stem, and motor cortex. A proteomic analysis of protein expression in a widely used mouse model of ALS was undertaken to identify differences in protein expression in the spinal cords of mice expressing a mutant protein with the G93A mutation found in human ALS. Protein profiling was done on soluble and particulate fractions of spinal cord extracts using high throughput two-dimensional liquid chromatography coupled to tandem mass spectrometry. An integrated proteomics-informatics platform was used to identify relevant differences in protein expression based upon the abundance of peptides identified by database searching of mass spectrometry data. Changes in the expression of proteins associated with mitochondria were particularly prevalent in spinal cord proteins from both mutant G93A-SOD1 and wild-type SOD1 transgenic mice. G93A-SOD1 mouse spinal cord also exhibited differences in proteins associated with metabolism, protein kinase regulation, antioxidant activity, and lysosomes. Using gene ontology analysis, we found an overlap of changes in mRNA expression in presymptomatic mice (from microarray analysis) in three different gene categories. These included selected protein kinase signaling systems, ATP-driven ion transport, and neurotransmission. Therefore, alterations in selected cellular processes are detectable before symptomatic onset in ALS mouse models. However, in late stage disease, mRNA expression analysis did not reveal significant changes in mitochondrial gene expression but did reveal concordant changes in lipid metabolism, lysosomes, and the regulation of neurotransmission. Thus, concordance of proteomic and mRNA expression data within multiple categories validates the use of gene ontology analysis to compare different types of "omic" data. [Abstract/Link to Full Text]

