PipMaker Examples

TABLE OF CONTENTS


Introduction

This page illustrates the utility of PipMaker with several examples, most of which compare a region of the human genome with the homologous region from the mouse. The points illustrated here are:


Bruton's tyrosine kinase

The color pip, made with coding-region specifications in the "exons" file and a first sequence underlay file, paints exons Green, introns LightYellow, and several conserved non-coding regions Red (or "Red +").

Defects in the Bruton's tyrosine kinase (BTK) gene lead to X-linked agammaglobulinemia (Tsukada et al. 1993; Vetrie et al. 1993), a disorder characterized by a severe deficiency of circulating immunoglobulins and mature B cells (OMIM 300300). Examination of the B cells present in the bone marrow of XLA patients demonstrates increased levels of pro B cells but reduced levels of both pre B and mature B cells (Campana et al. 1990), suggesting that BTK function is crucial for the maturation of B cells. BTK expression further suggests its involvement in B cell development; expression is restricted to B cells and myeloid cells, but is not seen in T cells. In addition, BTK is expressed in the early stages of B cell differentiation before immunoglobulin heavy- or light- chain rearrangements. This expression is continued throughout B cell development but is down-regulated once the B cell matures into a plasma cell (de Weers et al. 1993; Smith et al. 1994).

Understanding elements regulating BTK expression is crucial to understanding its involvement in the complexities of B cell development. Both in vitro and in vivo experiments have demonstrated the contribution of binding sites for Spi-1/PU.1, SpiB, and Sp1 within the 280 bp 5' of BTK exon 1 to the hematopoietic cell lineage-specific expression (Sideras et al. 1994; Himmelmann et al. 1996; Muller et al. 1996). Under the hypothesis that elements important in regulating the expression of BTK are conserved between species, the human (GenBank U78027) and murine (GenBank U58105) genomic sequences in the region have been compared (Oeltjen et al. 1997).

Genomic sequencing of BTK has demonstrated both a gene rich and repeat dense region (Oeltjen et al. 1995). In addition to BTK, four genes previously mapped to the region (Vorechovsky et al. 1994) were localized: the single exon RNA-binding gene, FTP-3; the seven exon gene, alpha-D-galactosidase A (GLA), defects in which result in Fabry disease (a lysosomal storage disease); a five exon ribosomal gene, L44L; and a two exon gene of unknown function, FCI-12. As shown in the pip diagram (PDF) the comparison of the mouse and human genomic sequences demonstrates not only conservation of the entire coding sequence, but also extensive conservation of the noncoding sequence. In comparing the conservation of the two ubiquitously expressed genes, L44L and GLA, to the more specifically expressed BTK gene, the noncoding sequence within the BTK locus appears to be more conserved (Oeltjen et al. 1997). While conservation within both the L44L and GLA loci is primarily restricted to the regions flanking the first exons, conservation within the BTK locus is clustered throughout. These clusters include the region flanking the first exon, at the 3' end of the first intron, within the fourth and fifth introns, between the eighth and tenth exons, and between the thirteenth through sixteenth exons.

Transient transfection experiments including the conserved sequence regions upstream and downstream of the first exon have demonstrated the contribution of both of these regions to the cell lineage-specific expression pattern of BTK (Oeltjen et al. 1997). These data suggest the hypothesis that other conserved regions within the locus are also important in the regulation of BTK.


Pro-alpha1 type II collagen

The pip diagram of the human and mouse pro-alpha1 type II collagen genes produced with the default PipMaker settings reveals a high degree of duplication within the gene. In particular, note that a number of human exons, including exon 7, have multiple matches to mouse exons. However, if the Chaining option is selected on the Advanced PipMaker page, the resulting pip diagram shows matches that are limited to orthologous exons.


Beta-like globin gene cluster

The pip diagram of the human and chicken beta-globin genes produced with the default PipMaker settings reveals gene duplications. For instance, the second exons of the G-gamma and A-gamma genes clearly show at least three distinct matches. (Actually, there are four, corresponding to the four genes in the chicken cluster.) The pip diagram produced with the "Chaining" option shows no matches for the gene around position 20k (epsilon-globin) or the gene around 46k (the eta-globin pseudo-gene). Since there are only four globin genes in the chicken sequence, at most four human genes can be matched with chaining. However, the pip diagram produced with the "Single coverage" option shows all human regions having matches in the chicken sequence, with no duplicate matches.


The CD4 region

Long genomic regions containing the CD4 gene from human and mouse have been analyzed by Ansari-Lari et al. 1998. To measure the effectiveness of analyzing the human sequence using comparison to incomplete sequence from the mouse, we generated a variety of coverage levels using a procedure described by Bouck et al. 1998. Inspection of the pip for 1.8X coverage and the pip for 5X coverage shows that they are useful approximations to the final pip. According to Bouck et al., alignment of the human sequence with the fragments from 1.8X coverage identified 69% of the most highly conserved regions, whereas 89% were identified by fragments from 5X coverage. Remember that this sort of analysis requires using the "Search both strands" option.


The ERCC2 gene region

Comparison of the human and mouse sequences (Lamerdin et al. 96) yields a pip diagram showing little sequence conservation outside of the coding regions. Matches extend upstream of exon 1 for only about 200 bp, indicative of a very small regulatory region. One plausible explanation is that ERCC2 may be expressed at about the same level in all cells, given the ubiquitous need for excision-repair of the DNA. Thus it may be under relatively simple control, manifested in this analysis as a limited number of cis-regulatory sequences. The adjacent, oppositely transcribed KLC gene shows a series of short matching segments for about 1000 bp 5' to the cap site. In these cases where the matching sequences are mostly restricted to exons, and especially when the pattern of expression differs in some respects between human and rodent, examination of the homologous locus in a species more closely related to humans, such as a prosimian primate, could be informative. For instance, regulatory elements that are conserved in primates but divergent in some other mammalian order should be readily detectable.


The T-cell receptor alpha/delta region

Sequences from the diversity, joining and constant regions of the human and mouse alpha/delta T-cell receptor locus show an unusually high level of conservation, particularly around the alpha joining and constant segments. Another unusual phenomenon observable in the pip diagram is that sequence conservation frequently shows no apparent relationship to segments known to have a biological function. However, two areas of high conservation and known enhancement function are colored in the pip: the Conserved Sequence Block of Kuo et al. 1993 is colored green, and the alpha enhancer, reported by Winoto and Baltimore 1989, is colored red. Perhaps comparison of a more divergent pair of sequences, say from human and chicken, would show less background sequence conservation, and hence be more informative about the location of essential elements.


The C. elegans bli-4 gene region

A comparison of the bli-4 gene locus from two Caenorhabditis species, C. elegans and C. briggsae, reveals a number of discrepancies between the putative exons annotated in GenBank entry AF039719, which were determined by a gene-prediction program, and the gap-free regions of high similarity. In the pip diagram, several putative exons that show low conservation are identified by green stripes, while highly conserved regions not annotated as exons are marked in red. Thacker et al. 1999 have thoroughly investigated this gene and shown that sequence conservation between these two nematodes is a reliable indicator of unidentified exons.