NAT2PRED: a web-server for inferring the acetylator phenotype of the human
N-acetyltransferase-2 (NAT2) from SNPs observed in the NAT2 gene

Main page: http://nat2pred.rit.albany.edu

Summary

N-acetyltransferase-2 (NAT2) is an enzyme that catalyzes the acetylation of aromatic and heterocyclic amine carcinogens. Because of its involvement in the detoxification of carcinogens, mutations within NAT2 that affect the enzymatic acetylator activity may also modify risk of cancer development. It was shown that individuals in human populations are divided into three enzymatic acetylator phenotypes: slow, rapid, and intermediate. A number of single nucleotide polymorphisms (SNPs) within the NAT2 gene have been found to affect the NAT2 acetylator phenotype. This web-server implements Support Vector Machine (SVM), a supervised pattern recognition method, to infer the NAT2 acetylator phenotype from six SNPs found in the NAT2 gene in positions 282, 341, 481, 590, 803, and 857. Given a combination of these SNPS observed in a particular individual (i.e., his/her genotype), the web-server assigns one of the three NAT2 phenotypes, slow, intermediate, or rapid, to this individual. The web-server can be used for a fast determination of the NAT2 acetylator phenotype in genetic screens. NAT2PRED was developed on a dataset where majority of subjects are Caucasian (94%, see Table 1). However, the model utilizes a generally observed linkage disequilibrium between the six NAT2 SNPs and performs very well on other ethnic groups (for the results of an independent critical evaluation of NAT2PRED conducted using a worldwide dataset composed of 56 populations please refer to Sabbagh et al, 2009, BMC Medical Genetics).

Dataset

The dataset used in this work to develop the SVM predictor was obtained from the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer screening trial conducted at the National Cancer Institute. Complete data on the NAT2 acetylator phenotype and six NAT2 SNPs (C282T, T341C, C481T, G590A, A803G, and G857A) were available for 1,377 subjects (Table 1). The dataset contains 790 subjects with the slow phenotype, 503 subjects with the intermediate phenotype, and 84 subjects with the rapid phenotype.

Table 1. The dataset used in the study.

NAT2 SNPs

Number of cases

Location

Amino acid substitution

C282T*
CC
CT
TT

 

642
577
158

Exon 2

No change

T341C*
TT
TC
CC

 

464
631
282

Exon 2

I114T

C481T*
CC
CT
TT

 

489
634
254

Exon 2

No change

G590A*
GG
GA
AA

 

713
531
133

Exon 2

R197Q

A803G*
AA
AG
GG

 

485
636
256

Exon 2

K268R

G857A*
GG
GA
AA

 

1288
87
2

Exon 2

G286E

Each row name shows the NAT2 gene position and SNP observed at this position. *Minor allele. dbSNP IDs: C282T: rs1041983; T341C: rs1801280; C481T: rs1799929; G590A: rs1799930; A803G: rs1208; G857A: rs1799931; Ethnic makeup of the dataset: subjects were recruited from participating centers in the U.S; 94% are Caucasian; 3% are African-American; 3.0% are American Indian/Alaskan Native, Pacific Islander, Asian, or Hispanic.

Performance evaluation

We used a 7-fold cross-validation to test the SVM predictor of the acetylator phenotype. In this approach, the dataset is randomly partitioned into 7 groups, each containing 1/7 of the dataset. At each cross-validation run, one group is removed and the predictor is trained on the remaining observations and tested on the removed group. The process is repeated 7 times, so that each group is used for testing once. In order to assess different aspects of classification quality, we used the following performance measures: overall accuracy (ACC), sensitivity (SN) for phenotype i and specificity (SP) for phenotype i (Baldi et al, 2000). The results of the cross-validation are shown in Table 2.

Table 2. The performance of the predictor of the NAT2 acetylator phenotype.

NAT2 phenotype

Sensitivity (SN)

Specificity (SP)

Rapid
(84 cases)

99.6%

100%

Intermediate
(503 cases)

100%

99.7%

Slow
(790 cases)

100%

100%

Input page

The user is asked to select the genotype for each of the six SNPs. There are three possible variants for each SNP position, which corresponds to three radio buttons per position. For instance, if an individual has heterozygous genotype CT in position 341, then this genotype will correspond to the second radio button as shown in Figure 1 below:

Figure 1. The input page of NAT2PRED.

Output page

After clicking 'submit' button, the user receives the results of the prediction. The output page displays the selected genotype and the probabilities of each of the three acetylator phenotypes (slow, intermediate, and rapid) for this genotype. The probabilities are in range [0.0 to 1.0]. The final prediction is the phenotype with the highest probability. The higher the probability, the greater the confidence of the predicted phenotype. If the probability of the predicted phenotype is similar to the second largest probability, the prediction is ambiguous. See Figure 2 below for details on the output format.

Figure 2. An example of the NAT2PRED output page.

 

Batch submission of multiple genotypes

Instead of using radio buttons to enter genotype for one individual at a time, the user may submit multiple genotypes in a properly formatted ASCII text file and receive predictions for the submitted genotypes by e-mail. Usually, the results will be e-mailed within 1-2 minutes after submission. A genotype in the input file must be specified using six comma-delimited integers 1, 2, or 3 that correspond to the selection of one of the three radio buttons (1st, 2nd, or 3rd) for each of the six SNPs. First number corresponds to selection for position 282, second - for position 341, third - for position 481, fourth - for position 590, fifth - for position 803, sixth - for position 857. For instance, a line 1,2,1,1,1,1 corresponds to the selection shown in Figure 1: first radio button (CC) for position 282, second radio button (CT) for position 341 , first radio button (CC) for position 481, first radio button (GG) for position 590, first radio button (AA) for position 803, and first radio button (GG) for position 857. The input file must contain one genotype per line as shown below:

1,2,3,1,1,1
1,2,2,1,1,2
2,2,2,2,2,2
1,2,3,1,1,1

The output for a batch submission is returned by e-mail as a tab-delimited ASCII text file. See Figure 3 below for details on the format of the output file.

Figure 3. An example of the output file for batch submission.


Citation

If you use this web-server, please cite the following article:

I.B.Kuznetsov, M.McDuffie, R.Moslehi, 2009, A web-server for inferring the human N-acetyltransferase-2 (NAT2)
enzymatic phenotype from NAT2 genotype. Bioinformatics, 25(9):1185-1186

Please address your questions and comments to Igor Kuznetsov