E. g, for the hydrophobicity selleck bio attribute, class 1 comprises polar amino acids, class 2 neutral amino acids, and class 3 hydrophobic amino acids. The composition descriptors then represent the overall percentage of each class in the sequence. Since there are seven attributes and three classes, 7 3 21 composition descriptions can be computed. The transition descriptors represent frequencies with which an attribute changes class along the sequence, e. g, a class 1 amino acid is fol lowed by a class 2 amino acid or vice versa. Since there are three Inhibitors,Modulators,Libraries possible transitions between classes, 7 3 21 transition descriptors can be computed. The distribution descriptors represent the distribution of each attribute in the sequence.
For each attribute and for each class, five distribution descriptors are computed based on the fol lowing criteria location of the first residue, Inhibitors,Modulators,Libraries 25% residues, 50% residues, 75% residues and 100% residues with a given property. For instance, if the total length of a sequence is N amino acids, and all polar amino acids are among the first i residues of the sequence, then the distribution descriptor for 100% residues of the given class would be calculated as i N. Thus, the total number of distribution descriptors is 5 7 3 121. CTD descriptors were computed by using PROFEAT web server. Sequence order and pseudo amino acid descriptors The sequence order and pseudo amino acid descriptors were proposed by Chou and are used most suc cessfully to predict protein subcellular location. We here used the PROFEAT web server to calculate 60 sequence order coupling numbers, 100 quasi sequence order descriptors, and 50 pseudo amino acid descriptors.
The sequence order coupling numbers are derived from the physico chemical distance matrix between pairs of amino acids. The coupling number of rank d is defined as the sum of squared physico chemical distances between all amino acids Inhibitors,Modulators,Libraries being located d residues from each Inhibitors,Modulators,Libraries other. This is mathematically described by the equation Amino acid and dipeptide composition Amino acid composition descriptors represent the frac tions for each of the twenty natural amino acids in a pro tein sequence, while dipeptide composition descriptors represent the fractions of 20 20 400 possible dipep tides in the sequence. Despite its simplicity the method has been applied successfully, e.
g, for classifica tion of G protein coupled receptors, nuclear receptors, predictions of protein fold and predicting the subcellular localization of proteins. Amino where Inhibitors,Modulators,Libraries di,i d is the physicochemical distance between the two amino acids at position i and i d, and N is the total length of the sequence. PROFEAT allows computing these descriptors starting from rank d 1 up to d 30 and using sellckchem two different distance matrices. Quasi sequence order descriptors are thereafter com puted from coupling numbers and from protein amino acid composition.