Share Email Print

Proceedings Paper

Protein secondary structure prediction using support vector machine with advanced encoding schemes
Author(s): Hae-Jin Hu; Yi Pan; Robert Harrison; Phang C. Tai
Format Member Price Non-Member Price
PDF $14.40 $18.00
cover GOOD NEWS! Your organization subscribes to the SPIE Digital Library. You may be able to download this paper for free. Check Access

Paper Abstract

Over the decades, many studies have been done for the prediction of the protein structure. Since the protein secondary structure is closely related to the protein tertiary structure, many approaches begin with the prediction of secondary structure and apply the results to predict the tertiary structure. The recent trend of secondary structure prediction studies is mostly based on the neural network or the support vector machine (SVM). In this study, SVM is used as a machine learning tool for the prediction of secondary structure and several new encoding schemes, including orthogonal matrix, hydrophobicity matrix, BLOSUM62 substitution matrix and combined matrix of these, are developed and optimized to improve the prediction accuracy. Based on the best encoding scheme, each protein sequence is expressed as consecutive sliding windows and each amino acid inside a window is represented with 20 different matrix values. Once the optimal window length for six SVM binary classifiers is chosen to be 13 through many experiments, the new encoding scheme is tested based on this optimal window size with the 7-fold cross validation tests. The results show 2% increase in the accuracy of the binary classifiers when compared with the instances in which the classical orthogonal matrix is used. For the training and testing of the SVM binary classifiers, RS126 data sets is used since this is the common set adopted by the previous research groups. Finally, to combine the results of the six SVM binary classifiers, several existing tertiary classifiers are applied and the efficiency of each tertiary classifier is compared.

Paper Details

Date Published: 12 April 2004
PDF: 8 pages
Proc. SPIE 5433, Data Mining and Knowledge Discovery: Theory, Tools, and Technology VI, (12 April 2004); doi: 10.1117/12.542174
Show Author Affiliations
Hae-Jin Hu, Georgia State Univ. (United States)
Yi Pan, Georgia State Univ. (United States)
Robert Harrison, Georgia State Univ. (United States)
Phang C. Tai, Georgia State Univ. (United States)

Published in SPIE Proceedings Vol. 5433:
Data Mining and Knowledge Discovery: Theory, Tools, and Technology VI
Belur V. Dasarathy, Editor(s)

© SPIE. Terms of Use
Back to Top