College of Liberal Arts & Social Sciences Theses and Dissertations

Graduation Date

8-2011

Document Type

Thesis

Department/Program Conferring Degree

Physics

Keywords

protein, secondary structure prediction, discrete wavelet transform, hydrophobicity

Abstract

This project develops a secondary structure prediction approach that uses the discrete wavelet transform. In order to use the wavelet technique, we convert the primary amino acid sequence of the protein to a numerical signal using the hydrophobic tendencies associated with the amino acids. The data used in this project consists of both a + B and a/B proteins coming from the Structural Classification of Proteins (SCOP) protein database. This data provides both protein primary sequences and secondary structure locations. In total, 13,435 individual proteins and nearly 15,511 unique protein subunits are analyzed. We use three different experimentally determined hydrophobicity scales for comparison. A control data set is formed by creating 200 realizations of each protein, each realization being a random permutation of the proteins amino acid sequence. The realizations are subjected to the same analysis as the parent protein. Our analysis involves examinining the correlation between locations of significant hydrophobicity fluctuations and secondary structure, where significance is determinded by comparison to the control data set. Our focus is on using the first and second scales of the wavelet detail but we also construct a scale-scale measure that combines these scales to detect secondary structure. Using standard performance measures, like the Matthews correlation coefficient (MCC) and the accuracy(Q), we find that our method does show promise at being a useful tool for predicting the locations of secondary structures in protein given just the amino acid sequence.

Share

COinS