Full Title of Thesis or Dissertation
College/Department Conferring Degree
protein, secondary structure prediction, discrete wavelet transform, hydrophobicity
This project develops a secondary structure prediction approach that uses the discrete wavelet transform. In order to use the wavelet technique, we convert the primary amino acid sequence of the protein to a numerical signal using the hydrophobic tendencies associated with the amino acids. The data used in this project consists of both a + B and a/B proteins coming from the Structural Classification of Proteins (SCOP) protein database. This data provides both protein primary sequences and secondary structure locations. In total, 13,435 individual proteins and nearly 15,511 unique protein subunits are analyzed. We use three different experimentally determined hydrophobicity scales for comparison. A control data set is formed by creating 200 realizations of each protein, each realization being a random permutation of the proteins amino acid sequence. The realizations are subjected to the same analysis as the parent protein. Our analysis involves examinining the correlation between locations of significant hydrophobicity fluctuations and secondary structure, where significance is determinded by comparison to the control data set. Our focus is on using the first and second scales of the wavelet detail but we also construct a scale-scale measure that combines these scales to detect secondary structure. Using standard performance measures, like the Matthews correlation coefficient (MCC) and the accuracy(Q), we find that our method does show promise at being a useful tool for predicting the locations of secondary structures in protein given just the amino acid sequence.
Vanderleest, Timothy E., "Analysis of protein secondary structure via the discrete wavelet transform" (2011). College of Liberal Arts & Social Sciences Theses and Dissertations. 95.