Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER

Markus Wistrand; Erik L L Sonnhammer

doi:10.1186/1471-2105-6-99

Improved profile HMM performance by assessment of critical algorithmic features in SAM and HMMER

BMC Bioinformatics. 2005 Apr 15:6:99. doi: 10.1186/1471-2105-6-99.

Authors

Markus Wistrand¹, Erik L L Sonnhammer

Affiliation

¹ Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden. markus.wistrand@cgb.ki.se

Abstract

Background: Profile hidden Markov model (HMM) techniques are among the most powerful methods for protein homology detection. Yet, the critical features for successful modelling are not fully known. In the present work we approached this by using two of the most popular HMM packages: SAM and HMMER. The programs' abilities to build models and score sequences were compared on a SCOP/Pfam based test set. The comparison was done separately for local and global HMM scoring.

Results: Using default settings, SAM was overall more sensitive. SAM's model estimation was superior, while HMMER's model scoring was more accurate. Critical features for model building were then analysed by comparing the two packages' algorithmic choices and parameters. The weighting between prior probabilities and multiple alignment counts held the primary explanation why SAM's model building was superior. Our analysis suggests that HMMER gives too much weight to the sequence counts. SAM's emission prior probabilities were also shown to be more sensitive. The relative sequence weighting schemes are different in the two packages but performed equivalently.

Conclusion: SAM model estimation was more sensitive, while HMMER model scoring was more accurate. By combining the best algorithmic features from both packages the accuracy was substantially improved compared to their default performance.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Base Sequence
Computational Biology / methods*
Data Interpretation, Statistical
Evaluation Studies as Topic
Markov Chains
Models, Biological
Models, Chemical
Models, Genetic
Models, Molecular
Models, Statistical
Proteins / chemistry
Sensitivity and Specificity
Sequence Alignment
Sequence Analysis, Protein
Software*

Substances

Proteins