xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation

xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation
Title:
xHMMER3x2: Utilizing HMMER3’s speed and HMMER2’s sensitivity and specificity in the glocal alignment mode for improved large-scale protein domain annotation
Other Titles:
Biology Direct
Keywords:
Publication Date:
29 November 2016
Citation:
Yap et al. Biology Direct (2016) 11:63
Abstract:
While the local-mode HMMER3 is notable for its massive speed improvement, the slower glocal-mode HMMER2 is more exact for domain annotation by enforcing full domain-to-sequence alignments. Since a unit of domain necessarily implies a unit of function, local-mode HMMER3 alone remains insufficient for precise function annotation tasks. In addition, the incomparable E-values for the same domain model by different HMMER builds create difficulty when checking for domain annotation consistency on a large-scale basis. RESULTS: In this work, both the speed of HMMER3 and glocal-mode alignment of HMMER2 are combined within the xHMMER3x2 framework for tackling the large-scale domain annotation task. Briefly, HMMER3 is utilized for initial domain detection so that HMMER2 can subsequently perform the glocal-mode, sequence-to-full-domain alignments for the detected HMMER3 hits. An E-value calibration procedure is required to ensure that the search space by HMMER2 is sufficiently replicated by HMMER3. We find that the latter is straightforwardly possible for ~80% of the models in the Pfam domain library (release 29). However in the case of the remaining ~20% of HMMER3 domain models, the respective HMMER2 counterparts are more sensitive. Thus, HMMER3 searches alone are insufficient to ensure sensitivity and a HMMER2-based search needs to be initiated. When tested on the set of UniProt human sequences, xHMMER3x2 can be configured to be between 7× and 201× faster than HMMER2, but with descending domain detection sensitivity from 99.8 to 95.7% with respect to HMMER2 alone; HMMER3's sensitivity was 95.7%. At extremes, xHMMER3x2 is either the slow glocal-mode HMMER2 or the fast HMMER3 with glocal-mode. Finally, the E-values to false-positive rates (FPR) mapping by xHMMER3x2 allows E-values of different model builds to be compared, so that any annotation discrepancies in a large-scale annotation exercise can be flagged for further examination by dissectHMMER. CONCLUSION: The xHMMER3x2 workflow allows large-scale domain annotation speed to be drastically improved over HMMER2 without compromising for domain-detection with regard to sensitivity and sequence-to-domain alignment incompleteness. The xHMMER3x2 code and its webserver (for Pfam release 27, 28 and 29) are freely available at http://xhmmer3x2.bii.a-star.edu.sg/ .
License type:
http://creativecommons.org/licenses/by/4.0/
Funding Info:
Supported by A*STAR Singapore, the grants “A*STAR-NKTH JGC 007”, “SERC IMaGIN”, and “IAF CAT3 Integrated Genomics Platform”.
Description:
ISSN:
1745-6150
Files uploaded:

File Size Format Action
825-document3.pdf 2.60 MB PDF Open