A large-scale evaluation of computational protein function prediction.
Radivojac P., Clark WT., Oron TR., Schnoes AM., Wittkop T., Sokolov A., Graim K., Funk C., Verspoor K., Ben-Hur A., Pandey G., Yunes JM., Talwalkar AS., Repo S., Souza ML., Piovesan D., Casadio R., Wang Z., Cheng J., Fang H., Gough J., Koskinen P., Törönen P., Nokso-Koivisto J., Holm L., Cozzetto D., Buchan DWA., Bryson K., Jones DT., Limaye B., Inamdar H., Datta A., Manjari SK., Joshi R., Chitale M., Kihara D., Lisewski AM., Erdin S., Venner E., Lichtarge O., Rentzsch R., Yang H., Romero AE., Bhat P., Paccanaro A., Hamp T., Kaßner R., Seemayer S., Vicedo E., Schaefer C., Achten D., Auer F., Boehm A., Braun T., Hecht M., Heron M., Hönigschmid P., Hopf TA., Kaufmann S., Kiening M., Krompass D., Landerer C., Mahlich Y., Roos M., Björne J., Salakoski T., Wong A., Shatkay H., Gatzmann F., Sommer I., Wass MN., Sternberg MJE., Škunca N., Supek F., Bošnjak M., Panov P., Džeroski S., Šmuc T., Kourmpetis YAI., van Dijk ADJ., ter Braak CJF., Zhou Y., Gong Q., Dong X., Tian W., Falda M., Fontana P., Lavezzo E., Di Camillo B., Toppo S., Lan L., Djuric N., Guo Y., Vucetic S., Bairoch A., Linial M., Babbitt PC., Brenner SE., Orengo C., Rost B., Mooney SD., Friedberg I.
Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.