Leveraging base-pair mammalian constraint to understand genetic variation and human disease
Sullivan PF., Meadows JRS., Gazal S., Phan BN., Li X., Genereux DP., Dong MX., Bianchi M., Andrews G., Sakthikumar S., Nordin J., Roy A., Christmas MJ., Marinescu VD., Wang C., Wallerman O., Xue J., Yao S., Sun Q., Szatkiewicz J., Wen J., Huckins LM., Lawler A., Keough KC., Zheng Z., Zeng J., Wray NR., Li Y., Johnson J., Chen J., Paten B., Reilly SK., Hughes GM., Weng Z., Pollard KS., Pfenning AR., Forsberg-Nilsson K., Karlsson EK., Lindblad-Toh K., Andrews G., Armstrong JC., Bianchi M., Birren BW., Bredemeyer KR., Breit AM., Christmas MJ., Clawson H., Damas J., Di Palma F., Diekhans M., Dong MX., Eizirik E., Fan K., Fanter C., Foley NM., Forsberg-Nilsson K., Garcia CJ., Gatesy J., Gazal S., Genereux DP., Goodman L., Grimshaw J., Halsey MK., Harris AJ., Hickey G., Hiller M., Hindle AG., Hubley RM., Hughes GM., Johnson J., Juan D., Kaplow IM., Karlsson EK., Keough KC., Kirilenko B., Koepfli K-P., Korstian JM., Kowalczyk A., Kozyrev SV., Lawler AJ., Lawless C., Lehmann T., Levesque DL., Lewin HA., Li X., Lind A., Lindblad-Toh K., Mackay-Smith A., Marinescu VD., Marques-Bonet T., Mason VC., Meadows JRS., Meyer WK., Moore JE., Moreira LR., Moreno-Santillan DD., Morrill KM., Muntané G., Murphy WJ., Navarro A., Nweeia M., Ortmann S., Osmanski A., Paten B., Paulat NS., Pfenning AR., Phan BN., Pollard KS., Pratt HE., Ray DA., Reilly SK., Rosen JR., Ruf I., Ryan L., Ryder OA., Sabeti PC., Schäffer DE., Serres A., Shapiro B., Smit AFA., Springer M., Srinivasan C., Steiner C., Storer JM., Sullivan KAM., Sullivan PF., Sundström E., Supple MA., Swofford R., Talbot J-E., Teeling E., Turner-Maier J., Valenzuela A., Wagner F., Wallerman O., Wang C., Wang J., Weng Z., Wilder AP., Wirthlin ME., Xue JR., Zhang X.
Thousands of genomic regions have been associated with heritable human diseases, but attempts to elucidate biological mechanisms are impeded by an inability to discern which genomic positions are functionally important. Evolutionary constraint is a powerful predictor of function, agnostic to cell type or disease mechanism. Single-base phyloP scores from 240 mammals identified 3.3% of the human genome as significantly constrained and likely functional. We compared phyloP scores to genome annotation, association studies, copy-number variation, clinical genetics findings, and cancer data. Constrained positions are enriched for variants that explain common disease heritability more than other functional annotations. Our results improve variant annotation but also highlight that the regulatory landscape of the human genome still needs to be further explored and linked to disease.