Endpoint definitions
Stroke |
Stroke |
Binary |
AD |
AD or family history of AD |
Binary |
RA |
Rheumatoid arthritis |
Binary |
T2D |
Type 2 Diabetes |
Binary |
Breast cancer |
Breast cancer |
Binary |
Prostate cancer |
Prostrate cancer |
Binary |
T1D |
Type 1 Diabetes |
Binary |
IBD |
Inflammatory Bowel Disease |
Binary |
eGFR |
estimated glomerular filtration rate |
Continuous |
Height |
Height |
Continuous |
Gout |
Gout |
Binary |
BMI |
Body Mass Index |
Continuous |
HDL |
high-density lipoprotein |
Continuous |
HbA1c |
Glycated hemoglobin |
Continuous |
Urate |
Urate |
Continuous |
Polygenic score development methods
DBSLMM |
Yang, S. & Zhou, X. Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. The American Journal of Human Genetics 106, 679–693 (2020). |
lassosum |
Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology 41, 469–480 (2017). |
LDpred2 |
Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: Better, faster, stronger. Bioinformatics 36, 5424–5431 (2020). |
MegaPRS |
Zhang, Q., Privé, F., Vilhjálmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat Commun 12, 4192 (2021). |
PRS-CS |
Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019). |
pt.clump |
See Privé, F, et al. Making the most of clumping and thresholding for polygenic scores. The American journal of human genetics 105.6 (2019) for background. |
sbayesr |
Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat Commun 10, 5086 (2019). |
UKBB.EnsPRS |
Monti, Remo, et al. Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning. The American Journal of Human Genetics 111.7 (2024). |
Genetic ancestry
We use two genetic ancestry groups in this work, which were inherited from the 1000 Genomes (1KG) project:
- EUR: People similar to 1KG-EUR (originally defined as a European superpopulation)
- SAS: People similar to 1KG-SAS (originally defined as a South Asian superpopulation)
These labels were assigned to individuals by measuring their genetic similarity to these groups, using the 1000 Genomes dataset as a reference panel. By performing analyses within ancestry-matched populations, we sought to reduce differences between biobanks that arise mainly from different ancestry composition, and highlight potential performance differences between genetic ancestries. The vast majority of study participants in the UK Biobank, FinnGen, HUNT and Estonia Biobank matched to the 1KG-EUR ancestry group. The only other replicated ancesetry group was 1KG-SAS, matched by participants in the UK Biobank and Genes & Health. The use of continental labels as group labels is a limitation of this work.
Metrics
AUROC |
Area Under Receiver Operating CharacteristicThis metric cares only about relative ordering of observations, and is available only for binary traits. |
β |
Standardized regression coefficients. For continuous traits, this is the change in the trait (in standard deviations) per standard deviation of the PGS. For binary traits, this is the change in the log-odds per standard deviation of the PGS. This is the metric that was used for meta-analyses. |
Odds Ratio (OR) |
This is the change in the odds ratio per standard deviation of the PGS (exp(β)) |
R² |
This is the variance explained by the PGS on the observed scale for continuous traits, or on the liability scale for continuous traits |