Glossary

Modified

December 17, 2024

Doi

Endpoint definitions

Target trait GWAS trait Binary or continuous?
Stroke Stroke Binary
AD AD or family history of AD Binary
RA Rheumatoid arthritis Binary
T2D Type 2 Diabetes Binary
Breast cancer Breast cancer Binary
Prostate cancer Prostrate cancer Binary
T1D Type 1 Diabetes Binary
IBD Inflammatory Bowel Disease Binary
eGFR estimated glomerular filtration rate Continuous
Height Height Continuous
Gout Gout Binary
BMI Body Mass Index Continuous
HDL high-density lipoprotein Continuous
HbA1c Glycated hemoglobin Continuous
Urate Urate Continuous

Polygenic score development methods

Method Reference
DBSLMM Yang, S. & Zhou, X. Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. The American Journal of Human Genetics 106, 679–693 (2020).
lassosum Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology 41, 469–480 (2017).
LDpred2 Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: Better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).
MegaPRS Zhang, Q., Privé, F., Vilhjálmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat Commun 12, 4192 (2021).
PRS-CS Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019).
pt.clump See Privé, F, et al. Making the most of clumping and thresholding for polygenic scores. The American journal of human genetics 105.6 (2019) for background.
sbayesr Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat Commun 10, 5086 (2019).
UKBB.EnsPRS Monti, Remo, et al. Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning. The American Journal of Human Genetics 111.7 (2024).

Genetic ancestry

We use two genetic ancestry groups in this work, which were inherited from the 1000 Genomes (1KG) project:

  • EUR: People similar to 1KG-EUR (originally defined as a European superpopulation)
  • SAS: People similar to 1KG-SAS (originally defined as a South Asian superpopulation)

These labels were assigned to individuals by measuring their genetic similarity to these groups, using the 1000 Genomes dataset as a reference panel. By performing analyses within ancestry-matched populations, we sought to reduce differences between biobanks that arise mainly from different ancestry composition, and highlight potential performance differences between genetic ancestries. The vast majority of study participants in the UK Biobank, FinnGen, HUNT and Estonia Biobank matched to the 1KG-EUR ancestry group. The only other replicated ancesetry group was 1KG-SAS, matched by participants in the UK Biobank and Genes & Health. The use of continental labels as group labels is a limitation of this work.

Metrics

Metric Explanation
AUROC Area Under Receiver Operating CharacteristicThis metric cares only about relative ordering of observations, and is available only for binary traits.
β Standardized regression coefficients. For continuous traits, this is the change in the trait (in standard deviations) per standard deviation of the PGS. For binary traits, this is the change in the log-odds per standard deviation of the PGS. This is the metric that was used for meta-analyses.
Odds Ratio (OR) This is the change in the odds ratio per standard deviation of the PGS (exp(β))
This is the variance explained by the PGS on the observed scale for continuous traits, or on the liability scale for continuous traits