Glossary

Modified

December 17, 2024

Doi

10.1016/j.ajhg.2024.06.003

Endpoint definitions

Target trait	GWAS trait	Binary or continuous?
Stroke	Stroke	Binary
AD	AD or family history of AD	Binary
RA	Rheumatoid arthritis	Binary
T2D	Type 2 Diabetes	Binary
Breast cancer	Breast cancer	Binary
Prostate cancer	Prostrate cancer	Binary
T1D	Type 1 Diabetes	Binary
IBD	Inflammatory Bowel Disease	Binary
eGFR	estimated glomerular filtration rate	Continuous
Height	Height	Continuous
Gout	Gout	Binary
BMI	Body Mass Index	Continuous
HDL	high-density lipoprotein	Continuous
HbA1c	Glycated hemoglobin	Continuous
Urate	Urate	Continuous

Polygenic score development methods

Method	Reference
DBSLMM	Yang, S. & Zhou, X. Accurate and Scalable Construction of Polygenic Scores in Large Biobank Data Sets. The American Journal of Human Genetics 106, 679–693 (2020).
lassosum	Mak, T. S. H., Porsch, R. M., Choi, S. W., Zhou, X. & Sham, P. C. Polygenic scores via penalized regression on summary statistics. Genetic Epidemiology 41, 469–480 (2017).
LDpred2	Privé, F., Arbel, J. & Vilhjálmsson, B. J. LDpred2: Better, faster, stronger. Bioinformatics 36, 5424–5431 (2020).
MegaPRS	Zhang, Q., Privé, F., Vilhjálmsson, B. & Speed, D. Improved genetic prediction of complex traits from individual-level data or summary statistics. Nat Commun 12, 4192 (2021).
PRS-CS	Ge, T., Chen, C.-Y., Ni, Y., Feng, Y.-C. A. & Smoller, J. W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat Commun 10, 1776 (2019).
pt.clump	See Privé, F, et al. Making the most of clumping and thresholding for polygenic scores. The American journal of human genetics 105.6 (2019) for background.
sbayesr	Lloyd-Jones, L. R. et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat Commun 10, 5086 (2019).
UKBB.EnsPRS	Monti, Remo, et al. Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning. The American Journal of Human Genetics 111.7 (2024).

Genetic ancestry

We use two genetic ancestry groups in this work, which were inherited from the 1000 Genomes (1KG) project:

EUR: People similar to 1KG-EUR (originally defined as a European superpopulation)
SAS: People similar to 1KG-SAS (originally defined as a South Asian superpopulation)

These labels were assigned to individuals by measuring their genetic similarity to these groups, using the 1000 Genomes dataset as a reference panel. By performing analyses within ancestry-matched populations, we sought to reduce differences between biobanks that arise mainly from different ancestry composition, and highlight potential performance differences between genetic ancestries. The vast majority of study participants in the UK Biobank, FinnGen, HUNT and Estonia Biobank matched to the 1KG-EUR ancestry group. The only other replicated ancesetry group was 1KG-SAS, matched by participants in the UK Biobank and Genes & Health. The use of continental labels as group labels is a limitation of this work.

Metrics

Metric	Explanation
AUROC	Area Under Receiver Operating CharacteristicThis metric cares only about relative ordering of observations, and is available only for binary traits.
β	Standardized regression coefficients. For continuous traits, this is the change in the trait (in standard deviations) per standard deviation of the PGS. For binary traits, this is the change in the log-odds per standard deviation of the PGS. This is the metric that was used for meta-analyses.
Odds Ratio (OR)	This is the change in the odds ratio per standard deviation of the PGS (exp(β))
R²	This is the variance explained by the PGS on the observed scale for continuous traits, or on the liability scale for continuous traits