Validating the Framingham Hypertension Risk Score
Results From the Whitehall II Study
A promising hypertension risk prediction score using data from the US Framingham Offspring Study has been developed, but this score has not been tested in other cohorts. We examined the predictive performance of the Framingham hypertension risk score in a European population, the Whitehall II Study. Participants were 6704 London-based civil servants aged 35 to 68 years, 31% women, free from prevalent hypertension, diabetes mellitus, and coronary heart disease. Standard clinical examinations of blood pressure, weight and height, current cigarette smoking, and parental history of hypertension were undertaken every 5 years for a total of 4 times. We recorded a total of 2043 incident (new-onset) cases of hypertension in three 5-year baseline follow-up data cycles. Both discrimination (C statistic: 0.80) and calibration (Hosmer-Lemeshow χ2: 11.5) of the Framingham hypertension risk score were good. Agreement between the predicted and observed hypertension incidences was excellent across the risk score distribution. The overall predicted:observed ratio was 1.08, slightly better among individuals >50 years of age (0.99 in men and 1.02 in women) than in younger participants (1.16 in men and 1.18 in women). Reclassification with a modified score on the basis of our study population did not improve the prediction (net reclassification improvement: −0.5%; 95% CI: −2.5% to 1.5%). These data suggest that the Framingham hypertension risk score provides a valid tool with which to estimate near-term risk of developing hypertension.
Hypertension, defined as systolic/diastolic blood pressure of ≥140/90 mm Hg,1 is a risk factor for coronary heart disease, chronic heart failure, stroke,1–7 chronic kidney disease,8 premature mortality,1–3 and possibly also for dementia, in particular, poststroke dementia.9,10 There is evidence to show that targeting high-risk but nonhypertensive individuals for treatment may delay hypertension onset.11,12 However, simple office-based tools to help clinicians identify high-risk people are lacking.
Recently, a promising risk prediction score using data from the US Framingham Offspring Study has been developed.13 First, it is simple, including only 7 items: age, sex, systolic and diastolic blood pressures, body mass index, parental hypertension, and cigarette smoking (calculator available at www.annals.org). Second, the risk score was highly successful in estimating an individual’s risk for hypertension for ≤4 years among participants in the Framingham study. These findings clearly warrant further testing beyond the cohort in which the risk score was developed. In this study, we examine the Framingham hypertension risk score in a large European population, the British Whitehall II Study.
Population and Study Design
The Whitehall II Study is a prospective occupational cohort study.14 The target population was all London-based office staff, aged 35 to 55 years, working in 20 civil service departments on recruitment to the study in 1985–1988 (phase 1). With a response of 73%, the cohort consisted of 10 308 employees (6895 men and 3413 women). Since the phase 1 medical examination, follow-up examinations have taken place approximately every 5 years (phase 3, 1991–1993, n=8104; phase 5, 1997–1999, n=6551; and phase 7, 2003–2004, n=6483).
The present analysis was based on 3 baseline follow-up screening cycles (Table 1). Participants were included if they attended 2 consecutive screenings between phases 1 and 7. At the baseline for each of the 3 screening cycles, we successively excluded those participants who had prevalent hypertension (n=1472, 1196, and 1574 at phases 1, 3, and 5, respectively), prevalent cardiovascular disease (n=38, 86, and 155), prevalent diabetes mellitus (n=48, 34, and 66), or missing data on risk factors (n=491, 377, and 789). The baseline population at phase 1 included 4620 men and 2084 women.
Ethical approval for the Whitehall II Study was obtained from the University College London Medical School Committee on the Ethics of Human Research. All of the participants provided written informed consent.
Assessment of Risk Factors and Prevalent Disease
We measured systolic blood pressure and diastolic blood pressure twice in the sitting position after 5 minutes of rest with the Hawksley random-0 sphygmomanometer (phases 1 to 5) and OMRON HEM 907 (phase 7). The average of the 2 readings was taken to be the measured systolic and diastolic blood pressures. Prehypertension was defined as systolic blood pressure from 120 to 139 mm Hg or diastolic blood pressure from 80 to 89 mm Hg. Current smoking and parental hypertension were self-reported. Weight was measured in underwear to the nearest 0.1 kg on Soehnle electronic scales. Height was measured in bare feet to the nearest 1 mm using a stadiometer with the participant standing erect with head in the Frankfort plane. Body mass index was calculated as weight (kilograms)/height (meters) squared.
Prevalent coronary heart disease was defined by meeting Multinational Monitoring of Trends and Determinants in Cardiovascular Disease (MONICA) project criteria,15 positive responses to questions about chest pain16 and physician diagnoses, evidence from medical charts, or positive ECG findings. Diabetes mellitus was defined as a fasting glucose ≥7.0 mmol/L, a 2-hour postload glucose ≥11.1 mmol/L (75-g oral glucose tolerance test), or reported doctor-diagnosed diabetes mellitus or use of diabetes medication.17
Assessment of Incident Hypertension
Hypertension was defined according to the seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure (systolic/diastolic >140/90 mm Hg or use of antihypertensive medication).1 In each of the 3 screening cycles, we determined incident hypertension by the presence of hypertension at follow-up among participants free of this condition at baseline (Table 1).
Participants were followed across the screening cycles until incident hypertension or last study phase, whichever came first, contributing to a total of 13 679 person-examinations. The association between prehypertension at baseline and subsequent incident hypertension was summarized using odds ratios and 95% CIs, which were computed using standard methods. We examined the validity of the Framingham risk score in 4 steps. First, we examined whether the prediction of incident hypertension on the basis of prehypertension status improved after reclassification on the basis of high Framingham risk score (corresponding with >20% predicted risk between successive screening cycles) using the net reclassification improvement.18 We repeated this analysis with cutoffs >10% and >15% predicted risk to examine whether the findings were sensitive to the threshold used to define high risk using the Framingham risk score.
Second, we randomly split the person-examination observations into 2 groups, 60% for a “derivation” data set and 40% for a “validation” data set. We developed a comparison risk prediction score on the basis of the derivation data using the same variables and statistical procedures as those used for the development of the Framingham hypertension risk score.13 We identified significant predictors and interaction terms for incident hypertension in multivariable-adjusted Weibull regression models for interval-censored data.
Third, we calculated a risk prediction score (the “Whitehall II risk score”) for the validation data set from the β-coefficients obtained from the derivation cohort. We calculated the Framingham risk score using the β-coefficients derived in the Framingham study.13 The variable parental hypertension included 2 categories (yes versus no) in the present study but 3 categories (neither parent, 1 parent, or both parents) in the Framingham study. To produce a comparable β-coefficient that could be applied to the Whitehall data, we used the parental hypertension distribution presented in the Framingham article to give a weighted average to the estimates from the per-category increment coefficient. This resulted in a coefficient that shows the effect for the presence versus absence of parental hypertension. Both of the above scores were computed using the observed follow-up time for each participant within the follow-up cycle so that we could compare this predicted risk with the observed incident hypertension. With study examinations occurring every 5 years, results were expressed per 5 years.
Fourth, we assessed the performance of the risk prediction for both the Framingham and Whitehall II risk scores in the validation cohort. We compared the predicted hypertension incidence with the observed incidence for each decile category of both risk scores. We calculated the overall predicted:observed risk ratios for the whole validation cohort and separately by sex, age, risk factor groups, and study cycle. We assessed discrimination on the basis of C statistics and calibration by using the modified Hosmer-Lemeshow χ2 statistics, again following the same procedures as the Framingham study.13 Finally, we estimated the net reclassification improvement18 to examine whether prediction on the basis of the Framingham risk score categories (corresponding with <5%, 5% to 20%, and >20% predicted risk between successive screening cycles) was significantly improved after reclassification on the basis of the Whitehall II score. All of the analyses were run with SAS version 9.2.
Table 2 presents characteristics of the 6704 participants. Their mean age at baseline was 44.6 years, and 31% were women. Mean blood pressure was 118.9/ 74.6 mm Hg, and 3646 (54.4%) were prehypertensive at baseline. Clinical features for the derivation and validation subcohorts were determined on the basis of the 3 baseline examinations. As expected, the cohorts were very similar.
Prehypertension Risk Category Versus Framingham Hypertension Risk Score
During the baseline follow-up cycles (median length: 5.6 years), we recorded a total of 2043 incident cases of hypertension (5-year hypertension incidence was 13.6 per 100). Of these, 1690 person-examinations were associated with baseline prehypertension and 353 with baseline normotension. For nonhypertension cases at follow-up, the corresponding figures were 5312 and 6324, respectively, giving an odds ratio of incident hypertension for those with baseline prehypertension compared with those with normotension of 5.70 (95% CI: 5.04 to 6.44).
High Framingham score (>20% predicted risk) was a significantly better predictor of incident hypertension than prehypertension. Among those not developing hypertension, the net percentage of individuals correctly reclassified (ie, correct reclassifications−incorrect reclassifications) using the Framingham score compared with the prehypertension risk category was 24.6%. Among those with incident hypertension, the net percentage was −18.0%. The overall net reclassification improvement from defining high risk on the basis of the Framingham score rather than prehypertension was, therefore, 6.6% (95% CI: 3.2% to 10.1%). Repeating this analysis with the high-risk group defined by the Framingham score corresponding with >10% and >15% predicted risks resulted in net reclassification improvements of 4.5% (95% CI: 2.1% to 7.0%) and 8.5% (95% CI: 5.6% to 11.3%). These findings suggest that the superior prediction of incident hypertension with the Framingham hypertension risk score rather than prehypertension status was robust to various cutoff points to define high risk.
Developing a Comparison Score (the Whitehall II Risk Score)
To create a comparison risk score on the basis of the Whitehall II data, we drew a 60% random sample from the total data. This derivation data set included 8207 person-examinations. The Weibull β-coefficients for incident hypertension from a multivariable-adjusted model were used to calculate the Whitehall II hypertension risk score (Table 3). The hazard ratio for body mass index was slightly greater (1.071 versus 1.039) than that obtained in the Framingham study,12 but differences in all of the other hazard ratios between the present study and the Framingham study were nonsignificant.
Comparison Between the Framingham and Whitehall II Risk Scores
The validation data set was independent of the derivation data and was composed of 5472 person-examinations in total. The overall agreement between the predicted and observed incidences of hypertension was high across the risk score distribution for both the Framingham and Whitehall II risk scores (Figure). The predicted:observed ratio for incident hypertension was close to 1.00:1.08 for both the Framingham risk score and the Whitehall II risk score. The Framingham score slightly overestimated hypertension risk among men <50 years old and those with normal weight but not other subgroups (Table 4). There were no differences in risk prediction among the 3 study cycles (P=0.13).
The C statistic was 0.803 for the Framingham risk score and 0.804 for the Whitehall II risk score, indicating good discrimination for both. Hosmer-Lemeshow χ2 values of 11.5 for the Framingham score and 14.3 for the Whitehall II score were both <20.0, indicating good calibration.
Table 5 shows the reclassification of individuals between risk categories after replacing the Framingham risk score with the Whitehall II risk score. Among the incident hypertension cases, 32 person-examinations were appropriately reclassified to higher risk categories, whereas 20 person-examinations were inappropriately reclassified to lower risk categories. Among those who did not develop hypertension, 175 person-examinations were appropriately classified to lower risk categories and 261 person-examinations inappropriately to higher risk categories. Because the net reclassification improvement was −0.3%, replacing the Framingham risk score with the Whitehall II risk score did not result in a better prediction of incident hypertension. On repeating this analysis with the highest-risk groups split into a high-risk (20% to 40%) group and a very high-risk (>40%) group, we found that the net reclassification improvement was −1.0%, again showing that the Whitehall risk score did not result in a better prediction of incident hypertension than the Framingham hypertension risk score.
In a large cohort of nonhypertensive men and women aged 35 to 68 years, we showed that the Framingham hypertension risk score has high calibration and discrimination for predicting the risk of incident hypertension. The ratio of predicted:observed absolute risk of incident hypertension was close to 1 through the entire score distribution. Reclassification showed that the original Framingham risk score performs as well as the alternative Whitehall risk score derived here. These findings provide strong support for the validity of the Framingham hypertension risk score.
Clinical trials have demonstrated that treatment of prehypertensive individuals can prevent hypertension.1,11,12,19 However, prehypertension is highly prevalent, and, therefore, treating all prehypertensive people would require substantial resources.13 Our results show that the Framingham hypertension risk score improved the prediction of incident hypertension compared with that based on prehypertension status alone and enabled a better identification of nonhypertensive individuals at the greatest risk. Superior prediction by the Framingham score is unsurprising given that it takes into account multiple independent risk factors. Furthermore, the algorithm treats blood pressure as a continuous variable rather than as a categorical 1; this corresponds with the observation that the risk of cardiovascular disease increases in a continuous manner by increasing systolic and diastolic blood pressure levels >115/75 mm Hg.2
Studies evaluating a risk score on the same data on which the score was developed are prone to overoptimistic estimates of predictive performance. Our analysis shows that the results of the Framingham Offspring Study that developed the hypertension risk score were highly replicable in an independent cohort and, thus, probably realistic. Several similarities in the Whitehall II and Framingham Offspring studies may have contributed to the similar predictive performance of the Framingham hypertension risk score observed in these 2 cohorts. That is, both were predominantly white populations and free of diabetes mellitus; there was little difference in mean age (45 versus 42 years), mean blood pressure (119/75 versus 116/75 mm Hg), or hypertension incidence. It may be that the measurement of blood pressure, with a mercury-column sphygmomanometer in the Framingham Offspring Study rather than the Hawksley random-0 sphygmomanometer as in the Whitehall II Study, partially explains the slightly higher blood pressure values in our cohort).20,21 Differences between the 2 cohorts, in turn, support the generalizability of the hypertension risk score across heterogeneous populations. Indeed, the British Whitehall II participants were leaner (body mass index: 24.3 versus 25.1 kg/m2), with lower prevalence of current smokers (16% versus 35%) and from a different cultural settings (European versus American, metropolitan versus moderate-sized town) than the Framingham Offspring cohort.13
No previous study has examined the predictive performance of the Framingham hypertension risk score by sex, age, and in specific subgroups. We found that the predicted:observed ratio was very similar in men and women, smokers and nonsmokers, and participants with and without a history of parental hypertension. The score slightly overestimated risk for normal-weight people and those <50 years of age. Thus, if the Framingham hypertension risk score was used, preventive treatment would be considered for these specific groups at a lower level of absolute hypertension risk than for the other groups.
Strengths and Limitations
The present study benefits from the large sample size, a design with multiple screening cycles, and the standardized protocols to assess risk factors. This study also has several limitations. First, the initial examination was in the late 1980s when the prevalence of obesity was lower than at present. However, credibility that the Framingham hypertension risk score may also be valid in more contemporary cohorts is increased by our findings confirming the predictive validity of this score in the most recent data cycle between years 1997–1999 and 2003–2004 and among overweight participants. Second, we measured blood pressure with a random-0 sphygmomanometer at the first 3 examinations but used an automated oscillometric device at the latest examination. Although the latter device is known to provide higher blood pressure values,22 sensitivity analyses showed the predictive performance of the Framingham hypertension risk score to be similar in the cycle with change in blood pressure device as in those cycles with blood pressure measurement undertaken with the same device. Third, we used self-report data to assess the history of parental hypertension, whereas in the original Framingham Offspring Study parents’ blood pressure levels were measured. Direct measurement of parental hypertension is likely to provide more accurate information, but such data are seldom available. Because the hypertension risk score will typically be determined on the basis of self-reported parental hypertension in clinical practice, our measurement method is justified and strengthens ecological validity. Fourth, because our cohort was composed mostly white participants and did not include the unemployed, further validation studies are needed to confirm the generalizability of the findings in more heterogeneous populations.
Risk models are used to target preventive treatments to individuals at the highest risk to facilitate cost-effectiveness. Our investigation represents a crucial step in validating the simple office-based Framingham hypertension risk score, which has not been tested previously beyond the cohort for whom the scoring method was developed. Our study validated this risk score in a well-characterized British cohort that was larger than the original derivation data set; we also demonstrated the predictive validity of the score separately among men and women and among various risk groups. This evidence further justifies use of the Framingham hypertension risk score in clinical practice to identify individuals at increased near-term risk of developing hypertension.
Sources of Funding
This work was supported by the Medical Research Council, British Heart Foundation, Wellcome Trust, Health and Safety Executive, Department of Health, Agency for Health Care Policy Research (United Kingdom), John D. and Catherine T. MacArthur Foundation Research Networks on Successful Midlife Development and Socio-economic Status and Health, National Institute on Aging (National Institutes of Health), Academy of Finland, and European Science Foundation. G.D.B. is a Wellcome Trust Fellow.
- Received March 11, 2009.
- Revision received April 10, 2009.
- Accepted June 17, 2009.
Chobanian AV, Bakris GL, Black HR, Cushman WC, Green LA, Izzo JL Jr, Jones DW, Materson BJ, Oparil S, Wright JT Jr, Roccella EJ. Seventh report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure. Hypertension. 2003; 42: 1206–1252.
Seshadri S, Beiser A, Kelly-Hayes M, Kase CS, Au R, Kannel WB, Wolf PA. The lifetime risk of stroke: estimates from the Framingham Study. Stroke. 2006; 37: 345–350.
Staessen JA, Richart T, Birkenhager WH. Less atherosclerosis and lower blood pressure for a meaningful life perspective with more brain. Hypertension. 2007; 49: 389–400.
The Trials of Hypertension Prevention Collaborative Research Group. Effects of weight loss and sodium reduction intervention on blood pressure and hypertension incidence in overweight people with high-normal blood pressure: the Trials of Hypertension Prevention, Phase II. Arch Intern Med. 1997; 157: 657–667.
Tunstall-Pedoe H, Kuulasmaa K, Amouyel P, Arveiler D, Rajakangas AM, Pajak A. Myocardial infarction and coronary deaths in the World Health Organization MONICA Project: registration procedures, event rates, and case-fatality rates in 38 populations from 21 countries in four continents. Circulation. 1994; 90: 583–612.
Rose GA, Blackburn H, Gillum RF, Prineas RJ. Cardiovascular Survey Methods. 2nd ed. Geneva, Switzerland: World Health Organization; 1982.
World Health Organization. Definition, Diagnosis and Classification of Diabetes Mellitus and Its Complications. Geneva, Switzerland: World Health Organization; 1997.
He J, Whelton PK, Appel LJ, Charleston J, Klag MJ. Long-term effects of weight loss and dietary sodium reduction on incidence of hypertension. Hypertension. 2000; 35: 544–549.
Conroy RM, O'Brien E, O'Malley K, Atkins N. Measurement error in the Hawksley random zero sphygmomanometer: what damage has been done and what can we learn? BMJ. 1993; 306: 1319–1322.
Hense HW. The Hawksley random zero sphygmomanometer: comparison with mercury instrument is illogical. BMJ. 1993; 307: 562–563.
Stang A, Moebus S, Mohlenkamp S, Dragano N, Schmermund A, Beck EM, Siegrist J, Erbel R, Jockel KH. Algorithms for converting random-zero to automated oscillometric blood pressure values, and vice versa. Am J Epidemiol. 2006; 164: 85–94.