Reports of relationships

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Study Protocol

Word count (2260 plus references)

1. Title

Dietary sugars and risk of breast cancer in the UK Women's Cohort.

2. Background

Reports of relationships between dietary sugars, and breast cancer risk have been conflicting in cohort studies. However, "the possibility remains that a diet high in sugars may increase breast cancer risk" (Gnagnarella et al., 2008). Evidence from studies suggests that abnormal glucose metabolism may be important in breast carcinogenesis (Muti et al., 2002). It has been hypothesised that "glucose and other factors related to glucose metabolism, such as insulin and insulin-like growth-factors may contribute to breast cancer development" (Muti et al., 2002). The plausible mechanism being that high intake of sugars has been suggested to elevate serum levels of insulin-growth factor-1 and studies show high insulin-growth factor-1 may increase breast cancer risk (Romieu et al., 2004).

Some case-control studies have shown an increased risk for sugars and breast cancer (Muti et al., 2002). This study design is more prone to recall bias; therefore cohort studies are potentially more reliable. Prospective cohort studies that have explored the relationship between sugar and breast cancer risk are limited. Those that have been conducted have produced conflicting evidence (Wen et al., 2009). Researchers within this field have called for "further prospective studies to investigate this hypothesis" (Bradshaw et al., 2009).

The UK Women's Cohort Study (UKWCS) has been designed to explore the association between diet and disease (Greenwood et al., 2000). This cohort is ideal as baseline and follow up information has been collected on intake of foods high in natural and processed sources of sugars.

3. Hypothesis and Aims


  • That there is no direct association between dietary intake of sugars, and increased risk of breast cancer within the UK Women's Cohort.


  • What is the effect on intake of total sugar on risk of breast cancer within the UK Women's Cohort?
  • What is the effect on intake of sweet foods compared to sugar from fruit on risk of breast cancer within the UK Women's Cohort?
  • What is the effect on intake of fructose, and, sucrose, and the risk of breast cancer within the UK Women's Cohort?
  • What is the effect on intake of total sugars, sucrose and fructose, and risk of breast cancer for both pre and post-menopausal women within the UK Women's Cohort?
  • How does the study compare to other cohorts investigating similar observations.

4. Methods & Data Sources

Study Design

The UKWCS was created with the specific intention to "investigate the long-term relationship between diet and health outcomes and, in particular, neoplasm's and circulatory disease" (Cade et al., 2004).

This investigation will use a prospective cohort analytic study design to explore associations between sugar, sucrose, and fructose and breast cancer.

Cohort studies are suitable to explore the long-term relationship between diet and cancer as they track individual's overtime (Cade et al., 2007). The UKWCS was constructed by Professor Janet Cade and colleagues and started in 1993. Case-control studies have inferred an association between sugar intake and breast cancer (Tavani et al., 2006). A recent cohort study failed to find a similar association when following participants up over a much longer time period than a retrospective case-control study (Nielsen et al., 2005).

This project is a small scale Masters Dissertation project using data previously obtained from the UKWCS. All collection of data has been done by University of Leeds and provided in anonymous form to the student to undertake a research project as part submission for master's degree.

Data Sources

Selection of Participants & Selection Criteria

The cohort was taken from responders to a direct mail survey by the World Cancer Research Fund (WCRF). Women living in England, Wales, and Scotland, were specifically targeted. 500,000 people responded to the survey, 35,372 women met the criteria for selection UKWCS (Cade et al., 2004). This cohort is described by the authors as "health conscious with only 11 % current smokers and 58 % taking dietary supplements. 28 % of the participants reported to be vegetarian and 1 % vegan". (Cade et al., 2004).

Women aged 35-69 years who completed the original mail survey were eligible for inclusion in the study, n = 35,372. Information on around 600 variables was recorded from participants; response rate for those mailed was 58% out of 61,000.

Participants were specifically selected based on gender, largely women were recruited as one of the cancers being investigated was breast cancer and women are much more at risk of this disease compared to the male population, only 1 % of the male population are diagnosed with breast cancer (Dumitrescu and Cotarla, 2005). A minimum age of 35 years was set and a maximum of 69 in order to obtain enough cases of cancer in pre and post menopausal women. All recruits were volunteers in the study and could opt out at anytime.

The researchers collected baseline data in 2 phases. Phase 1 took place between 1995 and 1998; participants filled in lifestyle questionnaires and validated (217-item) food frequency questionnaires posted to each cohort. In 1999 to 2002, participants were sent a further 4-day food and drink weighed diary, a 1-day physical activity diary and a questionnaire relating to personal and familial dietary and medical history. Follow up data was collected at 5 years after baseline using questionnaire and 4-day food diary.

Health information was obtained on participants by recording disease history ICD-10 (International Classification of Diseases, WHO 1990). Table 1, Appendix C, lists health information obtained on the original cohort. The cohort was last updated on 1st January 2010; participation rate on date was 17%.

Sample size from the WRCF mail survey was used to obtain estimates for numbers required for the cohort. This was based on data from the cohort formation in 1989. Figures demonstrated that at 1-year follow up, a sample size would need to include 30,000 participants would detect differences in cancer registration. The authors projected "after allowing for participant effect, staggered entry and time lag. This sample size should have approximately 80% power to detect an effect which should identify approximately 550 cases of breast cancer" within the cohort sample of 35,372. (Cade et al., 2004).

The FFQ generated nutrient intakes after applying reference standard values from McCance & Widdowson's The Composition of Foods (5th Edition). Enough data is available to be able to examine specific nutrients of interest, total sugar, and sugar from sweet foods, sucrose and fructose. Health information recorded at baseline is included as Appendix C, table 1.


The relationship between breast cancer incidence rate (outcome variable) and exposure to total sugars, sucrose and fructose (predictor variable) at 5-year follow up will be measured. Mean intakes will be taken for each variable and compared between menopausal status and quintile of intake.

Issues of concern

Appropriate time needs to be considered for training on statistical software, STATA for analysis of data. Consideration also needs to be given to analysis of data required for University and this has been built into the timetable. Data sets for the cohort are sensitive therefore need to be analysed in the University, this is being arranged at present. Consideration of work commitments needs also to be taken into account therefore the timetable is considered flexible other than final dates for completion.

5. Ethical Considerations


Baseline human data was collected by Leeds University research team the researchers also contacted one hundred and seventy-four local research ethics committees in order for permission to be granted for the baseline study.


All participants in the study had to give written consent to be contacted and for baseline and follow up data to be obtained from participants. Copies of information sheets and consent forms have not been included in this protocol to ensure patients cannot be identified. Participants also gave consent for data to be collected in food diary, questionnaires, and 5 years follow up. Health information on disease history, medications, familial disease history, early life, women's health. Data on socio-demographic and socioeconomic characteristics was collected including smoking and alcohol consumption. Biological samples of blood and buccal cells (cheek cells) were obtained from a sample of participants, 5343 buccal cell DNA; 2589 blood samples.


Regulated by Ethics and Governance Framework, covered by Human Tissue Act, Data Protection Act, data will be held at Leeds University by the immediate research team in secure databases.

Student researchers will have to sign confidentiality declarations and must be supervised by an appropriate member of staff to ensure data is used appropriately. Data used for student research projects has been made anonymous.

6. Analysis of Data

Data will be analyzed using STATA 10. Data from the cohort has collected amounts of nutrients in "grams per day calculated by multiplying frequency of consumption of each food group by the nutrient content of the indicated portion size and summing all foods". (Cade et al., 2007) Nutrient composition was derived from UK food composition tables to enable analysis of individual nutrient intake.

Cohort baseline characteristics will be analysed presenting mean and standard deviation for total sample population baseline measures obtained. The statistical relationship between total sugar intake, total intake of sucrose and total fructose intake will be investigated using Cox proportional hazard regression. Predictor variables will be total sugars, total sucrose, and total fructose intake. Outcome variable will exact age on diagnosis of breast cancer.

There are a number of available Multivariate survival analysis models using metric and non-metric independent Variables. The advantage of the Cox proportional hazard regression is it can be used when there is a need to assess multiple co-variates on survival (Cleves et al., 2008), and is the most widely used multivariate method. It is easy to implement in STATA, works with a hazard model, and conveniently separates baseline hazard function from covariates (Cleves et al., 2008). Cox can also handle both continuous and categorical predictor variables wwithout knowing baseline hazard ho(t), can still calculate coefficients for each covariate, and therefore hazard ratio (Margetts & Nelson, 2008). It also assumes multiplicative risk, the proportional hazard assumption (Cleves et al., 2008).

Associations will be investigated for sugar intake for pre and post menopausal status separately. This will be achieved by looking at a simple analysis model that adjusts for age and total energy intake for predictor variables and be presented in a table showing mean and standard deviation nutrient intakes for total sample by breast cancer status. This is so cancer cases can be compared between pre-menopausal and post-menopausal women. This can then be compared to the cohort paper by Nielson, et al. (2005, pp. 124-129) who looked at a number of different sugars in their cohort.

A second model more complex model will be derived to decrease the chance of common confounding variables seen in this type of cohort dietary analysis. The complex model will adjust for age, body mass index, use of oral contraceptive pill, current smoking status, alcohol consumption, use of hormone replacement therapy, physical activity, number of children, age of menarche, age at first birth, null parity, total energy intake at baseline, education, social economic status. Due to the stratified sampling scheme within this analysis, individuals will need to be weighted in both models for the inverse probability of being sampled.

Sensitivity analysis will be used to exclude women who were diagnosed with breast cancer in the first year after completing the FFQ. It will also exclude individuals with previous malignant cancers. This has been done as participants with previous cases of cancer are at increased risk therefore may bias any baseline measure.

When data was originally collected by Cade, et al (2004) a second FFQ was taken from a sub sample of the cohort. This was done so the amount of random error could be "estimated using regression calibration approach". This allowed individual predicted values of dietary exposure for all participants.

Cox's proportional hazard regression will then be run using the predicted values for each participant. Participants will be categorised into quintiles of nutrient intake to give estimated hazard ratios. These will need to be corrected for the effects of measurement error based on individual foods in relation of diet to breast cancer (Cade, et al., 2004). The complex model will be presented in a table so comparison between quintiles for pre-menopausal and post-menopausal cases/non cases, person-years of exposure, and presenting hazard ratio's with 95% confidence intervals obtained from bootstrap estimates (Cade, et al., 2004).