World’s largest genetic project unveils incredible new data

This is set to drive the discovery of new diagnostics, treatments and cures. Uniquely, it is also available to approved researchers worldwide, via a protected database containing only de-identified data (for example, name, address, date of birth, name of GP and more stripped out).

This abundance of genomic data is unparalleled. But what cements it as a defining moment for the future of healthcare is its use in combination with the existing wealth of data UK Biobank has collected over the past 15 years, which includes:

lifestyle
whole body imaging scans
health information
proteins found in the blood

UK Research and Innovation (UKRI) has supported UK Biobank since its very conception, with the Medical Research Council (MRC) being one of the joint founding funders of the biomedical database.

The pilot study which sequenced the whole genomes of the first 50,000 UK Biobank participants was funded by MRC. The sequencing for the remaining 450,000 participants was completed as part of the Innovate UK led Industrial Strategy Challenge Fund Data to Early Diagnosis and Precision Medicine Challenge.

An important milestone

Professor Dame Ottoline Leyser DBE FRS, Chief Executive of UKRI, who will attend a reception at the House of Lords to celebrate the landmark scientific achievement, said:

It is an honour to represent UKRI during this landmark event for science, following our support of UK Biobank since its conception. Researchers can now apply to access de-identified full genome data from half a million participants, alongside a rich combination of medical, biochemical, lifestyle and environmental data from volunteers involved.

Today marks an important milestone in UKRI’s commitment to realise the potential of genetics for biomedical research, innovation and translation to the clinic.

After five years, more than 350,000 hours of genome sequencing, and over £200 million of investment, UK Biobank is releasing the world’s largest-by-far single set of sequencing data, completing the most ambitious project of its kind ever undertaken.

A veritable treasure trove

Professor Sir Rory Collins FRS FMedSci, Principal Investigator at UK Biobank said:

This is a veritable treasure trove for approved scientists undertaking health research, and I expect it to have transformative results for diagnoses, treatments and cures around the globe.

Game-changing data for health research

Today’s addition of sequencing data comes after a series of great leaps made using the vast UK Biobank biomedical database. These leaps include:

finding genes associated with protection against obesity and type 2 diabetes, which has the potential to lead to the development of new drugs
identifying individuals at very high genetic risk for diseases such as heart disease, breast cancer and prostate cancer, which may help with screening
a link between activity and Parkinson’s that can predict the disease up to seven years before diagnosis from smartwatch data, potentially leading to early intervention

Enhancing existing data’s potential

The new sequencing data will dramatically enhance the existing data’s potential.
Whole genome sequencing data on this scale, combined with UK Biobank’s existing data and biological samples, will result in extraordinary biomedical innovations, including:

more targeted drug discovery and development
discovering thousands of disease-causing non-coding genetic variants
accelerating precision medicine
understanding the biological underpinnings of disease

Democratising data

To date, over 30,000 researchers from more than 90 countries have registered to use UK Biobank, with over 9,000 peer-reviewed papers published as a result.

Researchers are given the tools and computing power to analyse the de-identified data via UK Biobank’s secure, cloud-based Research Analysis Platform.

Cheryl Moore, Chief Research Programmes Officer, Wellcome Trust said:

From the sequencing of the genomes themselves through to innovative and secure data storage, the release of this rich dataset marks a significant and impressive moment in scientific research. It’s truly field-opening for understanding the interactions between our genetics, environment and health.

Wellcome’s funding has supported a new, bespoke data platform that will provide approved researchers with the tools they need to analyse the wealth of data. Crucially, this opens up exciting opportunities for early-career researchers and those in low-and-middle-income countries, in turn offering huge potential to unlock new discoveries and enhance our understanding of health to improve lives around the world.

The consortium behind this joint venture

This project was funded by Wellcome, UKRI and four biopharmaceutical companies:

Amgen
AstraZeneca
GSK
Johnson & Johnson

In return for significant investment, UK Biobank gives nine months’ exclusive data access to industry members of the consortium.

In this way, commercial companies invest heavily to enhance a ground-breaking health dataset that is then available to approved research across the world.

The DNA sequencing was completed by Amgen’s subsidiary, deCODE Genetics, and the Wellcome Sanger Institute, using Illumina NovaSeq technology, and with deCODE providing additional informatics processing support.

Brilliant minds and cross-sector collaboration

Professor Sir John Bell CH GBE FRS FmedSci added:

UK Life Sciences are going from strength to strength, and UK Biobank is leading the way by combining world-leading data, fantastic infrastructure, brilliant minds and cross-sector collaboration.

Further information

This data and the rest of UK Biobank’s de-identified data is now globally accessible for approved researchers on the UK Biobank Research Analysis Platform. The platform is hosted on Amazon Web Services (AWS) in the London region and enabled by DNAnexus.

This is the first time a globally accessible resource, the computing power, and necessary storage required to analyse this size and sort of data, has been made available to researchers.

Following completion of the sequencing, the industry consortium led efforts to process and joint call the genomes using the DRAGEN pipeline on AWS infrastructure. It enabled this vast volume of data to be transformed into a single combined genetic dataset by Illumina.

These outputs further enrich the scientific importance of the data, enhancing the potential to identify less frequent genetic variants and making it more cross-comparable with other large scale population health studies.

The four pharmaceutical companies plan to publicly share their summary statistical analyses arising from the consortium collaboration, including genome-wide association results. They will provide the research community with highly valuable insights without the costly and time-consuming burden of analysing raw data.