Why is infrastructure important?
Infrastructure is a fundamental enabler of the research and innovation ecosystem. It is the equipment, facilities, resources, and services used by research and innovation communities and businesses to:
- conduct research
- foster innovation
- deliver real-world impacts
This ranges from national-level facilities offering globally unique capabilities through to the instruments found in laboratories across the country.
Over the last few decades, there have been rapid improvements in the sensitivity, throughput and availability of technology used in research. These improvements, coupled with advances in computer science and artificial intelligence (AI), have led to a huge increase in the volume and complexity of data generated through bioscience research.
Researchers need more effective ways to track, store, process, use and reuse this information in their work, helping them to realise real-world impacts for public benefit.
EMBL-EBI, a global leader for biological data
The European Molecular Biology Laboratory’s European Bioinformatics Institute (EMBL-EBI) is a centre for resources, expertise, and excellence in bioinformatics (the application of computer science in the management and analysis of complex biological data).
As a global leader in the storage, analysis, and dissemination of large biological datasets, EMBL-EBI provides freely accessible data, tools, services, and training for the life science research community, enabling wide-ranging scientific development.
The institute is part of (and funded by) EMBL, an intergovernmental organisation made up of 29 member states, including the UK. EMBL-EBI is located in the UK at the Wellcome Genome Campus in Hinxton, Cambridgeshire. As well as the Biotechnology and Biological Sciences Research Council (BBSRC), other key funders of EMBL-EBI include:
- Medical Research Council (MRC)
- European Commission
- US National Institutes of Health (NIH)
- Wellcome Trust
About 4% of EMBL-EBI’s funding comes from private companies.
EMBL-EBI’s data resources are used globally and have become an integral part of how researchers across the life sciences navigate data and create impact. On an average day, EMBL-EBI resources get over 100 million web requests from millions of scientists worldwide.
A foundational global research infrastructure
Dr Amanda Collis, Executive Director, Research, Strategy and Programmes, BBSRC said:
Since its inception 30 years ago, EMBL-EBI has become a foundational global research infrastructure. It has benefited from considerable investment from BBSRC and other funders over the years.
The data resources, tools, and services developed by EMBL-EBI are essential for researchers in delivering bio-based solutions for the challenges we face as a society. EMBL-EBI enables scientific discoveries in all areas of bioscience, from food security to infectious disease, genomic medicine and biodiversity conservation.
EMBL-EBI: the home of big data in biology. Video credit: EMBL-EBI
Video transcript and on-screen captions are available by watching on YouTube.
Open data enabling international collaboration
UK Research and Innovation (UKRI) advocates for open data in research. Good research and data management means that publicly funded research is:
- more transparent and easily scrutinised, helping to increase public trust
- easy to reuse and build upon
- collaborative and efficient
EMBL-EBI has led the way for decades in data for open science. Researchers internationally add their experimental data to EMBL-EBI’s databases and resources. In return, EMBL-EBI curates the data, creating free-to-use tools and resources that make it easier to navigate, analyse and collaborate. This is not only for within communities but between communities, helping to support interdisciplinary research.
As well as enabling open access to biological data, EMBL-EBI also provides open-access software. With ever-increasing data in the biosciences, researchers are reliant on tools for data mining and analysis. It’s vital that these tools remain accessible, accurate and adaptable.
User needs rather than profit
Crowd-sourcing expertise via open-source software means:
- users can share their work, enabling collaboration and equity
- bugs and security issues can be spotted and fixed quickly
- users can improve and test features that are useful for the whole community, pushing development towards user needs rather than profit
An external evaluation led by Charles Beagrie in 2021 found that EMBL-EBI’s users placed an estimated value of £1.25 billion per year for EMBL-EBI’s services if they were not free to use. EMBL-EBI’s resource contributions to wider research and future impact were estimated to be worth £2.2 billion per year.
The bricks behind the biology
BBSRC has provided substantial capital investment to EMBL-EBI, totalling over £100 million, with further investment from across UKRI.
In 2005, BBSRC, MRC and the Wellcome Trust funded a 1500-square-metre expansion, increasing staff capacity and enabling expansion of services, research and training. A £10 million BBSRC grant also established an early EMBL-EBI data centre in London in 2010.
In 2012, the UK Large Facilities Capital Programme grant (via BBSRC) granted £75 million for a new bioinformatics technical hub, resulting in:
- a new building (the ‘South Building’) that houses the secretariat (central hub) of European life science infrastructure for biological information (ELIXIR)
- a training centre
- Open Targets, an industry-led clinical translation suite for bioinformatics
MRC, the Natural Environment Research Council and the Wellcome Trust also contributed funding.
Infrastructure for key global priorities
In 2018, EMBL-EBI and BBSRC developed a business case for a UKRI £45 million Strategic Priorities Fund grant, which was awarded in 2019 to boost EMBL-EBI’s technical infrastructure.
Capital investment from BBSRC in 2020 dovetailed with funding from the UK government and the Wellcome Trust for office space, resulting in EMBL-EBI’s Thornton Building.
In 2023, UKRI awarded over £80 million to EMBL-EBI from the Infrastructure Fund for developing data and platforms that will help develop key global priorities, such as:
- antimicrobial resistance
- emerging infectious diseases
- sustainability
- biodiversity
Ensembl
The Ensembl project began in 1999, funded by the Wellcome Trust and EMBL. It was created to annotate the human genome as part of the Human Genome Project, including data such as gene function, associated diseases and sequence.
Since then, Ensembl’s annotation has broadened out into hundreds of other plant and animal species with BBSRC support, including:
- farm animal genomes like chickens, cows and pigs
- veterinary animals like dogs, cats, horses and rabbits
- wild species like warthogs and water buffalo
- important aquaculture species like Atlantic salmon, rainbow trout and European seabass
- essential plant genomes, including wheat, rice, maize and soybean
The impact of free access to this genetic information is immense, far-reaching, and difficult to quantify. Understanding the genome of key farmed animal species like cows and chickens has enabled decades of research into how to improve these species through breeding practices, improving welfare, productivity and food security.
For plants, Ensembl helps researchers improve crucial crop species, such as breeding for drought or pest resistance, which is vital for future food security.
Application for human health
Ensembl has now had over 100 releases and is integrating increasing numbers of genomes. As of December 2024, it hosted genetic information for over 30,000 species across the tree of life, acting as a platform for many spheres of bioscience.
Ensembl plays an important role in human health and genomic medicine. The NHS was the first national health care system to offer whole genome sequencing as part of routine care, in partnership with Genomics England.
Genomics England accelerated large-scale whole genome sequencing via the 100,000 Genomes Project, using EMBL-EBI data as a reference for analysis and interpretation. This is a game changer for patients with rare diseases, who will have a greatly improved likelihood of diagnosis and who may benefit from treatment tailored to their condition.
EMBL-EBI-managed data, such as that within Ensembl, is critical to this service. Genomics England and the NHS Genomic Laboratory Hub uses EMBL-EBI resources, like the Ensembl gene annotation database and the Variant Effect Predictor (VEP), which determines the effect of changes to the genetic code. These gene annotations and the Ensembl VEP are used routinely in NHS clinical genomics labs to accelerate patient diagnosis.
PDBe: a database for proteins
EMBL-EBI’s Protein Data Bank in Europe (PDBe) is a database for the collection and organisation of 3D structural data. Researchers can add the structure of proteins or other biological macromolecules to PDBe after determining and verifying them in the lab.
PDBe derived from the original Protein Data Bank project and is now part of the Worldwide Protein Data Bank (2003) for wider collaboration and data gathering. The database is core-funded by EMBL and the Wellcome Trust, with BBSRC, MRC, NIH, and the EU funding further projects. BBSRC has supported numerous parts of PDBe. This has cumulatively improved accessibility, usability, range, and collaboration on protein and macromolecule structural data offerings.
The development of Google DeepMind’s revolutionary, Nobel Prize-winning AI-powered tool AlphaFold was made possible by the critical structural biology and data infrastructure hosted and managed by EMBL-EBI, such as PDBe.
Read more about BBSRC’s structural biology research, including the underpinning biology to support the development of AlphaFold.
ELIXIR: coordinating data, tools, standards and training across Europe
BBSRC also supports ELIXIR, an initiative to connect and coordinate bioinformatics resources across participating countries. ELIXIR coordinates the provision of bioinformatics resources like data, FAIR (findable, accessible, interoperable, and reusable) resources, training, computing and data management support across Europe.
A national node in each participating country helps to coordinate the services of around 240 organisations, ensuring alignment and connections with the world-leading data resources run by EMBL-EBI. The UK node of ELIXIR brings together 28 UK academic partners to deliver life science data services and is coordinated via the Earlham Institute.
ELIXIR helps participating countries build their national data infrastructure for life sciences and works to connect these activities across countries.
Addressing differences, collaboration and standardisation
With over 1,000 bioinformaticians active across Europe, it aims to address several challenges, including national differences in data regulation, format and standards. It also fosters:
- collaboration, both academia-academia and academia-industry
- topical communities with shared aims, like the ELIXIR Plant Sciences Community and ELIXIR Rare Diseases Community, for the sharing of niche tools and databases
- standardisation of vocabulary, metadata and other developments to support interoperability
- collaborative efforts to sustain Europe’s life science data resources
ELIXIR is funded by a wide range of sources, including its member states and the EU, with many individual nodes receiving national-level investment in their respective countries.
Since its inception, ELIXIR has identified a set of Core Data Resources that have fundamental importance to the wider life science community and the long-term preservation of biological data. Many of these are managed by EMBL-EBI and partners.
International collaboration, training, and communication
A range of BBSRC funding has also facilitated knowledge exchange, collaboration, and standards-setting between EMBL-EBI and other institutes, organisations and networks.
This includes the funding of workshops across a variety of topics, including ontology, data management challenges and metagenomics. It also included funding under a BBSRC Brazil partnership award, which supported EMBL-EBI’s first trainer-support workshop for scientists in Brazil.
EMBL-EBI’s Bioinformatics for Discovery programme was also funded by BBSRC. It was tailored for researchers working on real-life issues, such as agri-food and pharmaceutics, and included face-to-face workshops, webinars, and a peer-to-peer forum for interaction.
EMBL-EBI international partnerships, supported via BBSRC International Partnering Awards, include:
- collaboration with Japan to integrate jPost into EMBL-EBI’s ProteomeXchange consortium, improving the availability of proteomics data
- collaboration with China to integrate iProx with the same international benefits
- partnering with India on PDBe data sharing with workshops and exchange visits, supporting the establishment of PDB India
There are external major collaborations, partnerships, and interdisciplinary projects that use EMBL-EBI’s resources, like the NHS-Genomics England collaboration.
The Darwin Tree of Life Project is another great example, with EMBL-EBI being one of ten consortium members. This ambitious project to sequence, assemble and publish the genomes of over 70,000 species found in Britain and Ireland has been making great progress. EMBL-EBI has developed a separate data portal to host the genomes of species recorded via this project.
Top image: Credit: EMBL-EBI