🇺🇦    make metadata, not war

A comprehensive bibliographic database of the world’s scholarly literature

The world’s largest collection of open access research papers, machine access to our vast unique full text corpus, core features, indexing the world’s repositories.

We serve the global network of repositories and journals

Comprehensive data coverage

We provide both metadata and full text access to our comprehensive collection through our APIs and Datasets

Powerful services

We create powerful services for researchers, universities, and industry

Cutting-edge solutions

We research and develop innovative data-driven and AI solutions

Committed to the POSI

Cost-free PIDs for your repository

OAI identifiers are unique identifiers minted cost-free by repositories. Ensure that your repository is correctly configured, enabling the CORE OAI Resolver to redirect your identifiers to your repository landing pages.

OAI IDs provide a cost-free option for assigning Persistent Identifiers (PIDs) to your repository records. Learn more.

Who we serve?

Enabling others to create new tools and innovate using a global comprehensive collection of research papers.

Companies

“ Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can ... ” Show more

Gareth Malcolm, Content Partner Manager at Turnitin

Academic institutions.

Making research more discoverable, improving metadata quality, helping to meet and monitor open access compliance.

Academic institutions

“ CORE’s role in providing a unified search of repository content is a great tool for the researcher and ex... ” Show more

Nicola Dowson, Library Services Manager at Open University

Researchers & general public.

Tools to find, discover and explore the wealth of open access research. Free for everyone, forever.

Researchers & general public

“ With millions of research papers available across thousands of different systems, CORE provides an invalu... ” Show more

Jon Tennant, Rogue Paleontologist and Founder of the Open Science MOOC

Helping funders to analyse, audit and monitor open research and accelerate towards open science.

Funders

“ Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, hel... ” Show more

Ben Johnson, Research Policy Adviser at Research England

Our services, access to raw data.

Create new and innovative solutions.

Content discovery

Find relevant research and make your research more visible.

Managing content

Manage how your research content is exposed to the world.

Companies using CORE

Gareth Malcolm

Gareth Malcolm

Content Partner Manager at Turnitin

Our partnership with CORE will provide Turnitin with vast amounts of metadata and full texts that we can utilise in our plagiarism detection software.

Academic institution using CORE

Kathleen Shearer

Executive Director of the Confederation of Open Access Repositories (COAR)

CORE has significantly assisted the academic institutions participating in our global network with their key mission, which is their scientific content exposure. In addition, CORE has helped our content administrators to showcase the real benefits of repositories via its added value services.

Partner projects

Ben Johnson

Ben Johnson

Research Policy Adviser

Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.

logo

The Royal Society

Free to access articles

As well as  open access , we  provide  ‘free  to  access' journal  content . A rticles  denoted  b y a tick symbol  can be accessed free of charge.

Access for low-and middle-income countries

In keeping with our role as the UK's national academy of science, we offer  free access to readers in low-and middle-income countries .

Journal archive

The Royal Society’s Philosophical Transactions , launched in 1665, was the world’s first scientific journal . It established the fundamental principles of scientific priority and peer review.

Most of our oldest content is now freely available, specifically, all papers older than 70 years. In addition, papers published between 10 years ago and either 12 months ago (biological sciences and history of science) or 24 months ago (physical sciences) from online issue publication are freely available. For Biographical Memoirs all issues are now freely available.

Please refer to the table below for an overview of the freely available content in our journals.

* 1 year for biological sciences ( including JRSI, Focus and N&R ) and 2 years for physical sciences. 

Package subscribers receive free lease access to the full journal archive, while their subscription remains active. If you are interested in purchasing perpetual access to the non-free content in our archive, please see our archive purchasing options

Open access week

Each year, during  Open Access Week , we make all of our published content free to access.

Related content

research papers uk

Open access publishing

The Royal Society supports open access publishing as part of our commitment to the widest possible…

Earth seen from space

Royal Society Open Access Equity

Free journal access and automatic APC waivers to researchers in low- and middle-income countries

Cephalopoda

History of scientific journals

Over 350 years of scientific publishing

Email updates

We promote excellence in science so that, together, we can benefit humanity and tackle the biggest challenges of our time.

Subscribe to our newsletters to be updated with the latest news on innovation, events, articles and reports.

What subscription are you interested in receiving? (Choose at least one subject)

Access to Research

Discover a world of published academic research at your local library

Access to Research gives free, walk-in access to over 30 million academic articles in participating public libraries across the UK. Start now by viewing which articles and journals are available from home, then find a participating library where you can view the full text. Share #AccessToResearch

  • Art, Architecture, Biological Sciences, Business, Engineering, Environmental Science, Film, Health, History, Journalism, Languages, Politics, Philosophy, Physics, Religion, Social Sciences, Mathematics

Other Areas of interest

What, Why, Who? Find out more.

Which publishers are taking part?

Which libraries are participating?

About Open Access

Supported by

Publishers Licensing Society (PLS)

Oxford University ranked number 1 in the  Times Higher Education (THE) World University Rankings  for the eighth year running, and at the heart of this success is our ground-breaking research and innovation.

Old PC monitor in Quadrangle

AI at Oxford

Applying AI to society's greatest challenges and tackling its ethical issues

3d model of a brain

Brain and Mental Health

How Oxford experts are exploring the most complex object in the known universe

A grid of images representing research

Oxford’s REF 2021 results show largest volume of world-leading research

The Research Excellence Framework assesses the quality of research in UK higher education.

Jet engine

Oxford's innovation case studies

Helping you to change the world

Oxford's Global Research Map

Oxford's Global Research Map

Explore Oxford's world-class research from pole to pole and in every continent

Oxford is world-famous for research excellence and home to some of the most talented people from across the globe. Our work helps the lives of millions, solving real-world problems through a huge network of partnerships and collaborations. The breadth and interdisciplinary nature of our research sparks imaginative and inventive insights and solutions.

Professor Timothy Power

Oxford profiles

Meet some of the talented people behind Oxford’s world-class research. Pushing forward the boundaries of knowledge, their work solves real world problems and creates a positive impact on our societies, economies and health.

started in Oxford montage

Started in Oxford

The Oxford region is one of the most innovative in the UK, with new enterprises continuing to join a growing band of spinouts, startups and entrepreneurs.

Writing documents

Oxford's experience in Policy Engagement

Oxford’s researchers and academics have a wealth of experience in engaging with policymakers and contributing to policy impact.

Image of the Radcliffe Camera building; part of the University of Oxford

Engaged research at Oxford

Have a look at some of the short films below for excellent examples of Public Engagement with Research (PER) activities that take place at Oxford which Inform/Inspire, Consult and Collaborate with the public.

Research Collaboration Values graphic

Research Collaboration Values

Our approach to research collaboration and partnership is underpinned by five core values.

h

Oxford Impact Films

Watch our Research Impact films: 3-4 minute videos of how our research has benefitted policy, health, business and culture. 

University of Oxford Student Ambassador Litter Picking Kayak Project

Oxford leads Nature Positive Universities Alliance to reverse biodiversity decline

The Nature Positive Universities Alliance brings higher education institutions together to use their unique power and influence as drivers of positive change.

Image of Professor Molly Stevens

New Academic Champion for Women and Diversity in Entrepreneurship

To support diversity in innovation and entrepreneurship, and to enhance the University’s commitment to these goals, Professor Stevens will work with the IDEA (Increasing Diversity in Enterprising Activities) programme and

Professor Ekaterina Hertog shares her insights on AI, automation in the home and its impact on women

AI, automation in the home and its impact on women

As we mark International Women’s Day, Professor Ekaterina Hertog spoke to us about AI, the increase of automation in the home and its impact on women and wider society.

Figure 1: Image source: CREDS guidance resources

Research to policy impact: strategies for translating findings into policy messages

Royal gallery

Getting started in Policy Engagement: pathways to engagement

Experts call for responsible use of generative AI in adult social care

Experts call for responsible use of generative AI in adult social care

‘Adult social care is about supporting people to live independently and to protect fundamental human rights. Generative AI offers many potential benefits and opportunities to adult social care.

Radcliffe Quarter buildings

Boosting Policy Engagement Through OPEN Leaders

Silhouette of the city of London skyline, illuminated with an orange glow by the setting sun behind.

Urgent call for UK Government to develop a heat resilience strategy

A hand-held device to help detect signs of cardiovascular and other diseases.

A new device to detect cardiovascular disease

Cerne Abbas Giant: New research shows giant carved as muster station for King Alfred’s armies

New research shows the Cerne Abbas Giant was a muster station for King Alfred’s armies

Research undertaken by Martin Papworth for the National Trust, showed that the Giant was carved in the Anglo-Saxon period not, as most people thought, in prehistory or more recently, yet the reason why he was made has remained a mystery.

artwork of cover for Darem the Lion Defender

Engaging communities in wildlife conservation through storybooks

Impact case study.

electric cars charging

A Partnership in Learning by Doing: using research to engage policymakers to pave the way for electric car clubs in Oxfordshire

stakeholder engagement report cover

10 recommendations for best practice stakeholder engagement

Wildebeest

Developing the next generation of wildlife conservation leaders

screenshot of a map showing climate-related impacts

A Global Resilience Index: Supporting climate adaptation of global infrastructure systems

girl with her arms up standing near some water

Tackling mental illness by supporting industry to develop new drugs

King Charles presents President Macron with Oxford University research on Voltaire’s work

King Charles presents President Macron with Oxford University research on Voltaire’s work

The gift is an extract from a University of Oxford research project to produce and publish the first ever scholarly edition of the Œuvres complètes de Voltaire (Complete works of Voltaire) begun in 1968, completed over 50 years later, in 2022, made up of 205 volumes.

Close up of mosquito

Development of a Malaria vaccine - R21/Matrix-M

Impact case studies.

Oxford University welcomes UK associate membership of Horizon Europe

Oxford University welcomes UK associate membership of Horizon Europe

Horizon Europe is the EU’s funding programme for research and innovation projects for the years 2021 to 2027.  The programme has a budget of €95.5 billion (£81bn). It is the successor to Horizon 2020 and the previous Framework Programmes for Research and Technological Development.

Swabbing for injection

Development and roll-out of Typhoid Vi-conjugate vaccine (TCV)

Latest research news.

Global cyber attack around the world with planet Earth viewed from space and internet network communication under cyberattack portrayed with red icons of an unlocked padlock.

• Research strategy • Engagement and partnership strategy • Divisions

Latest campaigns

AI at Oxford banner

We are developing fundamental AI tools, applying AI to global challenges, and addressing the ethical issues of new technologies.

illustration showing parts of the brain

Experts at Oxford are expanding our understanding of brain health at a cellular level, exploring the impacts of mental health issues on the individual, and examining population-wide global health problems.

Other campaigns

Cancer at Oxford

Digital collections at oxford, true planet, coronavirus research.

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

The varying impacts of COVID-19 and its related measures in the UK: A year in review

Roles Conceptualization, Formal analysis, Investigation, Methodology, Software, Validation, Visualization, Writing – original draft

* E-mail: [email protected]

Affiliation Department of Sociology, University of Oxford, Oxford, United Kingdom

ORCID logo

Roles Funding acquisition, Writing – review & editing

  • Muzhi Zhou, 
  • Man-Yee Kan

PLOS

  • Published: September 29, 2021
  • https://doi.org/10.1371/journal.pone.0257286
  • Peer Review
  • Reader Comments

Fig 1

We examine how the earnings, time use, and subjective wellbeing of different social groups changed at different stages/waves of the pandemic in the United Kingdom (UK). We analyze longitudinal data from the latest UK Household Longitudinal Survey (UKHLS) COVID study and the earlier waves of the UKHLS to investigate within-individual changes in labor income, paid work time, housework time, childcare time, and distress level during the three lockdown periods and the easing period between them (from April 2020 to late March 2021). We find that as the pandemic developed, COVID-19 and its related lockdown measures in the UK had unequal and varying impacts on people’s income, time use, and subjective well-being based on their gender, ethnicity, and educational level. In conclusion, the extent of the impacts of COVID-19 and COVID-induced measures as well as the speed at which these impacts developed, varied across social groups with different types of vulnerabilities.

Citation: Zhou M, Kan M-Y (2021) The varying impacts of COVID-19 and its related measures in the UK: A year in review. PLoS ONE 16(9): e0257286. https://doi.org/10.1371/journal.pone.0257286

Editor: Florian Fischer, Charite Universitatsmedizin Berlin, GERMANY

Received: October 13, 2020; Accepted: August 27, 2021; Published: September 29, 2021

Copyright: © 2021 Zhou, Kan. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All data files are available from the UK Data Service database (study number(s) 6641, 8644). Dat file URL: https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=8644 https://beta.ukdataservice.ac.uk/datacatalogue/studies/study?id=6641 .

Funding: This work is supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (awardee: Man-Yee Kan, grant number 771736). Funding website: https://ec.europa.eu/programmes/horizon2020/en . The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

More than one year has passed since the United Kingdom (UK) officially announced its first national lockdown on 23 March 2020 due to the rapid spread of COVID-19. The outbreak of COVID-19 and the massive lockdown measures have greatly changed people’s lives. When people were instructed to stay at home and maintain physical distancing, the lives of millions of people were affected. For months, many people were unable to go to work or school, nor could they meet friends and relatives. What was unexpected was that people in the UK experienced a total of three national lockdowns over the past year. Now, people’s lives are far from what they were before the first lockdown, and the pandemic is still not over.

Recent evidence has shown that the COVID-19 pandemic and related social and economic measures, such as physical distancing and business closure, have differential impacts on various social groups. In the UK, for example, women and parents are found to have experienced a larger reduction in subjective wellbeing [ 1 , 2 ]. Black, Asian, and minority ethnic (BAME) immigrants were more likely to experience economic hardship immediately after the first national lockdown [ 3 ]. In addition, among those who were known to have COVID-19, people of BAME background in the UK had a death rate that was higher than that of white people [ 4 ]. As Damian Barr said in his poem, “we are in the same storm, but we are not all in the same boat [ 5 ]”.

These earlier findings identified the existence of immediate unequal impacts for different social groups, but our understanding of the longer-term impacts of COVID-19 and related measures remains limited. We know little about how the impacts might have changed since the first lockdown. The COVID-19 pandemic has already lasted for more than one year, and the UK has experienced three national lockdowns. Early research was confined by data that covered only two time points—such as before and shortly after the announcement of the first lockdown. Little is known about to how unequal social impacts reveal themselves at different stages of the COVID-19 pandemic, especially with repeated lockdowns. This omission hinders our understanding of how COVID-19 and COVID-induced social policies, such as physical distancing measures, working from home, and the closure of certain businesses, which have been changing on a weekly or even daily basis, progressively affect people’s lives. Documenting the development of the impacts of COVID-19 and COVID-induced measures is important for us to understand the consequences of this rapidly developing pandemic and help policymakers plan for future waves and future pandemics.

We need more comprehensive and up-to-date research on how inequalities have changed as the COVID-19 pandemic develops with repeated waves and the various measures to contain it were implemented over the past year. We conducted analyses on a nationally representative population data from the latest UK Household Longitudinal Survey (UKHLS), which was conducted before the first lockdown in March 2020, during the first lockdown from April to June 2020, during the ease of the first lockdown (June to September 2020), and during the later two lockdowns (November 2020, and from January 2021 to March 2021). In this paper, we contribute to COVID-19 research by providing a dynamic picture of how people’s labor earnings, time use, and wellbeing changed across different stages of the pandemic. We further investigated whether and the extent to which the inequalities in these outcomes based on gender, ethnicity, and educational level have changed over the past year.

In what follows, we first review the latest works on the impact of COVID-19 and COVID-induced measures on people’s lives, focusing on three dimensions of social inequality: gender, race/ethnicity, and education. We then outline the development of the COVID-19 pandemic and the lockdown measures in the UK from March 2020 to April 2021. Next, we introduce the data and its longitudinal design, which enables us to compare the information of the same individuals before the start of this pandemic and at different time points over the past year. Finally, we will report the results of fixed-effect regression analyses and discuss our conclusions.

The impacts of COVID-19 and its related measures

The COVID-19 pandemic has developed for over one year. In many countries, repeated waves of COVID-19 have been observed. The primary aim of COVID-19 induced measures is to contain the virus by reducing physical contacts between people. Many of these measures immediately affect people’s behaviors, but others could have longer-term impacts. For example, the closure of businesses and work-from-home guidance tremendously altered people’s working patterns. Reductions in paid work time and earnings have been immediately recorded in countries that have introduced lockdown measures such as Australia [ 6 ], the UK [ 3 , 7 ], and the United States (US) [ 8 ]. When more people stayed at home and the option of outsourcing domestic work was reduced due to business closure or the fear of contracting COVID-19, it is not surprising to see that people spent substantially more time on unpaid domestic work than they had in the past [ 6 , 7 , 9 , 10 ].

People’s feelings also changed. The contraction of COVID-19 is associated with a series of symptoms such as a high temperature, continuous cough and a loss or change to the sense of smell or taste. Serious cases will result in hospital admission and death. In the UK, the case-fatality rate is estimated to be 2.1% [ 11 ]. Daily news reporting the surging number of new cases and deaths brings in a high level of worry about health and security [ 2 ]. In addition, loss of employment, financial strain, and social isolation are well-known factors that negatively affect mental health [ 12 – 14 ]. Not surprisingly, soon after the start of the pandemic, worsened subjective wellbeing was observed in Australia [ 6 , 15 ], the UK [ 2 , 16 , 17 ], and the US [ 18 ]. Once daily increase of COVID-19 cases declined and the lockdown restrictions began to be lifted, people’s subjective wellbeing started to recover. As Pierce et al. [ 2 ] noted by using the first five waves of the same UKHLS COVID study data as in this paper, “[b]etween April and October 2020, the mental health of most UK adults remained resilient or returned to pre-pandemic levels.” However, “[a]round one in nine individuals had deteriorating or consistently poor mental health.”

This COVID-19 pandemic and its related measures have raised increasing concerns of exacerbated social inequalities. Since long before the pandemic, gender inequalities have existed in the labor market. In the UK, the labor force participation rate for men is higher than that for women, and men are also much more likely to work full time [ 9 , 19 ]. Women are more likely to be at-home workers. Reasons for this inequality include inflexible workplace expectations, gender norms expecting men to be the primary earners and women the primary caregivers, and discrimination in the labor market. When people are required to work from home, the spatial boundary between market work and family life is blurred. Many studies have investigated whether the changes in time use due to lockdown measures are the same for women and men. Between March and May 2020 (UK 1st lockdown), British men were found to be more likely to be furloughed or dismissed from work than women [ 20 ]. However, studies focusing on the labor market performance of parents reveal a different pattern. In the UK, during the first lockdown period from April to May 2020, among parents with children aged between 4 and 15, mothers were found to be more likely to be laid off, furloughed, or quit their jobs [ 21 ]. Similarly, in Australia [ 6 ], Canada [ 22 ], and the US [ 23 ], mothers with young children experienced a larger change in their paid work time or were more likely to leave their jobs. On the other hand, several studies have reported improvements in the domestic division of labor: the increase in domestic work was larger for men than for women during the lockdown period in Australia [ 6 ], Canada [ 24 ], France [ 25 ], and the US [ 26 ]. However, contrary results were reported in Germany [ 27 ] and Spain [ 28 ]. The decline in subjective wellbeing also differs between women and men. In the UK and Australia, women were found to experience a larger reduction in subjective wellbeing than men [ 1 , 2 , 6 , 9 , 29 ].

In the UK, BAME immigrants were more likely to experience economic hardship just after the first lockdown [ 3 ]. Compared with their white counterparts, BAME immigrants were also found to suffer a larger decline in subjective wellbeing at the beginning of the March 2020 lockdown in the UK [ 3 , 30 ]. In the US state of Indiana, Black Americans were more than three times more likely to lose their jobs than whites [ 31 ]. In contrast, another study highlights that white Britons in middle-income jobs were more likely to experience job loss, primarily driven by the fact that many BAME people are employed in key sectors such as the health and social care services, which were exempt from the lockdown measures and instead had a surge in work demands, during the first UK lockdown [ 20 ]. Notably, in the UK, people of BAME backgrounds had a death rate that was higher than that of white people after they were confirmed to have COVID-19 [ 4 ].

People with less education and lower income suffered substantially during the pandemic. They were particularly hit hard with a higher chance of losing their jobs and earnings in countries such as Canada [ 32 ], the UK [ 20 ], and the US [ 31 ]. Many of the less educated are trapped in lower-skilled occupations with tight financial constraints. Consequently, the less educated group reported a heightened level of distress during the first lockdown in the UK [ 33 ]. However, one US study reports that the decline in subjective wellbeing up to April 2020 was larger among the more educated, possibly because the more educated might have felt a greater loss of control and wealth due to COVID-19-related uncertainties [ 18 ]. Another study conducted in the US between April 2020 and June 2021 pointed out that part of the reason for the deterioration of mental health results should be attributed to the concurrent presidential election and unrest in domestic politics [ 34 ].

Again, the current literature has focused extensively on the impacts of the relatively early stage of this pandemic. In particular, studies that have employed the same British data source as the present study have examined the changes in earnings, time use, and subjective wellbeing during the implementation of the first national lockdown in late March 2020 [ 3 , 7 , 9 , 10 , 20 ]. Pierce et al.’s work [ 2 ] on subjective wellbeing is an exception. Their work examined the recovery of subjective wellbeing when the first lockdown measures were eased from June to October 2020. However, their study did not cover the later lockdowns in November 2020 and January 2021. In this article, we will provide a first-year review of COVID-19 development in the UK and document how people have responded to the first lockdown, the ease of the first lockdown, and the later two lockdowns. This evaluation will reveal whether people responded similarly to repeated lockdowns and whether these changes in earnings, time use, and feelings are temporary or long-lasting.

Timeline of the lockdown measures in the UK

On 31 January 2020, the first two positive cases of COVID-19 were confirmed in the UK. On 5 March 2020, the first patient who tested positive for COVID-19 died. On 23 March 2020, the Prime Minister placed the UK on lockdown to slow down the outbreak of this pandemic. These measures included physical distancing, school closures, working from home, and closure of non-essential businesses, including pubs and cafes. Key sectors, including health and social care, education and childcare, and key public services, were allowed to operate.

To maintain employment and to protect individuals and businesses from economic hardship, a coronavirus job retention scheme was implemented for the period between late March and the end of October 2021 to cover 80 percent of the regular salary of furloughed employees, up to a maximum of £2,500 per month [ 35 ]. In April, the UK had more than 10,000 deaths related to COVID-19. In May, phased reopening of shops and schools was announced, and those who were unable to work from home were expected to return to the workplace.

Beginning on 1 June 2020, schools were open for all Reception, Year 1 and Year 6 pupils, but the summer holiday soon arrived. Nonessential businesses reopened gradually beginning on 15 June. Beginning on 4 July, pubs, cinemas, restaurants reopened. Physical distancing rules were relaxed from a “two-meter” to a “one-meter plus” rule. In August, restrictions were eased further, although the pandemic was far from over.

The UK variant of the coronavirus (scientific name B.1.1.7, WHO name Alpha) was first identified in September 2020 and was considered to be more transmissible and potentially deadlier. In late September, people were required to work from home with a 10 pm curfew for the hospitality sector. In October, England entered a 3-tier system where different regions were classified into different tiers depending on the level of the spread of the virus. Soon after, the second national lockdown came into force on 5 November and lasted until 2 December. People were told to stay at home. Other measures included the closure of the hospitality sector and nonessential shops, but schools were open, and people could leave their home for outdoor exercise. After 2 December, the UK then entered a stricter 3-tier restriction system.

However, this 3-tier system did not last long. After Scotland announced a lockdown, on 4 January 2021, a third national lockdown was announced. Schools were closed again, and people were urged to stay at home. This time, the measures were stricter than those in the second lockdown. They included “Stay at home at all times, wherever possible,” “Not allowed to meet others from outside your household (or support bubble),” “All retail and hospitality venues must close,” and “Personal care services have to close.” Schools were closed to most pupils, except for the children of critical workers and the most vulnerable children. Nurseries were kept open.

Since 8 March, schools in the UK have been completely reopened. Nonessential retail and personal care services have been reopened since 12 April. People have been allowed to meet outdoors, as a number of restrictive measures have been lifted since 17 May. A complete easing will occur on 19 July 2021. The Prime Minister has pledged that all adults in the UK will be offered their first dose of a COVID-19 vaccine by the end of July.

By 16 April 2021, the recorded number of deaths related to COVID-19 had reached over 127,000 in the UK. Fig 1 displays the spread of COVID-19 and related deaths in the UK during the research period. A more detailed timeline of the UK lockdowns can be found at [ https://www.instituteforgovernment.org.uk/sites/default/files/timeline-lockdown-web.pdf ]. Fig 1 shows the development of the COVID-19 pandemic in the UK based on data provided by the UK government.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Note: Data source: https://coronavirus.data.gov.uk/details . Crude death rate is new deaths within 28 days of a positive test per 100,000 population.

https://doi.org/10.1371/journal.pone.0257286.g001

Data and methods

Data and sample.

We use data from the first eight waves of the UKHLS COVID study data and the preceding two waves (2017/18 and 2018/19) of the UKHLS main survey [ 36 ]. The UKHLS is a household panel survey and started its first wave in 2009 with a nationally representative sample of 51,000 adults (aged 16 and above) from approximately 40,000 households. Individuals were followed up annually and were interviewed face-to-face. This research is based completely on the UKHLS data that are publicly available through the UK Data Service (Study numbers: 6614 and 8644) and are completely anonymous.

Regarding the COVID study, households who participated in previous UKHLS surveys were contacted to fill in a monthly online questionnaire beginning in April 2020. The complementary telephone survey started in May 2020. Participation in the survey was voluntary. Approximately 16,000 respondents (aged 16 and above) completed this first wave of the COVID survey with a response rate of 42%. Currently, data from the first eight waves of surveys conducted in the last week in April, May, June, July, September, November in 2020 and the last week in January and March in 2021 are available.

Our analytic sample contains individuals who have participated in the UKHLS main survey and at least one of the eight waves of the COVID study. The respondents all had access to the internet or telephone to participate in the surveys. This requirement might have caused a sample selection bias. In a supplementary analysis, the sample from the COVID study is found to be socioeconomically advantaged in terms of employment, occupation, education, and homeownership compared to the full UKHLS sample. If we assume that one’s socioeconomic status has a protective effect on the negative consequences of the COVID-19 and related lockdown measures, the reported results may underestimate the potential negative impacts of the COVID-19 and the related lockdown. Nonetheless, one paper discusses this issue of nonrandom sample selection and demonstrates that the bias due to sample selection is very limited once weight is considered [ 37 ]. In the following analysis, we apply the individual weights, which were adjusted for “unequal selection probabilities and differential nonresponse” and are supplied in the data [ 38 ]. Based on the User Guide for the data, these weights “scale respondents to the eligible population in the UKHLS wave 9 sample, adjusted for death, incapacity and emigration occurring between wave 9 and the start of the COVID-19 web survey.” [ 38 ] This approach has been used in previous work analyzing the same data [ 2 , 3 , 20 ].

Our sample includes respondents of prime working age (between 20 and 65) in 2020. Two percent of the UKHLS COVID sample has missing values in the predictors to be used in regressions. The numbers of observations with no missing predictors are 10484, 9008, 8478, 8210, 7642, 7083, 7019, and 7525 in the first eight waves of the COVID study. The final sample for each regression is dependent on the outcome variables with nonmissing values (some outcome variables are not asked in certain waves) and the selection of subgroups (for example, people who had a job before the pandemic). Please refer to S1 Table for more details of the sample selection process. The focus on within-individual changes in the outcome variables indicates that the respondents should be followed up for more than one wave. Previous analyses using the same data and selecting the individuals interviewed for more than one wave do not find that this selection would bias the results [ 39 ].

Monthly labor income, weekly paid work hours, subjective wellbeing, weekly housework hours, and weekly childcare hours are the five dependent variables or outcomes of interest.

Monthly labor income.

Respondents’ labor income in January or February 2020 (before the lockdown) was collected retrospectively in the COVID survey. Respondents also provided their current labor income in each month thereafter. We calculate the natural log of the labor income. Those who had a job in January or February 2020 were selected to predict this outcome.

Weekly paid work hours.

Respondents retrospectively reported their current paid work hours per week and their usual working hours in January or February 2020. During the period of the COVID-19 pandemic, the question asked was “How many hours did you work, as an employee or self-employed, last week?” During the prepandemic period, the question was “During January and February 2020, how many hours did you usually work per week?” Those who had a job in January or February 2020 were selected to predict this outcome.

Subjective wellbeing.

Subjective wellbeing is the mental wellbeing reported by the respondents in a General Health Questionnaire (GHQ). The value is the sum of 12 items (GHQ-12) scored on a Likert scale from 0 to 3: “ability to concentrate,” “losing sleep,” “playing a useful role in life,” “capability of making decisions,” “feeling under stress,” “overcoming difficulties,” “ability to enjoy activities,” “ability to face problems,” “feeling unhappy or depressed,” “losing confidence,” “believing in self-worth,” and “feeling generally happy.” The overall scale ranges from 0 (least distressed) to 36 (most distressed). This measurement is a validated and widely used measure of nonspecific mental distress in surveys [ 40 ]. The same information was collected in earlier waves of the main survey of the UKHLS and in each wave of the COVID study. The full sample was used to predict this outcome.

Weekly housework hours.

Respondents’ weekly housework hours were collected by the question “Thinking about last week, how much time did you spend on housework, such as time spent cooking, cleaning and doing the laundry?” Information about housework hours before the COVID survey was derived from the earlier UKHLS waves (the latest one was collected in the years between 2018 and 2019). The full sample was used to predict this outcome.

Weekly childcare hours.

Respondents’ childcare hours were collected by the question “About how many hours did you spend on childcare or home-schooling last week?” This information is only available in the COVID survey. Only those who had a child younger than 16 years old in the household (referred to as parents in later analyses) were asked this question, and these respondents are used for analyses.

Independent variables.

We include the wave dummies, which represent the time point when information was collected to examine the dynamics in those outcome variables.

The key socioeconomic independent variables are constant for the same individual across the waves. These variables are gender (52.7% females), whether an individual is Black, Asian or another minority ethnic (10.1%) or not (reference group: whites), and educational level (university degree holders 32.2%). The underrepresentation of ethnic minority groups is common in a panel survey sample (the 2011 census reported that 85.6% of the working-age people were from white ethnic groups) because of the selection of people with repeated observations to satisfy the requirement of the fixed-effect models. People with disadvantaged backgrounds are known to be more likely to drop out in repeated surveys [ 41 ]. The later regression analysis has considered this sample selection issue using weights, as discussed above. Moreover, attrition in panel surveys is not found to have a significant impact on the estimations in predicting income [ 42 ], time use [ 43 ], or attitudes [ 44 ].

Whether the respondent had a positive COVID-19 test outcome was asked in each wave. We included this variable in the model to control for the impact of contracting COVID-19 so that the period indicators could better represent the spread of COVID-19 and COVID-19-related policy change at the macro-level. This variable has four categories: “having no test” (reference, 89.7%), “tested positive” (0.8%), “tested negative” (9.0%), and “result pending” (0.5%).

All models controlled for respondents’ partnership status (whether they live with a partner) and parenthood status (the presence of a child younger than age 16 in the household) to account for potential changes in the family status that are correlated with the outcomes [ 45 , 46 ].

Analytical strategies

We applied linear fixed-effect regressions to predict the five outcomes. By interacting the month indicator with gender, BAME group, and education levels, we examined how the change in income, time use, and wellbeing differed across individuals in the three different sociodemographic groups in different periods of the pandemic. The reference time point is January and February 2020 for earnings and weekly paid work hours outcomes. The reference time point is the year 2018/2019 for the subjective wellbeing (distress level) and weekly housework hours outcomes. For weekly childcare hours, the reference time point is April 2020, which was during the first national lockdown. The outcome variables compare the information reported by the same individuals at each time point and hence reveal within-person changes. This analytic approach enabled us to investigate trajectories of the outcome variables over the past year conditional on the same individual.

The fixed-effect regression method takes full account of the time-constant individual characteristics that are correlated with both the independent variable and the outcome variables. This is achieved by demeaning the dependent and independent variables using person-specific means [ 47 ].

The samples in the UKHLS main survey and the COVID survey are probability samples of postal addresses. The samples are clustered and stratified. Accordingly, clustered standard errors are used to consider this sampling design [ 48 ].

These analyses were conducted in Stata/SE 16.1. Replication codes are available at https://github.com/jomuzhi/ukcovidunderstandingsociety .

Descriptive results

We first report the weighted mean values of the key outcomes in Table 1 . Please note that the information was collected at the end of each survey month.

thumbnail

https://doi.org/10.1371/journal.pone.0257286.t001

First, among those who worked before this pandemic (between January and February 2020), there was a clear reduction in their average earnings when the pandemic started in the UK. Their income recovered by almost ten percent in May from the April level, which should have been mainly driven by the implementation of the job retention scheme . Some workers who could not work from home, such as those working on construction sites, also returned to the workplace in May. Since then, average monthly net earnings have remained at approximately the level of £1,550. Notably, since the first lockdown, people’s take-home earnings has never returned to their prepandemic level but never fell below 90% of the pre-pandemic level.

Before the pandemic, those who worked in January and February 2020 worked 34.7 hours per week on average. A record low of 21.9 hours per week was observed in April 2020. The persistent decline in paid work time over the past year is evident, although working hours have recovered gradually since May and reached a peak of approximately 30 hours per week in September 2020. The later two national lockdowns (November 2020 and January 2021) did not reduce the working hours as much as the first national lockdown. Weekly paid work hours were maintained at approximately 28 hours.

People felt more distressed beginning in March 2020. The worst number of 13.4 was recorded in the last two rounds of lockdown-November 2020 and January 2021, when new cases and deaths grew sharply at the beginning of these lockdowns.

People’s housework hours increased and reached the highest level of 12.3 hours per week in April and May 2020. Then, housework time declined gradually and was maintained at 10.5 hours per week. Compared with the figure recorded in September 2020 when most lockdown restrictions were eased, the figure in January 2021 did not change significantly, even though a stricter lockdown was in place. This finding concurs with the small reduction in paid work hours from September 2020 to January 2021.

The average childcare hours per week reached 16.7 hours for parents in April, but this figure gradually declined to approximately 13 hours per week before the third national lockdown. In January 2021, childcare hours only increased 0.5 hours per week over the September figure, even though schools were closed to most pupils during the third lockdown. Overall, people’s time use had become less responsive to repeated lockdowns.

Changes in earnings, paid work time, subjective wellbeing, housework and childcare time

Fig 2 reports within-individual changes in earnings, paid work hours, distress level, and housework hours across waves. The red lines indicate the time point when the national lockdowns started to enforce. Please note that the information was collected at the end of each survey month. Detailed coefficients are reported in S2 Table .

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g002

Respondents’ earnings stayed lower than the pre-pandemic level over the entire year, with the largest decline (~9%) recorded in late April, the first month after the announcement of the first national lockdown. Earnings recovered slightly after the gradual relaxation of restrictive measures and the implementation of the job retention scheme. Following the third lockdown, when almost the same strict measures as the first lockdown were imposed, we found a similar level of decline in earnings (~8%) compared with the prepandemic period, as in the first lockdown. One year after the onset of the pandemic in the UK, our sample still experienced a 7.4% decline in earnings compared with the pre-pandemic level.

Paid work hours remained much lower than the prepandemic level over the entire year. The largest drop of nearly 13 hours was observed in the first month after the March 2020 lockdown. Then, paid work hours recovered and have never returned to the same lowest point. People worked the longest hours in September 2020, when restrictive measures were minimal. Interestingly, despite the implementation of the second and the stricter third national lockdowns, paid work hours dropped only slightly compared to the September figure and were even higher than the July 2020 figure, even though all shops were allowed to open back in July 2020. This observation suggests an increased adaptation to the work-from-home practice. After the first lockdown, more firms announced a long-term strategy to allow employees to work from home [ 49 ]. Accordingly, people have increased their paid work time even though they might still work from home.

In this pandemic, people’s subjective well-being has been damaged. The distress level (a higher score indicating more distress) stayed higher than the prepandemic level over the past year. In the three-month period after the first lockdown, a high level of distress was recorded. An improvement in subjective wellbeing was observed from July and before the enforcement of the second lockdown. The November lockdown brought a further decline in subjective wellbeing, which is consistent with the findings in one earlier study [ 2 ]. The distress level in November 2020 and January 2021 was even higher than that in the first lockdown period. It appears that people were much less optimistic and suffered tremendously as the pandemic dragged longer. People became slightly less negatively affected in their subjective wellbeing in March 2021, although the level was only similar to that in April 2020. One year after the onset of the pandemic in the UK, respondents’ subjective wellbeing returned to the level of April 2020, which was one month after the announcement of the first national lockdown.

The increase in housework hours was the highest during the first lockdown. Compared with the housework hours during the easing period in September 2020, the January 2021 lockdown was not associated with an increase in people’s housework time. This change echoes the relatively high level of paid work time in the later two lockdown periods.

Next, we examine childcare time since the first national lockdown. In Fig 3 , we can see that beginning in April 2020 (during the first lockdown period), childcare hours have been dropping. The lowest level was observed in September 2020, when schools completely reopened. Interestingly, childcare hours in January 2021 were similar to those in September 2020, despite the closure of schools to most children in January 2021.

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g003

Differential impacts on women and men

Figs 4 and 5 report whether changes in the five indicators differ between women and men. For monthly net earnings and weekly paid work hours, we analyzed an additional sample that includes only non-key workers. We will examine whether a disproportionate number of female workers in certain key sectors, such as health and social care, drive the results.

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g004

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g005

First, the reduction in earnings for female workers (those who worked in Jan/Feb 2020) was smaller than that for male workers during the first lockdown in April 2020 (p = 0.011). Since then, there has been no difference between women and men in changes in earnings, reflecting the faster recovery of men’s earnings. Differential impacts on women and men were not found among non-keyworkers. Therefore, the higher proportion of women working in key sectors, which were operating much more actively than other sectors during the first lockdown period, should be the main reason for the gender difference in the earning decline during the first lockdown.

During the first lockdown, the decline in paid work hours was smaller for female workers than for male workers, disregarding their keyworker status (p<0.001). The gender difference in the reduction in paid work hours decreased as the first lockdown ended and became statistically insignificant at the 0.05 level from July to September 2020, indicating a faster recovery of paid work time for men than for women. The differential impacts of gender on paid work hours observed in the first lockdown were not observed in later lockdowns among non-keyworkers.

In Fig 5 , the growth in distress level was much higher for women than for men in the first month of the first lockdown (p<0.001). Then, women’s subjective wellbeing recovered, and men’s distress levels began to rise. These findings suggest that men’s response to this pandemic lagged behind that of women in terms of their subjective wellbeing in the first lockdown. The distress level of both women and men was reduced to the lowest level from July to September 2020, when life in general had returned to normal. Once the cases of COVID-19 surged and lockdown restrictions were reimposed in November 2020 (p = 0.056) and January 2021 (p = 0.061), women again suffered from a larger increase in distress levels than men. The distress level of women reached a similar high point across the three lockdowns. For men, their distress level was higher in the later lockdowns than in the first lockdown, when the cases of COVID-19 and its related deaths worsened.

We do not observe a gender-specific impact on housework time. The gender gap in housework time was maintained over the past year.

Differential impacts on BAME people and white people

Figs 6 and 7 report whether changes in the five indicators differ between BAME people and whites. For monthly net earnings and weekly paid work hours, we analyzed an additional sample that includes only non-key workers.

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g006

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g007

Compared with whites, the earnings of the BAME group were particularly negatively affected by the pandemic. The differential impacts on earnings persisted across almost all months over the past year, except during the third lockdown. The gap was large even when most lockdown restrictions were eased in September 2020 (p = 0.003). The earning gap between the BAME group and whites was even larger among non-key workers. Over the past year, the decline in market working time was similar for the BAME group and whites in both the full and the non-key worker samples. In March 2021, the reduction in paid work time decreased less for the BAME group than for the whites (p = 0.006).

Regarding the distress level ( Fig 7 ), the increase for the BAME group was larger than that for whites during the first lockdown, but the difference was not statistically significant at the 0.05 level due to the large standard error of the estimates of the BAME group. Beginning in September 2020, the changes in the distress levels were similar for the BAME group and whites. The increase in housework hours seems to be larger for the BAME group, but the large standard errors prevent us from drawing a reliable conclusion.

Differential impacts on degree and non-degree holders

Figs 8 and 9 report whether changes in the five indicators differ between degree and non-degree holders. For monthly net earnings and weekly paid work hours, we analyzed an additional sample that includes only non-key workers.

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g008

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g009

As expected, the decline in earnings and paid work hours was particularly acute among non-degree holders. These differential impacts were even larger among non-key workers. When the spread of the virus decreased and most of the restrictive measures eased from July to September 2020, the difference in the impacts on non-degree and degree holders became smaller but was sustained. For paid work hours, the difference was insignificant between July and September 2020 for both the full and the non-keyworker samples. Once restrictive measures were reimposed, the difference became substantial again (p<0.001).

As Fig 9 shows, there was no significant difference in the change in subjective wellbeing between degree and non-degree holders before January 2021. However, degree holders experienced a larger increase in distress level during the third national lockdown that started in January 2021 (p = 0.028), but the differential effect disappeared in March 2021.

We do not observe a statistically significant difference in the changes in housework time between the two groups.

Changes in weekly childcare hours since April 2020

Fig 10 reports whether changes in the weekly childcare hours differ across these groups.

thumbnail

https://doi.org/10.1371/journal.pone.0257286.g010

Our findings show that women and men, BAME people and whites, and degree and non-degree holders did not differ significantly in changes to their childcare time since April 2020. However, there is a tendency that the reduction in childcare time in September, which should be associated with pupils returning to schools after summer vacation, was larger for mothers and the more educated group, suggesting that women and the more educated might have spent more time taking care of children at home.

For more details of the results, please refer to S2 – S5 Tables. The within-individual R-squares are small when predicting earnings, subjective wellbeing, housework time, and childcare time. Small within-individual R-squares are not uncommon in fixed-effect regressions, especially when predicting housework time and subjective wellbeing [ 50 , 51 ]. These results suggest that a limited number of individuals have changed their partnership and parenthood status and COVID-test results, but their outcome variables—earnings, time use, and subjective wellbeing-have changed considerably over the past year. The inclusion of more time-varying variables might be able to improve the explanatory power. Those variables could be whether furloughed, whether participated in the job retention scheme, or whether went back to work/school. However, the purpose of this paper is to provide an overall net impact of COVID-19 and its related measures on an individual instead of focusing on a specific policy or the spread of COVID-19. Given the focus on the trajectories of earnings, time use, and subjective wellbeing at different stages of the pandemic, we do not include those time-varying variables suggested above.

Discussion and conclusion

In this article, we have utilized the latest UK COVID panel data to provide a comprehensive analysis of the dynamics of earnings, time use, and subjective wellbeing at different stages of the pandemic over the past year. Our research, with a much extended time scope, surpasses past UK studies that only followed a short period after the first lockdown imposed in March 2020 [for example, 3, 7, 9, 20]. Our analysis has incorporated multiple domains of outcomes across several social groups. We aim to examine how the spread of COVID-19 and COVID-induced policies have had unequal and dynamic impacts on different social groups in the UK. Our findings offer important insights into whether inequalities in changes in income, time use and wellbeing are likely to be long lasting or temporary.

Overall, the initial outbreak of COVID-19 and the first national lockdown brought the largest change in earnings and time use. The later two lockdowns together with the repeated new highs of the COVID-19 cases and deaths impacted people’s subjective wellbeing the most. Although strict measures that aimed to reduce people’s physical contact were imposed in the later two lockdowns, people’s time use did not respond as strongly as they did during the first lockdown. Among the five indicators, none had returned to their prepandemic level until late March 2021. It remains uncertain when and whether earnings, working patterns, family life, and subjective wellbeing will return to the prepandemic level.

Female workers experienced less reduction in their earnings than male workers, which is largely due to the relatively high proportion of women working in key sectors, especially in the health and social care industry. Women have made an important contribution to the fight against COVID-19 by working in key sectors. However, even among non-key workers, the decline in paid work hours was smaller for women but only during the first lockdown period. These findings concur with earlier research that reported that men in the UK were more likely than women to be laid off or furloughed during the first lockdown [ 20 ]. Once lockdown measures were gradually lifted beginning in June 2020, men’s paid work time recovered faster than that of women. This finding is similar to previous work on the gendered impact of natural disasters on market labor [ 52 ]. In summary, our analysis has shown that in the UK, men’s paid work time was more responsive to the restrictive measures of the first lockdown, but women’s and men’s paid work time responded similarly in the later two lockdowns.

The subjective wellbeing of women was more sensitive to the outbreak of COVID-19 and related lockdown measures than that of men. For example, the increase in women’s distress level was substantial in April, but it then gradually improved until the next lockdown. Men’s responses lagged behind of those of women. Past COVID-19 research has highlighted the gender difference in social networks, where women tend to have more friends [ 29 ]. The larger exposure to news related to COVID-19 for those with more close friends might be the factor that explains the diverging trajectories of women’s and men’s subjective wellbeing [ 53 , 54 ]. Theses differential impacts became smaller in later two lockdowns, as the pandemic had developed for a certain period. At the beginning of the pandemic, women and men seemed to have perceived the danger of this infectious disease differently.

The gender gap in housework time was maintained over the past year. Overall, the gender-specific changes in earnings, paid work time, and subjective wellbeing were mainly observed when strict restrictions were in place, and the gender gap returned to its prepandemic level once those measures were lifted.

People of a BAME background experienced a larger loss in earnings than whites. This finding is consistent with an earlier finding on BAME immigrants in the UK [ 3 ]. We have further shown that the enlarged earning gaps between BAME and white people persisted almost over the entire year.

Persistently enlarged earning gaps were observed between non-degree and degree holders. The gap was even larger among non-key workers. Non-degree holders suffered from a larger reduction in earnings across all months over the past year. This gap was particularly large during the national lockdown periods. A similar observation was found for weekly paid work hours. The spread of COVID-19 and lockdown restrictions are associated with an enlarged gap in paid work time between non-degree and degree holders. This effect on paid work time is likely to be temporary because differential impacts were not observed from July to September 2020, when lockdown measures were mostly lifted.

One limitation of this study is that some changes could be brought by seasonal fluctuations beyond COVID-19 and its related restrictions. For example, people’s paid work time in winter may differ from that in summer. General psychological health was usually worse in winter than in summer [ 55 ]. The ideal solution is to compare information collected in the same month before the pandemic and in 2020. However, this approach is not possible with the current data. If the current survey retains the current monthly or bimonthly data collection frequency, future work can compare the same month in 2020 and the years after to examine pandemic and post-pandemic differences. We have also included the measure of the spread of COVID-19 (daily new cases or daily new death rates, as shown in Fig 1 ) to examine whether the outcomes are affected by the macrolevel development of the COVID-19 pandemic in the UK. We do not find strong evidence showing that those measures are associated with the outcomes. Our results reveal the trajectories of earnings, time use, and subjective wellbeing at different time points over the past year but cannot identify the exact impact of a specific lockdown restrictive policy. There could be other non-COVID-19-related policy updates that occurred in parallel over the past year that may have had an impact on the same outcomes. Nonetheless, the trends of the observed changes in income, time use, and subjective wellbeing corresponded closely to the different waves of the pandemic and the lockdown timeline. Therefore, the major sources of those changes should be related to the spread of COVID-19 and its related lockdown measures.

In conclusion, our findings suggest that the long-lasting pandemic and the related restrictions to contain the virus over the past year have produced persistent negative consequences for earnings, work patterns, and subjective wellbeing. The spread of COVID-19 and the national lockdowns at different stages had distinct patterns and measures, and their impacts on labor earnings, time use and subjective well-being varied. Time use patterns became less sensitive to the later lockdowns, but the distress levels reached a new high with repeated lockdowns in multiple waves of the pandemic. The differential impacts of the lockdown measures based on gender became insignificant once lockdown measures were lifted. However, some social groups, including BAME and white people and non-degree holders and degree holders, experienced persistently enlarged gaps in earnings. The negative impacts of the spread of COVID-19 and its related measures vary not only in their extent but also in their speed among different social groups. Further research should be conducted to understand factors that have driven these social inequalities and to monitor how inequalities based on gender, educational level, and ethnic minority status might be persistent or even exacerbated in the long term.

Supporting information

S1 table. samples and sample selection..

https://doi.org/10.1371/journal.pone.0257286.s001

S2 Table. Baseline model: Changes in the five indicators across waves.

https://doi.org/10.1371/journal.pone.0257286.s002

S3 Table. Gender and period interaction model results.

https://doi.org/10.1371/journal.pone.0257286.s003

S4 Table. Ethnicity and period interaction models.

https://doi.org/10.1371/journal.pone.0257286.s004

S5 Table. Education and period interaction model results.

https://doi.org/10.1371/journal.pone.0257286.s005

  • View Article
  • Google Scholar
  • PubMed/NCBI
  • 5. Barr D. Same storm, different boats. 2020. Available: https://www.damianbarr.com/latest/https/we-are-not-all-in-the-same-boat
  • 11. Coronavirus Resource Center, Johns Hopkins. Mortality analysis. 2021. Available: https://coronavirus.jhu.edu/data/mortality
  • 19. Office for National Statistics. Full report—women in the labour market: 2013. Office for National Statistics. 2013. Available: http://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/articles/womeninthelabourmarket/2013-09-25
  • 29. Etheridge B, Spantig L. The gender gap in mental well-being during the covid-19 outbreak: Evidence from the UK. Colchester: University of Essex, Institute for Social; Economic Research (ISER); 2020. Report No.: 2020–08. Available: http://hdl.handle.net/10419/227789
  • 30. Giovanis E. Income losses and subjective well-being: Gender and ethnic inequalities during the Covid-19 lockdown period in the UK. Research Square; 2021. https://doi.org/10.21203/rs.3.rs-132600/v2
  • 34. Blanchflower DG, Bryson A. Biden, COVID and mental health in America. NBER Working Paper. 2021. https://doi.org/10.3386/w29040
  • 35. UK Government3. Check if your employer can use the coronavirus job retention scheme. GOV.UK. 2020. Available: https://www.gov.uk/guidance/check-if-you-could-be-covered-by-the-coronavirus-job-retention-scheme
  • 36. Institute for Social and Economic Research, University of Essex. 2020. https://doi.org/10.5255/UKDA-SN-8644-3
  • 38. Institute for Social and Economic Research. Understanding Society COVID-19 user guide. Version 8.0. Colchester: University of Essex. 2021.
  • 39. Uhrig SCN. The nature and causes of attrition in the British Household Panel Study. Colchester: University of Essex, Institute for Social; Economic Research (ISER); 2008. Report No.: 2008–05. Available: http://hdl.handle.net/10419/92025
  • 47. Allison PD. Fixed effects regression models. SAGE Publications; 2009. Available: https://books.google.co.uk/books?id=3UxaBQAAQBAJ
  • 48. Abadie A, Athey S, Imbens GW, Wooldridge J. When should you adjust standard errors for clustering? National Bureau of Economic Research, Inc; 2017 Nov. Report No.: 24003. Available: https://ideas.repec.org/p/nbr/nberwo/24003.html
  • 50. Borra C, Browning M, Sevilla Sanz A. Marriage and housework. IZA Discussion Paper. 2017. Available: https://ssrn.com/abstract=2960549
  • 52. Bradshaw S. Socio-economic impacts of natural disasters: A gender analysis. United Nations Economic Commission for Latin America and the Caribbean. 2004. Available: https://www.cepal.org/en/publications/5596-socio-economic-impacts-natural-disasters-gender-analysis

Detail of a painting depicting the landscape of New Mexico with mountains in the distance

Explore millions of high-quality primary sources and images from around the world, including artworks, maps, photographs, and more.

Explore migration issues through a variety of media types

  • Part of The Streets are Talking: Public Forms of Creative Expression from Around the World
  • Part of The Journal of Economic Perspectives, Vol. 34, No. 1 (Winter 2020)
  • Part of Cato Institute (Aug. 3, 2021)
  • Part of University of California Press
  • Part of Open: Smithsonian National Museum of African American History & Culture
  • Part of Indiana Journal of Global Legal Studies, Vol. 19, No. 1 (Winter 2012)
  • Part of R Street Institute (Nov. 1, 2020)
  • Part of Leuven University Press
  • Part of UN Secretary-General Papers: Ban Ki-moon (2007-2016)
  • Part of Perspectives on Terrorism, Vol. 12, No. 4 (August 2018)
  • Part of Leveraging Lives: Serbia and Illegal Tunisian Migration to Europe, Carnegie Endowment for International Peace (Mar. 1, 2023)
  • Part of UCL Press

Harness the power of visual materials—explore more than 3 million images now on JSTOR.

Enhance your scholarly research with underground newspapers, magazines, and journals.

Explore collections in the arts, sciences, and literature from the world’s leading museums, archives, and scholars.

Powering Britain's Battery Revolution

  • Latest News and Events
  • Our Mission
  • UK Vehicle Ecosystem
  • Research Programme
  • The Faraday Battery Challenge
  • Lithium Ion
  • Beyond Lithium Ion
  • Batteries for Emerging Economies
  • Seed Projects

Scientific Publications

  • Faraday Insights
  • Consultations
  • Early-stage commercialisation
  • Faraday Battery Challenge
  • Early Career Researchers
  • PhD Researchers
  • Undergraduates
  • STEM Outreach
  • Public Engagement With Research
  • Our Community
  • Researcher Resources
  • Work with Us
  • Opportunities
  • Career Development
  • Fellowships and Sprints

Research from the Faraday Institution’s programme is internationally recognised as a mark of excellence. Scientific discoveries have led to highly cited publications, a suite of patents, and commercial spin outs. Since its inception, the Faraday Institution has contributed over 777 publications to the scientific literature, more than 85 of which represent collaborative work across Faraday Institution research projects.

The following statistical data derives from the SciVal record from April 2018 to October 2023, which recognises 733 papers and 2,228 authors. 90.2% of publications are in open access journals, with 17.3% categorised as gold open access. 45 papers were published in collaboration with an industry partner. Almost half (44.1%) of the published research coming out of the Faraday Institution has international collaborators, spanning over 391 institutions, 39 countries and 6 continents. Key countries that collaborate most frequently with the Faraday Institution include the USA, China, Germany, France, Sweden, South Korea and Spain in that order.

Faraday Institution publication map. North America - 104, South America - 7, Africa - 3, Europe - 800+. Asia - 151, Australia - 10.

Number of Faraday Institution papers involving an international collaboration, per country, October 2023, source: Scopus via SciVal

Overall, Faraday Institution publications are of measurably high quality. 91.5% appear in the top quartile journals, with 64.6% in the top 10% of journals. Notably, 47.5% fall into the top 10% most cited publications worldwide, which serves to raise the UK average Field-Weighted Citation Impact (FWCI) in the research domains in which the Faraday Institution operates (chemistry, materials science, energy, engineering, chemical engineering, physics & astronomy, environmental science). Faraday Institution publications have 19,601 overall citations, with 26.7 citations per publication on average. In the UK, research in 2022 carries a FWCI of 1.54. Faraday Institution research is on target to be ahead of this with a FWCI of 2.4 (as of October 2023).

Technology Roadmaps

The following Faraday Institution technology roadmaps present an overview of the fundamental challenges impeding the commercial development of a range of energy storage technologies, the necessary advances to understand the underlying science, and the multidisciplinary approach being taken by our researchers in facing these challenges. It is our hope that these roadmaps will guide academia, industry, and funding agencies towards the further development of such batteries in the future.

2023 Neutron and Muon Characterisation Techniques for Battery Materials

2023 Neutron and muon characterisation techniques for battery materials , Gabriel Perex et al., Journal of Materials Chemistry A, April 2023, DOI 10.1039/D2TA07235A

2023 Roadmap for a Sustainable Circular Economy in Lithium-Ion and Future Battery Technologies

2023 Roadmap for a sustainable circular economy in lithium-ion and future battery technologies . Gavin Harper et al, IOP Science, JPhys Energy, DOI 10.1088/2515-7655/acaa57

2022 Roadmap on Li-Ion Battery Manufacturing Research

2022 Roadmap on Li-ion battery manufacturing research . Patrick S Grant et al, IOP Materials, JPhys Energy, DOI 10.1088/2515-7655/ac8e30

2021 Roadmap on Lithium-Ion Battery Cathode Materials

2021 Roadmap on Lithium-Ion Battery Cathode Materials , Samuel G. Booth et al, APL Materials, October 2021, DOI:10.1063/5.0051092

2021 Roadmap on Sodium-ion Batteries

2021 Roadmap on Sodium-ion Batteries , : Nuria Tapia-Ruiz et al., J Phys Energy, July 2021, DOI:10.1088/2515-7655/ac01ef

2020 Roadmap on Solid-state Batteries

2020 Roadmap on Solid-state Batteries , Mauro Pasta et al., J Phys Energy, August 2020, DOI:10.1088/2515-7655/ab95f4

Publications

A full list of publications to October 2023 for each project can be found below.

Papers from the Lithium-ion Battery Projects

The Faraday Institution’s portfolio of research includes seven projects that aim to optimise the performance of lithium-ion technologies.

Battery Degradation

Multi-scale modelling, recycling and reuse (relib), electrode manufacturing (nextrode), battery safety (safebatt), cathode materials (futurecat), cathode materials (catmat), papers from the beyond lithium-ion projects.

The Faraday Institution’s portfolio of research includes three projects that explore new battery chemistries.

Solid-state Batteries (SOLBAT)

Sodium-ion batteries (nexgenna), lithium-sulfur batteries (listar), papers from the characterisation projects, link to characterisation projects papers, news feeds / social media.

UK Research and Innovation Logo

Tel 01235 425300 Registered Charity, number 1176500 A company limited by guarantee, registered in England and Wales, number 10959095 Registered office and correspondence address: The Faraday Institution, Quad One, Becquerel Avenue, Harwell Campus, Didcot, OX11 0RA, UK

Privacy Overview

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 07 June 2023

CORE: A Global Aggregation Service for Open Access Papers

  • Petr Knoth   ORCID: orcid.org/0000-0003-1161-7359 1 ,
  • Drahomira Herrmannova   ORCID: orcid.org/0000-0002-2730-1546 1   nAff2 ,
  • Matteo Cancellieri 1 ,
  • Lucas Anastasiou 1 ,
  • Nancy Pontika 1 ,
  • Samuel Pearce 1 ,
  • Bikash Gyawali 1 &
  • David Pride 1  

Scientific Data volume  10 , Article number:  366 ( 2023 ) Cite this article

6513 Accesses

76 Altmetric

Metrics details

  • Research data

This paper introduces CORE, a widely used scholarly service, which provides access to the world’s largest collection of open access research publications, acquired from a global network of repositories and journals. CORE was created with the goal of enabling text and data mining of scientific literature and thus supporting scientific discovery, but it is now used in a wide range of use cases within higher education, industry, not-for-profit organisations, as well as by the general public. Through the provided services, CORE powers innovative use cases, such as plagiarism detection, in market-leading third-party organisations. CORE has played a pivotal role in the global move towards universal open access by making scientific knowledge more easily and freely discoverable. In this paper, we describe CORE’s continuously growing dataset and the motivation behind its creation, present the challenges associated with systematically gathering research papers from thousands of data providers worldwide at scale, and introduce the novel solutions that were developed to overcome these challenges. The paper then provides an in-depth discussion of the services and tools built on top of the aggregated data and finally examines several use cases that have leveraged the CORE dataset and services.

Similar content being viewed by others

research papers uk

A data citation roadmap for scholarly data repositories

Martin Fenner, Mercè Crosas, … Tim Clark

research papers uk

A large dataset of scientific text reuse in Open-Access publications

Lukas Gienapp, Wolfgang Kircheis, … Martin Potthast

research papers uk

SciSciNet: A large-scale open data lake for the science of science research

Zihang Lin, Yian Yin, … Dashun Wang

Introduction

Scientific literature contains some of the most important information we have assembled as a species, such as how to treat diseases, solve difficult engineering problems, and answer many of the world’s challenges we are facing today. The entire body of scientific literature is growing at an enormous rate with an annual increase of more than 5 million articles (almost 7.2 million papers were published in 2022 according to Crossref, the largest Digital Object Identifier (DOI) registration agency). Furthermore, it was estimated that the amount of research published each year increases by about 10% annually 1 . At the same time, an ever growing amount of research literature, which has been estimated to be well over 1 million publications per year in 2015 2 , is being published as open access (OA), and can therefore be read and processed with limited or no copyright restrictions. As reading this knowledge is now beyond the capacities of any human being, text mining offers the potential to not only improve the way we access and analyse this knowledge 3 , but can also lead to new scientific insights 4 .

However, systematically gathering scientific literature to enable automated methods to process it at scale is a significant problem. Scientific literature is spread across thousands of publishers, repositories, journals, and databases, which often lack common data exchange protocols and other support for inter-operability. Even when protocols are in place, the lack of infrastructure for collecting and processing this data, as well as restrictive copyrights and the fact that OA is not yet the default publishing route in most parts of the world further complicate the machine processing of scientific knowledge.

To alleviate these issues and support text and data mining of scientific literature we have developed CORE ( https://core.ac.uk/ ). CORE aggregates open access research papers from thousands of data providers from all over the world including institutional and subject repositories, open access and hybrid journals. CORE is the largest collection of OA literature–at the time of writing this article, it provides a single point of access to scientific literature collected from over ten thousand data providers worldwide and it is constantly growing. It provides a number of ways for accessing its data for both users and machines, including a free API and a complete dump of its data.

As of January 2023, there are 4,700 registered API users and 2,880 registered dataset and more than 70 institutions have registered to use CORE Recommender in their repository systems.

The main contributions of this work are the development of CORE’s continuously growing dataset and the tools and services built on top of this corpus. In this paper, we describe the motivation behind the dataset’s creation and the challenges and methods of assembling it and keeping it continuously up-to-date. Overcoming the challenges posed by creating a collection of research papers of this scale required devising innovative solutions to harvesting and resource management. Our key innovations in this area which have contributed to the improvement of the process of aggregating research literature include:

Devising methods to extend the functionality of existing widely-adopted metadata exchange protocols which were not designed for content harvesting, to enable efficient harvesting of research papers’ full texts.

Developing a novel harvesting approach (referred to here as CHARS) which allows us to continuously utilise the available compute resources while providing improved horizontal scalability, recoverability, and reliability.

Designing an efficient algorithm for scheduling updates of harvested resources which optimises the recency of our data while effectively utilising the compute resources available to us.

This paper is organised as follows. First, in the remainder of this section, we present several use cases requiring large scale text and data mining of scientific literature, and explain the challenges in obtaining data for these tasks. Next, we present the data offered by CORE and our approach for systematically gathering full text open access articles from thousands of repositories and key scientific publishers.

Terminology

In digital libraries the term record is typically used to denote a digital object such as text, image, or video. In this paper and when referring to data in CORE, we use the term metadata record to refer to the metadata of a research publication, i.e. the title, authors, abstract, project funding details, etc., and the term full text record to describe a metadata record which has an associated full text.

We use the term data provider to refer to any database or a dataset from which we harvest records. Data providers harvested by CORE include disciplinary and institutional repositories, publishers and other databases.

When talking about open access (OA) to scientific literature, we refer to the Budapest Open Access Initiative (BOAI) definition which defines OA as “free availability on the public internet, permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose” ( https://www.budapestopenaccessinitiative.org/read ). There are two routes to open access, 1) OA repositories and 2) OA journals. The first can be achieved by self-archiving (depositing) publications in repositories (green OA), and the latter by directly publishing articles in OA journals (gold OA).

Text and Data Mining of Scientific Literature

Text and data mining (TDM) is the discovery by a computer of new, previously unknown information, by automatically extracting information from different written resources ( http://bit.ly/jisc-textm ). The broad goal of TDM of scientific literature is to build tools that can retrieve useful information from digital documents, improve access to these documents, or use these documents to support scientific discovery. OA and TDM of scientific literature have one thing in common–they both aim to improve access to scientific knowledge for people. While OA aims to widen the availability of openly available research, TDM aims to improve our ability to discover, understand and interpret scientific knowledge.

TDM of scientific literature is being used in a growing number of applications, many of which were until recently not viable due to the difficulties associated with accessing the data from across many publishers and other data providers. Because many use cases involving text and data mining can only realise their full potential when they are executed on an as large corpus of research papers as possible, these data access difficulties have rendered many of the uses cases described below very difficult to achieve. For example, to reliably detect plagiarism in newly submitted publications it is necessary to have access to an always up-to-date dataset of published literature spanning all disciplines. Based on data needs, scientific literature TDM use cases can be broadly categorised into the following two categories, which are shown in Fig.  1 :

A priori defined sample use cases: Use cases which require access to a subset of scientific publications that can be specified prior to the execution of the use case. For example, gathering the list of all trialled treatments for a particular disease in the period 2000–2010 is a typical example of such a use case.

Undefined sample use cases: Use cases which cannot be completed using data samples that are defined a priori. The execution of such use cases might require access to data not known prior to the execution or may require access to all data available. Plagiarism detection is a typical example of such use case.

figure 1

Example uses cases of text and data mining of scientific literature. Depending on data needs, TDM uses can be categorised into a) a priori defined sample use cases, and b) undefined sample use cases. Furthermore, TDM use cases can broadly be categorised into 1) indirect applications which aim to improve access to and organisation of literature and 2) direct applications which focus on answering specific questions or gaining insights.

However, there are a number of factors that significantly complicate access to data for these applications. The needed data is often spread across many publishers, repositories, and other databases, often lacking interoperability (these factors will be further discussed in the next section). Consequently, researchers and developers working in these areas typically invest a considerable amount of time in corpus collection, which could be up to 90% of the total investigation time 5 . For many, this task can even prove impossible due to technical restrictions and limitations of publisher platforms, some of which will be discussed in the next section. Consequently, there is a need for a global, continuously updated, and downloadable dataset of full text publications to enable such analysis.

Challenges in machine access to scientific literature

Probably the largest obstacle to the effective and timely retrieval of relevant research literature is that it may be stored in a wide variety of locations with little to no interoperability: repositories of individual institutions, publisher databases, conference and journal websites, pre-print databases, and other locations, each of which typically offers different means for accessing their data. While repositories often implement a standard protocol for metadata harvesting, the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH), publishers typically allow access to their data through custom made APIs, which are not standardised and are subject to changes 6 . Other data sources may provide static data dumps in a variety of formats or not offer programmatic access to their data at all.

However, even when publication metadata can be obtained, other steps involved in the data collection process complicate the creation of a final dataset suitable for TDM applications. For example, the identification of scientific publications within all downloaded documents, matching these publications correctly to the original publication metadata, and their conversion from formats used in publishing, such as the PDF format, into a textual representation suitable for text and data mining, are just some of the additional difficulties involved in this process. The typical minimum steps involved in this process are illustrated in Fig.  2 . As there are no widely adopted solutions providing interoperability across different platforms, custom harvesting solutions need to be created for each.

figure 2

Example illustration of the data collection process. The figure depicts the typical minimum steps which are necessary to produce a dataset for TDM of scientific literature. Depending on the use case, tens or hundreds of different data sources may need to be accessed, each potentially requiring a different process–for example accessing a different set of API methods or a different process for downloading publication full text. Furthermore, depending on the use case, additional steps may be needed, such as extraction of references, identification of duplicate items or detection of the publication’s language. In the context of CORE, we provide the details of this process in Section Methods.

Challenges in systematically gathering open access research literature

Open access journals and repositories are increasingly becoming the central providers of open access content, in part thanks to the introduction of funder and institutional open access policies 7 . Open access repositories include institutional repositories such as the University of Cambridge Repository https://www.repository.cam.ac.uk/ , and subject repositories such arXiv https://arxiv.org/ . As of February 2023, there are 6,015 open access repositories indexed in the Directory of Open Access Repositories http://v2.sherpa.ac.uk/opendoar/ (OpenDOAR), as well as 18,935 open access journals indexed in the Directory of Open Access Journals https://doaj.org/ (DOAJ). However, open access research literature can be stored in a wide variety of other locations, including publisher and conference websites, individual researcher websites, and elsewhere. Consequently, a system for harvesting open access content needs to be able to harvest effectively from thousands of data providers. Furthermore, a large number of open access repositories (69.4% of repositories indexed in OpenDOAR as of January 2018) expose their data through the OAI-PMH protocol while often not providing any alternatives. An open access harvesting system therefore also needs to be able to effectively utilise OAI-PMH for open access content harvesting. However, these two requirements–harvesting from thousands of data providers and utilising OAI-PMH for content harvesting–pose a number of significant scalability challenges.

Challenges related to harvesting from thousands of data providers

Open access data providers vary greatly in size, with some hosting millions of documents while others host a significantly lower number. New documents are added and old documents are often updated by data providers daily.

Different geographic locations and internet connection speeds may result in vastly differing times needed to harvest information from different providers, even when their size in terms of publication numbers is the same. As illustrated in Table  1 , there are also a variety of OAI-PMH implementations across commonly used repository platforms providing significantly different harvesting performance. To construct this table, we analysed OAI-PMH metadata harvesting performances of 1,439 repositories in CORE, covering eight different repository platforms. It should be noted that the OAI-PMH protocol only necessitates metadata to be expressed in the Dublin Core (DC) format. However, it also can be extended to express the metadata in other formats. Because the Dublin-Core standard is constrained to just 15 elements, it is not uncommon for OAI-PMH repositories to also use and extended metadata format such as Rioxx ( https://rioxx.net ) or the OpenAIRE Guidelines ( https://www.openaire.eu/openaire-guidelines-for-literature-institutional-and-thematic-repositories ).

Additionally, harvesting is limited not only by factors related to the data providers, but also by the compute resources (hardware) available to the aggregator. As many use cases listed in the Introduction, such as in plagiarism detection or systematic review automation, require access to very recent data, ensuring that the harvested data stays recent and that the compute resources are utilised efficiently both pose significant challenges.

To overcome these challenges, we designed the CORE Harvesting System (CHARS) which relies on two key principles. The first is the application of the microservices software principles to open access content harvesting 8 . The second is our strategy we denote pro-active harvesting , which means that providers are scheduled automatically according to current need. This strategy is implemented in the harvesting Scheduler (Section CHARS_architecture). The Scheduler uses a formula we designed for prioritising data providers.

The combination of the Scheduler with CHARS microservices architecture enables us to schedule harvesting according to current compute resource utilisation, thus greatly increasing our harvesting efficiency. Since switching from a fixed-schedule approach described above to pro-active harvesting, we have been able to greatly improve the data recency of our collection as well as to increase the size of the collection threefold within the span of three years.

Challenges related to the use of OAI-PMH protocol for content harvesting

As explained above, OAI-PMH is currently the standard method for exchanging data across repositories. While the OAI-PMH protocol was originally been designed for metadata harvesting only, it has been, due to its wide adoption and lack of alternatives, used as an entry point for full text harvesting. Full text harvesting is achieved by extracting URLs from the metadata records collected through OAI-PMH, the extracted URLs are then used to discover the location of the actual resource 9 . However, there are a number of limitations of the OAI-PMH protocol which make it unsuitable for large-scale content harvesting:

It directly supports only metadata harvesting, meaning additional functionality has to be implemented in order to use it for content harvesting.

The location of full text links in the OAI-PMH metadata is not standardised and the OAI-PMH metadata records typically contain multiple links. From the metadata it is not clear which of these links points to the described representation of the resource and in many cases none of them does so directly. Therefore, all possible links to the resource itself have to be extracted from the metadata and tested to identify the correct resource. Furthermore, OAI-PMH does not facilitate any validation for ensuring the discovered resource is truly the described resource. In order to overcome this issues, the adoption of the RIOXX https://rioxx.net/ metadata format or the OpenAIRE guidelines https://guidelines.openaire.eu/ has been promoted. However, the issue of unambiguously connecting metadata records and the described resource is still present.

The architecture of the OAI-PMH protocol is inherently sequential, which makes it ill-suited for harvesting from very large repositories. This is because the processing of large repositories cannot be parallelised and it is not possible to recover the harvesting in case of failures.

Scalability across different implementations of OAI-PMH differs dramatically. Our analysis (Table  1 ) shows that performance can differ significantly also when only a single repository software is considered 10 .

Other limitations include difficulties in incremental harvesting, reliability issues, metadata interoperability issues, and scalability issues 11 .

We have designed solutions to overcome a number of these issues, which have enabled us to efficiently and effectively utilise OAI-PMH to harvest open access content from repositories. We present these solutions in Section Using OAI-PMH for content harvesting. While we currently rely on a variety of solutions and workarounds to enable content harvesting through OAI-PMH, most of the limitations listed in this section could also be addressed by adopting more sophisticated data exchange protocols, such as the ResourceSync ( http://www.openarchives.org/rs/1.1/resourcesync ) protocol which was designed with content harvesting in mind 10 and the adoption in the systems of data providers we support.

Our solution

In the above sections we have highlighted a critical need for many researchers and organisations globally for large-scale always up-to-date seamless machine access to scientific literature originating from thousands of data providers at full text level. Providing this seamless access has become both a defining goal and a feature of CORE and has enabled other researchers to design and test innovative methods on CORE data, often powered by artificial intelligence processes. In order to put together this vast continuously updated dataset, we had to overcome a number of research challenges, such as those related to the lack of interoperability, scalability, regular content synchronisation, content redundancy and inconsistency. Our key innovation in this area is the improvement of the process of aggregating research literature , as specified in the Introduction section.

This underpinning research has allowed CORE to become a leading provider of open access papers. The amount of data made available by CORE has been growing since 2011 12 and is continuously kept up to date. As of February 2023, CORE provides access to over 291 million metadata records and 32.8 million full text open access articles, making it the world’s largest archive of open access research papers, significantly larger than PubMed, arXiv and JSTOR datasets.

Whilst there are other publication databases that could be initially viewed as similar to CORE, such as BASE or Unpaywall, we will demonstrate the significant differences that set CORE apart and show how it provides access to a unique, harmonised corpus of Open Access literature. A major difference between these existing services is that CORE is completely free to use for the end user, it hosts full text content, and offers several methods for accessing its data for machine processing. Consequently, it removes the need to harvest and pre-process full text for text mining, since CORE provides plain text access to the full texts via its raw data services, eliminating the need for text and data miners to work on PDF formats. A detailed comparison of other publication databases is provided in the Discussion. In addition, CORE enables building powerful services on top of the collected full texts, supporting all the categories of use cases outlined in the Use cases section.

As of today, CORE provides three services for accessing its raw data: API, dataset, and a FastSync service. The CORE API provides real-time machine access to both metadata and full texts of research papers. It is intended for building applications that need reliable access to a fraction of CORE data at any time. CORE provides a RESTful API. Users can register for an API key to access the service. Full documentation and Python notebooks containing code examples can be found on the CORE documentation pages online ( https://api.core.ac.uk/docs/v3 ). The CORE Dataset can be used to download CORE data in bulk. Finally, CORE FastSync enables third party systems to keep an always up to date copy of all CORE data within their infrastructure. Content can be transferred as soon as it becomes available in CORE using a data synchronisation service on top of the ResourceSync protocol 13 optimised by us for improved synchronisation scalability with an on-demand resource dumps capability. CORE FastSync provides fast, incremental and enterprise data synchronisation.

CORE is the largest up-to-date full text open access dataset as well as one of the most widely used services worldwide supporting access to freely available research literature. CORE regularly releases data dumps licensed as ODC-By, making the data freely available for both commercial and non-commercial purposes. Access to CORE data via the API is provided freely to individuals conducting work in their own personal capacity and to public research organisations for unfunded research purposes. CORE offers licenses to commercial organisations wanting to use CORE services to obtain a convenient way of accessing CORE data with a guaranteed level of service support. CORE is operated as a not-for-profit entity by The Open University and this business model makes it possible for CORE to remain free for the >99.99% of its users.

A large number of commercial organisations have benefited from these licenses in areas as diverse as plagiarism detection in research, building specialised scholarly publication search engines, developing scientific assistants and machine translation systems and supporting education etc. https://core.ac.uk/about/endorsements/partner-projects . The CORE data services–CORE API and Dataset, have been used by over 7,000 experts to analyse data, develop text-mining applications and to embed CORE into existing production systems.

Additionally, more than 70 repository systems have registered to use the CORE Recommender and the service is notably used by prestigious institutions, including the University of Cambridge and by popular pre-prints services such as arXiv.org. Other CORE services are the CORE Discovery and the CORE Repository Dashboard. The first was released on July 2019 and at the time of writing it has more than 5000 users. The latter is a tool designed specifically for repository managers which provides access to a range of tools for managing the content within their repositories. The CORE Repository Dashboard is currently used by 499 users from 36 countries.

In the rest of this paper we describe the CORE dataset and the methods of assembling it and keeping it continuously up-to-date. We also present the services and tools built on top of the aggregated corpus and provide several examples of how the CORE dataset has been used to create real-world applications addressing specific use-cases.

As highlighted in the Introduction, CORE is a continuously growing dataset of scientific publications for both human and machine processing. As we will show in this section, it is a global dataset spanning all disciplines and containing publications aggregated from more than ten thousand data providers including disciplinary and institutional repositories, publishers, and other databases. To improve access to the collected publications, CORE performs a number of data enrichment steps. These include metadata and full text extraction, language and DOI detection, and linking with other databases. Furthermore, CORE provides a number of services which are built on top of the data: a publications recommender ( https://core.ac.uk/services/recommender/ ), CORE Discovery service ( https://core.ac.uk/services/discovery/ ) (a tool for discovering OA versions of scientific publications), and a dashboard for repository managers ( https://core.ac.uk/services/repository-dashboard/ ).

Dataset size

As of February 2023, CORE is the world’s largest dataset of open access papers (comparison with other systems is provided in the Discussion). CORE hosts over 291 million metadata records including over 34 million articles with full text written in 82 languages and aggregated from over ten thousand data providers located in 150 countries. Full details of CORE Dataset size are presented in Table  2 . In the table, “Metadata records” represent all valid (not retracted, deleted, or for some other reason withdrawn) records in CORE. It can be seen that about 13% of records in CORE contain full text. This number represents records for which a manuscript was successfully downloaded and converted to plain text. However, a much higher proportion of records contains links to additional freely available full text articles hosted by third-party providers. Based on analysing a subset of our data, we estimate that about 48% of metadata records in CORE fall into this category, indicating that CORE is likely to contain links to open access full texts for 139 million articles. Due to the nature of academic publishing there will be instances where multiple versions of the same paper are deposited in different repositories. For example, an early version of an article can be deposited by an author to a pre-print server such as arXiv or BiorXiv and then a later version uploaded to an institutional repository. Identifying and matching these different versions is a significant undertaking. CORE has carried out research to develop techniques based on locality sensitive hashing for duplicates identification 8 and integrated these into its ingestion pipeline to link versions of papers from across the network of OA repositories and group these under a single works entity. The large number of records in CORE translates directly into the size of the dataset in bytes as the uncompressed version of the dataset including PDFs is about 100 TB. The compressed version of the CORE dataset with plain texts only amounts to 393 GB and uncompressed to 3.5 TBs.

Recent studies have estimated that around 24%–28% of all articles are available free to read 2 , 14 . There are a number of reasons why the proportion of full text content in CORE is lower than these estimates. The main reason is likely that a significant proportion of the free to read articles represents content hosted on platform with many restrictions for machine accessibility, i.e. some repositories severely restrict or fully prohibit content harvesting 9 .

The growth of CORE has been made possible thanks to the introduction of a novel harvesting system and the creation of an efficient harvesting scheduler, both of which are described in the Methods section. The growth of metadata and full text records in CORE is shown in Fig.  3 . Finally, Fig.  4 shows age of publications in CORE.

figure 3

Growth of records in CORE per month since February 2012. “Full text growth” represents growth of records containing full text, while “Metadata growth” represents growth of records without full text, i.e. the two numbers do not overlap. The two area plots are stacked on top of each other, their sum therefore represents the total number of records in CORE.

figure 4

Age of publications in CORE. Similarly as in Fig.  3 , the “Metadata” and “Full text” records bars are stacked on top of each other.

Data sources and languages

As of February 2023, CORE was aggregating content from 10,744 data sources. These data sources include institutional repositories (for example the USC Digital Library or the University of Michigan Library Repository), academic publishers (Elsevier, Springer), open access journals (PLOS), subject repositories, including those hosting eprints (arXiv, bioRxiv, ZENODO, PubMed Central) and aggregators (e.g. DOAJ). The ten largest data sources in CORE are shown in Table  3 . To calculate the total number of data providers in CORE, we consider aggregators and publishers as one data source despite each aggregating data from multiple sources. A full list of all data providers can be found on the CORE website. ( https://core.ac.uk/data-providers ).

The data providers aggregated by CORE are located in 150 different countries. Figure  5 shows the top ten countries in terms of number of data providers aggregated by CORE from each country alongside the top ten languages. The geographic spread of repositories is largely reflective of the size of the research economy in those countries. We see the US, Japan, Germany, Brazil and the UK all in the top six. One result that at first may appear surprising is the significant number of repositories in Indonesia, enough to place them at the top of the list. An article in Nature in 2019 showed that Indonesia may be the world’s OA leader, finding that 81% of 20,000 journal articles published in 2017 with an Indonesia-affiliated author are available to read for free somewhere online. ( https://www.nature.com/articles/d41586-019-01536-5 ). Additionally, there are a large number of Indonesian open-access journals registered with Crossref. This subsequently leads to a much higher number of individual repositories in this country.

figure 5

Top ten languages and top ten provider locations in CORE.

As part of the enrichment process, CORE performs language detection. Language is either extracted from the attached metadata where available or identified automatically from full text in case it is not available in metadata. More than 80% of all documents with language information are in English. Overall, CORE contains publications in a variety of languages, the top 10 of which are shown in Fig.  5 .

Document types

The CORE dataset comprises a collection of documents gathered from various sources, many of which contain articles of different types. Consequently, aside of research articles from journals and conferences, it includes other types of research outputs such as research theses, presentations, and technical reports. To distinguish different types of articles, CORE has implemented a method of automatically classifying documents into one of the following four categories 15 : (1) research article, (2) thesis, (3) presentation, (4) unknown (for articles not belonging into any of the previous three categories). This method is based on a supervised machine learning model trained on article full texts. Figure  6 shows the distribution of articles in CORE into these four categories. It can be seen that the collection aggregated by CORE consists predominantly of research articles. We have observed in the data collected from repositories that the vast majority of research theses deposited in repositories has full text associated with the metadata. As this is not always the case for research articles, and as Fig.  6 is produced on articles with full text only, we expect that the proportion of research articles compared to research theses in CORE is actually higher across the entire collection.

figure 6

Distribution of document types.

Research disciplines

To analyse the distribution of disciplines in CORE we have leveraged a third-party service. Figure  7 shows a subject distribution of a sample of 20,758,666 publications in CORE. For publications with multiple subjects we count the publication towards each discipline.

figure 7

Subject distribution of a sample of 20,758,666 CORE publications.

The subject for each article was obtained using Microsoft Academic ( https://academic.microsoft.com/home ) prior to its retirement in November 2021. Our results are consistent with other studies, which have reported Biology, Medicine, and Physics to be the largest disciplines in terms of number of publications 16 , 17 , suggesting that the distribution of articles in CORE is representative of research publications in general.

Additional CORE Tools and Services

CORE has built several additional tools for a range of stakeholders including institutions, repository managers and researchers from across all scientific domains. Details of usage of these services is covered in the Uptake of CORE section.

The Dashboard provides a suite of tools for repository management, content enrichment, metadata quality assessment and open access compliance checking. Further, it can provide statistics regarding content downloads and suggestions for improving the efficiency of harvesting and the quality of metadata.

CORE Discovery helps users to discover freely accessible copies of research papers. There are several methods for interacting with the Discovery tool. First, as a plugin for repositories, enriching metadata only pages in repositories with links to open access copies of full text documents. Second, via a browser extension for researchers and anyone interested in reading scientific documents. And finally as an API service for developers.

Recommender

The recommender is a plugin for repositories, journal systems and web interfaces that provides suggestions on relevant articles to the one currently displayed. Its purpose is to support users in discovering articles of interest from across the network of open access repositories. It is notably used by prestigious institutions, including the University of Cambridge and by popular pre-prints services such as arXiv.org.

OAI Resolver

An OAI (Open Archives Initiative) identifier is a unique identifier of a metadata record. OAI identifiers are used in the context of repositories using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). OAI Identifiers are viable persistent identifiers for repositories that can be, as opposed to DOIs, minted in a distributed fashion and cost-free, and which can be resolvable directly to the repository rather than to the publisher. The CORE OAI Resolver can resolve any OAI identifier to either a metadata page of the record in CORE or route it directly to the relevant repository page. This approach has the potential to increase the importance of repositories in the process of disseminating knowledge.

Uptake of CORE

As of February 2023, CORE averages over 40 million monthly active users and is the top 10th website in the category Science and Education according to SimilarWeb ( https://www.similarweb.com/ ). There are currently 4,700 registered API users and 2,880 registered dataset users. The CORE Dashboard is currently used by 499 institutional repositories to manage their open access content, monitor content download statistics, manage issues with metadata within the repository and ensure compliance with OA funder policies, notably REF in the U.K. The CORE Discovery plugin has been integrated into 434 repositories and the browser extension has been downloaded by more than 5,000 users via the Google Chrome Web Store ( https://chrome.google.com/webstore/category/extensions ). The CORE Recommender has been embedded in 70 repository systems including the University of Cambridge and arXiv.

In this section we discuss differences between CORE and other open access aggregation services and present several real-word use cases where CORE was used to develop services to support science. In this section we also present our future plans.

Existing open access aggregation services

Currently there are a number of open access aggregation services available (Table  4 ), with some examples being BASE ( https://base-search.net/ ), OpenAIRE ( https://www.openaire.eu/ ), Unpaywall ( http://unpaywall.org/ ), Paperity ( https://paperity.org/ ). BASE (Bielfield Academic Search Engine) is a global metadata harvesting service. It harvests repositories and journals via OAI-PMH and exposes the harvested content through an API and a dataset. OpenAIRE is a network of open access data providers who support open access policies. Even though in the past the project focused on European repositories, it has recently expanded by including institutional and subject repositories from outside Europe. A key focus of OpenAIRE is to assist the European Council to monitor compliance of its open access policies. OpenAIRE data is exposed via an API. Paperity is a service which harvests publications from open access journals. Paperity harvests both metadata and full text but does not host full texts. SHARE (Shared Access Research Ecosystem) is a harvester of open access content from US repositories. Its aim is to assist with the White House Office of Science and Technology Policy (OSTP) open access policies compliance. Even though SHARE harvests both metadata and full text it does not host the latter. Unpaywall is not primarily a harvester, but rather collects content from Crossref, whenever a free to read available version can be retrieved. It processes both metadata and full text but does not host them. It exposes the discovered links to documents through an API.

CORE differs from these services in a number of ways. CORE is currently the largest database of full text OA documents. In addition, CORE offers via its API a rich metadata record for each item in its collection which includes additional enrichments, contrary, for example, to Unpaywall’s API, which focuses only on delivering to the user information as to whether a free to read version is available. CORE also provides the largest number of links to OA content. To simplify access to data for end users it provides a number of ways for accessing its collection. All of the above services are free to use for research purposes however both CORE and Unpaywall also offer services to commercial partners on a paid-for basis.

Existing publication databases

Apart from OA aggregation services, a number of other services exists for searching and downloading scientific literature (Table  5 ). One of the main publication databases is Crossref ( https://www.crossref.org/ ), an authoritative index of DOI identifiers. Its primary function is to maintain metadata information associated with each DOI. The metadata stored by Crossref includes both OA and non-OA records. Crossref does not store publication full text, but for many publications provides full text links. As of February 2023, 5.9 m records in Crossref were associated with an explicit Creative Commons license (we have used the Crossref API to determine this number). Although Crossref provides an API, it does not offer its data for download in bulk, or provide a data sync service.

The remaining services from Table  5 can be roughly grouped into the following two categories: 1) citation indices, 2) academic search engines and scholarly graphs. The two major citation indices are Elsevier’s Scopus ( https://www.elsevier.com/solutions/scopus ) and Clarivate’s Web of Science ( https://clarivate.com/webofsciencegroup/solutions/web-of-science/ ), both of which are premium subscription services. Google Scholar, the best known academic search engine does not provide an API for accessing its data and does not permit crawling its website. Semantic Scholar ( https://www.semanticscholar.org/ ) is a relatively new academic search service which aims to create an “intelligent academic search engine” 18 . Dimensions ( https://www.dimensions.ai/ ) is a service focused on data analysis. It integrates publications, grants, policy documents, and metrics. 1findr ( https://1findr.1science.com/home ) is a curated abstract indexing service. It provides links to full text, but no API or a dataset for download.

The added value of CORE

There are other services that claim to provide access to a large dataset of open access papers. In particular, Unpaywall 2 , claim to provide access to 46.4 million free to read articles, and BASE, who state they provide access to full texts of about 60% of their 300 million metadata records. However, these statistics are not directly comparable to the numbers we report and are a product of a different focus of these two projects. This is because both the analysis of BASE and now Unpaywall define “providing access to” in terms of having a list of URLs from which a human user can navigate to the full text of the resource. This means that both Unpaywall and BASE do not collect these full text resources, which is also why they do not face many of the challenges we described in the Introduction. Using this approach, we could say that the CORE Dataset provides access to approximately 139 million full texts, i.e. about 48% of our 291 million metadata records point to a URL from which a human can navigate to the full text. However, to people concerned with text and data mining of scientific literature, it makes little sense to count URLs pointing to many different domains on the Web as the number of full texts made available.

As a result, our 32.8 million statistic refers to the number of OA documents we identified, downloaded, extracted text from, validated their relationship to the metadata record and the full texts of which we host on the CORE servers and make available to others. In contrast, BASE and Unpaywall do not aggregate the full texts of the resources they provide access to and consequently do not offer the means to interact with the full texts of these resources or offer bulk download capability of these resources for text analytics over scholarly literature.

We have also integrated CORE data with the OpenMinTeD infrastructure, a European Commission funded project which aimed to provide a platform for text mining of scholarly literature in the cloud 6 .

A number of academia and industry partners have utilised CORE in their services. In this section we present three existing uses of CORE demonstrating how CORE can be utilised to support text and data mining use cases.

Since 2017, CORE has been collaborating with a range of scholarly search and discovery systems. These include Naver ( https://naver.com/ ), Lean Library ( https://www.leanlibrary.com/ ) and Ontochem ( https://ontochem.com/ ). As part of this work, CORE serves as a provider of full text copies of reserch papers to existing records in these systems (Lean Library) or even supplies both metadata and full texts for indexing (Ontochem, NAVER). This collaboration also benefits CORE’s data providers as it expands and increases the visibility of their content.

In 2019, CORE entered into a collaboration with Turnitin, a global leader in plagiarism detection software. By using the CORE FastSync service, Turnitin’s proprietary web crawler searches through CORE’s global database of open access content and metadata to check for text similarity. This partnership enables Turnitin to significantly enlarge its content database in a fast and efficient manner. In turn, it also helps protect open access content from misuse, thus protecting authors and institutions.

As of February 2023, CORE Recommender 19 is actively running in over 70 repositories including the University of Cambridge institutional repository and arXiv.org among others. The purpose of the recommender is to improve the discoverability of research outputs by providing suggestions for similar research papers both within the collection of the hosting repository and the CORE collection. Repository managers can install the recommender to advance the accessibility of other scientific papers and outreach to other scientific communities, since the CORE Recommender acts as a gate to millions of open access research papers. The recommender is integrated with the CORE search functionality and is also offered as a plugin for all repository software, for example EPrints, DSpace, etc. as well as open access journals and any other webpage. Based on the fact that CORE harvests open repositories, the recommender only displays research articles where the full text is available as open access, i.e. for immediate use, without access barriers or limited rights’ restrictions. Through the recommender, CORE promotes the widest discoverability and distribution of the open access scientific papers.

Future work

An ongoing goal of CORE is to keep growing the collection to become a single point of access to all of world’s open access research. However, there are a number of other ways we are planning to improve both the size and ease of access to the collection. The CORE Harvesting System was designed to enable adding new harvesting steps and enrichment tasks. There remains scope for adding more of such enrichments. Some of these are machine learning powered, such as classification of scientific citations 20 . Further, CORE is currently developing new methodologies to identify and link different versions of the same article. The proposed system, titled CORE Works, will leverage CORE’s central position in the OA infrastructure landscape and will link different versions of the same paper using a unique identifier. We will continue to keep linking the CORE collection to scholarly entities from other services, thereby making CORE data participate in a global scholarly knowledge graph.

In the Introduction section we focused on a a number of challenges researchers face when collecting research literature for text and data mining. In this section, we instead focus on the perspective of a research literature aggregator, i.e. a system whose goal is to continuously provide seamless access to research literature aggregated from thousands of data providers worldwide in a way that enables the resulting research publication collection to be used by others in production applications. We describe the challenges we had to overcome to build this collection and to keep it continuously up-to-date, and present the key technical innovations which allowed us to greatly increase the size of the CORE collection and become a leading provider of open access literature which we illustrate using our content growth statistics.

CORE Harvesting system (CHARS)

CORE Harvesting System (CHARS) is the backbone of our harvesting process. CHARS uses the Harvesting Scheduler (Section CHARS_architecture) to select data providers to be processed next. It manages all the running processes (tasks) and ensures the available compute resources are well utilised.

Prior to implementing CHARS, CORE was centralised around data providers rather than around individual tasks needed to harvest and process these data providers (e.g. metadata download and parsing, full text download, etc.). Consequently, even though the scaling up and the continuation of this system was possible, the infrastructure was not horizontally scalable and the architecture suffered from tight coupling of services. This was not consistent with CORE’s high availability requirements and was regularly causing problems in the complexity of maintenance. In response to these challenges, we designed CHARS using a microservices architecture, i.e. using small manageable autonomous components that work together as part of a larger infrastructure 21 . One of the key benefits of microservices-oriented architecture is that the implementation focus can be put on the individual components which can be improved and redeployed as frequently as needed and independently of the rest of the infrastructure. As the process of open access content harvesting can be inherently split into individual consecutive tasks, a microservices-oriented architecture presents a natural fit for aggregation systems like CHARS.

Tasks involved in open access content harvesting

The harvesting process can be described as a pipeline where each task performs a certain action and where the output of each task feeds into the next task. The input to this pipeline is a set of data providers and the final output is a system populated with records of research papers available from them. The main types of key tasks currently performed as part of CORE’s harvesting system are (Fig.  8 ):

Metadata download: The metadata exposed by a data provider via OAI-PMH are downloaded and stored in the file system (typically as an XML). The downloading process is sequential, i.e. a repository provides typically between 100–1,000 metadata records per request and a resumption token. This token is then used to provide the next batch. As a result, full harvesting can a significant amount of time (hours-days) for large data providers. Therefore, this process has been implemented to provide resilience to a range of communication failures.

Metadata extraction : Metadata extraction parses, cleans, and harmonises the downloaded metadata and stores them into the CORE internal data structure (database). The harmonisation and cleaning process addresses the fact that different data providers/repository platforms describe the same information in different ways (syntactic heterogeneity) as well as having different interpretations for the same information (semantic heterogeneity).

Full text download : Using links extracted from the metadata CORE attempts to download and store publication manuscripts. This process is non-trivial and is further described in the Using OAI-PMH for content harvesting section.

Information extraction : Plain text from the downloaded manuscripts is extracted and processed to create a semi-structured representation. This process includes a range of information extraction tasks, such as references extraction.

Enrichment : The enrichment task works by increasing both metadata and full text harvested from the data providers with additional data from multiple sources. Some of the enrichments are performed directly by specific tasks in the pipeline such as language detection and document type detection. The remaining enrichments that involve external datasets are performed externally and independently to the CHARS pipeline and ingested into the dataset as described in the Enrichments section.

Indexing : The final step in the harvesting pipeline is indexing the harvested data. The resulting index powers CORE’s services, including search, API and FastSync.

figure 8

CORE Harvesting Pipeline. Each tasks’ output produces the input for the following task. In some cases the input is considered as a whole, for example all the content harvested from a data provider, while in other cases, the output is split in multiple small tasks performed on a record level.

Scalable infrastructure requirements

Based on the experience obtained while developing and maintaining our harvesting system as well as taking into consideration the features of the CiteSeerX 22 architecture, we have defined a set of requirements for a scalable harvesting infrastructure 8 . These requirements are generic and apply to any aggregation or digital library scenario. These requirements informed and are reflected in the architecture design of CHARS (Section CHARS architecture):

Easy to maintain: The system should be easy to manage, maintain, fix, and improve.

High levels of automation: The system should be completely autonomous while allowing manual interaction.

Fail fast: Items in the harvesting pipeline should be validated immediately after a task is performed, instead of having only one and final validation at the end of the pipeline. This has the benefit of recognising issues and enabling fixes earlier in the process.

Easy to troubleshoot: Possible code bugs should be easily discerned.

Distributed and scalable: The addition of more compute resources should allow scalability, be transparent and replicable.

No single point of failure: A single crash should not affect the whole harvesting pipeline, individual tasks should work independently.

Decoupled from user-facing systems: Any failure in the ingestion processing services should not have an immediate impact on user-facing services.

Recoverable: When a harvesting task stops, either manually or due to a failure, the system should be able to recover and resume the task without manual intervention.

Performance observable: The system’s progress must be properly logged at all times and overlay monitoring services should be set up to provide a transparent overview of the services’ progress at all times, to allow early detection of scalability problems and identification of potential bottlenecks.

CHARS architecture

An overview of CHARS is shown in Fig.  9 . The system consists of the following main software components:

Scheduler: it becomes active when a task finishes. It monitors resource utilisation and selects and submits data providers to be harvested.

Queue (Qn): a messaging system that assists with communication between parts of the harvesting pipeline. Every individual task, such as metadata download, metadata parsing, full text download, and language detection, has its own message queue.

Worker (W i ): an independent and standalone application capable of executing a specific task. Every individual task has its own set of workers.

figure 9

CORE Harvesting System.

A complete harvest of a data provider can be described as follows. When an existing task finishes, the scheduler is activated and informed of the result. It then uses the formula described in Appendix A to assign a score to each data provider. Depending on current resource utilisation, i.e. if there are any idle workers, and the number of data providers already scheduled for harvesting, the data provider with the highest score is then placed in the first queue Q 1 which contains data providers scheduled for metadata download. Once one of the metadata download workers W i -W j becomes available, a data provider is taken out of the queue and a new download of its metadata starts. Upon completion, the worker notifies the scheduler and, if the task is completed successfully, places the data provider in the next queue. This process continues until the data provider passes through the entire pipeline.

While some of the tasks in the pipeline need to be performed at the granularity of data providers, specifically metadata download and parsing, other tasks, such as full text extraction and language detection, can be performed at the granularity of individual records. While these tasks are originally scheduled at the granularity of data providers, only the individual records of a selected data provider which require processing are subsequently independently placed in the appropriate queue. Workers assigned to these tasks then process the individual records in the queue and they move through the pipeline once completed.

A more detailed description of CHARS, which includes technologies used to implement it, as well as other details can be found in 8 .

The harvesting scheduler is a component responsible for identifying data providers which need to be harvested next and placing these data providers in the harvesting queue. In the original design of CORE, our harvesting schedule was created manually, assigning the same harvesting frequency to every data provider. However, we found this approach inefficient as it does not scale due to the varying data providers size, differences in the update frequency of their databases and the maximum data delivery speeds of their repository platforms. To address these limitations, we designed the CHARS scheduler according to our new concept of “pro-active harvesting.” This means that the scheduler is event driven. It is triggered whenever the underlying hardware infrastructure has resources available to determine which data provider should be harvested next. The underlying idea is to maximise the number of ingested documents over a unit of time. The pseudocode and the formula we use to determine which repository to harvest next is described in Algorithm 1.

The size of the metadata download queue, i.e. the queue which represents an entry into the harvesting pipeline, is kept limited in order to keep the system responsive to the prioritisation of data providers. A long queue makes prioritising data providers harder, as it is not known beforehand how long the processing of a particular data provider will take. An appropriate size of the queue ensures a good balance between the reactivity and utilisation of the available resources.

Using OAI-PMH for content harvesting

We now describe the third key technical innovation which enables us to harvest full text content (as opposed to just metadata) from data providers using the OAI-PMH protocol. This process represents one step in the harvesting pipeline (Fig.  9 ), specifically, the third step which is activated after data provider metadata have been downloaded and parsed.

The OAI-PMH protocol was originally designed for metadata harvesting only, but due to its wide adoption and lack of alternatives it has been used as an entry point for full text harvesting from repositories. Full text harvesting is achieved by using URLs found in the metadata records to discover the location of the actual resource and subsequently downloading it 9 . We summarised the key challenges of this approach in the Challenges related to the use of OAI-PMH protocol for content harvesting section. The algorithm follows a depth first search strategy with prioritisation and finishes as soon as the first matching document is found.

The procedure works in the following way. First, all metadata records from a selected data provider with no full text are collected. Those records for which full text download was attempted within the retry period ( RP ) (usually six months) are filtered out. This is to avoid repeatedly downloading URLs that do not lead to the sought after documents. The downside of this approach is that if a data provider updates a link in the metadata, it might take up to the duration of the retry period to acquire the full text.

Algorithm 1

research papers uk

Next, the records are further filtered using a set of rules and heuristics we developed to a) increase the chances of identifying the URL leading to the described document quickly and b) to ensure that we identify the correct document. These filtering rules include:

Accepted file extensions: URLs are filtered according to a list of accepted file extensions. URLs ending with extensions such as .pptx that clearly indicate that the URL does not link to the required resource are removed from the list.

Same domain policy: URLs in the OAI-PMH metadata can link to any resources and domains. For example, a common practice is to provide a link to the associated presentation, dataset, or another related resource. As these are often stored in external databases, filtering out all URLs that lead to an external domain, i.e. domain different than the domain of the data provider, presents a simple method of avoiding the download of resources which with very high likelihood do not represent the target document. Exceptions include dx.doi.org and hdl.handle.net domains whose purpose is to provide a persistent identifier pointing to the document. The same domain policy is disabled for data providers which are aggregators and link to many different domains by design.

Provider-specific crawling heuristics: Many data providers follow a specific pattern when composing URLs. For example, a link to a full text document may be composed of the following parts: data provider URL  +  record handle  +  .pdf . For data providers utilising such patterns, URLs may be composed automatically where the relevant information (record handle) is known to us from the metadata. These generated URLs are then added to the list of URLs obtained from the metadata.

Prioritising certain URLs: As it is more likely for PDF URL to contain the target record than for an HTML URL, the final step is to sort URLs according to file and URL type. Highest priority is assigned to URLs that uses repository software specific patterns to identify full text, document, and PDF filetypes, while the lowest priority is assigned to hdl.handle.net URLs.

The system then attempts to request the document at each URL and download it. After each download, checks are performed to determine whether the downloaded document represents the target record. Currently, the downloaded document has to be a valid PDF with a title matching the original metadata record. If the target record is identified, the downloaded document is stored and the download process for that record ends. If the downloaded document contains an HTML page, URLs are extracted from this page and filtered using the same method mentioned above. This is because it is common in some of the most widely used repository systems such as DSpace for the documents not to be directly referenced from within the metadata records. Instead, the metadata records typically link to an HTML overview page of the document. To deal with this problem, we use the concept of harvesting levels. A maximum harvesting level corresponds to the maximum search depth for the referenced document. The algorithm finishes either as soon as the first matching document is found or after all the available URLs up to the maximum harvesting level have been exhausted. Algorithm 2 describes our approach for collecting the full texts using the OAI-PMH protocol. The algorithm follows a depth first search strategy with prioritisation and finishes as soon as the first matching document is found.

Algorithm 2

research papers uk

CHARS limitations

Despite overcoming the key issues to scalable harvesting of content from repositories, there still remains a number of important challenges. The first relates to the difficulty of estimating the optimal number of workers in our system to run efficiently. While the worker allocation is still largely established empirically, we are investigating more sophisticated approaches based on formal models of distributed computation, such as Petri Nets. This will allow us to investigate new approaches to dynamically allocating and launching workers to optimise the usage of our resources.

Enrichments

Conceptually, two types of enrichment processes are used within CORE: 1) an online enrichment process enriching a single record at the time of it being processed by the CHARS pipeline and 2) a periodic offline enrichment process which enriches a record based on information in external datasets (Fig.  10 ).

figure 10

CORE Offline Enrichments.

Online enrichments

Online enrichments are fully integrated into the CHARS pipeline described earlier in this section. These enrichments generally involve the application of machine learning models and rule-based tools to gather additional insights about the record, such as language detection, document type detection. As opposed to offline enrichments, online enrichments are always performed just once for a given record. The following is a list of the current enrichments performed online:

Article type detection: A machine learning algorithm assigns each publication one of the following four types: presentation, thesis, research paper, other. In the future we may include other types.

Language identification: This task uses third-party libraries to identify the language based on the full text of a document. The resulting language is then compared to the one provided by the metadata record. Some heuristics are applied to disambiguate and harmonise languages.

Offline enrichments

Offline enrichments are carried out by means of gathering a range of information from large third-party scholarly datasets (research graphs). Such information includes metadata that do not necessarily change, such as a DOI identifier, as well as metadata that evolve, such as the number of citations. Especially due to the latter, CORE performs offline enrichments periodically, i.e. all records in CORE go through this process repeatedly at specified time intervals (currently once per month).

The process is depicted in Fig.  10 . The initial mapping of a record is carried out using a DOI, if available. However, as the majority of records from repositories do not come with a DOI, we carry out a matching process against the Crossref database using a subset of metadata fields including title, authors and year. Once the mapping is performed, we can harmonise fields as well as gather a wide range of additional useful data from relevant external databases, thereby enriching the CORE record. Such data include, ORCID identifiers, citation information, additional links to freely available full texts, field of study information and PubMed identifiers. Our solution is based on a set of map-reduce tasks to enrich the dataset and implemented on a Cloudera Enterprise Data Hub ( https://www.cloudera.com/products/enterprise-data-hub.html ) 23 , 24 , 25 , 26 .

Data availability

CORE provides several large data dumps of the processed and aggregated data under the ODC-BY licence ( https://core.ac.uk/documentation/dataset ). The only condition for both commercial and non-commercial reuse of these datasets is to acknowledge the use of CORE in their outputs. Additionally, CORE makes its API and most recent data dump freely available to registered individual users and researchers. Please note that CORE claims no rights in the aggregated content itself which is open access and therefore freely available to everyone. All CORE data rights correspond to the sui generis database rights of the aggregated and processed collection.

Licences for CORE services, such as the API and FastSync, are available for commercial users wishing to benefit from convenient access to CORE data with guaranteed level of customer support. The organisation running CORE, i.e. The Open University, is a charitable organisation fully committed to the Open Research mission. CORE is a signatory of the Principles of Open Scholarly Infrastructure (POSI) ( https://openscholarlyinfrastructure.org/posse ). No profit generation is practised. Instead, CORE’s income from licences to commercial parties is used solely to provide sustainability by means of enabling CORE to become less reliant on unstable project grants, thus offsetting and reducing the cost of CORE to the taxpayer. This is done in full compliance with the principles and best practices of sustainable open science infrastructure.

Code availability

CORE consists of multiple services. Most of our source code is open source and available in our public repository on GitHub ( https://github.com/oacore/ ). As of today, we are unfortunately not yet able to provide the source code to our data ingestion module. However, as we want to be as transparent as possible with our community, we have documented in this paper the key algorithms and processes which we apply using pseudocode.

Bornmann, L. & Mutz, R. Growth rates of modern science: A bibliometric analysis based on the number of publications and cited references. JASIST 66 (11), 2215–2222 (2015).

CAS   Google Scholar  

Piwowar, H. et al . The State of OA: A large-scale analysis of the prevalence and impact of Open Access articles. PeerJ 6 , e4375 (2018).

Article   PubMed   PubMed Central   Google Scholar  

Saggion, H. & Ronzano, F. Scholarly data mining: making sense of scientific literature. 2017 ACM/IEEE Joint Conference on Digital Libraries (JCDL) : 1–2 (2017).

Kim, E. et al . Materials synthesis insights from scientific literature via text extraction and machine learning. Chemistry of Materials 29 (21), 9436–9444 (2017).

Article   CAS   Google Scholar  

Jacobs, N. & Ferguson, N. Bringing the UK’s open access research outputs together: Barriers on the Berlin road to open access. Jisc Repository (2014).

Knoth, P., Pontika, N. Aggregating Research Papers from Publishers’ Systems to Support Text and Data Mining: Deliberate Lack of Interoperability or Not? In: INTEROP2016 (2016).

Herrmannova, D., Pontika, N. & Knoth, P. Do Authors Deposit on Time? Tracking Open Access Policy Compliance. Proceedings of the 2019 ACM/IEEE Joint Conference on Digital Libraries , Urbana-Champaign, IL (2019).

Cancellieri, M., Pontika, N., Pearce, S., Anastasiou, L. & Knoth, P. Building Scalable Digital Library Ingestion Pipelines Using Microservices. Proceedings of the 11th International Conference on Metadata and Semantics Research (MTSR 2017) : 275–285. Springer (2017).

Knoth, P. From open access metadata to open access content: two principles for increased visibility of open access content. Proceedings of the 2013 Open Repositories Conference , Charlottetown, Prince Edward Island, Canada (2013).

Knoth, P.; Cancellieri, M. & Klein, M. Comparing the Performance of OAI-PMH with ResourceSync. Proceedings of the 2019 Open Repositories Conference , Hamburg, Germany (2019).

Kapidakis, S. Metadata Synthesis and Updates on Collections Harvested Using the Open Archive Initiative Protocol for Metadata Harvesting. Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science 11057 , 16–31 (2018).

Google Scholar  

Knoth, P. and Zdrahal, Z. CORE: three access levels to underpin open access. D-Lib Magazine 18 (11/12) (2012).

Haslhofer, B. et al . ResourceSync: leveraging sitemaps for resource synchronization. Proceedings of the 22nd International Conference on World Wide Web : 11–14 (2013).

Khabsa, M. & Giles, C. L. The number of scholarly documents on the public web. PLOS One 9 (5), e93949 (2014).

Article   ADS   PubMed   PubMed Central   Google Scholar  

Charalampous, A. & Knoth, P. Classifying document types to enhance search and recommendations in digital libraries. Research and Advanced Technology for Digital Libraries. TPDL 2017. Lecture Notes in Computer Science 10450 , 181–192 (2017).

Rosvall, M. & Bergstrom, C. T. Maps of random walks on complex networks reveal community structure. Proceedings of the National Academy of Sciences 105 (4), 1118–1123 (2008).

Article   ADS   CAS   Google Scholar  

D’Angelo, C. A. & Abramo, G. Publication rates in 192 research fields of the hard sciences. Proceedings of the 15th ISSI Conference : 915–925 (2015).

Ammar, W. et al . Construction of the Literature Graph in Semantic Scholar. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , Volume 3 (Industry Papers): 84–91 (2018).

Knoth, P. et al . Towards effective research recommender systems for repositories. Open Repositories , Bozeman, USA (2017).

Pride, D. & Knoth, P. An Authoritative Approach to Citation Classification. Proceedings of the 2020 ACM/IEEE Joint Conference on Digital Libraries (JCDL 2020), Virtual–China (2020).

Newman, S. Building microservices: designing fine-grained systems. O’Reilly Media, Inc. (2015).

Li, H. et al . CiteSeer χ : a scalable autonomous scientific digital library. Proceedings of the 1st International Conference on Scalable Information Systems , ACM (2006).

Bastian, H., Glasziou, P. & Chalmers, I. Seventy-five trials and eleven systematic reviews a day: how will we ever keep up? PLoS medicine 7 (9), e1000326 (2010).

Shojania, K. G. et al . How quickly do systematic reviews go out of date? A survival analysis. Annals of internal medicine 147 (4), 224–233 (2007).

Article   PubMed   Google Scholar  

Tsafnat, G. et al . Systematic review automation technologies. Systematic reviews 3 (1), 74 (2014).

Harzing, A.-W. & Alakangas, S. Microsoft Academic is one year old: The Phoenix is ready to leave the nest. Scientometrics 112 (3), 1887–1894 (2017).

Article   Google Scholar  

Download references

Acknowledgements

We would like to acknowledge the generous support of Jisc, under a number of grants and service contracts with The Open University. These included projects CORE, ServiceCORE, UK Aggregation (1 and 2) and DiggiCORE, which was co-funded by Jisc with NWO. Since 2015, CORE has been supported in three iterations under the Jisc Digital Services–CORE (JDSCORE) service contract with The Open University. Within Jisc, we would like to thank primarily the CORE project managers, Andy McGregor, Alastair Dunning, Neil Jacobs and Balviar Notay. We would also like to thank the European Commission for funding that contributed to CORE, namely OpenMinTeD (739563) and EOSC Pilot (654021). We would like to show our gratitude to all current CORE Team members who contributed to CORE but are not authors of the manuscript, namely Valeriy Budko, Ekaterine Chkhaidze, Viktoriia Pavlenko, Halyna Torchylo, Andrew Vasilyev and Anton Zhuk. We would like to show our gratitude to all past CORE Team members who have contributed to CORE over the years, namely Lucas Anastasiou, Giorgio Basile, Aristotelis Charalampous, Josef Harag, Drahomira Herrmannova, Alexander Huba, Bikash Gyawali, Tomas Korec, Dominika Koroncziova, Magdalena Krygielova, Catherine Kuliavets, Sergei Misak, Jakub Novotny, Gabriela Pavel, Vojtech Robotka, Svetlana Rumyanceva, Maria Tarasiuk, Ian Tindle, Bethany Walker and Viktor Yakubiv, Zdenek Zdrahal and Anna Zelinska.

Author information

Drahomira Herrmannova

Present address: Oak Ridge National Laboratory Oak Ridge, Oak Ridge, TN, USA

Authors and Affiliations

Knowledge Media Institute, The Open University Walton Hall, Milton Keynes, UK

Petr Knoth, Drahomira Herrmannova, Matteo Cancellieri, Lucas Anastasiou, Nancy Pontika, Samuel Pearce, Bikash Gyawali & David Pride

You can also search for this author in PubMed   Google Scholar

Contributions

P.K. is the Founder and Head of CORE. He conceived the idea and has been the project lead since the start in 2011. He researched and created the first version of CORE, acquired funding, built the team, and has been managing and leading all research and development. M.C., L.A., S.P. and P.K. designed, worked out all technical details, and implemented significant parts of the system including CHARS, the harvesting scheduler, and the OAI-PMH content harvesting method. All authors contributed to the maintenance, operation and improvements of the system. D.H. drafted the initial version of the manuscript based on consultations with P.K. D.P. and P.K. wrote the final manuscript with additional input from L.A. and N.P. D.H., M.C. and L.A. performed the data analysis for the paper and D.H. produced the figures. D.H., D.P., B.G. and L.A. participated in research activities and tasks related to CORE following the instructions and directly supervised by P.K.

Corresponding author

Correspondence to Petr Knoth .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Knoth, P., Herrmannova, D., Cancellieri, M. et al. CORE: A Global Aggregation Service for Open Access Papers. Sci Data 10 , 366 (2023). https://doi.org/10.1038/s41597-023-02208-w

Download citation

Received : 18 May 2021

Accepted : 03 May 2023

Published : 07 June 2023

DOI : https://doi.org/10.1038/s41597-023-02208-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

research papers uk

  • - Google Chrome

Intended for healthcare professionals

  • Access provided by Google Indexer
  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution
  • International

Home

Search form

  • Advanced search
  • Search responses
  • Search blogs
  • The BMJ research homepage: make an impact, change clinical practice

At a glance: Research

Australia hospital - elderly patient and doctor

Delirium and incident dementia in hospital patients

Thyroid cancer check

Glucagon-like peptide 1 receptor agonist use and risk of thyroid cancer

Bias

Quantifying possible bias in clinical and epidemiological studies with quantitative bias analysis: common approaches and limitations

Checking brain scan

Use of progestogens and the risk of intracranial meningioma

Caring for elderly woman

Community based complex interventions to sustain independence in older people

Research papers, efficacy of a single low dose of esketamine after childbirth for mothers with symptoms of prenatal depression, derivation and external validation of a simple risk score for predicting severe acute kidney injury after intravenous cisplatin, quality and safety of artificial intelligence generated health information, current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation, assessing robustness to worst case publication bias using a simple subset meta-analysis, 25 year trends in cancer incidence and mortality among adults in the uk, cervical pessary versus vaginal progesterone in women with a singleton pregnancy, comparison of prior authorization across insurers, diagnostic accuracy of magnetically guided capsule endoscopy with a detachable string for detecting oesophagogastric varices in adults with cirrhosis, ultra-processed food exposure and adverse health outcomes, added benefit and revenues of oncology drugs approved by the european medicines agency between 1995 and 2020, regression discontinuity design studies: a guide for health researchers, exposure-response associations between chronic exposure to fine particulate matter and risks of hospital admission for major cardiovascular diseases, short term exposure to low level ambient fine particulate matter and natural cause, cardiovascular, and respiratory morbidity, optimal timing of influenza vaccination in young children, effect of exercise for depression, association of non-alcoholic fatty liver disease with cardiovascular disease and all cause death in patients with type 2 diabetes mellitus, process guide for inferential studies using healthcare data from routine clinical practice to evaluate causal effects of drugs (principled): considerations from the fda sentinel innovation center, duration of cardiopulmonary resuscitation and outcomes for adults with in-hospital cardiac arrest, clinical effectiveness of an online supervised group physical and mental health rehabilitation programme for adults with post-covid-19 condition, updated recommendations for the cochrane rapid review methods guidance, research news, covid-19: nearly 20% of patients receive psychiatric diagnosis within three months of covid, antibiotics are as good as surgery for appendicitis, study reports, black babies are less likely to die when cared for by black doctors, us study finds.

Submit your paper

Latest video

Light therapy for depression . . . and other stories, yellow plaques on the trunk and limbs, senile non-rheumatic mitral valve calcification.

research papers uk

University of Bristol Law School

Law working papers series.

' aria-hidden=

Welcome to the Bristol Law Working Papers Series. The series publishes a broad range of legal scholarship in all subject areas from members of the University of Bristol Law School. All papers are published electronically and available to download as pdf files.

Working papers

Exceptions and Regulatory Autonomy (PDF, 1,504kB) Author: Joshua Paine

Default Norms in Labour Law- From Private Right to Public Law (PDF, 1,525kB) Author: Alan Bogg

An Analysis of the UK–Australia FTA’s Investment Chapter (PDF, 630kB) Author: Joshua Paine

A Kantian moral cosmopolitan approach to teaching professional legal ethics (PDF, 693kB) Author: Omar Madhloom

COVID-19 at Work: How risk is assessed & its consequences in England & Sweden (PDF, 837kB) A‌uthors: Peter Andersson and Tonia Novitz

Capturing the value of community fuel poverty alleviation (PDF, 1,891kB) Authors: Colin Nolden, Daniela Rossade and Peter Thomas

Bridging the Spaces in-between? The IWGB and Strategic Litigation (PDF, 522kB)   Author: Manoj Dias-Abey

View past papers

Cookies on GOV.UK

We use some essential cookies to make this website work.

We’d like to set additional cookies to understand how you use GOV.UK, remember your settings and improve government services.

We also use cookies set by other sites to help us deliver content from their services.

You have accepted additional cookies. You can change your cookie settings at any time.

You have rejected additional cookies. You can change your cookie settings at any time.

research papers uk

  • Society and culture
  • Sports and leisure
  • 2012 Olympic and Paralympic legacy

The research papers and reports covering our sectors are independent. Please note that, while we have consulted on them, we do not necessarily…

The research papers and reports covering our sectors are independent. Please note that, while we have consulted on them, we do not necessarily endorse any findings or recommendations contained within them.

Many of our public bodies also produce a range of research on our sectors. Links to their websites can be found on our research links page .

02/07/2012 Cost-Benefit Analysis of Radio Switchover: Methodology Report

12/08/2011 International comparisons of public engagement in culture and sport

20/06/2011 DCMS Longitudinal Data Library

24/05/2011 Digital Radio Switchover: Willingness to Pay and Consumer Behaviour Research

24/05/2011 Intertek - Research Study on Energy Consumption of Digital Radios: Phase 2

15/12/2010 Measuring the value of culture: a report to the Department for Culture Media and Sport

05/11/2010 Understanding the relationship between taste and value in culture and sport

Cultural and Sporting Evidence (CASE) Research

22/03/2010 Libraries Omnibus

09/11/2009 Digital Britain: Attitudes to supporting non-BBC regional news from the TV licence fee

06/10/2009 London 2012 olympic and paralympic games impacts and legacy evaluation framework

16/06/2009 To accompany the Final Digital Britain Report, the Department is also published - ‘Digital Britain: Attitudes towards internet content among adults’.

02/01/2009 Commissioned report ‘Capturing the Impact of Libraries’ by BOP Consulting has been published.

23/06/2008 Research report by National Children’s Bureau, Play and Exercise in Early Years:Physically active play in early childhood provision

26/02/2008 Scoping study for a UK Gambling Act 2005, impact assessment framework

17/12/2007 Survey of Live Music in England and Wales in 2007

23/10/2007 DCMS commissioned BMG Research to study Residential care and nursing homes: readiness for digital switchover

26/09/2007 The feasibility of a live music economic impact study

01/08/2007 An Assessment of Productivity Indicators for the Creative Industries

24/07/2007 Evaluation of the Cultural Pathfinder programme

06/07/2007 “Culture on demand”  report, proposing practical ways to engage the broadest possible audience for culture by building on existing demand

16/05/2007 Framework for evaluating Cultural Policy Investment  

10/01/2007 BMRB on Digital switchover - readiness of social housing

07/12/2006 Licensing Act 2003: the experience of smaller establishments in applying for live music authorisation

For a full archive of previous research see the publications page in The National Archive .

DCMS has licensed Qualtrics online survey software enabling us to undertake sophisticated online surveys and in-house consultations.

Is this page useful?

  • Yes this page is useful
  • No this page is not useful

Help us improve GOV.UK

Don’t include personal or financial information like your National Insurance number or credit card details.

To help us improve GOV.UK, we’d like to know more about your visit today. We’ll send you a link to a feedback form. It will take only 2 minutes to fill in. Don’t worry we won’t send you spam or share your email address with anyone.

  • Work & Careers
  • Life & Arts

Become an FT subscriber

Try unlimited access Only $1 for 4 weeks

Then $75 per month. Complete digital access to quality FT journalism on any device. Cancel anytime during your trial.

  • Global news & analysis
  • Expert opinion
  • Special features
  • FirstFT newsletter
  • Videos & Podcasts
  • Android & iOS app
  • FT Edit app
  • 10 gift articles per month

Explore more offers.

Standard digital.

  • FT Digital Edition

Premium Digital

Print + premium digital, digital standard + weekend, digital premium + weekend.

Today's FT newspaper for easy reading on any device. This does not include ft.com or FT App access.

  • 10 additional gift articles per month
  • Global news & analysis
  • Exclusive FT analysis
  • Videos & Podcasts
  • FT App on Android & iOS
  • Everything in Standard Digital
  • Premium newsletters
  • Weekday Print Edition
  • FT Weekend newspaper delivered Saturday plus standard digital access
  • FT Weekend Print edition
  • FT Weekend Digital edition
  • FT Weekend newspaper delivered Saturday plus complete digital access
  • Everything in Preimum Digital

Essential digital access to quality FT journalism on any device. Pay a year upfront and save 20%.

  • Everything in Print
  • Everything in Premium Digital

Complete digital access to quality FT journalism with expert analysis from industry leaders. Pay a year upfront and save 20%.

Terms & Conditions apply

Explore our full range of subscriptions.

Why the ft.

See why over a million readers pay to read the Financial Times.

International Edition

IMAGES

  1. Professional Research Paper For Undergraduate CS Student from Experts

    research papers uk

  2. 😱 What should a research paper look like. You should research paper

    research papers uk

  3. List of Research Papers Published by Our PhD Scholars

    research papers uk

  4. Research papers Writing Steps And process of writing a paper

    research papers uk

  5. PPT

    research papers uk

  6. APPENDIX ONE— A Research Paper Clinic

    research papers uk

COMMENTS

  1. CORE

    Research Policy Adviser Aggregation plays an increasingly essential role in maximising the long-term benefits of open access, helping to turn the promise of a 'research commons' into a reality. The aggregation services that CORE provides therefore make a very valuable contribution to the evolving open access environment in the UK.

  2. Free to access articles

    Most of our oldest content is now freely available, specifically, all papers older than 70 years. In addition, papers published between 10 years ago and either 12 months ago (biological sciences and history of science) or 24 months ago (physical sciences) from online issue publication are freely available. For Biographical Memoirs all issues ...

  3. Google Scholar

    Google Scholar provides a simple way to broadly search for scholarly literature. Search across a wide variety of disciplines and sources: articles, theses, books, abstracts and court opinions.

  4. Access To Research

    Discover a world of published academic research at your local library. Access to Research gives free, walk-in access to over 30 million academic articles in participating public libraries across the UK. Start now by viewing which articles and journals are available from home, then find a participating library where you can view the full text.

  5. ResearchGate

    Access 160+ million publications and connect with 25+ million researchers. Join for free and gain visibility by uploading your research.

  6. Research

    Oxford University welcomes UK associate membership of Horizon Europe. Horizon Europe is the EU's funding programme for research and innovation projects for the years 2021 to 2027. The programme has a budget of €95.5 billion (£81bn). It is the successor to Horizon 2020 and the previous Framework Programmes for Research and Technological ...

  7. Research

    A briefing paper explaining how council tax is applied to empty properties in England, Scotland and Wales, including the 'empty homes premium'. ... Research covering key Brexit moments, negotiations, the EU and its institutions, and UK-EU relations after Brexit. Coronavirus. Research relating to Covid-19 from the Commons Library and other ...

  8. The varying impacts of COVID-19 and its related measures in the UK: A

    Nonetheless, one paper discusses this issue of nonrandom sample selection and demonstrates that the bias due to sample selection is very limited ... Hu Y. Intersecting ethnic and native-migrant inequalities in the economic impact of the COVID-19 pandemic in the UK. Research in Social Stratification and Mobility. 2020;68: 100528. pmid:32834346 ...

  9. British Journal of Cancer

    Special 19 Apr 2021. Open for submissions. Advertisement. Published in association with Cancer Research UK. Its mission has always been to encourage communication of research from laboratories ...

  10. British Journal of Psychology

    The power threat meaning framework 5 years on − A scoping review of the emergent empirical literature. Counselling children and adolescents: Working in school and clinical mental health settings (Special Indian edition) By Jolie Ziomek-Daigle, New York, NY: Routledge. 2017. UK £ 56.99. ISBN: 9780367240356.

  11. A systematic review of forest schools literature in England

    View PDF View EPUB. This paper draws on the breadth of Forest School research literature spanning the past ten years in order to categorise theorisations across the papers. As Forest Schools in the UK are still a fairly recent development research is still limited in quantity and can lack theorisation at a broader level of abstraction.

  12. Search

    Find the research you need | With 160+ million publications, 1+ million questions, and 25+ million researchers, this is where everyone can access science

  13. Research briefings

    Our flagship briefings, POSTnotes and POSTbriefs, are publicly available. They are a product of peer review and rigorous horizon scanning. POST works on a range of topics including climate change, education, health and social care, digital tech and more. UK Parliament produces impartial analysis and research on a variety of topics.

  14. JSTOR Home

    Harness the power of visual materials—explore more than 3 million images now on JSTOR. Enhance your scholarly research with underground newspapers, magazines, and journals. Explore collections in the arts, sciences, and literature from the world's leading museums, archives, and scholars. JSTOR is a digital library of academic journals ...

  15. Scientific Publications

    Since its inception, the Faraday Institution has contributed over 777 publications to the scientific literature, more than 85 of which represent collaborative work across Faraday Institution research projects. The following statistical data derives from the SciVal record from April 2018 to October 2023, which recognises 733 papers and 2,228 ...

  16. CORE: A Global Aggregation Service for Open Access Papers

    Abstract. This paper introduces CORE, a widely used scholarly service, which provides access to the world's largest collection of open access research publications, acquired from a global ...

  17. The BMJ research homepage: make an impact, change clinical practice

    Original medical research, research reviews and news, research methods and reporting, ... Cases of cancer among UK men and women aged 35-69 years have seen a modest rise over the past quarter of a century, but there has also been a substantial decline in death rates, finds this study ... Research papers. Research paper Comparison of prior ...

  18. Legal research papers

    Working papers. 2022. An Analysis of the UK-Australia FTA's Investment Chapter (PDF, 630kB) Author: Joshua Paine A Kantian moral cosmopolitan approach to teaching professional legal ethics (PDF, 693kB) Author: Omar Madhloom COVID-19 at Work: How risk is assessed & its consequences in England & Sweden (PDF, 837kB) A‌uthors: Peter Andersson and Tonia Novitz

  19. PDF International comparison of the UK research base, 2022

    Introduction. This note summarises key findings from the latest 'International comparison of the UK research base' statistical release2 and is an update of the 2019 release3. The release evaluates the UK's research performance in an international setting, by comparing different aspects of scholarly outputs across a selection of comparators.

  20. The Complete Guide to Writing Computer Science Research Papers for UK

    Developing Research Skills: Writing a research paper For Computer Science in UK necessitates thorough reading, analysis, and evaluation of the literature. The student's capacity to gather data ...

  21. Research

    Scoping study for a UK Gambling Act 2005, impact assessment framework. 17/12/2007. Survey of Live Music in England and Wales in 2007. 23/10/2007. DCMS commissioned BMG Research to study ...

  22. Newspaper headlines: Major gender care review, and 'Mr Bates vs ...

    Leading Wednesday's coverage across most of the papers is a newly published review by paediatrician Hilary Cass into NHS provision of gender care for children. The Daily Telegraph highlights the ...

  23. Rise in civil engineering activity boosts UK construction sector

    The S&P Global/Cips UK construction purchasing managers' index, a measure of the health of the industry, rose to 50.2 from 49.7 in February, the fourth consecutive month-on-month increase.