To read this content please select one of the options below:

Please note you do not have access to teaching notes, a systematic literature review on wikidata.

Data Technologies and Applications

ISSN : 2514-9288

Article publication date: 20 August 2019

Issue publication date: 2 September 2019

The purpose of this paper is to review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to uncover the topics of interest, the fields that are benefiting from its applications and which researchers and institutions are leading the work.

Design/methodology/approach

A systematic literature review is conducted to identify and review how Wikidata is being dealt with in academic research articles and the applications that are proposed. A rigorous and systematic process is implemented, aiming not only to summarize existing studies and research on the topic, but also to include an element of analytical criticism and a perspective on gaps and future research.

Despite Wikidata’s potential and the notable rise in research activity, the field is still in the early stages of study. Most research is published in conferences, highlighting such immaturity, and provides little empirical evidence of real use cases. Only a few disciplines currently benefit from Wikidata’s applications and do so with a significant gap between research and practice. Studies are dominated by European researchers, mirroring Wikidata’s content distribution and limiting its Worldwide applications.

Originality/value

The results collect and summarize existing Wikidata research articles published in the major international journals and conferences, delivering a meticulous summary of all the available empirical research on the topic which is representative of the state of the art at this time, complemented by a discussion of identified gaps and future work.

  • Literature review
  • Applications
  • Empirical studies
  • Knowledge graphs

Mora-Cantallops, M. , Sánchez-Alonso, S. and García-Barriocanal, E. (2019), "A systematic literature review on Wikidata", Data Technologies and Applications , Vol. 53 No. 3, pp. 250-268. https://doi.org/10.1108/DTA-12-2018-0110

Emerald Publishing Limited

Copyright © 2019, Emerald Publishing Limited

Related articles

We’re listening — tell us what you think, something didn’t work….

Report bugs here

All feedback is valuable

Please share your general feedback

Join us on our journey

Platform update page.

Visit emeraldpublishing.com/platformupdate to discover the latest news and updates

Questions & More Information

Answers to the most commonly asked questions here

  • Search Menu
  • Supplements
  • Advance articles
  • Author Guidelines
  • Submission Site
  • Open Access
  • Why Submit?
  • About Digital Scholarship in the Humanities
  • About the European Association for Digital Humanities
  • About the Alliance of Digital Humanities Organizations
  • Editorial Board
  • Advertising and Corporate Services
  • Journals Career Network
  • Self-Archiving Policy
  • Dispatch Dates
  • Journals on Oxford Academic
  • Books on Oxford Academic

European Association for Digital Humanities

Article Contents

1 introduction, 2 research questions, 3 methodology, 5 discussion, 6 limitations, 7 conclusion.

  • < Previous

A systematic review of Wikidata in Digital Humanities projects

ORCID logo

  • Article contents
  • Figures & tables
  • Supplementary Data

Fudie Zhao, A systematic review of Wikidata in Digital Humanities projects, Digital Scholarship in the Humanities , Volume 38, Issue 2, June 2023, Pages 852–874, https://doi.org/10.1093/llc/fqac083

  • Permissions Icon Permissions

Wikidata has been widely used in Digital Humanities (DH) projects. However, a focused discussion regarding the current status, potential, and challenges of its application in the field is still lacking. A systematic review was conducted to identify and evaluate how DH projects perceive and utilize Wikidata, as well as its potential and challenges as demonstrated through use. This research concludes that: (1) Wikidata is understood in the DH projects as a content provider, a platform, and a technology stack; (2) it is commonly implemented for annotation and enrichment, metadata curation, knowledge modelling, and Named Entity Recognition (NER); (3) Most projects tend to consume data from Wikidata, whereas there is more potential to utilize it as a platform and a technology stack to publish data on Wikidata or to create an ecosystem of data exchange; and (4) Projects face two types of challenges: technical issues in the implementations and concerns with Wikidata’s data quality. In the discussion, this article contributes to addressing three issues related to coping with the challenges in the specific context of the DH field based on the research findings: the relevance and authority of other available domain sources; domain communities and their practices; and workflow design that coordinates technical and labour resources from projects and Wikidata.

According to Wikidata’s main page ( https://www.wikidata.org/wiki/Wikidata:Main_Page ), Wikidata is ‘a free and open knowledge base that both humans and machines can edit’. It acts as central storage for the structured data of its Wikimedia sister projects such as Wikipedia (which stores unstructured data like texts) and provides support to other sites beyond Wikimedia projects. It is Wikimedia’s response to semantic technologies, featuring a user-generated ontology, 1 regular RDF (Resource Description Framework) dumps, 2 a live SPARQL (Simple Protocol and RDF Query Language) endpoint for data query, 3 and interlinks to other open datasets. 4 Like other Wikimedia projects, it maintains a simple, user-friendly, collaboration-oriented editing interface which makes it easier for novice users to create and publish semantically rich structured data that conforms to the Wikidata ontology by following its tutorials. 5

Wikidata has been adopted and systematically reviewed in the fields relevant to the Digital Humanities (DH) domain, such as Information Science (IS) ( Mora-Cantallops et al. , 2019 ) and library science ( Tharani, 2021 ). DH-related projects have also embraced Wikidata. As illustrated in Fig. 1 , 6 43 abstracts for presentations at the DH conferences hosted by Alliance of Digital Humanities Organizations (ADHO) mentioned Wikidata between 2016 and 2020. However, except for a brief account of Wikidata’s use in DH ( Cook, 2017 ), a comprehensive overview of Wikidata’s status quo, potential, and challenges in the field is still lacking.

Wikidata-related presentations at ADHO annual conferences

Wikidata-related presentations at ADHO annual conferences

A survey of DH projects’ adoption of Wikidata, however, is necessary. It is important for early adopters to reflect on their work within the context of the DH domain and for potential adopters to understand what Wikidata can achieve for their projects, where they can find pioneer projects for reference, and what pitfalls to avoid. In addition, a review of the advancements in the DH sector allows the Wikidata communities, composed of specialists from other fields and the general public, to find areas for inspiration and improvement.

This article aims to fill this research gap in the literature by conducting a systematic review of Wikidata usage in DH-related initiatives to better serve the DH communities’ use of Wikidata. The second section presents the four research questions proposed to uncover Wikidata’s definition, applications, possibilities, and challenges in the DH field. The third section describes the methods while the fourth section offers the results. The fifth section addresses issues uncovered by the research. The sixth section points out limitations and future directions. The seventh section summarizes the research’s findings.

Q1: How is Wikidata described in the current DH literature? Q2: To what end is Wikidata being experimented with in the DH domain? Q3: What are the potentials of incorporating Wikidata into DH projects? Q4: What are the challenges associated with Wikidata mentioned in DH projects?

To answer the questions, a systematic review of DH projects that adopted Wikidata was conducted using Kitchenham’s guidelines for Systematic Review (2004). The practices carried out in other relevant systematic reviews ( Mora-Cantallops et al. , 2019 ; Tharani, 2021 ) were also referred to during the process. Although initially developed for the software domain, this procedure is commonly performed to summarize empirical evidence of the benefits and limitations of a technology ( Kitchenham, 2004 ), which fits this article’s purpose. In contrast to the traditional literature review, Kitchenham’s systematic review features a predefined search strategy and documentation of the search and selection process of relevant literature, which allows a reader to assess the completeness of the search ( Kitchenham, 2004 , p. 2). Because of the interdisciplinary nature of the DH field, it is unavoidable to overlook some Wikidata-related projects despite efforts. Thus, this article presents the search and selection process for readers to identify gaps in its routes to resources and to invite future contributions that are not covered by this search.

3.1 Search strategy

Book of Abstracts from ADHO annual conferences, a compiled list of journals exclusive to the DH field, 7 four online academic research databases (ACM Digital Library, IEEE Xplore, Springer Link and Web of Science, Science Direct), as well as Diff, a Wikimedia community blog, were searched and screened in accordance with the pre-determined search strategies, and 195 papers and presentations were identified as shown in Table 1 . The end period of the search is 31 December 2021. While a specific start date was not set, the earliest paper included in this study was published in 2016.

Total number of articles and presentations identified from each source

3.2 Inclusion and exclusion criteria

Fifty projects were selected based on the following criteria: English only, no duplicates, and Wikidata implementation. These criteria mean that: (1) projects reported in languages other than English were excluded; (2) duplicated papers recorded in different databases were excluded; Besides, When multiple papers and presentations reported on the same project, only the most recent or most comprehensive one was retained; (3) In some projects, Wikidata was cited in the literature review as previous work or as an example to illustrate a point. These projects were excluded because Wikidata was not implemented or planned to be implemented. As this article is a survey of the field’s use of Wikidata, these criteria also mean that projects that report Wikidata usage in a single sentence were accepted. In addition, projects from GLAM (galleries, libraries, archives, and museums) were included if they were reported at DH conferences, in DH journals, or if their abstracts, keywords, or texts explicitly mentioned DH. The selection process is demonstrated in Table 2 .

Selection process for the included projects

3.3 Data collection and analysis

Each of the fifty projects was tabulated according to the research questions. Table 3 shows the abridged results of the systematic review. For Q1, the reviewed projects were investigated to identify which terms were primarily used to describe Wikidata. The terms were then grouped into categories to classify projects further. For Q2, this article takes TaDiRAH (Taxonomy of Digital Research Activities in the Humanities) as its primary reference to categorize the various practices in the DH domain. TaDiRAH is a controlled vocabulary designed to structure DH-related information. It was chosen because it is anticipated to be particularly beneficial to endeavours aiming to collect information on DH projects. 8 TaDiRAH offers three main categories: research activities, research objects, and research techniques, which assist in describing the reviewed projects with varying focuses. This article adapts TaDiRAH to describe DH projects’ interaction with Wikidata into five types, namely, activity-focused (annotation and enrichment, modelling), object-focused (metadata curation), technique-focused (Named Entity Recognition), and miscellaneous. For Q3, this article identifies potentials by examining the relationships between Wikidata’s features identified in Q1 and user applications identified in Q2 with an emphasis on data flow between them. For Q4, this article identifies challenges reported in the reviewed projects in their interaction with Wikidata.

Summary of the projects included in the systematic review

The URL links pointing to the cited abstracts in the online version of the Book of Abstract for DH2019 were available when this systematic review was conducted in December 2021 and became inaccessible in October 2022. The links of the projects presented at DH2019 have been replaced with the URL from The Index of Digital Humanities Conferences which also contains metadata information about the reviewed projects. Readers can access the content when the issue is fixed.

4.1 Q1: How is Wikidata described in the current DH literature?

The descriptions of Wikidata in the current DH literature fall into three categories: a content provider of open, free, generic, editable, heterogeneous, linked data; a platform for crowdsourcing, collaboration, dissemination, and linking datasets on the Semantic Web and to the general public; and a technology stack to access Linked Data, as shown in Fig. 2 .

Wikidata components described in the reviewed projects

Wikidata components described in the reviewed projects

4.1.1 Content provider

DH is highly interdisciplinary, involving the Arts and Humanities, GLAM, and IS. Consequently, Wikidata as a content provider is referred to differently depending on the reviewed projects’ domains and specific tasks. This article divides the descriptions of Wikidata as a data source into two categories: content and form. Wikidata provides multilingual content [P16, P25], open [P25, P36], free [P17], generic [P5, P21, P43], editable [P31], heterogeneous [P31], global [P20, P48], online [P17, P32, P39], crowdsourced [P18, P48], and collaborative [P17, P21, P25].

For form, Wikidata can be viewed as a database [P18, P21, P31, P38], ontology [P22], knowledge base [P1, P3, P5, P6, P13, P14, P15, P21, P25, P34, P39, P43, P48], knowledge graph [P16, P34, P36]. What they are called is also domain-specific. In the context of metadata curation, a dataset is commonly referred to as an authority file [P20, P24, P25, P32, P44, P48] or a controlled vocabulary [P26, P46]. It is known as linked open data (LOD) on the Semantic Web [P2, P9, P13, P45]. Even within the same domain, whether they are used interchangeably or differently is context-dependent. 9 Their use is even more inconsistent in an interdisciplinary field such as DH. Therefore, these terms will not be strictly defined in this article, unless differences in interpretation of the form result in different applications.

4.1.2 Platform

According to one of the reviewed projects [P46], ‘Wikidata is not just a data source, but a platform as well’. Wikidata can serve as a dissemination platform to reach a broader audience [P25, P44]. It is also a Wikimedia platform for linking data from other Wikimedia sister projects, such as various Wikipedia language editions [P7, P47]. Moreover, Wikidata is evolving as a linking hub of data, so it is a good starting point for projects that intend to link their data to existing datasets [P3, P17, P22, P46]. Aware of this development, some projects [P38, P48] utilize Wikidata to access datasets from other external references.

4.1.3 Technology stack

In contrast to the other two categories, Wikidata as a technology stack for publishing linked data is only implied and practiced, not explicitly mentioned in the reviewed projects. The technology stack, particularly Linked Data, highlights its utility as a toolset for creating linked datasets. According to the W3C Working Group’s best practices for publishing linked data, 10 Linked Data modelling employs international standards for data interchange and query SPARQL, and it requires good URIs and standard vocabularies (multilingual if needed). Wikidata, which provides RDF dumps and a live SPARQL endpoint, language-neutral URIs, and classes, instances, and properties with URIs assigned, is suitable for Linked Data publication. The National Library of Israel converts its metadata to LOD by exporting metadata to Wikidata and creating new items on Wikidata for each selected manuscript [P33]. The library domain’s promotion of using Wikidata to lower barriers to LOD production ( Allison-Cassin and Scott, 2018 ) is in the context of their practices.

4.2 Q2: To what end is Wikidata being experimented with in the DH domain?

The primary applications of Wikidata are annotation and enrichment, metadata curation, modelling, and Named Entity Recognition (NER).

4.2.1 Annotation and enrichment

Annotation and enrichment are the most common uses of Wikidata. In such instances, Wikidata is often utilized as an external LOD resource to enrich the project’s own materials, such as corpora and texts [P20, P30, P31, P34, P43, P48], linguistic dictionaries and elements [P9, P26], datasets and databases [P3, P17, P19, P28, P31, P35, P47], journal articles [P25], and annotation tool [P1].

4.2.2 Metadata curation

Metadata curation in the reviewed projects is closely associated with practices in the library domain. Wikidata is primarily implemented either for the integration and interoperability of authority datasets [P8, P11, P24, P32, P37, P41, P46] or for the improvement of metadata quality and processes which falls into two categories: some institutes share their unique metadata for public use [P10, P33, P44], while others use Wikidata’s information to enhance their local records [P13, P15, P40]. One project uses Wikidata to visualize and analyse its metadata [P33].

4.2.3 Modelling

In terms of modelling, Wikidata is related to Linked Data best practices, which play a crucial role in knowledge modelling on the Semantic Web. In most cases, Wikidata is an external LOD referenced when populating an ontology with instances [P5, P6, P22, P27, P39, P45]. Project 20 links not only its instances but also classes and properties with Wikidata. In addition to Linked Data practices, Wikidata’s ontology is an important reference for the creation of data models [P16, P22].

4.2.4 NER and miscellaneous

Wikidata is used in NER-related tasks for corpus [P4, P18, P29, P42], and for NER tools and services [P2, P14, P21]. Besides, Wikidata is also used in pedagogy [P38] to teach archaeology students about the use of LOD. In data aggregation projects, it is one of the resources to link with domain-specific data [P23]. In addition, it is a source of data for research [P7, P12, P36, P39]. Different from projects in the annotation and enrichment category, these projects do not have their own materials to be processed but rather utilize Wikidata as their primary source.

4.3 Q3: What are the potentials of incorporating Wikidata into DH projects?

Q1 and Q2 are not independent of each other. Wikidata’s features promote or constrain possibilities of use by the DH domain while the DH projects’ perceptions and practices influence how it appropriates Wikidata for their own purposes. Wikidata as a content provider serves a variety of applications that consume external datasets for enrichment. Wikidata, as a platform, repository, and technology stack, is suitable for applications that need disseminating, storing, and producing data. Some applications show that the combined use of Wikidata’s features can lead to a collaborative environment for their data-related activities.

4.3.1 Data consumption—Wikidata as a data source

In all, 45 out of 50 projects consume Wikidata’s content. Wikidata is a data source for a wide range of disciplines, domains, time periods, and languages as a result of its diversity as a content provider. These include literary studies and related fields [P4, P5, P6, P8, P11, P18, P19, P20, P24, P27, P34, P41, P42], history [P12, P28, P35, P43], linguistics [P7, P9, P26, P30], archaeology [P38], philosophy [P36]. The data utilized are from multiple domains at different granularity levels. Among them, the most prevalent are geographical [P2, P3] and biographical data [P8, P19, P20, P28, P41]. The data ranges from antiquity [P20] to the present [P18, P39]. Linguistically, it supports research on French [P42], Latin [P5, P11], Classical Chinese [P50], Cuneiform languages [P26], German [P4, P24], Russian [P30], Belarusian [P8], Spanish [P41], Italian [P6], Finnish [P29], ancient Greek [P20], as well as multilingual materials [P7, P16].

The various forms of Wikidata offer a variety of consumption options. Its language-neutral identifier is widely utilized by projects [P5, P6, P8, P11, P20, P41]. Entities in Wikidata are often used to identify and enrich entities in the project’s data [P21, P29, P50]. Some projects consume not only entities but also Wikidata’s ontological classes and semantic relations [P16, P34, P45].

In addition to the variety and volume of its own content, Wikidata as a platform, particularly as a linking hub, enables projects [P38, P48] to consume external datasets that are linked to Wikidata.

4.3.2 Data publication—Wikidata as a platform, a repository, and a technology stack

Data publication is related to Wikidata as a platform for data dissemination, a repository for storing various types of data, and a technology stack for producing linked data. There are fewer projects that publish data on Wikidata. The motivations behind data publication can be divided into three types. The first option is to export its existing data to Wikidata. In this scenario, Wikidata is taken as a platform for dissemination. One example is project 44, which exports its archival authorities for audiovisual resources onto Wikidata in order to promote the utilization of these undervalued and underutilized materials. The production of linked data on Wikidata is a second type. In this instance, Wikidata serves not only as a platform for dissemination but also as a technology stack for the creation of linked data. The Hebrew Manuscript catalogue in the National Library of Israel exports its metadata to Wikidata and creates new items for selected manuscripts in order to convert them to LOD that can then be queried and visualized using Wikidata’s features and tools. The third type is the production of data for use. For instance, project 31 requires a reference knowledge base with which to link its items, but Wikidata is insufficient for its needs. To address this issue, it adds new entities to Wikidata for consumption by the project.

4.3.3 Data exchange—Wikidata as a virtual environment for data-related activities

Several projects [P17, P25, P49] endeavour to exchange data with Wikidata. The exchange between these projects and Wikidata improves the quality of data on both sides. By exporting local datasets to Wikidata, they can invite a broader audience to their own collections and potentially gain data enrichment from the crowd and other domain institutes. The crowdsourced data can then incorporate into their own systems. The DH community has long envisioned a collaborative, well-integrated environment for their own. These projects demonstrate the possibility of achieving this vision with Wikidata by combining the use of Wikidata as a content provider, a platform, and a technology stack.

4.4 Q4: What are the challenges associated with Wikidata mentioned in DH projects?

Most reviewed projects did not report the obstacles encountered. The challenges mentioned by the projects fall into two categories: technical implementation issues and data quality concerns. The technical discussion focuses on identifier mismatches [P3, P13, P17, P48] and ontological incompatibility [P13, P17]. When addressing data quality, projects prioritize coverage over accuracy. Wikidata is reported as rich in contemporary [P18] and notable [P3] entities but has a public interest bias [P13]. It often poses challenges to historical data [P14, P42] and local data [P48].

According to Cook (2017) , DH scholars who have a particular area of expertise find Wikidata to be overly generic and of low quality. The current review, however, has shown that Wikidata is primarily perceived as a content provider to fulfil a variety of tasks that require consuming external datasets. The gap between perception and practices leaves a question to ponder: why did the projects choose to consume Wikidata if its data is unspecific and of poor quality?

A close examination of the projects reveals that in practice, DH projects contextualize their perception of Wikidata’s quality by comparing or combining it with other data sources available for the project’s purpose. In other words, the quality of Wikidata is context-sensitive and case-specific. In some scenarios, Wikidata is of higher quality than alternative sources. Wikidata is suitable for projects requiring that the entities be well-known, popular, or infamous for something. For instance, Wikidata facilitates the extraction of this type of information in NER tasks [P18]. In newly emerging or niche fields where domain-specific sources are frequently lacking, Wikidata is recognized as a substitute with which domain sources can link to achieve linked data best practices. Wikidata is reported to be a reliable source for fictional [P13] and mythical figures [P20] in comparison to expert-curated sources such as VIAF (Virtual International Authority File). For cross-domain tasks, Wikidata becomes the knowledge anchor of choice for entity-fishing, because it is better than domain-specific sources in terms of supporting a generic service like entity-fishing [P21]. Even for the same type of data, different projects may have varying perceptions of Wikidata’s quality. In terms of geographical data, for instance, project 2 indicates that Wikidata has poor spatial data, primarily point data, whereas project 14 shows that Wikidata has substantial coverage for its geographical places.

Wikidata is often used in conjunction with other sources, such as DBPedia [P2, P3, P9, P13, P21, P23, P26, P43, P45, P50] and Wikipedia [P1, P3, P4, P7, P21, P28, P46, P47, P48]. VIAF [P8, P10, P11, P12, P13, P17, P22, P24, P35, P37, P41, P44] is another generic source frequently cited alongside Wikidata. For domain-specific sources, for example, Wikidata, together with other generic geographical data sources such as Geonames [P4, P15, P16, P23, P29], is combined with local geographical resources. In project 29, where Finish geographical sources lack global coverage, Wikidata is queried for retrieving international place names.

This review also shows that only a few projects have utilized Wikidata as a platform and a technology stack for sharing, storing, and creating data on Wikidata. Even fewer projects exchange data with Wikidata. However, as the IS systematic review ( Mora-Cantallops et al. , 2019 ) points out, Wikidata’s quality is affected by user types and editing practices, as well as external references. While the DH communities are concerned about the anonymity of Wikidata’s crowd, it is frequently overlooked that DH projects are expert users and a high-quality source of external references that have a positive impact on Wikidata’s quality. The greater a domain’s usage of Wikidata, the more likely its breadth and depth will increase on Wikidata. In this context, crowdsourcing is not only about how to manage the crowd but also about how to join the crowd. GLAM projects serve as excellent examples in this regard. GLAM projects [P16, P17, P33, P44] understand Wikidata as a linking hub and a technology stack for linked data and are more willing to contribute their datasets to Wikidata, which improves the data quality of their domains on Wikidata. Behind these projects are the development of community practices, tactics, and strategies to collaborate with Wikidata. This mode of collaboration can inspire the Arts and Humanities. There are pioneering works: the opening of data from academic research in film and media studies [P25] and the production-for-consumption strategy adopted by project 31 are examples of exchanging data with Wikidata to improve data quality. Future practitioners may consider taking advantage of Wikidata’s underused features, incorporating data sharing with Wikidata in their plan, and encouraging other domain players to form a community of practice on Wikidata.

This review did not survey the solutions to cope with the challenges the projects encountered. Since DH is a cross-disciplinary field, it is difficult for this research to cover all of the solutions used in DH projects in detail. Nevertheless, it can be concluded from the studied projects that a well-designed workflow for allocating limited labour and technical resources to automation and manual tasks is crucial for dealing with the challenges. A good illustration of this point is the various approaches adopted by projects for data reconciliation with Wikidata. For each case that has presented its reconciliation process, a certain amount of manual labour is required. In some instances, tools are created for domain experts [P31, P46] or the crowdsourcing force [P3, P48] to match entities through curation manually. In other instances where there are insufficient human resources, matching algorithms are developed to identify and reduce the number of situations requiring a human decision [P17]. For projects with sufficient technical support, Wikidata’s matching tool, such as Mix’n’match, 11 may not be an optimal solution. Project 17 chose to customize its own matching algorithms because the outcomes of Mix’n’match were limited. However, for these projects, it is necessary to be aware of the technical communities’ efforts, such as the empirical research on Wikidata’s knowledge organization and data coverage by the technical side, and tools and methods that have been developed to facilitate data matching and assess data distribution. As demonstrated in project 46, for projects with inadequate technical resources, tools developed on Wikidata and other open sources, such as OpenRefine, 12 are still viable options.

The geographical distribution of the reviewed projects is unbalanced, which can be attributed to the search strategy employed for the systematic review. The selected search engines, digital libraries, and conferences tend to offer English-language resources. The keyword searches are also conducted in English. Identifying projects outside the Anglophone and European spheres necessitates a more localized search strategy, employing multilingual keywords and a broader range of search targets. Besides, some initiatives are also reported on Wikimedia platforms and individual projects. These projects are difficult to track using the search strategies outlined in this article, and instead, rely on other researchers’ the occasional discovery or contributions. In the future, this article would like to see more focused discussion and contributions regarding Wikidata use cases in multilingual DH communities.

This article finds that (1) Wikidata is more than a knowledge base. In DH and related fields, it is understood as a content provider that offers data for a wide range of topics in various granularities and formats, a platform for collaboration, crowdsourcing, dissemination, and integration, and a technology stack to publish linked data; (2) Wikidata is frequently used for annotating and enriching DH project materials, curating metadata to improve the interoperability of authority datasets and local metadata quality, and publishing and interlinking linked data in knowledge modelling; (3) Most projects tend to take Wikidata as a content provider to consume data from, whereas there is more significant potential to use it as a platform and technology stack to publish linked data or to create a data exchange ecosystem between projects and Wikidata; (4) The reviewed projects face both technical challenges in their implementation and quality concerns regarding Wikidata. Data integration problems, such as mismatched identifiers and incompatible data models, are frequently reported as technical issues. Regarding data quality, projects prioritize data coverage over data accuracy. Wikidata is reported to have less historical, local, and obscure entity coverage.

Based on the research findings, this article makes the following three recommendations about the use of Wikidata in the specific context of the DH field: (1) Utilize domain sources in comparison or complement with Wikidata. The perception of Wikidata is contextualized by combining or contrasting it with other data sources available to serve a project’s purpose. Wikidata serves as a supplement to domain-specific resources and is a significant resource for projects that require niche or cross-domain resources. (2) Encourage domain communities to co-develop practices, tactics, and strategies to interact with Wikidata. The quality of the data is influenced by the general practices, tactics, and strategies of other players in the specific domain. It improves for domains where community members have reached a consensus on Wikidata’s role as a central hub for data exchange and are actively utilizing it. (3) Design workflows that coordinate technical and labour resources from projects and Wikidata. A workflow design that optimizes technical support from both Wikidata and the project’s technicians, as well as the human participation of domain experts, project users, and crowdsourcing forces on Wikidata or the project’s own platforms is essential for taming Wikidata’s ambivalent nature and maximizing its potentials for DH initiatives.

For a detailed description of Wikidata’s ontology, please refer to the following link: https://www.wikidata.org/wiki/Wikidata : WikiProject_Ontology

The RDF dumps are available at the following link: https://www.wikidata.org/wiki/Wikidata : Database_download

For the SPARQL service, please refer to the following link: https://query.wikidata.org

There are many ways datasets can be reconciled and linked with Wikidata. For example, a simple way to achieve reconciliation is by using the Mix’n’match tool ( https://mix-n-match.toolforge.org/#/ ) provided by Wikidata.

For beginner tutorials about Wikidata, please refer to the following link: https://www.wikidata.org/wiki/Wikidata : Tours

Data about the ADHO annual conferences are collected from the Index of Digital Humanities Conferences site that aggregates and presents conference metadata: https://dh-abstracts.library.cmu.edu/conferences

The journals are selected from a dataset of the list of DH journals published on zenodo ( Spinaci et al. , 2019 ).

https://github.com/dhtaxonomy/TaDiRAH

See an attempt to distinguish ‘database’ from ‘knowledge base’ in the library domain ( Tharani, 2021 , p. 4).

https://www.w3.org/TR/ld-bp/#CONVERT

https://mix-n-match.toolforge.org/#/

https://openrefine.org/

Abrami G. , Mehler A. , Manuel S. ( 2020 ). TextAnnotator: A Web-Based Annotation Suite for Texts, In Digital Humanities 2020: Conference Abstracts , Ottawa: Carleton University & Université d’Ottawa (University of Ottawa), p. 137 .

Adams B. ( 2021 ). Chronotopic information interaction: integrating temporal and spatial structure for historical indexing and interactive search . Digital Scholarship in the Humanities , 36 ( 3 ): 525 – 41 .

Google Scholar

Allison-Cassin S. , Scott D. ( 2018 ). Wikidata: a platform for your library’s linked open data. The Code4Lib Journal [Preprint] , (40). https://journal.code4lib.org/articles/13424 (accessed 22 October 2022).

Almeida P.D , Rocha , J. G. , Ballatore , A. , and Zipf , A. ( 2016 ). ' Where the streets have known names'. In Computational Science and Its Applications -- ICCSA 2016 . Cham : Springer International Publishing , pp. 1 – 12 . https://doi.org/10.1007/978-3-319-42089-9_1

Google Preview

Barbaresi A. ( 2017 ). Toponyms as entry points into a digital edition: mapping the torch (1899–1936). In Digital Humanities 2017: Conference Abstracts , Montréal: McGill University and Université de Montréal, pp. 159 – 61 .

Bartalesi V. , Pratelli N. and Meghini , C. ( 2021 ). A formal representation of the divine comedy’s primary sources: the Hypermedia Dante Network ontology . Digital Scholarship in the Humanities , 37 ( 3 ), 630 – 643 . https://doi.org/10.1093/llc/fqab080

Bartalesi V. , Metilli D. , Pratelli , N. , and Pontari , P. ( 2021 ). ‘ Towards a knowledge base of medieval and renaissance geographical Latin works: the IMAGO ontology’ . Digital Scholarship in the Humanities , 37 ( 1 ), 34 – 35 . https://doi.org/10.1093/llc/fqab060

Blessing A. , Kuhn J. ( 2016 ). Crosslingual textual emigration analysis. In Digital Humanities 2016: Conference Abstracts , Kraków: Jagiellonian University and Pedagogical University, pp. 744 – 745 .

Börner I. , Kohler G.-B. , Looschen S. ( 2017 ). Database of Belarusian periodicals. In Digital Humanities 2017: Conference Abstracts , Montréal: McGill University & Université de Montréal, pp. 679 – 80 .

Bowers J. , Romary L. ( 2017 ) Deep encoding of etymological information in TEI . Journal of the Text Encoding Initiative [Online] , Issue 10 . https://doi.org/10.4000/jtei.1643

Camlot J. , Neugebauer T. , Berrizbeitia F. ( 2020 ). Dynamic systems for humanities audio collections: the theory and rationale of swallow. In Digital Humanities 2020: Conference Abstracts , Ottawa: Carleton University and Université d’Ottawa (University of Ottawa), pp. 487 – 491 .

Carbé E. , Giannelli N. ( 2019 ). A digital platform for the ‘latin silk road’: issues and perspectives in building a multilingual corpus for textual analysis. In Digital Humanities 2019: Conference Abstracts , Utrecht: Utrecht University.

Cook S. ( 2017 ). The uses of Wikidata for galleries, libraries, archives and museums and its place in the digital humanities . Comma , 2017 ( 2 ): 117 – 24 .

Daquino M. , Daga E. , Tomasi F. ( 2019 ). MAuth - Mining Authoritativeness In Art History. In Digital Humanities 2019: Conference Abstracts , Utrecht: Utrecht University.

Egloff M. , Picca D. , Adamou A. ( 2019 ). Extraction of character profiles from the Gutenberg Archive. In Metadata and Semantic Research . Cham : Springer , pp. 367 – 72 .

Ehrmann M. , Romanello , M. , Flückiger , A. , and Clematide , S. ( 2020 ). Overview of CLEF HIPE 2020: Named Entity Recognition and linking on historical newspapers. In Experimental IR Meets Multilinguality, Multimodality, and Interaction . Lecture Notes in Computer Science. Cham : Springer International Publishing , pp. 288 – 310 .

Eslao C.F. , Osadetz S. ( 2018 ). Using linked open data to enrich concept searching in large text corpora. In Digital Humanities 2018: Conference Abstracts , Mexico City: El Colegio de México and Universidad Nacional Autónoma de México (UNAM) (National Autonomous University of Mexico), pp. 569 – 571 .

Eyharabide V. , Lully V. , Morel F. ( 2019 ). MusicKG: representations of sound and music in the middle ages as linked open data. In: Acosta, M., Cudré-Mauroux, P., Maleshkova, M., Pellegrini, T., Sack, H., Sure-Vetter, Y. (eds) Semantic Systems. The Power of AI and Knowledge Graphs. SEMANTiCS 2019 . Cham: Springer International Publishing, pp. 57 – 63 .

Faraj G. , Micsik A. ( 2019 ). Enriching Wikidata with cultural heritage data from the COURAGE Project. In Garoufallou E. , Fallucchi F. , William De Luca E. (eds), Metadata and Semantic Research . Communications in Computer and Information Science. Cham : Springer International Publishing , pp. 407 – 18 .

Fischer F., Börner, I. and Göbel, M. ( 2019 ). Programmable corpora: Introducing DraCor, an infrastructure for the research on European drama . In Digital Humanities 2019: Conference Abstracts , Utrecht : Utrecht University .

Fischer F. , Jäschke R. ( 2020 ). ‘ The Michael Jordan of greatness’—Extracting Vossian antonomasia from two decades of The New York Times, 1987–2007 . Digital Scholarship in the Humanities , 35 ( 1 ): 34 – 42 .

Foka A., Barker, E. and Konstantinidou, K. ( 2020 ). Semantically geo-annotating an ancient Greek ‘travel guide’ itineraries, chronotopes, networks, and linked data. In Proceedings of the 4th ACM SIGSPATIAL International Workshop on Geospatial Humanities, GeoHumanities 2020 , pp. 1 – 9 .

Foppiano L. , Romary L. ( 2020 ). Entity-fishing: a DARIAH entity recognition and disambiguation service . Journal of the Japanese Association for Digital Humanities , 5 ( 1 ): 22 – 60 .

Giovannetti E., Albanesi, D. and Bellandi, A. ( 2021 ). An ontology of masters of the Babylonian Talmud . Digital Scholarship in the Humanities, 37(3), 725–737 . https://doi.org/10.1093/llc/fqab043

Grossner K. , Mostern R.M. ( 2019 ). World-historical gazetteer . In Digital Humanities 2019: Conference Abstracts , Utrecht : Utrecht University .

Hechtl A. , Börner, I. , Fischer , F. , and Trilcke , P. ( 2017 ). Cäsar Flaischlen’s ‘Graphische Litteratur-Tafel’—digitizing a giant historical flowchart of foreign influences on German literature . In Digital Humanities 2017: Conference Abstracts , Montréal : McGill University & Université de Montréal , pp. 468 – 69 .

Heftberger , A. , Höper , J. , Müller-Birn , C. , and Walkowski , N.-O. (2020). Opening up research data in film studies by using the structured knowledge base Wikidata. In: Kremers, H. (eds), Digital Cultural Heritage , pp. 401–410. https://doi.org/10.1007/978-3-030-15200-0_27 .

Homburg T. ( 2019 ). Towards creating a best practice digital processing pipeline for cuneiform languages. In Digital Humanities 2019: Conference Abstracts , Utrecht: Utrecht University.

Huber A. ( 2020 ). ‘Telling bigger stories’: formal ontological modelling of scholarly argumentation. In Digital Humanities 2020: Conference Abstracts , Ottawa: Carleton University and Université d’Ottawa (University of Ottawa), pp. 428 – 431 .

Hyvönen E. , Heino , E. , Leskinen , P. , et al.  ( 2016 ). WarSampo Data service and semantic portal for publishing linked open data about the second world war history. In The Semantic Web. Latest Advances and New Domains . Lecture Notes in Computer Science. Cham : Springer International Publishing , pp. 758 – 73 .

Kettunen K. , Mäkelä , E. , Ruokolainen , T. , Kuokkala , J. , and Löfberg , L. ( 2017 ). Old content and modern tools – searching named entities in a Finnish OCRed Historical Newspaper Collection 1771– 1910 . Digital Humanities Quarterly [Online] , 011 ( 3 ).

Kitchenham B. ( 2004 ). Procedures for performing systematic reviews . Technical Report. UK: Keele University, p. 33 .

Kovalenko K. , Wandl-Vogt E. ( 2017 ). Collaborative approaches to open up Russian manuscript lexicons . In Digital Humanities 2017: Conference Abstracts , Montréal : McGill University & Université de Montréal , pp. 735 – 36 .

Kräutli F. , Valleriani M. ( 2018 ). CorpusTracer: a CIDOC database for tracing knowledge networks . Digital Scholarship in the Humanities , 33 ( 2 ): 336 – 46 .

Mellet M. , Fauchié, A. , Sauret , N. , Vitali-Rosati , M. , and Juchereau , A. ( 2020 ). Stylo, a semantic writing tool for scientific publishing in human sciences . In Digital Humanities 2020: Conference Abstracts , Ottawa : Carleton University and Université d’Ottawa (University of Ottawa ), p. 119 .

Miller Y. , Prebor G. ( 2020 ). From metadata to linked open data and wikidata : Yemenite Hebrew Manuscripts and Wikidata . In Digital Humanities 2020: Conference Abstracts , Ottawa : Carleton University & Université d’Ottawa (University of Ottawa ), p. 68 .

Mora-Cantallops M. , Sánchez-Alonso S. , García-Barriocanal E. ( 2019 ). A systematic literature review on Wikidata . Data Technologies and Applications , 53 ( 3 ): 250 – 68 . https://doi.org/10.1108/DTA-12-2018-0110 .

Müller S., Brunzel, M., Kaun, D., et al.  ( 2019 ). HistorEx: exploring historical text corpora using word and document embeddings. In The Semantic Web: ESWC 2019 Satellite Events . Lecture Notes in Computer Science. Cham : Springer International Publishing , pp. 136 – 40 .

Nijboer H. , Brouwer J. , Bok M.J. ( 2019 ). Unthinking Rubens and Rembrandt: counterfactual analysis and digital art history . In Digital Humanities 2019: Conference Abstracts , Utrecht: Utrecht University .

O’Sullivan J. , Durity , A. , Vulcu , G. , Bordea , G. , and Jones , J. E. ( 2016 ). The categories of philosophy in the digital era . In Digital Humanities 2016: Conference Abstracts , Kraków : Jagiellonian University & Pedagogical University, pp. 783–785 .

Page K. , Burrows , T. , Hankinson , A. , et al.  ( 2019 ). A layered digital library for cataloguing and research: practical experiences with medieval manuscripts, from TEI to linked data . In Digital Humanities 2019: Conference Abstracts , Utrecht : Utrecht University .

Palladino C. , Bergman , J. , Trammell , C. , Mixon , E. , and Fulford , R. ( 2019 ). Using linked open data to navigate the past: an experiment in teaching archaeology . In Digital Humanities 2019: Conference Abstracts , Utrecht : Utrecht University .

Broadwell , P. , and Tangherlini , T. R. ( 2021 ). Comparative K-Pop choreography analysis through deep-learning pose estimation across a large video corpus . Digital Humanities Quarterly [Online] , 15 ( 1 ).

Reeve J. ( 2020 ). Corpus-DB: a scriptable textual corpus database for cultural analytics . In Digital Humanities 2020: Conference Abstracts , Ottawa : Carleton University and Université d’Ottawa (University of Ottawa ), p. 230 .

Ruiz Fabo P. , Bermúdez Sabel , H. , Martínez Cantón , C. , and González-Blanco , E. ( 2021 ). The Diachronic Spanish Sonnet Corpus: TEI and linked open data encoding, data distribution, and metrical findings . Digital Scholarship in the Humanities , 36 ( Supplement_1 ): i68 – i80 .

Soudani A. , Meherzi , Y. , Bouhafs , A. , et al.  ( 2019 ). Adapting a system for named entity recognition and linking for 19th century French novels . In Digital Humanities 2019: Conference Abstracts , Utrecht : Utrecht University .

Spinaci G. , Colavizza G. , Peroni S. ( 2019 ). List of Digital Humanities journals . Zenodo . https://doi.org/10.5281/zenodo.4164710 .

Steiner C. ( 2019 ). Cooking recipes of the middle ages: corpus, analysis, visualization . In Digital Humanities 2019: Conference Abstracts , Utrecht : Utrecht University .

Sapienza , S. , Hoyt , E. , John , M. S. , Summers , E. , and Bersch , J. ( 2021 ). Healing the gap: digital humanities methods for the virtual reunification of split media and paper collections Digital Humanities Quarterly [Online] , 15 ( 1 ).

Sugimoto G. ( 2020 ). Building linked open date entities for historical research. In Metadata and Semantic Research . Cham : Springer , pp. 323 – 35 .

Thalhath N. , Nagamori , M. , Sakaguchi , T. , and Sugimoto , S. ( 2020 ). Wikidata centric vocabularies and URIs for linking data in semantic web driven digital curation In Metadata and Semantic Research . Cham : Springer , pp. 336 – 44 .

Tharani K. ( 2021 ). Much more than a mere technology: a systematic review of Wikidata in libraries . The Journal of Academic Librarianship , 47 ( 2 ), 102326. https://doi.org/10.1016/j.acalib.2021.102326

Thompson R. , Mukhopadhyay T.P. ( 2021 ). Digital arts in Latin America: a report on the archival history of intersections in art and technology in Latin America . Digital Scholarship in the Humanities , 36 ( Supplement_1 ): i113 – i123 .

Veja C. , Hocker , J. , Schindler , C. , and Kollmann , S. ( 2018 ). Bridging citizen science and open educational resource. In Proceedings of the 14th International Symposium on Open Collaboration , pp. 1 – 12 .

Vitali-Rosati M. , Monjour , S. , Casenave , J. , Bouchard , E. , and Mellet , M. ( 2020 ). Editorializing the Greek Anthology: the Palatin manuscript as a collective imaginary . Digital Humanities Quarterly [Online] , 014 ( 1 ).

Wang Q. , Nurmikko-Fuller T. , Swift B. ( 2019 ). Analysis and visualization of narrative in Shanhaijing using linked data . In Digital Humanities 2019: Conference Abstracts , Utrecht : Utrecht University .

Email alerts

Citing articles via.

  • Recommend to Your Librarian

Affiliations

  • Online ISSN 2055-768X
  • Print ISSN 2055-7671
  • Copyright © 2024 EADH: The European Association for Digital Humanities
  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

How to Critically Utilise Wikidata – A Systematic Review of Wikidata in DH Projects

1. fudie zhao.

Oxford University

Initiated in 2013, Wikidata is a free and open knowledge base that acts as central storage for the structured data of its Wikimedia sister projects. It has been adopted and systematically reviewed in Information Science/Computer Science (Mora-Cantallops et al., 2019) and the library domain (Tharani, 2021). Projects in the DH domain have also been embracing Wikidata in their data-related activities. For example, since 2016, 43 presentations at DH conferences held by ADHO have mentioned Wikidata in their abstracts, as shown in Fig.1.

Data about the ADHO annual conferences is collected from the Index of Digital Humanities Conferences site which aggregates and presents conference metadata:

https://dh-abstracts.library.cmu.edu/conferences

However, except Stacey’s paper about Wikidata’s use in GLAMs and DH (Stacey, 2017), there still lacks a systematic review regarding Wikidata’s status quo, potential, and challenges in the field.

Fig.1: Wikidata-related presentations at ADHO annual conferences

This short paper intends to fill this research gap by proposing four research questions: Q1: How is Wikidata described in the current DH literature? Q2: To what end is Wikidata being experimented within the DH domain? Q3: What are the potentials of embracing Wikidata in data-related activities in DH projects? Q4: What are the challenges and possible solutions associated with Wikidata in DH projects? To answer the questions, a systematic literature review of DH projects that adopted Wikidata has been conducted based on the guidelines for Systematic Review proposed by Kitchenham (2004). Book of Abstracts from ADHO annual conferences, a compiled list of DH journals, and five online academic research databases (ACM Digital Library, Springer Link, and Web of Science, Science Direct) were searched and screened, guided by pre-determined search strategies and inclusion & exclusion criteria. 196 papers/presentations were identified in the sources, and after the screening, 58 were selected based on criteria (English only, no duplicates, only application studies, Wikidata implemented) for further analysis in Table 1.

Until December 31, 2021.

Table 1: Total number of articles and presentations identified from each source

This paper finds that: The descriptions of Wikidata in the current DH literature fall into three categories: a technology stack to access Linked Data, a platform for crowdsourcing, collaboration, dissemination, and linking datasets on the Semantic Web, and a content provider of open, free, generic, editable, heterogeneous, linked data, as shown in Fig.2:

Fig. 2: Wikidata Components

Wikidata has been included in data-related tasks such as annotation and enrichment, metadata curation, named entity recognition and disambiguation, knowledge representation and ontological engineering, data sourcing, aggregation of datasets, and the pursuit of open citation data and pedagogical practices (miscellaneous) as shown in Table 2.

Table 2: Wikidata application areas in the reviewed items

Projects in the DH domain can use Wikidata for data consumption and publication: 1) Data consumption – Wikidata is a data source for enrichment. 2) Data publication and exchange – Wikidata is an access point to disseminate data to the broader landscape of the Web for public engagement; a platform for crowdsourcing and collaborative production of linked data; a linked data approach towards the integration of data within a specific domain. The use of Wikidata is accompanied by doubt about its data quality. Cook (Cook, 2017, 122) points out that Wikidata’s data is too generic and short of quality for DH scholars who tend to work in a specific area, while Wikimedians pay less attention to research-oriented DH projects and focus more on projects which gather data and edit pages. The DH community can learn from the technical community regarding the factors that influence its data quality, and possible solutions. Factors specified in the research include: user types and their editing activities, the effectiveness of systems and tools to facilitate detection and improvement of data quality, and the relevance and authoritativeness of its external references and sources. The solutions proposed by the technical community encompass 1) a better understanding of users and the editorial process via research, and 2) the development of systems, measures, and tools concerning the evaluation and improvement of different dimensions of data quality. The technical side, however, has its limitation. As pointed out by the IS systematic review (Mora-Cantallops et al., 2019, 262), such applications are mostly limited to Wikidata itself and are yet to be linked to disciplines outside information systems. The contribution of this paper is to address three factors and relevant solutions in the specific context of DH projects: the relevance and authoritativeness of other available domain sources, domain communities and their activities, and workflow designs that balance the automated and manual work by utilising the technical and labour resources of a project’s own and those offered by Wikidata. This paper intends to invite discussion from participants at DH2022 about Wikidata’s possible use in the DH context and the challenges it may face.

Bibliography

Cook, S. (2017). The uses of Wikidata for galleries, libraries, archives and museums and its place in the digital humanities. Comma, 2017(2):117-124.

Kitchenham, B. (2004). Procedures for performing systematic reviews.  Keele, UK, Keele University,  33(2004), 1-26.

Mora-Cantallops, M., Sánchez-Alonso, S. and García-Barriocanal, E. (2019). A systematic literature review on Wikidata. Data Technologies and Applications, 53(3): 250–68.

Tharani, K. (2021). Much more than a mere technology: A systematic review of Wikidata in libraries. The Journal of Academic Librarianship, 47(2).

Full text license: This text is republished here with permission from the original rights holder.

If this content appears in violation of your intellectual property rights, or you see errors or omissions, please reach out to Scott B. Weingart to discuss removing or amending the materials.

ADHO - 2022

"responding to asian diversity".

Tokyo, Japan

July 25, 2022 - July 29, 2022

361 works by 945 authors indexed

Held in Tokyo and remote (hybrid) on account of COVID-19

Conference website: https://dh2022.adho.org/

Contributors: Scott B. Weingart, James Cummings

Series: ADHO (16)

Organizers: ADHO

  • Keywords: digital humanities knowledge base linked open data wikidata
  • Language: English
  • Topics: asia contemporary data publishing projects, systems, and methods english europe humanities computing linked (open) data north america

Loading metrics

Open Access

Ten quick tips for editing Wikidata

* E-mail: [email protected]

Affiliation Swinburne University of Technology, Melbourne, Australia

ORCID logo

Affiliations Ronin Institute, Montclair, New Jersey, United States of America, Institute for Globally Distributed Open Research and Education (IGDORE), Gothenburg, Sweden, Leibniz Institute for Freshwater Ecology and Inland Fisheries (IGB), Berlin, Germany, FIZ Karlsruhe–Leibniz Institute for Information Infrastructure, Berlin, Germany

Affiliations Ronin Institute, Montclair, New Jersey, United States of America, University of São Paulo, São Paulo, Brazil

Affiliation Kozminski University, Warsaw, Poland

Affiliation Micelio, Ekeren, Belgium

  • Thomas Shafee, 
  • Daniel Mietchen, 
  • Tiago Lubiana, 
  • Dariusz Jemielniak, 
  • Andra Waagmeester

PLOS

Published: July 20, 2023

  • https://doi.org/10.1371/journal.pcbi.1011235
  • Reader Comments

Fig 1

Citation: Shafee T, Mietchen D, Lubiana T, Jemielniak D, Waagmeester A (2023) Ten quick tips for editing Wikidata. PLoS Comput Biol 19(7): e1011235. https://doi.org/10.1371/journal.pcbi.1011235

Editor: Francis Ouellette, McGill University, CANADA

Copyright: © 2023 Shafee et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: T.L. is supported by FAPESP grant #19/26284-1 (São Paulo Research Foundation). This funder played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

This is a PLOS Computational Biology Software paper.

Introduction

This article acts as a successor to the 10 simple rules for editing Wikipedia from a decade ago [ 1 ]. It addresses Wikipedia’s machine-readable cousin: Wikidata—a project potentially even more relevant from the point of view of Computational Biology.

Wikidata is a free collaborative knowledgebase [ 2 ] providing structured data to every Wikipedia page and beyond. It relies on the same peer production principle as Wikipedia: anyone can contribute. Open, collaborative models often surprise in how productively they work in practice, given how unlikely they might be expected to work in theory. Nevertheless, they can still be met with a lot of resistance and suspicion in academic circles [ 3 , 4 ].

Since its launch in 2012, Wikidata has rapidly grown into a cross-disciplinary open knowledgebase with items ranging from genes to cell types to researchers [ 2 , 5 – 7 ]. It has wide-ranging applications, such as validating statistical information about disease outbreaks [ 8 ], aligning resources on human coronaviruses [ 9 ], or assessing biodiversity [ 10 , 11 ]. It can be thought of as a vast network graph ( Fig 1A ), wherein the items act as nodes (now over 100 million) linked to one another by over a billion statements, and further linked out to the wider web by many billions more. We’ll link to example Wikidata items and properties by using italics throughout the text as we refer to them ( Fig 1 ).

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

Wikidata items are linked to one another and to outside databases via properties that describe the relationships between them. ( A ) Some example links to and from the item human retinoic acid receptor alpha ( Q254943 ) . Items can have outgoing links; e.g., to the concept of a protein ( Q8054 ) , incoming links; e.g. from the human RAR–SRC1 complex ( Q107514806 ) , or both; e.g., to and from the human RARA gene ( Q18031040 ) . There can be multiple links out with the same property (e.g., multiple molecular functions) and links out to external websites and identifiers; e.g., it has the MeSH ID ( P486 ) of D011506. The links formed by properties can be further annotated with qualifiers; e.g., its physical interaction with ( P129 ) tretinoin ( Q29417 ) is with the role of ( P2868 ) being an agonist ( Q389934 ) . Now imagine this for a hundred million node items and many billions of property edges. ( B ) The human–readable interface for this item is organised into the label, description, and aliases, followed by a list of statements with their qualifications and references, with a final section listing any Wikipedia (and other wikimedia) pages for the item. ( C ) Example labels, descriptions, and aliases for virus ( Q808 ) from the 410 currently supported languages. These screenshots contain only text and data released under a CC0 licence .

https://doi.org/10.1371/journal.pcbi.1011235.g001

The online interface makes the items themselves somewhat human-readable ( Fig 1B ), but their structured nature makes it possible to query and combine the information in ways that can’t be achieved for information sources written entirely in prose. This versatility makes its applications in computational biology, arguably, even more universal and flexible than just relying on Wikipedia alone [ 12 ]. Queries on Wikidata can vary from which gene variants predict a positive prognosis in colorectal cancer to taxa by number of streets in the Netherlands that bear their name . We’ll try to use examples relevant to computational biology, but bear in mind that the same can be done with almost everything from a map of mediaeval witch executions in Scotland to emergency phone numbers by population using them to paintings depicting frogs .

Since it’s under a CC0 copyright waiver, Wikidata’s structured content is essentially released into the public domain to be used on other projects [ 13 ]. You’ll probably have already seen its structured data at the top of search engine results but it’s also used behind the scenes on thousands of sites, becoming the backbone infrastructure for using, sharing, and collaboratively curating structured reference knowledge.

Tip 1: Learn by doing

If you’re thinking of editing Wikidata, you can start right away, perhaps by exploring and experimenting with one of its sandbox items like Q4115189 , or by taking some of the introductory tours . While it is possible to edit without an account, it is best to register one. Wikidata uses the same user account as Wikipedia or Wikimedia Commons. This enables you to build a reputation within the editor community as you contribute, makes it easier for other editors to contact and collaborate with you, and will enable you to use some additional tools (see Tip 9). Paradoxically, it can also protect your anonymity better: you edit under a username of your choice instead of your edits being tagged with your IP address. Once you’ve created your account, it’s useful to click on your username in the top right of the screen to add some basic information to your userpage—particularly your topics of interest and your areas of expertise. It is increasingly common, although not required, for researchers on Wikidata to also link out to their real-world identity (faculty profile, professional social media, personal website, etc.) or simply to the Wikidata entry about them.

Whereas Wikipedia strictly prohibits editing a page about yourself (if you have one), in Wikidata, it is acceptable to add uncontroversial statements to the Wikidata item about you if you can reference them to publicly available sources (see Tip 7). It can therefore be useful to search for yourself in Wikidata and add statements, for example, your ORCID ( P496 ) , Github account ( P2037 ) , or Wikimedia username ( P4174 ) . Also note that while it is technically possible to add phone numbers or email addresses, be extremely cautious about adding any information—to any item—that may violate privacy (the policy about living people provides guidance here).

Tip 2: Think of knowledge as structured statements

Information in Wikidata is organised into statements. A basic statement is a triple containing a subject, a predicate, and an object. Although the subject of a statement is always a Wikidata item, the object can be either another Wikidata entity or another data type such as strings, URLs, quantities, or external identifiers. For example, Human retinoic acid receptor alpha ( Q254943 ) has the molecular function ( P680 ) of retinoic acid binding ( Q14901431 ) ( Fig 1 ). The identifiers beginning with Q are items and indicate objects, concepts, or events. Identifiers beginning with P are the properties that define relationships.

This model of statements is common to linked data repositories aligned to the Semantic Web [ 14 – 16 ], and Wikidata extends it with qualifiers and references that enable capturing specific detail and provenance (see Tip 7). For example, the statement Retinoic acid receptor alpha ( Q254943 ) physically interacts with ( P129 ) tretinoin ( Q29417 ) , with the role ( P2868 ) of agonist ( Q389934 ) cites as a reference that it is stated in ( P248 ) the IUPHAR/BPS database ( Q17091219 ) .

Besides Ps and Qs, some other identifiers with a leading letter are important in the Wikidata ecosystem. For example, identifiers starting with Ls are for lexemes that indicate linguistic properties of a word or phrase, e.g., the Swedish noun “modell” ( L47542 ) has multiple meanings, only one of which is a simplified representation of reality ( Q1979154 ) . Similarly, Wikidata identifiers starting with E are for entity schemas, which are particularly useful for defining and validating items (see Tip 9).

Wikidata is based on the knowledge graph management software Wikibase . Since the software is open-source, it is also used in a range of other specialist applications to host data as structured statements. Learning this way of thinking about information therefore enables participation beyond Wikidata. The main other example within the Wikimedia ecosystem is annotation of the Wikimedia Commons media-sharing platform. It is also being implemented in projects outside of Wikimedia that range from ontologies for botanical collections [ 17 ], a semantic map of the trade of enslaved people [ 18 ], or general research data management applications [ 19 ].

Tip 3: Take a look at what’s already there

The main reason to have data in a multidisciplinary knowledgebase is to be able to extract and combine it in interesting ways. It is possible to search for and view items individually via the user interface on the web or browse geographically nearby items , but a more powerful counterpart to this is to explore the data using database queries. Wikidata can be queried using the SPARQL language via tools such as the Wikidata Query Service . It is worth noting that queries are organised around semantic concepts rather than simple keyword text strings, so searching for “ diseases associated with human pancreatic beta cells via markers ” is essentially asking “find items listed as gene markers of human beta cells; for those genes, find diseases associated with them; count how often each gene occurs.”

The Wikidata Query Service also has several inbuilt lightweight visualisation options. The simplest is probably scatterplots of categorical ( Fig 2A ) and continuous ( Fig 2B ) data. For geographical data, it is possible to overlay coordinates over a map ( Fig 2C ). In the self-referential tradition of the Ten Simple Rules series [ 20 ], looking at the subset of the Wikidata network showing co-occurrence of main subjects in the “Simple Rules” and “Quick Tips” articles series illustrates the main clusters around the themes of career, learning, and software ( Fig 2D ). These visualisations are usually best viewed interactively, so links to the SPARQL queries are included in the figure legend. For those not experienced with SQL queries, you can request help for building queries (see Tip 9).

thumbnail

( A ) Which drugs target genes involved in cell proliferation . ( B ) Litter size versus lifespan for endangered species . ( C ) Birthplaces of people after whom taxa are named . ( D ) Co–occurance of main topics of PLOS “Simple Rules” and “Quick Tips” articles (cropped to subset around learning and career). The map image is produced using base map and data from OpenStreetMap and the OpenStreetMap Foundation .

https://doi.org/10.1371/journal.pcbi.1011235.g002

Some of these example queries are highly specialist, whereas others take advantage of Wikidata’s interdisciplinarity to show things that are difficult to answer with specialist databases, such as combining biological, historical, and geographic data to illustrate a sociological phenomenon of taxon naming bias ( Fig 2C ).

Some sites, such as Scholia (an open-source alternative to the likes of Google Scholar), are built entirely from Wikidata query visualisations [ 21 ], for example, this summary of publications about human astrocytes . The Wikidata Query Service also provides a code section from which snippets can be copied into different programming tools for more in-depth analysis and visualisation. Many of those tools are also able to dynamically read from and write to Wikidata so that items can be kept dynamically up-to-date and integrated with other programming pipelines (see Tip 9).

Tip 4: Join a community

Wikidata’s community portals are available in multiple languages ( Box 1 ) and have broad introductory help and training; however, the majority of its help comes from its community of users, who can be contacted individually or via several rapid-response locations like general project chat (discussions are archived after about a week). There is also a request-a-query page to ask for assistance in creating or refining SPARQL queries. Wikiprojects are self-organised communities of practice consisting of volunteers and their bots, focused on items in a specific topic, their structure, and typology, and are typically highly communicative and supportive to each other [ 22 ]. Some have relatively broad scope (e.g., Molecular Biology , Taxonomy , Medicine , Source MetaData ) and others are more narrow (e.g., COVID-19 , Haplogroups )—see the full directory here . These projects have discussion pages as well as exemplar items that can help you align newly added content with best practice (see Tip 6). In case of more serious problems or dispute resolution, consulting with current admins may be useful, too.

Additionally, it is possible to join Wikidata communities outside the Wikidata platform. Wikidata’s contributors are generally keen to help and collaborate with anyone interested in the platform, so consider also reaching out to researchers who’ve used Wikidata . You can join in-person events organised by the Wikimedia affiliates in most regions. Since multiple groups might exist for any given topic, so once you have found one that resonates with your interests, keep an eye out for others exploring other facets of the same topic. Finally, there are active Wikidata communities on social media platforms such as Mastodon , Telegram , Twitter, or Facebook .

Tip 5: Improve existing data

The easiest first edit to make is to add a new statement to an existing item. Just use the button and Wikidata will attempt to autocomplete and suggest potential properties and items as you type. A good way to get started with editing is to check out the external identifiers section on an item’s page and perhaps add some missing identifiers for the concept from a database you are familiar with. So for example, if you are on an item about a taxon, you could check whether it correctly states the corresponding GBIF taxon ID ( P846 ) , NCBI taxonomy ID ( P685 ) , MycoBank taxon name ID ( P962 ) , IPNI plant ID ( P961 ) , WoRMS-ID for taxa ( P850 ) , etc. These sorts of links out to external identifiers make Wikidata a valuable tool for easily cross referencing items between different resources for each concept.

Another good way to get started is to explore items about research articles and review—and possibly add—statements for main subject ( P921 ) . A way of annotating such articles that is particularly unique to Wikidata is adding statements for describes a project that uses ( P4510 ) to add important tools, techniques, or materials that the article highlights in its methods section. You can introduce a lot of extra richness to a statement including qualifiers via ( Fig 1 ). The web interface can be customised with a range of extra tools and gadgets via your preferences to align its capabilities to what is most useful to you.

You can also edit an item’s short description using the button at the top. Even though these aren’t machine-readable, the text is useful for humans to disambiguate between items at a glance (for example, the word “translation” might indicate “the creation of proteins using information from nucleic acids” or “a function that moves every point a constant distance in a specified direction in euclidean geometry” or “transfer of meaning from one language into another”).

Tip 6: Be bold, but not reckless

Like editing Wikipedia [ 1 ], the apparent complexity of Wikidata can make getting started seem intimidating. The trick is to start small. Try looking up Wikidata items on some key papers in your field of research (or this list of PLOS Comp Biol articles) and see if you can add its keywords as main subject ( P921 ) or its methods as describes a project that uses ( P4510 ) . Such annotation can get pretty detailed and granular as you can see in this example .

To work out how to best model new data you want to integrate, you can check out the showcases that many Wikiprojects maintain (see Tip 4) to see how similar item types should be organised for consistency. If your planned additions extend on current examples, involving those experienced contributor communities in the data modelling decisions can ensure that new content is modelled consistently with existing statements.

Remember, you can easily revert edits if you’ve made a mistake—go to the history tab at the top and click “undo.” If doing mass edits or additions (see Tip 9), remember to validate the updated data to make sure you’ve made the changes you intended to [ 8 , 23 ].

Tip 7: Add references (cite, cite, cite)

Just like in Wikipedia, Wikidata is primarily a secondary resource and acts as a hub or proxy to other resources, ideally in a way that facilitates verifiability. All statements should therefore, whenever possible, cite their provenance to existing knowledge in other external reliable sources. These are added via the button. To cite research articles, books, and other common reference types, you can reference their Wikidata QID ( Fig 3A ). If the source you want to use as a reference doesn’t have a Wikidata item yet, you can add it using tools such as Scholia. It is also possible to reference entries in external databases ( Fig 3B ) or webpages ( Fig 3C ). For sources that might change over time like databases and webpages—it is best to include the date retrieved or even an archived URL. Lastly, especially when a concrete reference isn’t possible, it is useful to provide the heuristic used ( Fig 3D ; list ). It’s worth including citations for even seemingly trivial statements if a reference is available, for example, the statement that an intron ( Q207551 ) is part of ( P361 ) a primary transcript ( Q7243183 ) references 2 papers ( Fig 3A ).

thumbnail

Examples of ( A ) referencing to Wikidata items for journal review articles, ( B ) referencing to a database entry, ( C ) referencing to a website, or ( D ) using a heuristic estimate to justify a statement. These screenshots contain only text and data released under a CC0 licence .

https://doi.org/10.1371/journal.pcbi.1011235.g003

Tip 8: Create new entities

Don’t be afraid to create new items. In general, each item should describe a single concept. For example, there are separate items for the ɑ-defensin protein domain ( Q4063641 ) , ɑ-defensin propeptide domain ( Q24727071 ) , ɑ-defensin gene family ( Q81639709 ) , ɑ-defensin 1 mouse gene ( Q18248700 ) , ɑ-defensin 1 mouse protein ( Q21421153 ) , etc.

It is trivially simple to create a new item: the “create new item” link on the left will allow you to define an item, assign a short description, and add any aliases that it might also be known by. Newly created items always need to be given an instance of ( P31 ) or subclass of ( P279 ) statement to link it into the wider knowledgebase, but otherwise there are no compulsory fields. An easy way to identify additional statements to add is by checking items of a similar type. The interface will also attempt to suggest potential properties as you add statements ( Fig 4 ). Although it’s best to avoid duplicates, merging items later is easy if it turns out there’s more than one for the same thing. You can also use Cradle where you can populate new items via a lightweight form which prompts you to include the most common fields.

thumbnail

Once you start to add statements to an item (especially instance of/subclass of), the interface will begin to suggest common properties to add that other similar items include. Depicted are the suggestions given for a protein domain. For some properties, it will then also suggest common values for that statement. This screenshot contains only text and data released under a CC0 licence .

https://doi.org/10.1371/journal.pcbi.1011235.g004

While all scientific concepts fit on Wikidata in principle, there are notability guidelines that advise on which things should or should not have items. For example, valid taxa, type specimens, or reference genomes are essentially automatically notable. In contrast, not all humans are sufficiently notable, though researchers who have published peer-reviewed articles usually are.

Proposing new properties that can be used to link items is trickier. Compared to the >100M items, there are only 8K properties, so these have more a role of a controlled vocabulary. To propose a new property, simply list it and some example use cases at Wikidata:Property_proposal and experienced contributors will check if it makes sense to implement as proposed or with some changes or whether an already existing property can be adapted.

Tip 9: Edit information in bulk

Once you’ve learnt how to add single statements and create single items, you’ll likely want to scale this up to edit information in bulk. Databases with a CC0 are becoming more common and can be integrated into Wikidata in full (e.g., CIViC, Wikipathways, Disease Ontology, and the Evidence and Conclusion Ontology). Other datasets (e.g., Uniprot, CC BY 4.0 licence) can still be integrated by linking out to them via external identifiers ( example ) or have their data integrated as a statement with proper referencing to attribute it ( example ).

When getting into larger scale editing, it is generally best to scale up test sets to identify any issues that come up—do a batch of 10 or a hundred edits before trying a thousand or a million. There are a range of ways to achieve this. There are Wikidata Tools available that cover a range of common situations. Editing tools can generally only be used after a minimum number of manual edits (typically 50) or a minimum age of the account (typically 4 days).

OpenRefine and Ontotext Refine take a spreadsheet of statements to be added and reconcile text strings in that spreadsheet to their most likely Wikidata items, flagging required manual intervention for ambiguous matches [ 24 ]. Ontotext Refine also contains an “RDF mapper,” which can help integrate Wikidata into external databases by generating a separate RDF that uses Wikidata’s identifiers but can be used outside of Wikidata. Quickstatements is a similar Wikidata editing tool, though it does not include the reconciliation functions so you’ll need to know any Wikidata QIDs to be included in statements beforehand [ 25 ]. Libraries are available in a range of languages ( Table 1 ) to interface with Wikidata via its dynamic API and the query service. For example, the Wikidata integrator library can update items based on external resources and then confirm data consistency via SPARQL queries. It is used by multiple python bots to keep biology topics up to date, such as genes, diseases, and drugs ( ProteinBoxBot ) [ 14 ], or cell lines ( CellosaurusBot ) [ 26 ].

thumbnail

https://doi.org/10.1371/journal.pcbi.1011235.t001

Since Wikidata is expressed as RDF, it comes with an EntitySchema extension [ 27 ] that enables describing the schema of captured knowledge as Shape Expressions (ShEx)—a formal language to describe data on the Semantic Web [ 28 ]. EntitySchema have been created for a range of item classes ( list ), for example, the Protein Reactome Schema ( E39 ) or clinical trial schema ( E189 ) . They act as documentation for the data deposited by data donors, but they also act as a document to describe expectations by users [ 8 , 28 ].

Tip 10: Mind the gaps: What data is currently missing?

Wikidata is a secondary source for data and so, though it is rapidly growing, it will never be complete. This means that some level of inconsistency and incompleteness in its contents is currently inevitable [ 8 , 29 – 31 ]. There is thorough coverage of some items, such as protein classes [ 32 ], human genes [ 14 ], cell types [ 26 , 33 ], and metabolic pathways [ 34 ]. However, this is not true across all topics, and inconsistencies fall into a few categories ( Box 2 ).

Box 2. Main classes of data inconsistency in Wikidata

  • (A) Item incompleteness. Since Wikidata is still in an exponential growth phase, it can be difficult to predict which topics will have already been well developed by the community and which are not yet well covered or linked out to external databases. For example, at time of writing, many common bioinformatics techniques and equipment types are currently missing. The opposite issue can also come up—duplicate items—which are resolvable via the merge function.
  • (B) Statement incompleteness. The issue of incompleteness can affect any part of Wikidata’s data model. For example, for many items about people, there are no statements about their date or place of birth. In cases where multiple statements are common for a given property on a given item (e.g., someone’s employer), some or all of them might be missing or outdated. Present statements can also sometimes be inconsistent—e.g., an occupation statement might include values that aren’t occupations, but rather a field of work, a produced good, or a genre.
  • (C) Language incompleteness . The language coverage for item labels and descriptions also varies, where core items—e.g., evolution ( Q1063 ) —will be in hundreds of languages, whereas items towards the edge of the network—e.g., evolvability ( Q909622 ) —may be in only a few languages or even just one.
  • (D) Referencing incompleteness . For example, at time of writing: The fact that the SARS–CoV–2 NSP9 complex ( Q89792653 ) is found in SARS–CoV–2 is referenced (to the EBI complex portal), but that it contains 2 NSP9 subunits isn’t referenced.
  • (E) Classification and description disparity . For example, at time of writing, principal component analysis ( Q2873 ) is listed as a subclass of multivariate statistics, used for dimensionality reduction, but factor analysis ( Q726474 ) is listed as a subclass of statistical method, used for looking for latent variables. Some inconsistency also stems from the fundamental lack of a single universal classification (“how many countries exist?” being a classic example).
  • (F) External databases and controlled vocabularies . Mapping to external databases can vary, since some are proprietary or have other licensing issues, while others are simply incomplete. For example, there is currently minimal mapping over to Research Resource Identifiers, although a dedicated property for them exists— Research Resource Identifier ( P9712 ) .

The practical upshot of this affects both contributing and using data. For adding new entities and new links between them, there is plenty to be done and huge scope for contribution. For using the data, it requires initial checking to ensure that it’s being stored in the structure that you’d expect (e.g., is the item listed as an instance of a gene, or instance of a protein, and for which species?) and that there aren’t obvious false negatives. Despite this, it is surprising how powerful Wikidata already is even in these early days, supporting COVID dashboards [ 7 ], a literature search engine [ 21 ], and genome browsers for several organisms [ 35 ].

Acknowledgments

The authors extend acknowledgement to the wide and varied community of developers and contributors who have made and populate the Wikidata platform.

  • View Article
  • PubMed/NCBI
  • Google Scholar
  • 11. Zárate M, Buckle C. LOBD: linked data dashboard for marine biodiversity. Springer; 2021. p. 151–164.
  • 15. Erxleben F, Günther M, Krötzsch M, Mendez J, Vrandečić D. Introducing Wikidata to the Linked Data Web. In: Mika P, Tudorache T, Bernstein A, Welty C, Knoblock C, Vrandečić D, et al., editors. The Semantic Web–ISWC 2014. Cham: Springer International Publishing; 2014. p. 50–65. https://doi.org/10.1007/978–3–319–11964–9_4
  • 16. Malyshev S, Krötzsch M, González L, Gonsior J, Bielefeldt A. Getting the Most Out of Wikidata: Semantic Technology Usage in Wikipedia’s Knowledge Graph. In: Vrandečić D, Bontcheva K, Suárez–Figueroa MC, Presutti V, Celino I, Sabou M, et al., editors. The Semantic Web–ISWC 2018. Cham: Springer International Publishing; 2018. p. 376–394. https://doi.org/10.1007/978–3–030–00668–6_23
  • 19. Rossenova L. Examining Wikidata and Wikibase in the context of research data management applications. In: TIB–Blog [Internet]. 16 Mar 2022 [cited 2023 Apr 21]. Available from: https://blogs.tib.eu/wp/tib/2022/03/16/examining–wikidata–and–wikibase–in–the–context–of–research–data–management–applications/ .
  • 21. Nielsen FÅ, Mietchen D, Willighagen E. Scholia, Scientometrics and Wikidata. In: Blomqvist E, Hose K, Paulheim H, Ławrynowicz A, Ciravegna F, Hartig O, editors. The Semantic Web: ESWC 2017 Satellite Events. Cham: Springer International Publishing; 2017. p. 237–259.
  • 24. Delpeuch A. Running a reconciliation service for Wikidata. Proceedings of the 1st Wikidata Workshop. 2020. 17. Available from: http://ceur-ws.org/Vol-2773/paper-17.pdf .
  • 25. Aycock M, Critchley N, Scott A. Gateway into Linked Data: Breaking Silos with Wikidata. Texas Conference on Digital Libraries. 2021. Available from: https://digital.library.txstate.edu/handle/10877/13529 .
  • 28. Thornton K, Solbrig H, Stupp GS, Labra Gayo JE, Mietchen D, Prud’hommeaux E, et al. Using Shape Expressions (ShEx) to Share RDF Data Models and to Guide Curation with Rigorous Validation. In: Hitzler P, Fernández M, Janowicz K, Zaveri A, Gray AJG, Lopez V, et al., editors. The Semantic Web. Cham: Springer International Publishing; 2019. p. 606–620. https://doi.org/10.1007/978–3–030–21348–0_39
  • 30. Piscopo A, Simperl E. What we talk about when we talk about wikidata quality: a literature survey. Proceedings of the 15th International Symposium on Open Collaboration. New York, NY, USA: Association for Computing Machinery; 2019. p. 1–11. https://doi.org/10.1145/3306446.3340822
  • 32. User:ProteinBoxBot–Wikidata. [cited 2022 Oct 5]. Available from: https://www.wikidata.org/wiki/User:ProteinBoxBot .
  • 33. Lubiana T. Cell Type Query Book. 2022. Available from: https://github.com/lubianat/cell_type_query_book .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PLoS Comput Biol
  • v.19(7); 2023 Jul
  • PMC10358883

Logo of ploscomp

Ten quick tips for editing Wikidata

Thomas shafee.

1 Swinburne University of Technology, Melbourne, Australia

Daniel Mietchen

2 Ronin Institute, Montclair, New Jersey, United States of America

3 Institute for Globally Distributed Open Research and Education (IGDORE), Gothenburg, Sweden

4 Leibniz Institute for Freshwater Ecology and Inland Fisheries (IGB), Berlin, Germany

5 FIZ Karlsruhe–Leibniz Institute for Information Infrastructure, Berlin, Germany

Tiago Lubiana

6 University of São Paulo, São Paulo, Brazil

Dariusz Jemielniak

7 Kozminski University, Warsaw, Poland

Andra Waagmeester

8 Micelio, Ekeren, Belgium

This is a PLOS Computational Biology Software paper.

Introduction

This article acts as a successor to the 10 simple rules for editing Wikipedia from a decade ago [ 1 ]. It addresses Wikipedia’s machine-readable cousin: Wikidata—a project potentially even more relevant from the point of view of Computational Biology.

Wikidata is a free collaborative knowledgebase [ 2 ] providing structured data to every Wikipedia page and beyond. It relies on the same peer production principle as Wikipedia: anyone can contribute. Open, collaborative models often surprise in how productively they work in practice, given how unlikely they might be expected to work in theory. Nevertheless, they can still be met with a lot of resistance and suspicion in academic circles [ 3 , 4 ].

Since its launch in 2012, Wikidata has rapidly grown into a cross-disciplinary open knowledgebase with items ranging from genes to cell types to researchers [ 2 , 5 – 7 ]. It has wide-ranging applications, such as validating statistical information about disease outbreaks [ 8 ], aligning resources on human coronaviruses [ 9 ], or assessing biodiversity [ 10 , 11 ]. It can be thought of as a vast network graph ( Fig 1A ), wherein the items act as nodes (now over 100 million) linked to one another by over a billion statements, and further linked out to the wider web by many billions more. We’ll link to example Wikidata items and properties by using italics throughout the text as we refer to them ( Fig 1 ).

An external file that holds a picture, illustration, etc.
Object name is pcbi.1011235.g001.jpg

Wikidata items are linked to one another and to outside databases via properties that describe the relationships between them. ( A ) Some example links to and from the item human retinoic acid receptor alpha ( Q254943 ) . Items can have outgoing links; e.g., to the concept of a protein ( Q8054 ) , incoming links; e.g. from the human RAR–SRC1 complex ( Q107514806 ) , or both; e.g., to and from the human RARA gene ( Q18031040 ) . There can be multiple links out with the same property (e.g., multiple molecular functions) and links out to external websites and identifiers; e.g., it has the MeSH ID ( P486 ) of D011506. The links formed by properties can be further annotated with qualifiers; e.g., its physical interaction with ( P129 ) tretinoin ( {"type":"entrez-protein","attrs":{"text":"Q29417","term_id":"130925"}} Q29417 ) is with the role of ( P2868 ) being an agonist ( Q389934 ) . Now imagine this for a hundred million node items and many billions of property edges. ( B ) The human–readable interface for this item is organised into the label, description, and aliases, followed by a list of statements with their qualifications and references, with a final section listing any Wikipedia (and other wikimedia) pages for the item. ( C ) Example labels, descriptions, and aliases for virus ( Q808 ) from the 410 currently supported languages. These screenshots contain only text and data released under a CC0 licence .

The online interface makes the items themselves somewhat human-readable ( Fig 1B ), but their structured nature makes it possible to query and combine the information in ways that can’t be achieved for information sources written entirely in prose. This versatility makes its applications in computational biology, arguably, even more universal and flexible than just relying on Wikipedia alone [ 12 ]. Queries on Wikidata can vary from which gene variants predict a positive prognosis in colorectal cancer to taxa by number of streets in the Netherlands that bear their name . We’ll try to use examples relevant to computational biology, but bear in mind that the same can be done with almost everything from a map of mediaeval witch executions in Scotland to emergency phone numbers by population using them to paintings depicting frogs .

Since it’s under a CC0 copyright waiver, Wikidata’s structured content is essentially released into the public domain to be used on other projects [ 13 ]. You’ll probably have already seen its structured data at the top of search engine results but it’s also used behind the scenes on thousands of sites, becoming the backbone infrastructure for using, sharing, and collaboratively curating structured reference knowledge.

Tip 1: Learn by doing

If you’re thinking of editing Wikidata, you can start right away, perhaps by exploring and experimenting with one of its sandbox items like Q4115189 , or by taking some of the introductory tours . While it is possible to edit without an account, it is best to register one. Wikidata uses the same user account as Wikipedia or Wikimedia Commons. This enables you to build a reputation within the editor community as you contribute, makes it easier for other editors to contact and collaborate with you, and will enable you to use some additional tools (see Tip 9). Paradoxically, it can also protect your anonymity better: you edit under a username of your choice instead of your edits being tagged with your IP address. Once you’ve created your account, it’s useful to click on your username in the top right of the screen to add some basic information to your userpage—particularly your topics of interest and your areas of expertise. It is increasingly common, although not required, for researchers on Wikidata to also link out to their real-world identity (faculty profile, professional social media, personal website, etc.) or simply to the Wikidata entry about them.

Whereas Wikipedia strictly prohibits editing a page about yourself (if you have one), in Wikidata, it is acceptable to add uncontroversial statements to the Wikidata item about you if you can reference them to publicly available sources (see Tip 7). It can therefore be useful to search for yourself in Wikidata and add statements, for example, your ORCID ( P496 ) , Github account ( P2037 ) , or Wikimedia username ( P4174 ) . Also note that while it is technically possible to add phone numbers or email addresses, be extremely cautious about adding any information—to any item—that may violate privacy (the policy about living people provides guidance here).

Tip 2: Think of knowledge as structured statements

Information in Wikidata is organised into statements. A basic statement is a triple containing a subject, a predicate, and an object. Although the subject of a statement is always a Wikidata item, the object can be either another Wikidata entity or another data type such as strings, URLs, quantities, or external identifiers. For example, Human retinoic acid receptor alpha ( Q254943 ) has the molecular function ( P680 ) of retinoic acid binding ( Q14901431 ) ( Fig 1 ). The identifiers beginning with Q are items and indicate objects, concepts, or events. Identifiers beginning with P are the properties that define relationships.

This model of statements is common to linked data repositories aligned to the Semantic Web [ 14 – 16 ], and Wikidata extends it with qualifiers and references that enable capturing specific detail and provenance (see Tip 7). For example, the statement Retinoic acid receptor alpha ( Q254943 ) physically interacts with ( P129 ) tretinoin ( {"type":"entrez-protein","attrs":{"text":"Q29417","term_id":"130925"}} Q29417 ) , with the role ( P2868 ) of agonist ( Q389934 ) cites as a reference that it is stated in ( P248 ) the IUPHAR/BPS database ( Q17091219 ) .

Besides Ps and Qs, some other identifiers with a leading letter are important in the Wikidata ecosystem. For example, identifiers starting with Ls are for lexemes that indicate linguistic properties of a word or phrase, e.g., the Swedish noun “modell” ( {"type":"entrez-nucleotide","attrs":{"text":"L47542","term_id":"995951","term_text":"L47542"}} L47542 ) has multiple meanings, only one of which is a simplified representation of reality ( Q1979154 ) . Similarly, Wikidata identifiers starting with E are for entity schemas, which are particularly useful for defining and validating items (see Tip 9).

Wikidata is based on the knowledge graph management software Wikibase . Since the software is open-source, it is also used in a range of other specialist applications to host data as structured statements. Learning this way of thinking about information therefore enables participation beyond Wikidata. The main other example within the Wikimedia ecosystem is annotation of the Wikimedia Commons media-sharing platform. It is also being implemented in projects outside of Wikimedia that range from ontologies for botanical collections [ 17 ], a semantic map of the trade of enslaved people [ 18 ], or general research data management applications [ 19 ].

Tip 3: Take a look at what’s already there

The main reason to have data in a multidisciplinary knowledgebase is to be able to extract and combine it in interesting ways. It is possible to search for and view items individually via the user interface on the web or browse geographically nearby items , but a more powerful counterpart to this is to explore the data using database queries. Wikidata can be queried using the SPARQL language via tools such as the Wikidata Query Service . It is worth noting that queries are organised around semantic concepts rather than simple keyword text strings, so searching for “ diseases associated with human pancreatic beta cells via markers ” is essentially asking “find items listed as gene markers of human beta cells; for those genes, find diseases associated with them; count how often each gene occurs.”

The Wikidata Query Service also has several inbuilt lightweight visualisation options. The simplest is probably scatterplots of categorical ( Fig 2A ) and continuous ( Fig 2B ) data. For geographical data, it is possible to overlay coordinates over a map ( Fig 2C ). In the self-referential tradition of the Ten Simple Rules series [ 20 ], looking at the subset of the Wikidata network showing co-occurrence of main subjects in the “Simple Rules” and “Quick Tips” articles series illustrates the main clusters around the themes of career, learning, and software ( Fig 2D ). These visualisations are usually best viewed interactively, so links to the SPARQL queries are included in the figure legend. For those not experienced with SQL queries, you can request help for building queries (see Tip 9).

An external file that holds a picture, illustration, etc.
Object name is pcbi.1011235.g002.jpg

( A ) Which drugs target genes involved in cell proliferation . ( B ) Litter size versus lifespan for endangered species . ( C ) Birthplaces of people after whom taxa are named . ( D ) Co–occurance of main topics of PLOS “Simple Rules” and “Quick Tips” articles (cropped to subset around learning and career). The map image is produced using base map and data from OpenStreetMap and the OpenStreetMap Foundation .

Some of these example queries are highly specialist, whereas others take advantage of Wikidata’s interdisciplinarity to show things that are difficult to answer with specialist databases, such as combining biological, historical, and geographic data to illustrate a sociological phenomenon of taxon naming bias ( Fig 2C ).

Some sites, such as Scholia (an open-source alternative to the likes of Google Scholar), are built entirely from Wikidata query visualisations [ 21 ], for example, this summary of publications about human astrocytes . The Wikidata Query Service also provides a code section from which snippets can be copied into different programming tools for more in-depth analysis and visualisation. Many of those tools are also able to dynamically read from and write to Wikidata so that items can be kept dynamically up-to-date and integrated with other programming pipelines (see Tip 9).

Tip 4: Join a community

Wikidata’s community portals are available in multiple languages ( Box 1 ) and have broad introductory help and training; however, the majority of its help comes from its community of users, who can be contacted individually or via several rapid-response locations like general project chat (discussions are archived after about a week). There is also a request-a-query page to ask for assistance in creating or refining SPARQL queries. Wikiprojects are self-organised communities of practice consisting of volunteers and their bots, focused on items in a specific topic, their structure, and typology, and are typically highly communicative and supportive to each other [ 22 ]. Some have relatively broad scope (e.g., Molecular Biology , Taxonomy , Medicine , Source MetaData ) and others are more narrow (e.g., COVID-19 , Haplogroups )—see the full directory here . These projects have discussion pages as well as exemplar items that can help you align newly added content with best practice (see Tip 6). In case of more serious problems or dispute resolution, consulting with current admins may be useful, too.

Box 1. Community portal languages

Afrikaans , Bahasa Indonesia , Bahasa Melayu , Basa Bali , British English , Bân–lâm–gú , Chi–Chewa , Cymraeg , Deutsch , English , Esperanto , Frysk , Gaeilge , Gagana Samoa , Ghanaian Pidgin , Hausa , Ido , Igbo , Ilokano , Jawa , Kreyòl ayisyen , Lingua Franca Nova , Lëtzebuergesch , Malagasy , Minangkabau , Mirandés , Māori , Nederlands , Ripoarisch , Scots , Sesotho sa Leboa , Simple English , Sunda , Tiếng Việt , Tyap , Türkçe , Volapük , Zazaki , asturianu , azərbaycanca , bosanski , brezhoneg , català , dansk , dolnoserbski , eesti , emiliàn e rumagnòl , español , euskara , eʋegbe , français , føroyskt , galego , hornjoserbsce , hrvatski , interlingua , isiXhosa , italiano , kurdî , latviešu , lietuvių , magyar , norsk bokmål , occitan , polski , português , português do Brasil , română , shqip , slovenčina , suomi , svenska , tarandíne , tatarça , toki pona , vèneto , íslenska , čeština , ślůnski , Ελληνικά , башҡортса , беларуская , беларуская (тарашкевіца) , български , македонски , монгол , русский , српски / srpski , татарча / tatarça , тоҷикӣ , українська , қазақша , հայերեն , יי ִ דיש , עברית , ئۇيغۇرچە , ئۇيغۇرچە / Uyghurche , اردو , العربية , بهاس ملايو , تۆرکجه , سرائیکی , سنڌي , فارسی , مصرى , پښتو , ߒߞߏ , अंगिका , अवधी , भोजपुरी , मगही , मराठी , हिन्दी , অসমীয়া , বাংলা , ਪੰਜਾਬੀ , ગુજરાતી , தமிழ் , తెలుగు , ಕನ್ನಡ , മലയാളം , සිංහල , ไทย , ဖၠုံလိက် , ဘာသာ မန် , မြန်မာဘာသာ , ქართული , ትግርኛ , አማርኛ , ᐃᓄᒃᑎᑐᑦ / inuktitut , ភាសាខ្មែរ , 中文 , 吴语 , 日本語 , 粵語 , 閩南語 , ꯃꯤꯇꯩ ꯂꯣꯟ , 한국어

Additionally, it is possible to join Wikidata communities outside the Wikidata platform. Wikidata’s contributors are generally keen to help and collaborate with anyone interested in the platform, so consider also reaching out to researchers who’ve used Wikidata . You can join in-person events organised by the Wikimedia affiliates in most regions. Since multiple groups might exist for any given topic, so once you have found one that resonates with your interests, keep an eye out for others exploring other facets of the same topic. Finally, there are active Wikidata communities on social media platforms such as Mastodon , Telegram , Twitter, or Facebook .

Tip 5: Improve existing data

The easiest first edit to make is to add a new statement to an existing item. Just use the button and Wikidata will attempt to autocomplete and suggest potential properties and items as you type. A good way to get started with editing is to check out the external identifiers section on an item’s page and perhaps add some missing identifiers for the concept from a database you are familiar with. So for example, if you are on an item about a taxon, you could check whether it correctly states the corresponding GBIF taxon ID ( P846 ) , NCBI taxonomy ID ( P685 ) , MycoBank taxon name ID ( P962 ) , IPNI plant ID ( P961 ) , WoRMS-ID for taxa ( P850 ) , etc. These sorts of links out to external identifiers make Wikidata a valuable tool for easily cross referencing items between different resources for each concept.

Another good way to get started is to explore items about research articles and review—and possibly add—statements for main subject ( P921 ) . A way of annotating such articles that is particularly unique to Wikidata is adding statements for describes a project that uses ( P4510 ) to add important tools, techniques, or materials that the article highlights in its methods section. You can introduce a lot of extra richness to a statement including qualifiers via ( Fig 1 ). The web interface can be customised with a range of extra tools and gadgets via your preferences to align its capabilities to what is most useful to you.

You can also edit an item’s short description using the button at the top. Even though these aren’t machine-readable, the text is useful for humans to disambiguate between items at a glance (for example, the word “translation” might indicate “the creation of proteins using information from nucleic acids” or “a function that moves every point a constant distance in a specified direction in euclidean geometry” or “transfer of meaning from one language into another”).

Tip 6: Be bold, but not reckless

Like editing Wikipedia [ 1 ], the apparent complexity of Wikidata can make getting started seem intimidating. The trick is to start small. Try looking up Wikidata items on some key papers in your field of research (or this list of PLOS Comp Biol articles) and see if you can add its keywords as main subject ( P921 ) or its methods as describes a project that uses ( P4510 ) . Such annotation can get pretty detailed and granular as you can see in this example .

To work out how to best model new data you want to integrate, you can check out the showcases that many Wikiprojects maintain (see Tip 4) to see how similar item types should be organised for consistency. If your planned additions extend on current examples, involving those experienced contributor communities in the data modelling decisions can ensure that new content is modelled consistently with existing statements.

Remember, you can easily revert edits if you’ve made a mistake—go to the history tab at the top and click “undo.” If doing mass edits or additions (see Tip 9), remember to validate the updated data to make sure you’ve made the changes you intended to [ 8 , 23 ].

Tip 7: Add references (cite, cite, cite)

Just like in Wikipedia, Wikidata is primarily a secondary resource and acts as a hub or proxy to other resources, ideally in a way that facilitates verifiability. All statements should therefore, whenever possible, cite their provenance to existing knowledge in other external reliable sources. These are added via the button. To cite research articles, books, and other common reference types, you can reference their Wikidata QID ( Fig 3A ). If the source you want to use as a reference doesn’t have a Wikidata item yet, you can add it using tools such as Scholia. It is also possible to reference entries in external databases ( Fig 3B ) or webpages ( Fig 3C ). For sources that might change over time like databases and webpages—it is best to include the date retrieved or even an archived URL. Lastly, especially when a concrete reference isn’t possible, it is useful to provide the heuristic used ( Fig 3D ; list ). It’s worth including citations for even seemingly trivial statements if a reference is available, for example, the statement that an intron ( Q207551 ) is part of ( P361 ) a primary transcript ( Q7243183 ) references 2 papers ( Fig 3A ).

An external file that holds a picture, illustration, etc.
Object name is pcbi.1011235.g003.jpg

Examples of ( A ) referencing to Wikidata items for journal review articles, ( B ) referencing to a database entry, ( C ) referencing to a website, or ( D ) using a heuristic estimate to justify a statement. These screenshots contain only text and data released under a CC0 licence .

Tip 8: Create new entities

Don’t be afraid to create new items. In general, each item should describe a single concept. For example, there are separate items for the ɑ-defensin protein domain ( Q4063641 ) , ɑ-defensin propeptide domain ( Q24727071 ) , ɑ-defensin gene family ( Q81639709 ) , ɑ-defensin 1 mouse gene ( Q18248700 ) , ɑ-defensin 1 mouse protein ( Q21421153 ) , etc.

It is trivially simple to create a new item: the “create new item” link on the left will allow you to define an item, assign a short description, and add any aliases that it might also be known by. Newly created items always need to be given an instance of ( P31 ) or subclass of ( P279 ) statement to link it into the wider knowledgebase, but otherwise there are no compulsory fields. An easy way to identify additional statements to add is by checking items of a similar type. The interface will also attempt to suggest potential properties as you add statements ( Fig 4 ). Although it’s best to avoid duplicates, merging items later is easy if it turns out there’s more than one for the same thing. You can also use Cradle where you can populate new items via a lightweight form which prompts you to include the most common fields.

An external file that holds a picture, illustration, etc.
Object name is pcbi.1011235.g004.jpg

Once you start to add statements to an item (especially instance of/subclass of), the interface will begin to suggest common properties to add that other similar items include. Depicted are the suggestions given for a protein domain. For some properties, it will then also suggest common values for that statement. This screenshot contains only text and data released under a CC0 licence .

While all scientific concepts fit on Wikidata in principle, there are notability guidelines that advise on which things should or should not have items. For example, valid taxa, type specimens, or reference genomes are essentially automatically notable. In contrast, not all humans are sufficiently notable, though researchers who have published peer-reviewed articles usually are.

Proposing new properties that can be used to link items is trickier. Compared to the >100M items, there are only 8K properties, so these have more a role of a controlled vocabulary. To propose a new property, simply list it and some example use cases at Wikidata:Property_proposal and experienced contributors will check if it makes sense to implement as proposed or with some changes or whether an already existing property can be adapted.

Tip 9: Edit information in bulk

Once you’ve learnt how to add single statements and create single items, you’ll likely want to scale this up to edit information in bulk. Databases with a CC0 are becoming more common and can be integrated into Wikidata in full (e.g., CIViC, Wikipathways, Disease Ontology, and the Evidence and Conclusion Ontology). Other datasets (e.g., Uniprot, CC BY 4.0 licence) can still be integrated by linking out to them via external identifiers ( example ) or have their data integrated as a statement with proper referencing to attribute it ( example ).

When getting into larger scale editing, it is generally best to scale up test sets to identify any issues that come up—do a batch of 10 or a hundred edits before trying a thousand or a million. There are a range of ways to achieve this. There are Wikidata Tools available that cover a range of common situations. Editing tools can generally only be used after a minimum number of manual edits (typically 50) or a minimum age of the account (typically 4 days).

OpenRefine and Ontotext Refine take a spreadsheet of statements to be added and reconcile text strings in that spreadsheet to their most likely Wikidata items, flagging required manual intervention for ambiguous matches [ 24 ]. Ontotext Refine also contains an “RDF mapper,” which can help integrate Wikidata into external databases by generating a separate RDF that uses Wikidata’s identifiers but can be used outside of Wikidata. Quickstatements is a similar Wikidata editing tool, though it does not include the reconciliation functions so you’ll need to know any Wikidata QIDs to be included in statements beforehand [ 25 ]. Libraries are available in a range of languages ( Table 1 ) to interface with Wikidata via its dynamic API and the query service. For example, the Wikidata integrator library can update items based on external resources and then confirm data consistency via SPARQL queries. It is used by multiple python bots to keep biology topics up to date, such as genes, diseases, and drugs ( ProteinBoxBot ) [ 14 ], or cell lines ( CellosaurusBot ) [ 26 ].

Since Wikidata is expressed as RDF, it comes with an EntitySchema extension [ 27 ] that enables describing the schema of captured knowledge as Shape Expressions (ShEx)—a formal language to describe data on the Semantic Web [ 28 ]. EntitySchema have been created for a range of item classes ( list ), for example, the Protein Reactome Schema ( E39 ) or clinical trial schema ( E189 ) . They act as documentation for the data deposited by data donors, but they also act as a document to describe expectations by users [ 8 , 28 ].

Tip 10: Mind the gaps: What data is currently missing?

Wikidata is a secondary source for data and so, though it is rapidly growing, it will never be complete. This means that some level of inconsistency and incompleteness in its contents is currently inevitable [ 8 , 29 – 31 ]. There is thorough coverage of some items, such as protein classes [ 32 ], human genes [ 14 ], cell types [ 26 , 33 ], and metabolic pathways [ 34 ]. However, this is not true across all topics, and inconsistencies fall into a few categories ( Box 2 ).

Box 2. Main classes of data inconsistency in Wikidata

  • (A) Item incompleteness. Since Wikidata is still in an exponential growth phase, it can be difficult to predict which topics will have already been well developed by the community and which are not yet well covered or linked out to external databases. For example, at time of writing, many common bioinformatics techniques and equipment types are currently missing. The opposite issue can also come up—duplicate items—which are resolvable via the merge function.
  • (B) Statement incompleteness. The issue of incompleteness can affect any part of Wikidata’s data model. For example, for many items about people, there are no statements about their date or place of birth. In cases where multiple statements are common for a given property on a given item (e.g., someone’s employer), some or all of them might be missing or outdated. Present statements can also sometimes be inconsistent—e.g., an occupation statement might include values that aren’t occupations, but rather a field of work, a produced good, or a genre.
  • (C) Language incompleteness . The language coverage for item labels and descriptions also varies, where core items—e.g., evolution ( Q1063 ) —will be in hundreds of languages, whereas items towards the edge of the network—e.g., evolvability ( Q909622 ) —may be in only a few languages or even just one.
  • (D) Referencing incompleteness . For example, at time of writing: The fact that the SARS–CoV–2 NSP9 complex ( Q89792653 ) is found in SARS–CoV–2 is referenced (to the EBI complex portal), but that it contains 2 NSP9 subunits isn’t referenced.
  • (E) Classification and description disparity . For example, at time of writing, principal component analysis ( Q2873 ) is listed as a subclass of multivariate statistics, used for dimensionality reduction, but factor analysis ( Q726474 ) is listed as a subclass of statistical method, used for looking for latent variables. Some inconsistency also stems from the fundamental lack of a single universal classification (“how many countries exist?” being a classic example).
  • (F) External databases and controlled vocabularies . Mapping to external databases can vary, since some are proprietary or have other licensing issues, while others are simply incomplete. For example, there is currently minimal mapping over to Research Resource Identifiers, although a dedicated property for them exists— Research Resource Identifier ( P9712 ) .

The practical upshot of this affects both contributing and using data. For adding new entities and new links between them, there is plenty to be done and huge scope for contribution. For using the data, it requires initial checking to ensure that it’s being stored in the structure that you’d expect (e.g., is the item listed as an instance of a gene, or instance of a protein, and for which species?) and that there aren’t obvious false negatives. Despite this, it is surprising how powerful Wikidata already is even in these early days, supporting COVID dashboards [ 7 ], a literature search engine [ 21 ], and genome browsers for several organisms [ 35 ].

Acknowledgments

The authors extend acknowledgement to the wide and varied community of developers and contributors who have made and populate the Wikidata platform.

Funding Statement

T.L. is supported by FAPESP grant #19/26284-1 (São Paulo Research Foundation). This funder played no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Systematic literature review of sentiment analysis on Twitter using soft computing techniques (Q62031754)

Identifiers, wikipedia (0 entries), wikibooks (0 entries), wikinews (0 entries), wikiquote (0 entries), wikisource (0 entries), wikiversity (0 entries), wikivoyage (0 entries), wiktionary (0 entries), multilingual sites (0 entries).

a systematic literature review on wikidata

Navigation menu

IMAGES

  1. 10 Steps to Write a Systematic Literature Review Paper in 2023

    a systematic literature review on wikidata

  2. How to Conduct a Systematic Review

    a systematic literature review on wikidata

  3. Systematic reviews

    a systematic literature review on wikidata

  4. Systematic Literature Review Methodology

    a systematic literature review on wikidata

  5. How to Write A Systematic Literature Review?

    a systematic literature review on wikidata

  6. Systematic Review Appraisal Training Course

    a systematic literature review on wikidata

VIDEO

  1. Systematic Literature Review and using basic PRISMA

  2. Systematic Literature Review, by Prof. Ranjit Singh, IIIT Allahabad

  3. Systematic Literature Review Paper

  4. Systematic Literature Review Paper presentation

  5. Systematic Literature Review Part2 March 20, 2023 Joseph Ntayi

  6. Introduction Systematic Literature Review-Various frameworks Bibliometric Analysis

COMMENTS

  1. A systematic literature review on Wikidata

    Design/methodology/approach. A systematic literature review is conducted to identify and review how Wikidata is being dealt with in academic research articles and the applications that are proposed. A rigorous and systematic process is implemented, aiming not only to summarize existing studies and research on the topic, but also to include an ...

  2. Much more than a mere technology: A systematic review of Wikidata in

    The systematic review of Wikidata undertaken in this study is based on the LIS literature published up until 2019. The methodology used for this systematic review aligns with the broad steps outlined by PRISMA (Moher, et al., 2009). These steps include defining research questions, conducting a literature search for identifying, screening, and ...

  3. A systematic literature review on Wikidata

    The results collect and summarize existing Wikidata research articles published in the major international journals and conferences, delivering a meticulous summary of all the available empirical research on the topic which is representative of the state of the art at this time, complemented by a discussion of identified gaps and future work. Purpose The purpose of this paper is to review the ...

  4. PDF A systematic literature review on Wikidata

    A systematic literature review is conducted to identify and review how Wikidata is being dealt with in academic research articles and the applications that are proposed. A rigorous and systematic process is implemented, aiming not ... Keywords: Wikidata, survey, literature review, empirical studies, applications. 1. Introduction

  5. A systematic literature review on Wikidata

    Similar conclusions are to be found in another systematic review of the Wikidata-related literature conducted by Mora-Cantallops et al. (Mora-Cantallops et al., 2019). It appears, then, that the ...

  6. Much more than a mere technology: A systematic review of Wikidata in

    Wikidata is gaining popularity in libraries as an open and collaborative global platform for sharing and exchanging library metadata. Based on a systematic review of the Library and Information ...

  7. A systematic literature review on Wikidata

    To review the current status of research on Wikidata and, in particular, of articles that either describe applications of Wikidata or provide empirical evidence, in order to uncover the topics of interest, the fields that are benefiting from its applications and which researchers and institutions are leading the work. en: dc.format.mimetype

  8. A systematic review of Wikidata in Digital Humanities projects

    This article aims to fill this research gap in the literature by conducting a systematic review of Wikidata usage in DH-related initiatives to better serve the DH communities' use of Wikidata. The second section presents the four research questions proposed to uncover Wikidata's definition, applications, possibilities, and challenges in the ...

  9. PDF Wikidata from a Research Perspective

    eral overview of the research performed on Wikidata through a systematic mapping study in order to identify the current topical coverage of existing research as well as the white spots which need ... pre-study of a systematic literature review. arXiv:1908.11153v2 [cs.DL] 18 Nov 2019, , Farda-Sarbas and Müller-Birn

  10. Automatic Quality Assessment of Wikipedia Articles—A Systematic

    Our systematic literature review included a total of 149studies. 3.3 Data Collection. Throughout every phase of the selection process, we systematically logged all the data we collected and produced. Initially, we store the title of every record we gathered during the first Identification phase and assign them a numeric identifier. We also ...

  11. A systematic literature review on Wikidata

    A systematic literature review on Wikidata (Q66724305) From Wikidata. Jump to navigation Jump to search. scientific article published on 20 August 2019. edit. Language Label Description Also known as; English: A systematic literature review on Wikidata. scientific article published on 20 August 2019. Statements. instance of.

  12. A systematic review of Wikidata in Digital Humanities projects

    A systematic review was conducted to identify and evaluate. how DH projects perceive and utilize Wikidata, as well as its potential and challenges as demonstrated through use. This research ...

  13. How to Critically Utilise Wikidata

    Q4: What are the challenges and possible solutions associated with Wikidata in DH projects? To answer the questions, a systematic literature review of DH projects that adopted Wikidata has been conducted based on the guidelines for Systematic Review proposed by Kitchenham (2004).

  14. A systematic literature review on Wikidata

    A rigorous and systematic process is implemented, aiming not only to summarize existing studies and research on the topic, but also to include an element of analytical criticism and a perspective on gaps and future research.</jats:sec><jats:sec><jats:title content-type="abstract-subheading">FindingsDespite Wikidata's potential and the notable ...

  15. Linked data for libraries: Creating a global knowledge space, a

    ARL white paper on Wikidata highlights use of open knowledge in scholarly communication, ... opportunities and challenges in libraries, a systematic literature review. Coll Res Libr 2021; 82(3): 410-435. Crossref. Google Scholar. 39. Virkus S, Garoufallou E. Data science and its relationship to library and information science: a content analysis.

  16. Systematic reviews: Structure, form and content

    A systematic review collects secondary data, and is a synthesis of all available, relevant evidence which brings together all existing primary studies for review (Cochrane 2016). A systematic review differs from other types of literature review in several major ways.

  17. Ten quick tips for editing Wikidata

    Examples of (A) referencing to Wikidata items for journal review articles, (B) referencing to a database entry, (C) referencing to a website, or ... García-Barriocanal E. A systematic literature review on Wikidata. Data Technol Appl. 2019;53:250-268. View Article Google Scholar 6. ...

  18. Guidance on Conducting a Systematic Literature Review

    Literature reviews establish the foundation of academic inquires. However, in the planning field, we lack rigorous systematic reviews. In this article, through a systematic search on the methodology of literature review, we categorize a typology of literature reviews, discuss steps in conducting a systematic literature review, and provide suggestions on how to enhance rigor in literature ...

  19. systematic review

    systematic literature review; edit. Language Label Description Also known as; English: systematic review. type of review publication that uses repeatable analytical methods to collect secondary data and analyse it. systematic reviews; ... About Wikidata; Disclaimers; Code of Conduct; Developers;

  20. Descriptors of Wikidata cited in the LIS literature

    Wikidata is gaining popularity in libraries as an open and collaborative global platform for sharing and exchanging library metadata. Based on a systematic review of the Library and Information ...

  21. Systematic reviews: Structure, form and content

    Topic selection and planning. In recent years, there has been an explosion in the number of systematic reviews conducted and published (Chalmers & Fox 2016, Fontelo & Liu 2018, Page et al 2015) - although a systematic review may be an inappropriate or unnecessary research methodology for answering many research questions.Systematic reviews can be inadvisable for a variety of reasons.

  22. Ten quick tips for editing Wikidata

    A systematic literature review on Wikidata. Data Technol Appl. 2019; 53:250-268. doi: 10.1108/DTA-12-2018-0110 [Google Scholar] 6. Turki H, Shafee T, Hadj Taieb MA, Ben Aouicha M, Vrandečić D, Das D, et al. Wikidata: A large-scale collaborative ontological medical database. J Biomed ...

  23. Systematic literature review of sentiment analysis on ...

    From Wikidata. Jump to navigation Jump to search. scientific article published on 17 January 2019. edit. Language Label Description Also known as; English: Systematic literature review of sentiment analysis on Twitter using soft computing techniques. scientific article published on 17 January 2019. Statements. instance of. scholarly article. 0 ...