Within a few years, Wikidata has developed into a central knowledge base for structured data through the collaborative efforts of Wikidata’s peer production community. One of the benefits of peer production is that knowledge is curated and maintained by a wide range of editors, with different cultural, experience and educational backgrounds, which hopefully results in potentially fewer biases and content-wise in a more diverse knowledge base.
Ensuring data quality is, thus, of utmost importance, as the goal of Wikidata is to “give more people more access to knowledge” and therefore, the data needs to be “fit for use by data consumers”. The Wikidata community has already developed methods and tools that monitor relative completeness (e.g., Recoin gadget ), encourage link validation and correction (e.g. Mix’N’Match ) and help editors observe recent changes and identify vandalism . Moreover, the community started global discussions about relevant dimensions of data quality in a recent RFC that used a survey of Linked Data Quality methods  as the debate’s starting point to better describe and categorize quality issues and add more quality aspects/ dimensions, with the goal of developing a data quality framework for Wikidata . These data quality dimensions are categorized in several dimensions, with intrinsic and contextual dimensions being the most crucial dimensions. Despite this progress, recent research has shown the dominant role of a Western perspective in the represented languages , thus, more work needs to be done to strive for more knowledge diversity. It is therefore a major concern of data quality, to support such knowledge diversity and ensure that Wikidata covers a wide variety of topics, from various trustworthy sources, where facts can be contradictory.
In this workshop, we would like to emphasize this perspective and discuss existing challenges and opportunities in the field of data quality monitoring and data quality assurance in the context of Wikidata. We would especially like to focus on Wikidata’s unique characteristics: its central role in a network of knowledge bases and other peer production projects (like Wikipedia), its ability to host plural statements and illustrate misinformation from Web information sources, its multilinguality, its community of humans and machines, as well as its dynamicity.
The workshop will give scientific researchers and community members the opportunity to discuss and present preliminary findings, ideas, opinions and demos.
TOPICS OF INTEREST
We are specifically interested in tackling data quality on Wikidata from three perspectives:
WHAT ARE SUITABLE DIMENSIONS AND MEASURES OF DATA QUALITY IN THE CONTEXT OF WIKIDATA THAT ARE NOT YET (FULLY) ADDRESSED?
The role of bots in data quality management
WHAT ARE SUITABLE AND NEEDED METHODS AND TOOLS FOR WIKIDATA’S DATA EDITORS TO EDIT AND MAINTAIN THE DATA?
Usage of property constraints in Wikidata
Approaches to identify vandalism
Ways to identify knowledge gaps (e.g. missing types of entities and statements that are relevant to Wikidata)
Approaches to identify deprecated values
WHAT ARE SUITABLE AND NEEDED METHODS AND TOOLS TO CONSUME HIGH QUALITY DATA FROM WIKIDATA?
Identification of trustful authorities
Identifying the usefulness of criteria that define trust (e.g.data freshness, data dynamicity, number of people involved in its curation)
Defining a new layer in the query service that ranks data according to these data quality criteria
Delpeuch Antonin."Quality assurance for Wikidata imports with OpenRefine".
Farda-Sarbas Mariam, Zhu Hong, Nest Marisa, and Müller-Birn Claudia."Analysis of Wikidata Bots".
Kleineidam Christian."Why Wikidata needs bulk edits".
Razniewski Simon."Completeness - complete mess?".
Rizza Ettore, and Hooland Seth van."Using external identifiers to improve Wikidata and its related datasets: state of play and future work".
Voß Jakob." Data modeling in Wikidata: Requirements for a Wikidata schema language".
Werkmeister Lucas, and Pintscher Lydia."Overview on Data Quality Tool on Wikidata".
Wisesa Avicenna, Darari Fariz, Razniewski Simon, and Nutt Werner."ProWD: A tool for profiling Wikidata completeness".
Kaffee Lucie-Aimée, University of Southampton (GB)
Krötzsch Markus, TU Dresden (DE)
Paulheim Heiko, University of Mannheim (DE)
Pintscher Lydia, Wikimedia Deutschland (DE)
Piscopo Alessandro, Southampton (GB)
Razniewski Simon, Max Planck Institute for Informatics (DE)
Stefan Heindorf, Paderborn University(DE)
Werkmeister Lucas, Wikimedia Deutschland (DE)
Werner Nutt, University of Bozen (IT)
Zaveri Amrapali, Maastricht University (NL)
08:30-09:00 Registration and Coffee
09:05-09:35 Opening Keynote by Amrapali Zaveri: "Open Data Quality: dimensions, metrics, assessment and improvement"
09:35-10:15 Participants introduction
10:15-10:30 Coffee Break
10:30-11:30 Sprint 1: What are the key challenges of pursuing data quality on Wikidata?
11:30-12:00 Group Representations
13:00-14:00 Sprint 2: How might we address the identified challenges?
14:00-14:30 Group Representations
14:30-14:45 Coffee Break
14:45-15:45 Sprint 3: How can we priorities our actions to address the challenges?
15:45-16:15 Group Presentations
16:15-16:45 Closing Keynote by Daniel Mietchen
16:45-17:00 Wrap up
R. Y. Wang and D. M. Strong, “Beyond Accuracy: What Data Quality Means to Data Consumers,” J Manage Inf Syst, vol. 12, no. 4, pp. 5–33, Mar. 1996.
V. Balaraman, S. Razniewski, and W. Nutt, “Recoin: Relative Completeness in Wikidata,” in Companion Proceedings of the The Web Conference 2018, Republic and Canton of Geneva, Switzerland, 2018, pp. 1787–1792.
Lydia Pintscher (2018). Data Quality in Wikidata. Wikimania 2018. https://commons.wikimedia.org/wiki/File:Wikimania_2018_-_data_quality_in_Wikidata_poster.pdf
A. Zaveri, A. Rula, A. Maurino, R. Pietrobon, J. Lehmann, and S. Auer, “Quality Assessment for Linked Open Data: A Survey,”.
L.-A. Kaffee and E. Simperl, “Analysis of Editors’ Languages in Wikidata,” in Proceedings of the 14th International Symposium on Open Collaboration, New York, NY, USA, 2018, pp. 21:1–21:5.