Publicamos un completo dataset de Wikipedia

InfluScience Wipedia

Acabamos de publicar un artículo en Quantitative Science Studies (QSS) en el que llevamos a cabo un doble trabajo con Wikipedia. Este trabajo se titula Wikinformetrics: Construction and description of an open Wikipedia knowledge graph dataset for informetric purposes.

Por un lado, tras revisar las características de Wikipedia, ofrecemos un marco conceptual para comprender mejor su funcionamiento, realizando una comparativa entre sus propiedades y las de un trabajo científico. Mientras que por otra parte, construimos un completo dataset usando la Wikipedia inglesa, incluyendo varias métricas de cada página, otras entidades, como los recursos referenciados o las categorías, y las relaciones entre todas ellas. Adicionalmente, exploramos este dataset para ilustrar acerca de sus posibilidades, ofreciendo varias visualizaciones de manera interactiva a través de Shiny app.

A continuación tienes el resumen:

«Wikipedia is one of the most visited websites in the world and is also a frequent subject of scientific research. However, the analytical possibilities of Wikipedia information have not yet been analyzed considering at the same time both a large volume of pages and attributes. The main objective of this work is to offer a methodological framework and an open knowledge graph for the informetric large-scale study of Wikipedia. Features of Wikipedia pages are compared with those of scientific publications to highlight the (di)similarities between the two types of documents. Based on this comparison, different analytical possibilities that Wikipedia and its various data sources offer are explored, ultimately offering a set of metrics meant to study Wikipedia from different analytical dimensions. In parallel, a complete dedicated dataset of the English Wikipedia was built (and shared) following a relational model. Finally, a descriptive case study is carried out on the English Wikipedia dataset to illustrate the analytical potential of the knowledge graph and its metrics.»

DESCARGAR ARTÍCULO | DESCARGAR DATASET

Arroyo-Machado, W., Torres-Salinas, D., & Costas, R. (2022). Wikinformetrics: Construction and description of an open Wikipedia knowledge graph dataset for informetric purposes. Quantitative Science Studies, 1-35. https://doi.org/10.1162/qss_a_00226

Por Wenceslao Arroyo Machado

Investigador posdoctoral en la Universidad de Granada y COO de EC3Metrics