From c8b886364f8f3f3a8ee68dadba78126b57594588 Mon Sep 17 00:00:00 2001 From: Nathan TeBlunthuis Date: Tue, 31 Mar 2020 16:22:30 -0700 Subject: [PATCH] add documentation for the output files --- keywords/README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/keywords/README.md b/keywords/README.md index 5bf27ba..490d7eb 100644 --- a/keywords/README.md +++ b/keywords/README.md @@ -5,3 +5,10 @@ This code finds trending web searches related to the COVID-19 pandemic using Goo We search the Wikidata API for entities in `src/wikidata_search.py` and then we make simple SPARQL queries in `src/wikidata_translations.py` to collect labels and aliases the entities. The labels come with language metadata. This seems to provide a decent initial list of relevant terms across multiple languages. The output data lives at [covid19.communitydata.science](https://covid19.communitydata.science/datasets/keywords). + +The output files have 4 colums: + +- `itemid` links to the wikidata entity +- `label` is the translation of the relevant keyword +- `langcode` is the [iso 639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) code corresponding the language of the label. +- `is_alt` indicates whether the label is an [alias](https://www.wikidata.org/wiki/Help:Aliases). -- 2.39.5