Skip to content

Data Exchange Routines between the ZDL and Wikimedia Projects

License

Notifications You must be signed in to change notification settings

zentrum-lexikographie/wikimedia

Repository files navigation

Exporting Lexical Resources to WikiData

Prerequisites

  • Clojure v1.11: Export routines are written in Clojure.
  • Java (JDK) >= v11: Clojure, being a hosted language, requires a current Java runtime.
  • Docker: a local Wikibase setup for testing the import is provided via Docker containers.

Testing Lexeme Coverage

$ clojure -M:test:kaocha --focus-meta :coverage

--- unit (clojure.test) ---------------------------
dwds.wikidata.coverage-test
  increased-coverage FAIL FAIL

Randomized with --seed 396611200

FAIL in dwds.wikidata.coverage-test/increased-coverage (coverage_test.clj:34)
expected: (< 0.8 (:tokens-pct coverage))
  actual: (not (< 0.8 0.68013096))

FAIL in dwds.wikidata.coverage-test/increased-coverage (coverage_test.clj:35)
expected: (< 0.2 (:forms-pct coverage))
  actual: (not (< 0.2 0.1069699))
1 tests, 2 assertions, 2 failures.

[…]

Importing Lexemes

Export a digest of existing WikiData lexemes (based on a current dump):

$ clojure -M:dump >lexemes.csv

Then run the import, filtering existing lexemes based on the exported dump retrieved before:

$ clojure -M:import\
    -l 1\
    -e lexemes.csv\
    -e lexemes.wikidata.csv\
    -s ../../data/zdl/wb\
    >>lexemes.wikidata.csv 

Testing

Start a containerized, local test setup of Wikibase/MySQL:

$ docker-compose up

Run lexeme import tests:

$ clojure -M:test:kaocha
--- unit (clojure.test) ---------------------------
dwds.wikidata.lexeme-import-test
  conversion
  test-wb-import

2 tests, 2 assertions, 0 failures.

Links

License

Copyright 2022 Gregor Middell.

This project is licensed under the GNU General Public License v3.0.

About

Data Exchange Routines between the ZDL and Wikimedia Projects

Resources

License

Stars

Watchers

Forks

Releases

No releases published