Source strings and zh-CN translate resources of Telegram
-
Updated
Dec 13, 2020 - Python
Source strings and zh-CN translate resources of Telegram
Collect tweets (tweets corpus) using Twitter API. Collection can be based on hashtags, keywords, geographical location
BERT models with tokenization for Japanese texts.
💬 Cross-platform application for the creation of language resources from ELAN linguistic analysis files, or from scratch.
Build scripts for the UniSegments collection of morphologically segmented lexicons for many languages
This is a web application that will serve to be the community-driven go-to site for finding Chinese resources and learning Mandarin.
The scripts for compiling the Universal Derivations collections of harmonised word-formation resources for multiple langugaes.
Script for simplifying the process of translating MineOS Language (.lang) files
Selected data processing scripts including language agnostic multilingual wiktionary parser
This repo contaings PDF, text and manually edited files of ka Dienshonhia dictionary digitalisation work
Simple parser for Dict.cc dictionary
Español: cree un conjunto de datos de tarjetas flash a partir de un archivo .txt. Palabra, significado, etimología, ejemplos, clase. English: create Flash Cards dataset from a .txt file. Word, meaning, etymology, examples, class.
Gets text and extracts sentences in a language from text using that language's lexicon.
Estonian Grammatical Error Correction (GEC) test and development corpus that contains L2 learner texts error-annotated in the M2 format.
Add a description, image, and links to the language-resources topic page so that developers can more easily learn about it.
To associate your repository with the language-resources topic, visit your repo's landing page and select "manage topics."