Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Label existing new Bürokratt corpus #70

Open
1 task
Tracked by #49
kunnark opened this issue Oct 31, 2022 · 4 comments
Open
1 task
Tracked by #49

Label existing new Bürokratt corpus #70

kunnark opened this issue Oct 31, 2022 · 4 comments
Assignees

Comments

@kunnark
Copy link
Contributor

kunnark commented Oct 31, 2022

AS A Data Scientist
I WANT TO use a labelled new dataset
IN ORDER TO start training new BERT models on that.

Acceptance Criteria:

  • label existing corpuses according to IOB notation and unify them as single corpus.

Additional:
Labeling classes:

PER - person names
GPE - geopolitical entities
LOC - geographical locations
ORG - organizations
PROD - products, things, works of art
EVENT - events
DATE - dates
TIME - times
TITLE - titles and professions
MONEY - monetary expressions
PERCENT - percentages
DOC_ORG - id of organisation document
CARD - banking or similar card number
IBAN - IBAN account number
DOC_PER - Personal document number
IDCODE - personal ID code
EMAIL - email address
TEL - phone number

Task covered with functionalities to add additional corporas and functionality to prelabel new corporas #41

@kunnark kunnark self-assigned this Oct 31, 2022
@kunnark kunnark moved this to ✅ Done in Data Anonymizer Oct 31, 2022
@kunnark kunnark mentioned this issue Oct 31, 2022
9 tasks
@turnerrainer
Copy link
Contributor

@alimuhammadahmer @kunnark

  1. This issue is marked as "Done" but I can not see any code commits;
  2. ACs must contain all REST endpoints used to get the result;
  3. All technical ACs are missing.

@kunnark kunnark moved this from ✅ Done to 📋 Backlog in Data Anonymizer Nov 11, 2022
@kunnark
Copy link
Contributor Author

kunnark commented Nov 11, 2022

@turnerrainer

  1. This issue is marked as "Done" but I can not see any code commits;
  2. ACs must contain all REST endpoints used to get the result;
  3. All technical ACs are missing.
  1. Status changed
  2. No REST endpoints in use, this is DS work that is done to get the corpus from the training.
  3. The technical task for the labeler was to label a dataset.

@turnerrainer turnerrainer assigned vmugra and unassigned kunnark Jan 18, 2023
@turnerrainer
Copy link
Contributor

@vmugra please verify if the AC of this issue is met.

@turnerrainer turnerrainer moved this from 📋 Backlog to 👀 In review in Data Anonymizer Jan 20, 2023
@vmugra vmugra moved this from 👀 In review to ✅ Done in Data Anonymizer Jan 23, 2023
@vmugra
Copy link

vmugra commented Jan 23, 2023

Task covered with functionalities to add additional corporas and functionality to prelabel new corporas #41

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: ✅ Done
Development

No branches or pull requests

3 participants