diff --git a/docs/entityrecognition.md b/docs/entityrecognition.md new file mode 100644 index 0000000..bd7c6c6 --- /dev/null +++ b/docs/entityrecognition.md @@ -0,0 +1,46 @@ +# Entity Recognition +The entity recognition part is performed by using danish and english pre-trained models published by SpaCy. + +## Model Links +- Danish model: [https://spacy.io/models/da#da_core_news_lg](https://spacy.io/models/da#da_core_news_lg) +- English model: [https://spacy.io/models/en#en_core_web_lg](https://spacy.io/models/en#en_core_web_lg) + +## Custom Danish Model +The danish model has been trained on top of the danish pre-trained SpaCy model to improve its accuracy and be able to recognize literals. See [Pypi Repository](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/pypi.md) for more information on where to find the custom model. + +## Loading a SpaCy Model +```python +import en_core_web_lg +import da_core_news_lg + +nlp_en = en_core_web_lg.load() +nlp_da = da_core_news_lg.load() +``` + +> Full code available [here](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/5fcd59bac0fbd91b2543d7d78a893f16da49f25f/components/GetSpacyData.py#L17#L18). + +## Performing Entity Recognition on Input +The entity recognition is performed using either the `nlp_en` or `nlp_da` variable defined in [Loading a SpaCy Model](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/entityrecognition.md#loading-a-spacy-model). + +```python +def GetTokens(text: str): + result = DetectLang(text) + if result == "da": + return nlp_da(text) + elif result == "en": + return nlp_en(text) + else: + raise UndetectedLanguageException() +``` + +> Full code available [here](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/5fcd59bac0fbd91b2543d7d78a893f16da49f25f/components/GetSpacyData.py#L31#L38). + +The return type of this function is a [Doc](https://spacy.io/api/doc) containing information such as the entity's start and end index, the entity's belonging sentence, and so on. + +----------- +
+ Up next: +
+ Entity Linker + +
\ No newline at end of file diff --git a/docs/our-part-of-the-pipeline.md b/docs/our-part-of-the-pipeline.md index fcf5e94..3ff3634 100644 --- a/docs/our-part-of-the-pipeline.md +++ b/docs/our-part-of-the-pipeline.md @@ -12,7 +12,7 @@ Our part of the pipeline is concerned with Entity Recognition and Entity Linking See the [Getting started](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/gettingstarted.md) guide. ## The input that the solution takes -See the [input](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/our-part-of-the-pipeline/pipeline-input.md) explanation +See the [input](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/our-part-of-the-pipeline/pipeline-input.md) explanation. ## Entity Recognition Check out the [Entity Recognition documentation](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/entityrecognition.md) @@ -21,8 +21,13 @@ Check out the [Entity Recognition documentation](https://github.com/Knox-AAU/Pre Check out the [Entity Linker documentation](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/entitylinker.md) ## The output it produces -See the [output](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/our-part-of-the-pipeline/pipeline-output.md) explanation +See the [output](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/our-part-of-the-pipeline/pipeline-output.md) explanation. +## Deployment +See the [Docker Watchtower](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/docker-watchtower.md) guide. + +## Writing Tests +See the [Testing](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/testing.md) guide. ## Other components - The [DirectoryWatcher](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/docs/DirectoryWatcher.md)