Skip to content

Commit

Permalink
Merge branch 'master' of github.com:wikilinks/nel
Browse files Browse the repository at this point in the history
  • Loading branch information
andychisholm committed Apr 27, 2016
2 parents a366d15 + 913e879 commit ea08134
Show file tree
Hide file tree
Showing 3 changed files with 14 additions and 15 deletions.
Empty file added doc.requirements.txt
Empty file.
8 changes: 4 additions & 4 deletions docs/guides/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ First, we must extract redirect mappings to properly resolve inter-article links
sift build-corpus --save redirects WikipediaRedirects latest json
```

The extracted rediect mappings are now stored under the `redirects` directory.
The extracted redirect mappings are now stored under the `redirects` directory.

Next, we perform full plain-text extraction over the Wikipedia dump, mapping links to their current Wikipedia target.

Expand All @@ -32,17 +32,17 @@ Our wikipedia corpus is now in the standard format for __sift__ corpora from whi

We will now extract two simple count driven models from this corpus which are useful in entity linking.

The first of model, "EntityCounts" is simply the total number of linkings for an entity over the corpus.
The first model, "EntityCounts" simply collects the total count of inlinks for each entity over the corpus.

We use this statistic as a proxy for the prior probability of an entity and expect that entities with a higher prior are more likely to linked.
We use this statistic as a proxy for the prior probability of an entity and expect that entities with higher counts are more likely to be linked.

```
sift build-doc-model --save ecounts EntityCounts processed redis --prefix models:ecounts[wikipedia]:
```

The second model, "EntityNameCounts" collects the number of times a given anchor text string is used to link an entity.

This statistic helps us model the conditional probability of an entity given the name used in text.
This statistic helps us model the conditional probability of an entity given the name used to reference it in text.

```
sift build-doc-model --save necounts EntityNameCounts processed --lowercase redis --prefix models:necounts[wikipedia]:
Expand Down
21 changes: 10 additions & 11 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,20 @@ __nel__ is an fast, accurate and highly modular framework for linking entities i

Out of the box, __nel__ provides:

- named entity recognition (DIY, or plug-in a NER system like Stanford, spaCy or Schwa)
- in-document coreference clustering
- candidate generation
- multiple disambiguation features
- named entity recognition
- coreference clustering and candidate generation
- multipple entity disambiguation feature models
- a supervised learning-to-rank framework for entity disambiguation
- a supervised nil detection system with configurable confidence thresholds
- nil clustering
- support for evaluation and error analysis of linking system output
- basic nil clustering for out-of-KB entities
- support for evaluating linker performance and running error analysis

__nel__ is completely modular, it can:
__nel__ is modular, it can:

- link entities to any knowledge base you like (not limited to just Wikipedia or Freebase)
- update, rebuild and redeploy linking models as a knowledge base changes over time
- retrain recognition and disambiguation models on your own corpus of documents
- easily adapt a linking pipeline to meet performance and accuracy tradeoffs
- link entity mentions to any knowledge base you like (not just Wikipedia and Freebase!)
- update, rebuild and redeploy models as a knowledge base changes over time
- retrain recognition and disambiguation classifiers on your own corpus of documents
- adapt linking pipelines to meet performance, precision and recall tradeoffs

__nel__ is flexible, you can run it:

Expand Down

0 comments on commit ea08134

Please sign in to comment.