-
Notifications
You must be signed in to change notification settings - Fork 267
Integrate link graph into main wiki pipeline #429
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to see the handling of Wikidata links implemented.
And thanks for making the documentation more up-to-date as well.
doc/guide/caspar.md
Outdated
## Preparing the training data | ||
|
||
The LDC2013T19 OntoNotes 5 corpus is needed to produce the training data for | ||
CASPAR. This is licensed by LDC and you need a LDC license to use the corpus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a -> an
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/guide/caspar.md
Outdated
|
||
## Pre-trained word embeddings | ||
|
||
The CASPAR parser uses pre-trained word embeddings which can be download from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
download -> downloaded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/guide/install.md
Outdated
with Python 3 you can install a pre-built wheel: | ||
|
||
``` | ||
sudo -H pip3 install http://www.jbox.dk/sling/sling-2.0.0-py3-none-linux_x86_64.whl | ||
``` | ||
and download the pre-trained model: | ||
|
||
You can test the installing by trying to import the `sling` package: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
installing -> installation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/guide/wikiflow.md
Outdated
graph that is stored in the `/w/item/links` property for each item. The link | ||
graph is built over all the Wikipedias being processed. The fan-in, | ||
i.e. the number of links to the item, is also computed and stored in the | ||
`/w/item/popularity` property. Tge popularity count also includes the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tge -> The
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/guide/wikiflow.md
Outdated
graph is built over all the Wikipedias being processed. The fan-in, | ||
i.e. the number of links to the item, is also computed and stored in the | ||
`/w/item/popularity` property. Tge popularity count also includes the number of | ||
times the item is a fact taget in other items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taget -> target
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sling/nlp/wiki/wikipedia-links.cc
Outdated
Text id = store->FrameId(target); | ||
if (id.empty()) continue; | ||
|
||
if (!store->IsFrame(target)) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this line might be redundant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Removed.
sling/nlp/wiki/wikipedia-links.cc
Outdated
if (id.empty()) continue; | ||
|
||
if (!store->IsFrame(target)) continue; | ||
accumulator_.Increment(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indention off by one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot to hit approve in my previous update.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review
doc/guide/caspar.md
Outdated
## Preparing the training data | ||
|
||
The LDC2013T19 OntoNotes 5 corpus is needed to produce the training data for | ||
CASPAR. This is licensed by LDC and you need a LDC license to use the corpus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/guide/caspar.md
Outdated
|
||
## Pre-trained word embeddings | ||
|
||
The CASPAR parser uses pre-trained word embeddings which can be download from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/guide/install.md
Outdated
with Python 3 you can install a pre-built wheel: | ||
|
||
``` | ||
sudo -H pip3 install http://www.jbox.dk/sling/sling-2.0.0-py3-none-linux_x86_64.whl | ||
``` | ||
and download the pre-trained model: | ||
|
||
You can test the installing by trying to import the `sling` package: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
doc/guide/wikiflow.md
Outdated
graph that is stored in the `/w/item/links` property for each item. The link | ||
graph is built over all the Wikipedias being processed. The fan-in, | ||
i.e. the number of links to the item, is also computed and stored in the | ||
`/w/item/popularity` property. Tge popularity count also includes the number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
doc/guide/wikiflow.md
Outdated
graph is built over all the Wikipedias being processed. The fan-in, | ||
i.e. the number of links to the item, is also computed and stored in the | ||
`/w/item/popularity` property. Tge popularity count also includes the number of | ||
times the item is a fact taget in other items. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sling/nlp/wiki/wikipedia-links.cc
Outdated
Text id = store->FrameId(target); | ||
if (id.empty()) continue; | ||
|
||
if (!store->IsFrame(target)) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Removed.
sling/nlp/wiki/wikipedia-links.cc
Outdated
if (id.empty()) continue; | ||
|
||
if (!store->IsFrame(target)) continue; | ||
accumulator_.Increment(id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
I have integrated the link graph into the main wiki pipeline. The fanin values now also includes counts from basic facts from the items.
I have also updated the documentation to use the new Myelin-based parser trainer.