Skip to content

Pipeline components

melisa-qordoba edited this page Sep 23, 2020 · 8 revisions

replaCy pipeline

ReplaCy pipeline API has been inspired by spaCy pipelines.

What is a pipeline component?

A pipeline component signature is List[Span] -> List[Span], that is, each pipeline component takes a list of spans, and then passes it to the next component. Anything function with the following properties can be added to the replaCy pipeline, which makes easy to write and use custom replaCy extensions.

Default pipeline components:

Be default replaCy pipeline consist of sorter, filter and joiner.

If replaCy is instantiated with kenlm model (see: ranking)

  • sorter - sorts suggestions
  • filter - filters suggestions according to max_count properties (see: filtering) else does nothing
  • joiner - joins spans into text

if kenlm is not passed, sorter and filter do nothing.

Check pipeline components

import en_core_web_sm
from replacy import ReplaceMatcher
from replacy.db import load_json

nlp = en_core_web_sm.load()
replaCy = ReplaceMatcher(nlp, load_json('path to match dict(s)'))
replaCy.pipe_names

Adding custom pipeline components:

Example:

import en_core_web_sm
from replacy import ReplaceMatcher
from replacy.db import load_json
from spacy.util import filter_spans

nlp = en_core_web_sm.load()
replaCy = ReplaceMatcher(nlp, load_json('path to match dict(s)'))
replaCy.add_pipe(filter_spans, name="filter_spans", before="joiner")

Check the list of custom replaCy extensions.

Clone this wiki locally