Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update v4 with latest from master and develop #11572

Merged
merged 38 commits into from
Oct 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
515d5c6
Add dev docs on satellite packages (#11435)
polm Sep 7, 2022
1f23c61
Refactor KB for easier customization (#11268)
rmitsch Sep 8, 2022
2602a30
Fix DVC command example (#11457)
polm Sep 8, 2022
aac9a58
Add docs for the `spacy.models_and_pipes_with_nvtx_range.v1` callback…
shadeMe Sep 9, 2022
0c72c6b
Auto-format code with black (#11468)
github-actions[bot] Sep 9, 2022
8a86a35
Remove has_letters in config template (#11465)
adrianeboyd Sep 9, 2022
6b83fee
Assets message (#11458)
kadarakos Sep 9, 2022
0ec9a69
Fix config validation failures caused by NVTX pipeline wrappers (#11460)
shadeMe Sep 12, 2022
cc10a27
Prevent tok2vec to broadcast to listeners when predicting (#11385)
svlandeg Sep 12, 2022
6be6913
Update cupy extras (#11279)
adrianeboyd Sep 13, 2022
3f0c3ad
Correct alignment example and documentation (#11491)
richardpaulhudson Sep 14, 2022
7c98245
Add levenshtein from polyleven (#11418)
adrianeboyd Sep 14, 2022
ca1ad67
disable mypy run for Python 3.10 (#11508)
svlandeg Sep 15, 2022
0509f90
add dot (#11500)
svlandeg Sep 15, 2022
d5c8498
disable mypy run for Python 3.10 (#11508) (#11511)
svlandeg Sep 15, 2022
df0b815
more explicit Example constructor example (#11489)
svlandeg Sep 16, 2022
279358b
Auto-format code with black (#11513)
github-actions[bot] Sep 16, 2022
af9b01e
Add dependency check to project step runs (#11226)
rmitsch Sep 16, 2022
f40d2fa
fix: remove duplicate v3.2 (#11530)
bdura Sep 23, 2022
6f692a0
Remove side effects from Doc.__init__() (#11506)
richardpaulhudson Sep 26, 2022
936a5f0
Fix English pipeline names in 3.4 release notes (#11542)
polm Sep 27, 2022
877671e
Preserve missing entity annotation in augmenters (#11540)
adrianeboyd Sep 27, 2022
a44b7d4
Add experimental coref docs (#11291)
polm Sep 27, 2022
3e8bc12
add punctuation to grc (#11426)
jmyerston Sep 27, 2022
9557b0f
Add spacy-partial-tagger to spaCy Universe (#11538)
yasufumy Sep 27, 2022
aea1671
Simplify and clarify enable/disable behavior of spacy.load() (#11459)
rmitsch Sep 27, 2022
e794d4a
`debug data` Spancat Table Improvements (#11504)
pmbaumgartner Sep 28, 2022
6d7630c
Allow overriding spacy_version in spacy package meta (#11552)
adrianeboyd Sep 29, 2022
ba63f57
Update docs to reflect Doc input to Language (#11555)
polm Sep 29, 2022
bcda8bc
update mypy to latest version (#11546)
svlandeg Sep 29, 2022
ff9002b
Add Zshot Spacy plugin (#11557)
GabrielePicco Sep 29, 2022
9c8cdb4
Merge branch 'master_copy' into develop_copy
svlandeg Sep 30, 2022
bf6e43a
Merge pull request #11563 from svlandeg/develop_copy
svlandeg Oct 3, 2022
087cc74
Remove mention of 1.7 from issue template (#11570)
polm Oct 3, 2022
70e21df
PR to test importlib-metadata (#11569)
svlandeg Oct 3, 2022
83425d4
Merge branch 'copy_master' into copy_develop
svlandeg Oct 3, 2022
e3027c6
Merge branch 'copy_develop' into copy_v4
svlandeg Oct 3, 2022
d4922f2
fix test for EL activations with refactored KB
svlandeg Oct 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/ISSUE_TEMPLATE/01_bugs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ about: Use this template if you came across a bug or unexpected behaviour differ
<!-- Include a code example or the steps that led to the problem. Please try to be as specific as possible. -->

## Your Environment
<!-- Include details of your environment. If you're using spaCy 1.7+, you can also type `python -m spacy info --markdown` and copy-paste the result here.-->
<!-- Include details of your environment. You can also type `python -m spacy info --markdown` and copy-paste the result here.-->
* Operating System:
* Python Version Used:
* spaCy Version Used:
Expand Down
2 changes: 1 addition & 1 deletion .github/azure-steps.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ steps:

- script: python -m mypy spacy
displayName: 'Run mypy'
condition: ne(variables['python_version'], '3.10')
condition: ne(variables['python_version'], '3.6')

- task: DeleteFiles@1
inputs:
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ quickstart-training-generator.js
cythonize.json
spacy/*.html
*.cpp
*.c
*.so

# Vim / VSCode / editors
Expand Down
2 changes: 1 addition & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
inputs:
versionSpec: "3.7"
- script: |
pip install flake8==3.9.2
pip install flake8==5.0.4
python -m flake8 spacy --count --select=E901,E999,F821,F822,F823,W605 --show-source --statistics
displayName: "flake8"

Expand Down
82 changes: 82 additions & 0 deletions extra/DEVELOPER_DOCS/Satellite Packages.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# spaCy Satellite Packages

This is a list of all the active repos relevant to spaCy besides the main one, with short descriptions, history, and current status. Archived repos will not be covered.

## Always Included in spaCy

These packages are always pulled in when you install spaCy. Most of them are direct dependencies, but some are transitive dependencies through other packages.

- [spacy-legacy](https://github.com/explosion/spacy-legacy): When an architecture in spaCy changes enough to get a new version, the old version is frozen and moved to spacy-legacy. This allows us to keep the core library slim while also preserving backwards compatability.
- [thinc](https://github.com/explosion/thinc): Thinc is the machine learning library that powers trainable components in spaCy. It wraps backends like Numpy, PyTorch, and Tensorflow to provide a functional interface for specifying architectures.
- [catalogue](https://github.com/explosion/catalogue): Small library for adding function registries, like those used for model architectures in spaCy.
- [confection](https://github.com/explosion/confection): This library contains the functionality for config parsing that was formerly contained directly in Thinc.
- [spacy-loggers](https://github.com/explosion/spacy-loggers): Contains loggers beyond the default logger available in spaCy&#39;s core code base. This includes loggers integrated with third-party services, which may differ in release cadence from spaCy itself.
- [wasabi](https://github.com/explosion/wasabi): A command line formatting library, used for terminal output in spaCy.
- [srsly](https://github.com/explosion/srsly): A wrapper that vendors several serialization libraries for spaCy. Includes parsers for JSON, JSONL, MessagePack, (extended) Pickle, and YAML.
- [preshed](https://github.com/explosion/preshed): A Cython library for low-level data structures like hash maps, used for memory efficient data storage.
- [cython-blis](https://github.com/explosion/cython-blis): Fast matrix multiplication using BLIS without depending on system libraries. Required by Thinc, rather than spaCy directly.
- [murmurhash](https://github.com/explosion/murmurhash): A wrapper library for a C++ murmurhash implementation, used for string IDs in spaCy and preshed.
- [cymem](https://github.com/explosion/cymem): A small library for RAII-style memory management in Cython.

## Optional Extensions for spaCy

These are repos that can be used by spaCy but aren&#39;t part of a default installation. Many of these are wrappers to integrate various kinds of third-party libraries.

- [spacy-transformers](https://github.com/explosion/spacy-transformers): A wrapper for the [HuggingFace Transformers](https://huggingface.co/docs/transformers/index) library, this handles the extensive conversion necessary to coordinate spaCy&#39;s powerful `Doc` representation, training pipeline, and the Transformer embeddings. When released, this was known as `spacy-pytorch-transformers`, but it changed to the current name when HuggingFace update the name of their library as well.
- [spacy-huggingface-hub](https://github.com/explosion/spacy-huggingface-hub): This package has a CLI script for uploading a packaged spaCy pipeline (created with `spacy package`) to the [Hugging Face Hub](https://huggingface.co/models).
- [spacy-alignments](https://github.com/explosion/spacy-alignments): A wrapper for the tokenizations library (mentioned below) with a modified build system to simplify cross-platform wheel creation. Used in spacy-transformers for aligning spaCy and HuggingFace tokenizations.
- [spacy-experimental](https://github.com/explosion/spacy-experimental): Experimental components that are not quite ready for inclusion in the main spaCy library. Usually there are unresolved questions around their APIs, so the experimental library allows us to expose them to the community for feedback before fully integrating them.
- [spacy-lookups-data](https://github.com/explosion/spacy-lookups-data): A repository of linguistic data, such as lemmas, that takes up a lot of disk space. Originally created to reduce the size of the spaCy core library. This is mainly useful if you want the data included but aren&#39;t using a pretrained pipeline; for the affected languages, the relevant data is included in pretrained pipelines directly.
- [coreferee](https://github.com/explosion/coreferee): Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages. Used as a spaCy pipeline component.
- [spacy-stanza](https://github.com/explosion/spacy-stanza): This is a wrapper that allows the use of Stanford&#39;s Stanza library in spaCy.
- [spacy-streamlit](https://github.com/explosion/spacy-streamlit): A wrapper for the Streamlit dashboard building library to help with integrating [displaCy](https://spacy.io/api/top-level/#displacy).
- [spacymoji](https://github.com/explosion/spacymoji): A library to add extra support for emoji to spaCy, such as including character names.
- [thinc-apple-ops](https://github.com/explosion/thinc-apple-ops): A special backend for OSX that uses Apple&#39;s native libraries for improved performance.
- [os-signpost](https://github.com/explosion/os-signpost): A Python package that allows you to use the `OSSignposter` API in OSX for performance analysis.
- [spacy-ray](https://github.com/explosion/spacy-ray): A wrapper to integrate spaCy with Ray, a distributed training framework. Currently a work in progress.

## Prodigy

[Prodigy](https://prodi.gy) is Explosion&#39;s easy to use and highly customizable tool for annotating data. Prodigy itself requires a license, but the repos below contain documentation, examples, and editor or notebook integrations.

- [prodigy-recipes](https://github.com/explosion/prodigy-recipes): Sample recipes for Prodigy, along with notebooks and other examples of usage.
- [vscode-prodigy](https://github.com/explosion/vscode-prodigy): A VS Code extension that lets you run Prodigy inside VS Code.
- [jupyterlab-prodigy](https://github.com/explosion/jupyterlab-prodigy): An extension for JupyterLab that lets you run Prodigy inside JupyterLab.

## Independent Tools or Projects

These are tools that may be related to or use spaCy, but are functional independent projects in their own right as well.

- [floret](https://github.com/explosion/floret): A modification of fastText to use Bloom Embeddings. Can be used to add vectors with subword features to spaCy, and also works independently in the same manner as fastText.
- [sense2vec](https://github.com/explosion/sense2vec): A library to make embeddings of noun phrases or words coupled with their part of speech. This library uses spaCy.
- [spacy-vectors-builder](https://github.com/explosion/spacy-vectors-builder): This is a spaCy project that builds vectors using floret and a lot of input text. It handles downloading the input data as well as the actual building of vectors.
- [holmes-extractor](https://github.com/explosion/holmes-extractor): Information extraction from English and German texts based on predicate logic. Uses spaCy.
- [healthsea](https://github.com/explosion/healthsea): Healthsea is a project to extract information from comments about health supplements. Structurally, it&#39;s a self-contained, large spaCy project.
- [spacy-pkuseg](https://github.com/explosion/spacy-pkuseg): A fork of the pkuseg Chinese tokenizer. Used for Chinese support in spaCy, but also works independently.
- [ml-datasets](https://github.com/explosion/ml-datasets): This repo includes loaders for several standard machine learning datasets, like MNIST or WikiNER, and has historically been used in spaCy example code and documentation.

## Documentation and Informational Repos

These repos are used to support the spaCy docs or otherwise present information about spaCy or other Explosion projects.

- [projects](https://github.com/explosion/projects): The projects repo is used to show detailed examples of spaCy usage. Individual projects can be checked out using the spaCy command line tool, rather than checking out the projects repo directly.
- [spacy-course](https://github.com/explosion/spacy-course): Home to the interactive spaCy course for learning about how to use the library and some basic NLP principles.
- [spacy-io-binder](https://github.com/explosion/spacy-io-binder): Home to the notebooks used for interactive examples in the documentation.

## Organizational / Meta

These repos are used for organizing data around spaCy, but are not something an end user would need to install as part of using the library.

- [spacy-models](https://github.com/explosion/spacy-models): This repo contains metadata (but not training data) for all the spaCy models. This includes information about where their training data came from, version compatability, and performance information. It also includes tests for the model packages, and the built models are hosted as releases of this repo.
- [wheelwright](https://github.com/explosion/wheelwright): A tool for automating our PyPI builds and releases.
- [ec2buildwheel](https://github.com/explosion/ec2buildwheel): A small project that allows you to build Python packages in the manner of cibuildwheel, but on any EC2 image. Used by wheelwright.

## Other

Repos that don&#39;t fit in any of the above categories.

- [blis](https://github.com/explosion/blis): A fork of the official BLIS library. The main branch is not updated, but work continues in various branches. This is used for cython-blis.
- [tokenizations](https://github.com/explosion/tokenizations): A library originally by Yohei Tamura to align strings with tolerance to some variations in features like case and diacritics, used for aligning tokens and wordpieces. Adopted and maintained by Explosion, but usually spacy-alignments is used instead.
- [conll-2012](https://github.com/explosion/conll-2012): A repo to hold some slightly cleaned up versions of the official scripts for the CoNLL 2012 shared task involving coreference resolution. Used in the coref project.
- [fastapi-explosion-extras](https://github.com/explosion/fastapi-explosion-extras): Some small tweaks to FastAPI used at Explosion.

31 changes: 31 additions & 0 deletions licenses/3rd_party_licenses.txt
Original file line number Diff line number Diff line change
Expand Up @@ -127,3 +127,34 @@ distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


polyleven
---------

* Files: spacy/matcher/polyleven.c

MIT License

Copyright (c) 2021 Fujimoto Seiji <fujimoto@ceptord.net>
Copyright (c) 2021 Max Bachmann <kontakt@maxbachmann.de>
Copyright (c) 2022 Nick Mazuk
Copyright (c) 2022 Michael Weiss <code@mweiss.ch>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
3 changes: 2 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,10 @@ pytest-timeout>=1.3.0,<2.0.0
mock>=2.0.0,<3.0.0
flake8>=3.8.0,<3.10.0
hypothesis>=3.27.0,<7.0.0
mypy>=0.910,<0.970; platform_machine!='aarch64'
mypy>=0.980,<0.990; platform_machine != "aarch64" and python_version >= "3.7"
types-dataclasses>=0.1.3; python_version < "3.7"
types-mock>=0.1.1
types-setuptools>=57.0.0
types-requests
types-setuptools>=57.0.0
black>=22.0,<23.0
36 changes: 20 additions & 16 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -68,37 +68,41 @@ transformers =
ray =
spacy_ray>=0.1.0,<1.0.0
cuda =
cupy>=5.0.0b4,<11.0.0
cupy>=5.0.0b4,<12.0.0
cuda80 =
cupy-cuda80>=5.0.0b4,<11.0.0
cupy-cuda80>=5.0.0b4,<12.0.0
cuda90 =
cupy-cuda90>=5.0.0b4,<11.0.0
cupy-cuda90>=5.0.0b4,<12.0.0
cuda91 =
cupy-cuda91>=5.0.0b4,<11.0.0
cupy-cuda91>=5.0.0b4,<12.0.0
cuda92 =
cupy-cuda92>=5.0.0b4,<11.0.0
cupy-cuda92>=5.0.0b4,<12.0.0
cuda100 =
cupy-cuda100>=5.0.0b4,<11.0.0
cupy-cuda100>=5.0.0b4,<12.0.0
cuda101 =
cupy-cuda101>=5.0.0b4,<11.0.0
cupy-cuda101>=5.0.0b4,<12.0.0
cuda102 =
cupy-cuda102>=5.0.0b4,<11.0.0
cupy-cuda102>=5.0.0b4,<12.0.0
cuda110 =
cupy-cuda110>=5.0.0b4,<11.0.0
cupy-cuda110>=5.0.0b4,<12.0.0
cuda111 =
cupy-cuda111>=5.0.0b4,<11.0.0
cupy-cuda111>=5.0.0b4,<12.0.0
cuda112 =
cupy-cuda112>=5.0.0b4,<11.0.0
cupy-cuda112>=5.0.0b4,<12.0.0
cuda113 =
cupy-cuda113>=5.0.0b4,<11.0.0
cupy-cuda113>=5.0.0b4,<12.0.0
cuda114 =
cupy-cuda114>=5.0.0b4,<11.0.0
cupy-cuda114>=5.0.0b4,<12.0.0
cuda115 =
cupy-cuda115>=5.0.0b4,<11.0.0
cupy-cuda115>=5.0.0b4,<12.0.0
cuda116 =
cupy-cuda116>=5.0.0b4,<11.0.0
cupy-cuda116>=5.0.0b4,<12.0.0
cuda117 =
cupy-cuda117>=5.0.0b4,<11.0.0
cupy-cuda117>=5.0.0b4,<12.0.0
cuda11x =
cupy-cuda11x>=11.0.0,<12.0.0
cuda-autodetect =
cupy-wheel>=11.0.0,<12.0.0
apple =
thinc-apple-ops>=0.1.0.dev0,<1.0.0
# Language tokenizers with external dependencies
Expand Down
15 changes: 14 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@
"spacy.lexeme",
"spacy.vocab",
"spacy.attrs",
"spacy.kb",
"spacy.kb.candidate",
"spacy.kb.kb",
"spacy.kb.kb_in_memory",
"spacy.ml.parser_model",
"spacy.morphology",
"spacy.pipeline.dep_parser",
Expand Down Expand Up @@ -205,6 +207,17 @@ def setup_package():
get_python_inc(plat_specific=True),
]
ext_modules = []
ext_modules.append(
Extension(
"spacy.matcher.levenshtein",
[
"spacy/matcher/levenshtein.pyx",
"spacy/matcher/polyleven.c",
],
language="c",
include_dirs=include_dirs,
)
)
for name in MOD_NAMES:
mod_path = name.replace(".", "/") + ".pyx"
ext = Extension(
Expand Down
6 changes: 3 additions & 3 deletions spacy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@ def load(
name: Union[str, Path],
*,
vocab: Union[Vocab, bool] = True,
disable: Union[str, Iterable[str]] = util.SimpleFrozenList(),
enable: Union[str, Iterable[str]] = util.SimpleFrozenList(),
exclude: Union[str, Iterable[str]] = util.SimpleFrozenList(),
disable: Union[str, Iterable[str]] = util._DEFAULT_EMPTY_PIPES,
enable: Union[str, Iterable[str]] = util._DEFAULT_EMPTY_PIPES,
exclude: Union[str, Iterable[str]] = util._DEFAULT_EMPTY_PIPES,
config: Union[Dict[str, Any], Config] = util.SimpleFrozenDict(),
) -> Language:
"""Load a spaCy model from an installed package or a local path.
Expand Down
9 changes: 9 additions & 0 deletions spacy/cli/_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -573,3 +573,12 @@ def setup_gpu(use_gpu: int, silent=None) -> None:
local_msg.info("Using CPU")
if gpu_is_available():
local_msg.info("To switch to GPU 0, use the option: --gpu-id 0")


def _format_number(number: Union[int, float], ndigits: int = 2) -> str:
"""Formats a number (float or int) rounding to `ndigits`, without truncating trailing 0s,
as happens with `round(number, ndigits)`"""
if isinstance(number, float):
return f"{number:.{ndigits}f}"
else:
return str(number)
29 changes: 24 additions & 5 deletions spacy/cli/debug_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import math

from ._util import app, Arg, Opt, show_validation_error, parse_config_overrides
from ._util import import_code, debug_cli
from ._util import import_code, debug_cli, _format_number
from ..training import Example, remove_bilu_prefix
from ..training.initialize import get_sourced_components
from ..schemas import ConfigSchemaTraining
Expand Down Expand Up @@ -989,7 +989,8 @@ def _get_kl_divergence(p: Counter, q: Counter) -> float:
def _format_span_row(span_data: List[Dict], labels: List[str]) -> List[Any]:
"""Compile into one list for easier reporting"""
d = {
label: [label] + list(round(d[label], 2) for d in span_data) for label in labels
label: [label] + list(_format_number(d[label]) for d in span_data)
for label in labels
}
return list(d.values())

Expand All @@ -1004,6 +1005,10 @@ def _get_span_characteristics(
label: _gmean(l)
for label, l in compiled_gold["spans_length"][spans_key].items()
}
spans_per_type = {
label: len(spans)
for label, spans in compiled_gold["spans_per_type"][spans_key].items()
}
min_lengths = [min(l) for l in compiled_gold["spans_length"][spans_key].values()]
max_lengths = [max(l) for l in compiled_gold["spans_length"][spans_key].values()]

Expand Down Expand Up @@ -1031,6 +1036,7 @@ def _get_span_characteristics(
return {
"sd": span_distinctiveness,
"bd": sb_distinctiveness,
"spans_per_type": spans_per_type,
"lengths": span_length,
"min_length": min(min_lengths),
"max_length": max(max_lengths),
Expand All @@ -1045,12 +1051,15 @@ def _get_span_characteristics(

def _print_span_characteristics(span_characteristics: Dict[str, Any]):
"""Print all span characteristics into a table"""
headers = ("Span Type", "Length", "SD", "BD")
headers = ("Span Type", "Length", "SD", "BD", "N")
# Wasabi has this at 30 by default, but we might have some long labels
max_col = max(30, max(len(label) for label in span_characteristics["labels"]))
# Prepare table data with all span characteristics
table_data = [
span_characteristics["lengths"],
span_characteristics["sd"],
span_characteristics["bd"],
span_characteristics["spans_per_type"],
]
table = _format_span_row(
span_data=table_data, labels=span_characteristics["labels"]
Expand All @@ -1061,8 +1070,18 @@ def _print_span_characteristics(span_characteristics: Dict[str, Any]):
span_characteristics["avg_sd"],
span_characteristics["avg_bd"],
]
footer = ["Wgt. Average"] + [str(round(f, 2)) for f in footer_data]
msg.table(table, footer=footer, header=headers, divider=True)

footer = (
["Wgt. Average"] + ["{:.2f}".format(round(f, 2)) for f in footer_data] + ["-"]
)
msg.table(
table,
footer=footer,
header=headers,
divider=True,
aligns=["l"] + ["r"] * (len(footer_data) + 1),
max_col=max_col,
)


def _get_spans_length_freq_dist(
Expand Down
Loading