Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: binh-vu/sm
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 3.4.0
Choose a base ref
...
head repository: binh-vu/sm
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref

Commits on May 3, 2022

  1. Copy the full SHA
    001ea47 View commit details

Commits on May 10, 2022

  1. update dep and readme

    Binh Vu committed May 10, 2022
    Copy the full SHA
    2bc1839 View commit details

Commits on May 14, 2022

  1. [3.4.3] fix bug in evaluating empty model

    Binh Vu committed May 14, 2022
    Copy the full SHA
    62b94d6 View commit details
  2. Copy the full SHA
    a314c1f View commit details
  3. support zip compressed dataset

    Binh Vu committed May 14, 2022
    Copy the full SHA
    0a97ba0 View commit details

Commits on May 31, 2022

  1. Copy the full SHA
    f151a1e View commit details

Commits on Jun 23, 2022

  1. remove unuse dependency

    Binh Vu committed Jun 23, 2022
    Copy the full SHA
    a8a21aa View commit details
  2. bump version

    Binh Vu committed Jun 23, 2022
    Copy the full SHA
    9738c02 View commit details
  3. Copy the full SHA
    bcdd411 View commit details

Commits on Jun 28, 2022

  1. fix get_latest_path, get_incremental_path to allow for flexible delim…

    …iter (not only dot before the suffix)
    Binh Vu committed Jun 28, 2022
    Copy the full SHA
    21e424b View commit details

Commits on Jul 7, 2022

  1. fixes get_incremental_path and get_latest_path

    Binh Vu committed Jul 7, 2022
    Copy the full SHA
    6c19084 View commit details

Commits on Jul 11, 2022

  1. fix bug in get_incremental_path

    Binh Vu committed Jul 11, 2022
    Copy the full SHA
    8e4504b View commit details

Commits on Jul 12, 2022

  1. set default context for parallel map to fork

    Binh Vu committed Jul 12, 2022
    Copy the full SHA
    b0d104e View commit details

Commits on Aug 2, 2022

  1. improve timer to save results to csv file

    Binh Vu committed Aug 2, 2022
    Copy the full SHA
    5e59e86 View commit details
  2. Update README.md

    binh-vu authored Aug 2, 2022
    Copy the full SHA
    d6fd4a6 View commit details

Commits on Aug 12, 2022

  1. format timer.py

    Binh Vu committed Aug 12, 2022
    Copy the full SHA
    87e1a34 View commit details

Commits on Sep 27, 2022

  1. update slugify version

    Binh Vu committed Sep 27, 2022
    Copy the full SHA
    155a295 View commit details

Commits on Oct 3, 2022

  1. - cachemethod: allow customize the attribute to store cache object

    - add matrix object
    Binh Vu committed Oct 3, 2022
    Copy the full SHA
    22fab92 View commit details

Commits on Oct 5, 2022

  1. - table: lazy init dataframe

    - add get_instance method
    Binh Vu committed Oct 5, 2022
    Copy the full SHA
    c614aaf View commit details

Commits on Oct 8, 2022

  1. add CacheMethod.as_json

    Binh Vu committed Oct 8, 2022
    Copy the full SHA
    a8b477d View commit details

Commits on Oct 10, 2022

  1. add lz4 compression

    Binh Vu committed Oct 10, 2022
    Copy the full SHA
    153a051 View commit details

Commits on Oct 13, 2022

  1. + add batch method

    + add new key function to cache method: cache selected arguments and keyword arguments
    Binh Vu committed Oct 13, 2022
    Copy the full SHA
    295429c View commit details

Commits on Oct 15, 2022

  1. update dependency

    Binh Vu committed Oct 15, 2022
    Copy the full SHA
    14cc5db View commit details

Commits on Oct 19, 2022

  1. bump graph-wrapper from 1.4.0 to 1.5.0

    Binh Vu committed Oct 19, 2022
    Copy the full SHA
    ddf4794 View commit details

Commits on Nov 4, 2022

  1. - add context, link models to inputs

    - add __slots__ to table and column models
    - batch function: batching an array won't return a list of tuple of a single list but only a list of list
    - datasets support individual table compression
    Binh Vu committed Nov 4, 2022
    Copy the full SHA
    c85da53 View commit details

Commits on Nov 5, 2022

  1. version 4: deprecate timer, deser, exp_manager to favor new packages …

    …and fix minor bugs
    Binh Vu committed Nov 5, 2022
    Copy the full SHA
    385b8c1 View commit details

Commits on Nov 6, 2022

  1. Copy the full SHA
    6226bf3 View commit details
  2. fix dataset format issue

    Binh Vu committed Nov 6, 2022
    Copy the full SHA
    e37735e View commit details
  3. Copy the full SHA
    88b43a1 View commit details
  4. add ray helper methods

    Binh Vu committed Nov 6, 2022
    Copy the full SHA
    3b6e1e6 View commit details
  5. add knowledge graph namespace trait, get_entity_rel_uri is now comply…

    … with custom prefix
    Binh Vu committed Nov 6, 2022
    Copy the full SHA
    9c6af0e View commit details
  6. remove deprecated modules from prelude

    Binh Vu committed Nov 6, 2022
    Copy the full SHA
    43ab54d View commit details
  7. release new version

    Binh Vu committed Nov 6, 2022
    Copy the full SHA
    d8423e5 View commit details

Commits on Nov 7, 2022

  1. bump serde2 version

    Binh Vu committed Nov 7, 2022
    Copy the full SHA
    c23249a View commit details

Commits on Nov 8, 2022

  1. fix bug that we cannot pickle EntityId class

    Binh Vu committed Nov 8, 2022
    Copy the full SHA
    bff6458 View commit details

Commits on Nov 9, 2022

  1. update lock

    Binh Vu committed Nov 9, 2022
    Copy the full SHA
    8a92115 View commit details
  2. add missing package timer4

    Binh Vu committed Nov 9, 2022
    Copy the full SHA
    f813003 View commit details
  3. rerun ci

    Binh Vu committed Nov 9, 2022
    Copy the full SHA
    6b21065 View commit details
  4. Update README.md

    binh-vu authored Nov 9, 2022
    Copy the full SHA
    42b9c01 View commit details

Commits on Nov 30, 2022

  1. add fulltable.keep_rows

    Binh Vu committed Nov 30, 2022
    Copy the full SHA
    e7e3d1c View commit details
  2. bump version

    Binh Vu committed Nov 30, 2022
    Copy the full SHA
    cd0dfe4 View commit details
  3. bump orjson version

    Binh Vu committed Nov 30, 2022
    Copy the full SHA
    7fb3f6c View commit details

Commits on Dec 7, 2022

  1. fix bug in get_latest_version

    Binh Vu committed Dec 7, 2022
    Copy the full SHA
    d4590e1 View commit details

Commits on Dec 20, 2022

  1. add keep_columns and clone functions

    Binh Vu committed Dec 20, 2022
    Copy the full SHA
    c5668e0 View commit details

Commits on Dec 24, 2022

  1. bump deps

    Binh Vu committed Dec 24, 2022
    Copy the full SHA
    80d5051 View commit details

Commits on Dec 25, 2022

  1. add ray_init helper

    Binh Vu committed Dec 25, 2022
    Copy the full SHA
    c42bd2d View commit details
  2. update batch function to allow return as tuple when passing a single …

    …variable
    Binh Vu committed Dec 25, 2022
    Copy the full SHA
    e5aa49f View commit details

Commits on Dec 26, 2022

  1. update lockfile

    Binh Vu committed Dec 26, 2022
    Copy the full SHA
    8b1c465 View commit details
  2. fix type annotation

    Binh Vu committed Dec 26, 2022
    Copy the full SHA
    b674166 View commit details
  3. fix type annotation

    Binh Vu committed Dec 26, 2022
    Copy the full SHA
    2cda6c2 View commit details
Showing with 3,871 additions and 1,370 deletions.
  1. +1 −1 .github/workflows/publish.yml
  2. +2 −1 .gitignore
  3. +57 −0 CHANGELOG.md
  4. +3 −0 README.md
  5. +32 −21 pyproject.toml
  6. +528 −37 sm/dataset.py
  7. +26 −14 sm/evaluation/cpa_cta_metrics.py
  8. +58 −22 sm/evaluation/hierarchy_scoring_fn.py
  9. +60 −0 sm/evaluation/precision_recall_f1.py
  10. +20 −3 sm/evaluation/prelude.py
  11. +37 −41 sm/evaluation/sm_metrics.py
  12. +10 −2 sm/evaluation/transformation.py
  13. +50 −0 sm/evaluation/utils.py
  14. +0 −2 sm/inputs/__init__.py
  15. +35 −1 sm/inputs/column.py
  16. +83 −0 sm/inputs/context.py
  17. +130 −0 sm/inputs/link.py
  18. +14 −0 sm/inputs/prelude.py
  19. +95 −10 sm/inputs/table.py
  20. +0 −9 sm/misc/__init__.py
  21. +88 −0 sm/misc/bijection.py
  22. +11 −4 sm/misc/deser.py
  23. +8 −8 sm/misc/exp_manager.py
  24. +74 −24 sm/misc/fn_cache.py
  25. +320 −50 sm/misc/funcs.py
  26. +0 −1 sm/misc/graph/.gitignore
  27. +0 −3 sm/misc/graph/__init__.py
  28. +0 −64 sm/misc/graph/deser.py
  29. +0 −197 sm/misc/graph/edmonds.py
  30. +0 −181 sm/misc/graph/graph.html
  31. +0 −306 sm/misc/graph/query.py
  32. +0 −87 sm/misc/graph/test.py
  33. +0 −98 sm/misc/graph/test2.py
  34. +0 −68 sm/misc/graph/viz.py
  35. +131 −0 sm/misc/matrix.py
  36. +16 −0 sm/misc/prelude.py
  37. +599 −0 sm/misc/ray_helper.py
  38. +67 −5 sm/misc/timer.py
  39. +0 −2 sm/namespaces/__init__.py
  40. +38 −0 sm/namespaces/dbpedia.py
  41. +161 −15 sm/namespaces/namespace.py
  42. +13 −9 sm/namespaces/prefix_index.py
  43. +20 −0 sm/namespaces/prelude.py
  44. +41 −0 sm/namespaces/utils.py
  45. +67 −43 sm/namespaces/wikidata.py
  46. +32 −1 sm/outputs/__init__.py
  47. +225 −0 sm/outputs/_sm_formatter.py
  48. +270 −0 sm/outputs/_sm_transform.py
  49. +398 −36 sm/outputs/semantic_model.py
  50. +5 −4 sm/prelude.py
  51. 0 sm/py.typed
  52. +9 −0 sm/typing.py
  53. +21 −0 tests/namespaces/test_wikidata.py
  54. +16 −0 tests/test_import.py
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -11,7 +11,7 @@ jobs:
- name: setup python
uses: actions/setup-python@v3.1.0
with:
python-version: 3.8
python-version: 3.9
- name: setup dependencies
run: |
pip install poetry twine
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -127,4 +127,5 @@ dmypy.json

# Pyre type checker
.pyre/
.idea
.idea
poetry.lock
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Changelog

## [6.12.0] - 2024-06-18

### Added

- Added integer/decimal/boolean to LiteralDataNodeType

## [6.11.2] - 2024-05-11

### Added

- Add helper functions

### Changed

- Make ray optional

## [6.11.1] - 2024-04-26

### Changed

- Upgrade rdflib to 7.0.0

## [6.11.0] - 2024-04-21

### Added

- Add `__str__` to `EntityIdWithScore` class
- Add `assert_not_empty` function to check if a list is not empty
- Add `SemanticModel.is_entity_column` and `SemanticModel.iter_data_nodes` functions
- Add `assert_is_unique`, `is_monotonic_decreasing`, and `KnownSizeIntegerEncoder` helpers

### Fixed

- Add missing prefixes to DBpediaNamespace and main URIs
- Fix division by zero error in `percentage`
- Fix `FullTable.keep_columns` function to make links consistent

## [6.10.1] - 2024-03-12

### Added

- Add `before_shutdown` function to ray map & ray actor map to copy the data stored in shared memory before shutting down the ray cluster to avoid data corruption.

## [6.10.0] - 2024-03-06

### Added

- Add DBpedia namespace

## [6.9.0] - 2024-03-03

### Added

- Add function to remove empty rows from a table
- Add new format (txt -- combination of csv & json) to save table data to easier to edit & view table
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# SEM-DESC [![PyPI](https://img.shields.io/pypi/v/sem-desc)](https://pypi.org/project/sem-desc/)

Containing basic functions (input, output, dataset, evaluation metrics) for the semantic modeling problem.
53 changes: 32 additions & 21 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,36 +1,47 @@
[tool.poetry]
name = "sem-desc"
version = "3.4.0"
version = "6.18.3"
description = "Package providing basic functionalities for the semantic modeling problem"
authors = ["Binh Vu <binh@toan2.com>"]
license = "MIT"
packages = [
{ include = "sm" }
]
include = [
]
packages = [{ include = "sm" }]

readme = "README.md"
homepage = "https://github.com/binh-vu/sm"
repository = "https://github.com/binh-vu/sm"

[tool.poetry.dependencies]
python = "^3.8"
pandas = "^1.4.1"
python-slugify = "^5.0.2"
lbry-rocksdb-optimized = "^0.8.1"
python = "^3.9"
pandas = { version = "^2.1.3", extras = ["excel"] }
python-slugify = "^8.0.4"
pyrsistent = "^0.17.3"
orjson = "^3.6.4"
loguru = "^0.5.3"
tqdm = "^4.63.1"
matplotlib = "^3.4.2"
orjson = ">= 3.9.0, < 4.0.0"
loguru = "^0.7.0"
tqdm = "^4.64.0"
matplotlib = "^3.5.3"
pydot = "^1.4.2"
ipython = "^8.0.1"
chardet = "^4.0.0"
ujson = "^5.1.0"
"ruamel.yaml" = "^0.17.9"
ujson = "^5.5.0"
"ruamel.yaml" = "^0.17.21"
colorama = ">=0.4.4"
graph-wrapper = "^1.4.0"
rdflib = "^6.1.1"
graph-wrapper = "^1.7.0"
rdflib = "^7.0.0"
serde2 = { version = "^1.7.0", extras = ["all"] }
rsoup = "^3.0.1"
Deprecated = "^1.2.13"
ray = { version = "^2.0.1", extras = ["default", "serve"], optional = true }
starlette = { version = "^0.45.3", optional = true }
timer4 = ">= 1.0.4, < 2.0.0"
typing-extensions = "^4.7.1"
transformers = "^4.44.2"
httpx = "^0.28.1"

[tool.poetry.group.dev.dependencies]
pytest = "^7.1.3"
pytest-cov = "^4.0.0"

[tool.poetry.dev-dependencies]
pytest = "^6.2.5"
[tool.poetry.extras]
all = ["ray", "starlette"]

[build-system]
requires = ["poetry-core>=1.0.0"]
Loading