Skip to content

Commit

Permalink
Merge pull request #16 from AlessioNar/development
Browse files Browse the repository at this point in the history
Release 0.1.2
  • Loading branch information
AlessioNar authored Jan 30, 2025
2 parents e4a358a + f656fe4 commit fbdc27c
Show file tree
Hide file tree
Showing 10 changed files with 373 additions and 2,120 deletions.
9 changes: 8 additions & 1 deletion docs/source/client.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
Client
===============

This module contains the following submodules:
The tulit Client library currently supports only the Cellar endpoint.

Some draft implementation were provided for Normattiva, LegiLux, and the Veneto Region, but as machine-readable endpoint are not sufficiently supported, they are to be considered only as provisional solutions, until such endpoints can be provided.

.. toctree::
:maxdepth: 2
Expand All @@ -22,6 +24,11 @@ This module contains the following submodules:
:show-inheritance:

.. automodule:: tulit.client.legilux
:members:
:undoc-members:
:show-inheritance:

.. automodule:: tulit.client.veneto
:members:
:undoc-members:
:show-inheritance:
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
project = 'tulit'
author = 'AlessioNar'

release = '0.1.0'
version = '0.1.0'
release = '0.1.2'
version = '0.1.2'

# -- General configuration
sys.path.insert(0, os.path.abspath('../../tulit'))
Expand Down
47 changes: 32 additions & 15 deletions docs/source/getting_started.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Getting Started
===============

tulit is hosted on PyPi, and documentation is published on readthedocs.io.

Installation
------------

Expand Down Expand Up @@ -34,13 +36,23 @@ The `tulit` package provides a client to query and retrieve data from a variety

.. code-block:: python
from tulit.download.cellar import CellarDownloader
client = CellarDownloader()
downloader = CellarDownloader(download_dir='./tests/data/formex', log_dir='./tests/logs')
with open('./tests/metadata/query_results/query_results.json', 'r') as f:
results = json.loads(f.read())
documents = downloader.download(results, format='fmx4')
from tulit.client.cellar import CellarClient
client = CellarClient(download_dir='./database', log_dir='./logs')
format = 'fmx4'
if format == 'fmx4':
sparql_query = './tests/metadata/queries/formex_query.rq'
elif format == 'xhtml':
sparql_query = './tests/metadata/queries/html_query.rq'
else:
print('No valid format')
results = send_sparql_query(sparql_query_filepath=sparql_query, celex=celex)
documents = client.download(results, format=format)
print(documents)
Expand All @@ -63,13 +75,6 @@ The following code snippet shows how to use the `tulit` package to parse a legal
file_to_parse = 'tests/data/akn/eu/32014L0092.akn'
parser.parse(file_to_parse)
# The various attributes of the parser can be accessed as follows
print(parser.preface)
print(parser.citations)
print(parser.recitals)
print(parser.chapters)
print(parser.articles)
A similar approach can be used to parse a legal document in FORMEX and XHTML format:

.. code-block:: python
Expand All @@ -88,7 +93,19 @@ A similar approach can be used to parse a legal document in FORMEX and XHTML for
parser = HTMLParser()
parser.parse(html_file)
Alternatively, the `tulit` parsers can also be called via command line by providing input and output paths of the file to be parsed, for example:
After parsing the document, the various attributes of the parser can be accessed as follows:

.. code-block:: python
print(parser.preface)
print(parser.citations)
print(parser.recitals)
print(parser.chapters)
print(parser.articles)
`tulit` clients and parsers can also be called via command line by providing input and output paths of the file to be parsed, for example:

.. code-block:: bash
Expand Down
289 changes: 288 additions & 1 deletion poetry.lock

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
[project]
name = "tulit"
version = "0.1.0"
version = "0.1.2"
description = "TULIT - The Universal Legal Informatics Toolkit, is set of legal informatics utilities collected in a Python package that focuses on the retrieval of legal data and metadata from official sources in the EU, and their transformation in pythonic data structures"

[tool.poetry]
name = "tulit"
version = "0.1.0"
version = "0.1.2"
description = "TULIT - The Universal Legal Informatics Toolkit, is set of legal informatics utilities collected in a Python package that focuses on the retrieval of legal data and metadata from official sources in the EU, and their transformation in pythonic data structures"
authors = ["AlessioNar <alessio.nardin@gmail.com>"]
license = "EUPL 1.2"
Expand All @@ -28,6 +28,8 @@ pytest = "^8.3.3"
coverage = "^7.6.9"
pytest-cov = "^6.0.0"
genbadge = "^1.1.1"
pygments = "^2.19.1"
sphinx = "^8.1.3"


[build-system]
Expand Down
2 changes: 1 addition & 1 deletion tests/client/test_cellar.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
class TestCellarClient(unittest.TestCase):
def setUp(self):
self.maxDiff = None
self.downloader = CellarClient(download_dir='./tests/data', log_dir='./tests/logs')
self.downloader = CellarClient(download_dir='./tests/data', log_dir='./tests/logs', proxies=None)

def test_download_documents(self):

Expand Down

Large diffs are not rendered by default.

44 changes: 34 additions & 10 deletions tulit/client/cellar.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
import logging

import argparse
import requests
import json
from tulit.client.client import Client
from tulit.sparql import send_sparql_query

class CellarClient(Client):

def __init__(self, download_dir, log_dir):
super().__init__(download_dir, log_dir)
def __init__(self, download_dir, log_dir, proxies=None):
super().__init__(download_dir, log_dir, proxies)
self.endpoint = 'http://publications.europa.eu/resource/cellar/'


Expand Down Expand Up @@ -50,7 +51,10 @@ def fetch_content(self, url) -> requests.Response:
'Content-Type': "application/x-www-form-urlencoded",
'Host': "publications.europa.eu"
}
response = requests.request("GET", url, headers=headers)
if self.proxies is not None:
response = requests.request("GET", url, headers=headers, proxies=self.proxies)
else:
response = requests.request("GET", url, headers=headers)
response.raise_for_status()
return response
except requests.RequestException as e:
Expand Down Expand Up @@ -151,13 +155,33 @@ def download(self, results, format=None):
logging.error(f"Error processing range: {e}")

return document_paths


def main():
parser = argparse.ArgumentParser(description='Download a Cellar document to a folder')
parser.add_argument('--celex', type=str, default='32024R0903', help='CELEX identifier of the document')
parser.add_argument('--format', type=str, default='fmx4', help='Format of the document, either fmx4 or xhtml')
parser.add_argument('--dir', type=str, default='tests/data/formex', help='Path to the directory')

args = parser.parse_args()

client = CellarClient(download_dir=args.dir, log_dir='./tests/logs')

if args.format == 'fmx4':
sparql_query = './tests/metadata/queries/formex_query.rq'
elif args.format == 'xhtml':
sparql_query = './tests/metadata/queries/html_query.rq'
else:
print('No valid format')
return None

results = send_sparql_query(sparql_query_filepath=sparql_query, celex=args.celex)

documents = client.download(results, format=args.format)

print(documents)


# Example usage
if __name__ == "__main__":
downloader = CellarClient(download_dir='./tests/data/formex', log_dir='./tests/logs')
with open('./tests/metadata/query_results/ai_act.json', 'r') as f:
results = json.loads(f.read())
documents = downloader.download(results, format='fmx4')
print(documents)
main()

3 changes: 2 additions & 1 deletion tulit/client/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ class Client:
"""
A generic document downloader class.
"""
def __init__(self, download_dir, log_dir):
def __init__(self, download_dir, log_dir, proxies=None):
"""
Initializes the downloader with directories for downloads and logs.
Expand All @@ -21,6 +21,7 @@ def __init__(self, download_dir, log_dir):
"""
self.download_dir = download_dir
self.log_dir = log_dir
self.proxies = proxies
self._ensure_directories()

def _ensure_directories(self):
Expand Down
2 changes: 1 addition & 1 deletion tulit/client/legilux.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,5 @@ def download(self, eli):
return None

if __name__ == "__main__":
downloader = LegiluxDownloader(download_dir='./tests/data/legilux', log_dir='./tests/metadata/logs')
downloader = LegiluxClient(download_dir='./tests/data/legilux', log_dir='./tests/metadata/logs')
downloader.download(eli='http://data.legilux.public.lu/eli/etat/leg/loi/2006/07/31/n2/jo')

0 comments on commit fbdc27c

Please sign in to comment.