Skip to content

Commit

Permalink
Merge pull request #11 from letuananh/dev-0.1.1
Browse files Browse the repository at this point in the history
texttaglib 0.1.1 stable release
  • Loading branch information
letuananh authored May 13, 2021
2 parents 3f01f81 + dee7b2d commit 7a5bce4
Show file tree
Hide file tree
Showing 22 changed files with 168 additions and 49 deletions.
2 changes: 0 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
.env_python3/
chirptext
puchikarui

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
8 changes: 8 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[submodule "modules/chirptext"]
path = modules/chirptext
url = https://github.com/letuananh/chirptext
branch = chirptext-0.1
[submodule "modules/puchikarui"]
path = modules/puchikarui
url = https://github.com/letuananh/puchikarui
branch = puchikarui-0.1
2 changes: 2 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,5 @@ include README.md
include LEGACY.md
include requirements*.txt
recursive-include texttaglib/data *.sql
recursive-include texttaglib/chirptext/data/luke/swadesh *.txt
recursive-include texttaglib/chirptext/data/sino *.csv
21 changes: 19 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,14 @@
# texttaglib

a Python library for managing and annotating text corpuses in different formats (ELAN, TIG, TTL, et cetera).

[![ReadTheDocs Badge](https://readthedocs.org/projects/texttaglib/badge/?version=latest&style=plastic)](https://texttaglib.readthedocs.io/)
[![Total alerts](https://img.shields.io/lgtm/alerts/g/letuananh/texttaglib.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/letuananh/texttaglib/alerts/)
[![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/letuananh/texttaglib.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/letuananh/texttaglib/context:python)

## Warning

⚠️ THIS PROJECT IS ARCHIVED. ALL FUTURE DEVELOPMENT WILL BE ON [speach](https://pypi.org/project/speach/) LIBRARY ⚠️
⚠️ `texttaglib` package is now in maintenance mode for supporting legacy software only. All new development will be moved to [speach](https://pypi.org/project/speach/) library.

Migration from `texttaglib` to `speach` should be trivial

Expand All @@ -26,4 +28,19 @@ pip install texttaglib
pip install speach
```

- For more information, please visit: https://github.com/neocl/speach
For more information, please visit: https://github.com/neocl/speach

## Legacy

texttaglib legacy releases (>= 0.1.1, < 0.2) use embedded chirptext-0.1 and puchikarui-0.1 for supporting legacy APIs.

Here is the sample code:

```python
from texttaglib import ttl
from texttaglib.chirptext import chio
from texttaglib.chirptext import deko
from texttaglib.puchikarui import Schema
```

Legacy documentation: https://texttaglib.readthedocs.io
46 changes: 46 additions & 0 deletions dev_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
#!/bin/bash

bold=$(tput bold)
normal=$(tput sgr0)
WORKSPACE_FOLDER=~/workspace
py3=`python -c "import sys; print('1' if sys.version_info >= (3,0) else '0')"`

function link_folder {
FOLDER_PATH=$1
SYMLINK_NAME=$2
if [ ! -d ${FOLDER_PATH} ]; then
echo "WARNING: Target folder ${bold}${FOLDER_PATH}${normal} does not exist"
elif [ ! -d ${SYMLINK_NAME} ]; then
ln -sv ${FOLDER_PATH} ${SYMLINK_NAME}
else
echo "Folder ${bold}${SYMLINK_NAME}${normal} exists."
fi
}

function link_file {
TARGET_FILE=$1
SYMLINK_NAME=$2
if [ ! -f ${TARGET_FILE} ]; then
echo "WARNING: Target file ${bold}${TARGET_FILE}${normal} does not exist"
elif [ ! -f ${SYMLINK_NAME} ]; then
ln -sv ${TARGET_FILE} ${SYMLINK_NAME}
else
echo "File ${bold}${SYMLINK_NAME}${normal} exists."
fi
}

if [ ${py3} -eq 0 ]; then
echo "+-------------------------------+"
echo "| WARNING: Python 3 is required |"
echo "+-------------------------------+"
fi

# init submodules
git submodule init && git submodule update

# prerequisite packages
pip install -r requirements.txt -qq

link_folder `readlink -f ./modules/chirptext/chirptext` texttaglib/chirptext
link_folder `readlink -f ./modules/puchikarui/puchikarui` texttaglib/puchikarui

44 changes: 40 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,49 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to texttaglib's documentation!
======================================
texttaglib's documentation!
===========================

.. warning::
⚠️ THIS PROJECT HAS BEEN RENAMED AND ARCHIVED. ALL FUTURE DEVELOPMENT WILL BE ON
`speach <https://pypi.org/project/speach/>`__ LIBRARY ⚠️

Migration to speach
-------------------

Migration from ``texttaglib`` to `speach <https://speach.readthedocs.io/>`_ should be trivial


.. code:: python
# just change import statements from something like
from texttaglib import elan
# to the new package name
from speach import elan
Installation

.. code:: bash
# change
pip install texttaglib
# into
pip install speach
- For more information, please visit: https://speach.readthedocs.io/

(Legacy) Introduction
---------------------

texttaglib is a Python library for managing and annotating text corpuses in different formats.

.. image:: https://readthedocs.org/projects/texttaglib/badge/?version=latest&style=plastic
:target: https://texttaglib.readthedocs.io/
.. image:: https://img.shields.io/lgtm/alerts/g/letuananh/texttaglib.svg?logo=lgtm&logoWidth=18
:target: https://lgtm.com/projects/g/letuananh/texttaglib/alerts/
.. image:: https://img.shields.io/lgtm/grade/python/g/letuananh/texttaglib.svg?logo=lgtm&logoWidth=18
:target: https://lgtm.com/projects/g/letuananh/texttaglib/context:python

Main functions are:

- Multiple storage formats (text files, JSON files, SQLite databases)
Expand All @@ -24,8 +62,6 @@ texttaglib is availble on PyPI.
.. code:: bash
pip install texttaglib
# or more explicit
python3 -m pip install texttaglib
Basic usage
-----------
Expand Down
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
chirptext >= 0.1a18
puchikarui
Sphinx
1 change: 1 addition & 0 deletions modules/chirptext
Submodule chirptext added at 620fc7
1 change: 1 addition & 0 deletions modules/puchikarui
Submodule puchikarui added at eb35d2
2 changes: 0 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
chirptext >= 0.1a18
puchikarui
11 changes: 7 additions & 4 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ def read(*filenames, **kwargs):
"Bug Tracker": "https://github.com/letuananh/texttaglib/issues",
"Source Code": "https://github.com/letuananh/texttaglib/"
},
keywords="nlp",
keywords=["nlp", "annotation", "text", "corpus", "linguistics", "ELAN", "transcription"],
license=pkg_info['__license__'],
author=pkg_info['__author__'],
tests_require=requirements,
Expand All @@ -76,14 +76,17 @@ def read(*filenames, **kwargs):
long_description=long_description,
long_description_content_type='text/markdown',
packages=['texttaglib',
'texttaglib.data'],
package_data={'texttaglib': ['data/*.sql', 'data/*.gz']},
'texttaglib.data',
'texttaglib.chirptext',
'texttaglib.puchikarui'],
package_data={'texttaglib': ['data/*.sql', 'data/*.gz'],
'texttaglib.chirptext': ['data/luke/swadesh/*.txt', 'data/sino/*.csv']},
include_package_data=True,
platforms='any',
test_suite='test',
# Reference: https://pypi.python.org/pypi?%3Aaction=list_classifiers
classifiers=['Programming Language :: Python',
'Development Status :: 2 - Pre-Alpha',
'Development Status :: {}'.format(pkg_info['__status__']),
'License :: OSI Approved :: {}'.format(pkg_info['__license__']),
'Environment :: Plugins',
'Intended Audience :: Education',
Expand Down
2 changes: 1 addition & 1 deletion test/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
# Homepage: https://github.com/letuananh/texttaglib

import os
from chirptext.cli import setup_logging
from texttaglib.chirptext.cli import setup_logging

TEST_DIR = os.path.dirname(__file__)
TEST_DATA = os.path.join(TEST_DIR, 'data')
Expand Down
4 changes: 2 additions & 2 deletions test/test_ttlig.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,8 +41,8 @@
import logging
from collections import OrderedDict

from chirptext import chio
from chirptext import deko
from texttaglib.chirptext import chio
from texttaglib.chirptext import deko

from texttaglib import ttl
from texttaglib import ttlig
Expand Down
2 changes: 1 addition & 1 deletion texttaglib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@
from .__version__ import __credits__, __license__, __description__, __url__
from .__version__ import __version_major__, __version_long__, __version__, __status__

from chirptext import texttaglib as ttl
from .chirptext import texttaglib as ttl
from texttaglib.sqlite import TTLSQLite


Expand Down
42 changes: 24 additions & 18 deletions texttaglib/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,10 @@

import os
import logging
from lxml import etree

from chirptext import TextReport, FileHelper
from chirptext import chio
from chirptext.cli import CLIApp, setup_logging
from .chirptext import TextReport, FileHelper
from .chirptext import chio
from .chirptext.cli import CLIApp, setup_logging

from texttaglib import ttl, TTLSQLite, ttlig, orgmode
from texttaglib.elan import parse_eaf_stream
Expand Down Expand Up @@ -185,21 +184,28 @@ def make_text(sent, delimiter=' '):

def make_html(cli, args):
''' Convert TTL to HTML '''
print("Reading document ...")
ttl_doc = ttl.Document.read_ttl(args.ttl)
output = TextReport(args.output)
doc_node = etree.Element('doc')
for sent in ttl_doc:
sent_node = etree.SubElement(doc_node, 'sent')
text_node = etree.XML(make_text(sent, delimiter=args.delimiter))
sent_node.append(text_node)
if sent.get_tag('translation'):
_LXML_AVAILABLE = False
try:
from lxml import etree
_LXML_AVAILABLE = True
except Exception:
print("lxml is required for this function")
if _LXML_AVAILABLE:
print("Reading document ...")
ttl_doc = ttl.Document.read_ttl(args.ttl)
output = TextReport(args.output)
doc_node = etree.Element('doc')
for sent in ttl_doc:
sent_node = etree.SubElement(doc_node, 'sent')
text_node = etree.XML(make_text(sent, delimiter=args.delimiter))
sent_node.append(text_node)
if sent.get_tag('translation'):
etree.SubElement(sent_node, 'br')
trans_node = etree.SubElement(sent_node, 'trans')
trans_node.text = sent.get_tag('translation').label
etree.SubElement(sent_node, 'br')
etree.SubElement(sent_node, 'br')
trans_node = etree.SubElement(sent_node, 'trans')
trans_node.text = sent.get_tag('translation').label
etree.SubElement(sent_node, 'br')
etree.SubElement(sent_node, 'br')
output.write(etree.tostring(doc_node, encoding='unicode', pretty_print=not args.compact))
output.write(etree.tostring(doc_node, encoding='unicode', pretty_print=not args.compact))


def sec_str(a_float):
Expand Down
6 changes: 3 additions & 3 deletions texttaglib/__version__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,10 @@
__copyright__ = "Copyright (c) 2018, texttaglib, Le Tuan Anh"
__credits__ = []
__license__ = "MIT License"
__description__ = "Python library for managing and annotating textual corpus using TextTagLib (TTL) format"
__description__ = "Python library for managing and annotating text corpuses in different formats (ELAN, TIG, TTL, et cetera)"
__url__ = "https://github.com/letuananh/texttaglib"
__maintainer__ = "Le Tuan Anh"
__version_major__ = "0.1"
__version_major__ = "0.1.1"
__version__ = "{}".format(__version_major__)
__version_long__ = "{}".format(__version_major__)
__status__ = "Prototype"
__status__ = "5 - Production/Stable"
1 change: 1 addition & 0 deletions texttaglib/chirptext
4 changes: 2 additions & 2 deletions texttaglib/elan.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,8 @@
from typing import List, Tuple
import xml.etree.ElementTree as ET

from chirptext import DataObject
from chirptext import chio
from .chirptext import DataObject
from .chirptext import chio

from .vtt import sec2ts, ts2sec

Expand Down
2 changes: 1 addition & 1 deletion texttaglib/orgmode.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
import os
import logging

from chirptext import chio
from .chirptext import chio


# ----------------------------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions texttaglib/puchikarui
6 changes: 3 additions & 3 deletions texttaglib/sqlite.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,9 @@

import logging

from chirptext import DataObject
from puchikarui import Schema, with_ctx
from texttaglib import ttl
from .puchikarui import Schema, with_ctx
from .chirptext import DataObject
from .chirptext import ttl
from .data import INIT_TTL_SQLITE


Expand Down
8 changes: 4 additions & 4 deletions texttaglib/ttlig.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@
from collections import OrderedDict
import warnings

from chirptext import DataObject, piter
from chirptext import chio
from chirptext.deko import is_kana, parse
from .chirptext import DataObject, piter
from .chirptext import chio
from .chirptext.deko import is_kana, parse
from .chirptext import ttl

from texttaglib import ttl


# ----------------------------------------------------------------------
Expand Down

0 comments on commit 7a5bce4

Please sign in to comment.