Skip to content

Commit

Permalink
Merge commit '8f52578e85b27831ab8a68a6d86721ea3348a553' into develop
Browse files Browse the repository at this point in the history
* commit '8f52578e85b27831ab8a68a6d86721ea3348a553':
  Run black locally with nox (pdfminer#776)
  Install typing_extensions on Python 3.6 and 3.7 (pdfminer#775)
  Fix `TypeError` by Ignoring null characters in PSBaseParser (pdfminer#768)
  Fix `ValueError` with unencrypted metadata values (Fixes pdfminer#766). (pdfminer#774)
  Fix `TypeError` when getting default width of font (pdfminer#772)
  Deprecate usage of `if __name__ == "__main__"` in scripts that are not documented. Also deprecate usage of scripts that are only there for testing purposes. (pdfminer#756)
  Fix Sphinx warnings and error (pdfminer#760)
  Update CHANGELOG.md for pdfminer#755
  Remove upper version bounds (pdfminer#755)
  Ignore path constructors that do not begin with  m (pdfminer#749)
  Bump version 20220506 & fix small issue with types
  • Loading branch information
Beants committed Aug 5, 2022
2 parents 766b787 + 8f52578 commit 7c7568c
Show file tree
Hide file tree
Showing 27 changed files with 302 additions and 64 deletions.
24 changes: 7 additions & 17 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,15 @@
**Pull request**

Please remove this paragraph and replace it with a description of your PR.
Also include links to the issues that it fixes.
Please *remove* this paragraph and replace it with a description of your PR. Also include the issue that it fixes.

**How Has This Been Tested?**

Please repalce this paragraph with a description of how this PR has been
tested. Include the necessary instructions and files such that other can
reproduce it.
Please *remove* this paragraph with a description of how this PR has been tested.

**Checklist**

- [ ] I have formatted my code with [black](https://github.com/psf/black).
- [ ] I have added tests that prove my fix is effective or that my feature
works
- [ ] I have added docstrings to newly created methods and classes
- [ ] I have optimized the code at least one time after creating the initial
version
- [ ] I have updated the [README.md](../README.md) or verified that this
is not necessary
- [ ] I have updated the [readthedocs](../docs/source) documentation or
verified that this is not necessary
- [ ] I have added a concise human-readable description of the change to
[CHANGELOG.md](../CHANGELOG.md)
- [ ] I have read [CONTRIBUTING.md](../CONTRIBUTING.md).
- [ ] I have added a concise human-readable description of the change to [CHANGELOG.md](../CHANGELOG.md).
- [ ] I have tested that this fix is effective or that this feature works.
- [ ] I have added docstrings to newly created methods and classes.
- [ ] I have updated the [README.md](../README.md) and the [readthedocs](../docs/source) documentation. Or verified that this is not necessary.
55 changes: 55 additions & 0 deletions .github/workflows/actions.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,61 @@ env:
default-python: "3.10"

jobs:

check-code-formatting:
name: Check coding style
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Python ${{ env.default-python }}
uses: actions/setup-python@v2
with:
python-version: ${{ env.default-python }}
- name: Upgrade pip, Install nox
run: |
python -m pip install --upgrade pip
python -m pip install nox
- name: Check coding style
run: |
nox --error-on-missing-interpreters --non-interactive --session format
check-coding-style:
name: Check coding style
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Python ${{ env.default-python }}
uses: actions/setup-python@v2
with:
python-version: ${{ env.default-python }}
- name: Upgrade pip, Install nox
run: |
python -m pip install --upgrade pip
python -m pip install nox
- name: Check coding style
run: |
nox --error-on-missing-interpreters --non-interactive --session lint
check-static-types:
name: Check static types
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v2
- name: Set up Python ${{ env.default-python }}
uses: actions/setup-python@v2
with:
python-version: ${{ env.default-python }}
- name: Upgrade pip, Install nox
run: |
python -m pip install --upgrade pip
python -m pip install nox
- name: Check static types
run: |
nox --error-on-missing-interpreters --non-interactive --session types
tests:
name: Run tests
runs-on: ${{ matrix.os }}
Expand Down
39 changes: 31 additions & 8 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,29 @@
All notable changes in pdfminer.six will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [20220805]

### Fixed

- `ValueError` when trying to decrypt empty metadata values ([#766](https://github.com/pdfminer/pdfminer.six/issues/766))
- Sphinx errors during building of documentation ([#760](https://github.com/pdfminer/pdfminer.six/pull/760))
- `TypeError` when getting default width of font ([#720](https://github.com/pdfminer/pdfminer.six/issues/720))
- Install typing-extensions on Python 3.6 and 3.7 ([#775](https://github.com/pdfminer/pdfminer.six/pull/775))
- `TypeError` in cmapdb.py when parsing null characters ([#768](https://github.com/pdfminer/pdfminer.six/pull/768))
- `ValueError` when trying to convert `str` to `int` in
### Deprecated

- Usage of `if __name__ == "__main__"` where it was only intended for testing purposes ([#756](https://github.com/pdfminer/pdfminer.six/pull/756))

## [20220524]

### Fixed

- Ignoring (invalid) path constructors that do not begin with `m` ([#749](https://github.com/pdfminer/pdfminer.six/pull/749))

### Changed

- Removed upper version bounds ([#755](https://github.com/pdfminer/pdfminer.six/pull/755))

## [20220426]

Expand Down Expand Up @@ -225,13 +248,13 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

- Group text lines if they are centered ([#384](https://github.com/pdfminer/pdfminer.six/pull/384))

## [20200124] - 2020-01-24
## [20200124]

### Security

- Removed samples/issue-00152-embedded-pdf.pdf because it contains a possible security thread; a javascript enabled object ([#364](https://github.com/pdfminer/pdfminer.six/pull/364))

## [20200121] - 2020-01-21
## [20200121]

### Fixed

Expand All @@ -244,23 +267,23 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

- The command-line utility latin2ascii.py ([#360](https://github.com/pdfminer/pdfminer.six/pull/360))

## [20200104] - 2019-01-04
## [20200104]

## Removed
### Removed

- Support for Python 2 ([#346](https://github.com/pdfminer/pdfminer.six/pull/346))

### Changed

- Enforce pep8 coding style by adding flake8 to CI ([#345](https://github.com/pdfminer/pdfminer.six/pull/345))

## [20191110] - 2019-11-10
## [20191110]

### Fixed

- Wrong order of text box grouping introduced by PR #315 ([#335](https://github.com/pdfminer/pdfminer.six/pull/335))

## [20191107] - 2019-11-07
## [20191107]

### Deprecated

Expand Down Expand Up @@ -290,7 +313,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

- Files for external applications such as django, cgi and pyinstaller ([#320](https://github.com/pdfminer/pdfminer.six/pull/320))

## [20191020] - 2019-10-20
## [20191020]

### Deprecated

Expand All @@ -316,7 +339,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

- All dependencies are managed in `setup.py` ([#306](https://github.com/pdfminer/pdfminer.six/pull/306) and [#219](https://github.com/pdfminer/pdfminer.six/pull/219))

## [20181108] - 2018-11-08
## [20181108]

### Changed

Expand Down
19 changes: 9 additions & 10 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,20 +26,25 @@ Any contribution is appreciated! You might want to:

## Guideline for creating pull request

* A pull request should close an existing issue.
* Pull requests should be merged to master. Version tags are used indicate the releases.
* A pull request should close an existing issue. For example, use "Fix #123" to indicate that your PR fixes issue 123.
* Pull requests should be merged to master.
* Include unit tests when possible. In case of bugs, this will help to prevent the same mistake in the future. In case
of features, this will show that your code works correctly.
* Code should work for Python 3.6+.
* Code should be formatted with [black](https://github.com/psf/black).
* Test your code by using nox (see below).
* New features should be well documented using docstrings.
* Check if the [README.md](../README.md) or [readthedocs](../docs/source) documentation needs to be updated.
* Check spelling and grammar.
* Don't forget to update the [CHANGELOG.md](CHANGELOG.md#[Unreleased])
* Don't forget to update the [CHANGELOG.md](CHANGELOG.md#[Unreleased]).

## Guidelines for posting comments

* [Be cordial and positive](https://www.kennethreitz.org/essays/be-cordial-or-be-on-your-way)

## Guidelines for publishing

* Publishing is automated. Add a YYYYMMDD version tag and GitHub workflows will do the rest.

## Getting started

1. Clone the repository
Expand Down Expand Up @@ -68,9 +73,3 @@ Any contribution is appreciated! You might want to:
```sh
nox -e py36
```

4. After changing the code, run the black formatter.

```sh
black .
```
11 changes: 7 additions & 4 deletions docs/source/howto/acro_forms.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _acro_forms:

How to extract AcroForm interactive form fields from a PDF using PDFMiner
********************************
*************************************************************************

Before you start, make sure you have :ref:`installed pdfminer.six<install>`.

Expand Down Expand Up @@ -78,14 +78,16 @@ How it works:
doc = PDFDocument(parser)
- Get the catalog
(the catalog contains references to other objects defining the document structure, see section 7.7.2 of PDF 32000-1:2008 specs: https://www.adobe.com/devnet/pdf/pdf_reference.html)

(the catalog contains references to other objects defining the document structure, see section 7.7.2 of PDF 32000-1:2008 specs: https://www.adobe.com/devnet/pdf/pdf_reference.html)

.. code-block:: python
res = resolve1(doc.catalog)
- Check if the catalog contains the AcroForm key and raise ValueError if not
(the PDF does not contain Acroform type of interactive forms if this key is missing in the catalog, see section 12.7.2 of PDF 32000-1:2008 specs)

(the PDF does not contain Acroform type of interactive forms if this key is missing in the catalog, see section 12.7.2 of PDF 32000-1:2008 specs)

.. code-block:: python
Expand Down Expand Up @@ -119,7 +121,8 @@ How it works:
values = resolve1(value)
- Call the value(s) decoding method as needed
(a single field can hold multiple values, for example a combo box can hold more than one value at time)

(a single field can hold multiple values, for example a combo box can hold more than one value at time)

.. code-block:: python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/reference/commandline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ pdf2txt.py

.. argparse::
:module: tools.pdf2txt
:func: maketheparser
:func: create_parser
:prog: python tools/pdf2txt.py

.. _api_dumppdf:
Expand Down
4 changes: 2 additions & 2 deletions docs/source/reference/highlevel.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,10 +21,10 @@ extract_text_to_fp
.. autofunction:: extract_text_to_fp


.. _api_extract_pages:

extract_pages
=============

.. currentmodule:: pdfminer.high_level
.. autofunction:: extract_pages

.. _api_extract_pages:
21 changes: 19 additions & 2 deletions noxfile.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,37 @@
import os

import nox


PYTHON_ALL_VERSIONS = ["3.6", "3.7", "3.8", "3.9", "3.10"]
PYTHON_MODULES = ["pdfminer", "tools", "tests", "noxfile.py", "setup.py"]


@nox.session
def format(session):
session.install("black")
# Format files locally with black, but only check in cicd
if "CI" in os.environ:
session.run("black", "--check", *PYTHON_MODULES)
else:
session.run("black", *PYTHON_MODULES)


@nox.session
def lint(session):
session.install("flake8")
session.run("flake8", "pdfminer/", "tools/", "tests/", "--count", "--statistics")
session.run("flake8", *PYTHON_MODULES, "--count", "--statistics")


@nox.session
def types(session):
session.install("mypy")
session.run(
"mypy", "--install-types", "--non-interactive", "--show-error-codes", "."
"mypy",
"--install-types",
"--non-interactive",
"--show-error-codes",
*PYTHON_MODULES,
)


Expand Down
9 changes: 9 additions & 0 deletions pdfminer/cmapdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -481,6 +481,15 @@ def _warn_once(self, msg: str) -> None:


def main(argv: List[str]) -> None:
from warnings import warn

warn(
"The function main() from cmapdb.py will be removed in 2023. It was probably "
"introduced for testing purposes a long time ago, and no longer relevant. "
"Feel free to create a GitHub issue if you disagree.",
DeprecationWarning,
)

args = argv[1:]
for fname in args:
fp = open(fname, "rb")
Expand Down
11 changes: 10 additions & 1 deletion pdfminer/converter.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,16 @@ def paint_path(
"""Paint paths described in section 4.4 of the PDF reference manual"""
shape = "".join(x[0] for x in path)

if shape.count("m") > 1:
if shape[:1] != "m":
# Per PDF Reference Section 4.4.1, "path construction operators may
# be invoked in any sequence, but the first one invoked must be m
# or re to begin a new subpath." Since pdfminer.six already
# converts all `re` (rectangle) operators to their equivelent
# `mlllh` representation, paths ingested by `.paint_path(...)` that
# do not begin with the `m` operator are invalid.
pass

elif shape.count("m") > 1:
# recurse if there are multiple m's in this shape
for m in re.finditer(r"m[^m]+", shape):
subpath = path[m.start(0): m.end(0)]
Expand Down
Loading

0 comments on commit 7c7568c

Please sign in to comment.