Skip to content

Commit

Permalink
Fixed Encoding Problems with harmonized-company-names.json
Browse files Browse the repository at this point in the history
Improved README.MD
  • Loading branch information
ptmrio committed Apr 18, 2023
1 parent 7767f89 commit ebeb244
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 19 deletions.
37 changes: 21 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,17 +9,22 @@ Installation
To use AIAutoRename, you'll need Python 3.6 or later. You can download it from the [official Python website](https://www.python.org/downloads/) or the Microsoft Store.

1. Clone or download this repository and navigate to the root directory of the project in your terminal.

2. Install the required packages using the `requirements.txt` file:


```
pip install -r requirements.txt
```
```
git clone https://github.com/ptmrio/AIAutoRename.git
cd AIAutoRename
```
2. Install the required python packages using the `requirements.txt` file:
```
pip install -r requirements.txt
```
3. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/) for Windows by following the installation instructions on their GitHub page. During the installation process, ensure that the "Add tesseract to PATH" option is checked. This will automatically add Tesseract to your PATH environment variable.
3. Install [Tesseract OCR](https://github.com/UB-Mannheim/tesseract/) for Windows by following the installation instructions on their GitHub page. After installation, add the folder of the installed Tesseract directory (typicalls `C:\Program Files\Tesseract-OCR`) to your PATH environment variable.
4. Download and install [poppler for Windows](https://github.com/oschwartz10612/poppler-windows). After installation, add the `bin` folder of the installed poppler directory to your PATH environment variable. Here's a [guide](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) on how to add directories to the PATH variable on Windows 10.
4. Download and extract [poppler for Windows](https://github.com/oschwartz10612/poppler-windows). After installation, add the `bin` folder (e.g. `C:\poppler\Library\bin`) of the installed poppler directory to your PATH environment variable.
Here's a [guide](https://www.architectryan.com/2018/03/17/add-to-the-path-on-windows-10/) on how to add directories to the PATH variable on Windows 10.
Configuration
Expand All @@ -33,7 +38,7 @@ OPENAI_MODEL=gpt-3.5-turbo
MY_COMPANY_NAME=<your-company-name>
```
Replace `<your-api-key>` with your OpenAI API key, which can be obtained from the [OpenAI website](https://platform.openai.com/docs/developer-quickstart/your-api-keys). Set `<your-company-name>` to your company's name. This information will help the OpenAI API to better understand the context and decide whether to use the sender or recipient of the PDF document.
Replace `<your-api-key>` with your OpenAI API key, which can be obtained from the [OpenAI website](https://platform.openai.com/account/api-keys). Set `<your-company-name>` to your company's name. This information will help the OpenAI API to better understand the context and decide whether to use the sender or recipient of the PDF document.
Usage
-----
Expand All @@ -43,10 +48,10 @@ Usage
To rename a single PDF file, run the following command in your terminal (cmd on Windows, terminal on Mac):
```
python autorename.py path/to/invoice.pdf
python autorename.py "C:\Users\username\Downloads\invoice123.pdf"
```
Replace `path/to/invoice.pdf` with the path to your PDF file.
Replace `C:\Users\username\Downloads\invoice123.pdf` with the path to your PDF file.
**Example:**
Expand All @@ -57,21 +62,21 @@ Suppose your PDF file is named `invoice123.pdf` and is located in the `invoices`
To rename all PDF files in a folder and its subfolders, run the following command in your terminal:
```
python autorename.py path/to/folder
python autorename.py "C:\Users\username\Downloads"
```
Replace `path/to/folder` with the path to your folder (no trailing slash).
Replace `C:\Users\username\Downloads` with the path to your folder (no trailing slash).
**Example:**
Suppose you have a folder named `invoices` on your desktop containing multiple PDF files. After running AIAutoRename on the folder, all PDF files within the folder and its subfolders will be renamed according to their content, such as document date, company name, and document type. For example, a file originally named `invoice123.pdf` might be renamed to `20220215 MegaCorp PO.pdf`, where `20220215` is the document date, `MegaCorp` is the company name, and `PO` is the document type (purchase order).
Suppose you downloaded a batch of documents into your `Downloads` folder. After running AIAutoRename on the folder, all PDF files within the folder will be renamed according to their content, such as document date, company name, and document type. For example, a file originally named `invoice123.pdf` might be renamed to `20220215 MegaCorp PO.pdf`, where `20220215` is the document date, `MegaCorp` is the company name, and `PO` is the document type (purchase order).
Contributing
------------
We welcome contributions from everyone! If you find a bug or have a feature request, please open an issue on our [GitHub repository](https://github.com/example/AIAutoRename). If you'd like to contribute code, please open a pull request with your changes. We appreciate your support in making AIAutoRename even better!
We welcome contributions from everyone! If you find a bug or have a feature request, please open an issue on our [GitHub repository](https://github.com/ptmrio/AIAutoRename). If you'd like to contribute code, please open a pull request with your changes. We appreciate your support in making AIAutoRename even better!
Support
-------
If you encounter any issues or need assistance using AIAutoRename, please don't hesitate to reach out by opening an issue on our [GitHub repository](https://github.com/example/AIAutoRename). We'll do our best to help you as soon as possible.
If you encounter any issues or need assistance using AIAutoRename, please don't hesitate to reach out by opening an issue on our [GitHub repository](https://github.com/ptmrio/AIAutoRename). We'll do our best to help you as soon as possible.
5 changes: 2 additions & 3 deletions autorename.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ def get_openai_response(text):
"Example incoming invoice: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"ER\"} " +
"Example outgoing invoice: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"AR\"} " +
"Example document: {\"company_name\": \"ACME\", \"document_date\": \"01.01.2021\", \"document_type\": \"Angebot\"}"
"If date is unavailable: {\"company_name\": \"ACME\", \"document_date\": \"00.00.0000\", \"document_type\": \"Angebot\"}"
"Example if date is unavailable: {\"company_name\": \"ACME\", \"document_date\": \"00.00.0000\", \"document_type\": \"Angebot\"}"
},
{"role": "user", "content": f"Extract the \"company_name\", \"document_date\", \"document_type\" from this PDF document and return a JSON object:\n\n{text}"},
]
Expand Down Expand Up @@ -105,7 +105,7 @@ def harmonize_company_name(company_name):
f'harmonized-company-names.json not found, using original name: {company_name}')
return company_name

with open("harmonized-company-names.json", "r") as file:
with open("harmonized-company-names.json", "r", encoding='utf-8') as file:
harmonized_names = json.load(file)

best_match = company_name
Expand Down Expand Up @@ -152,7 +152,6 @@ def parse_openai_response(response):
return company_name, document_date, document_type



def rename_invoice(pdf_path, company_name, document_date, document_type):
if document_date is not None:
base_name = f'{document_date.strftime("%Y%m%d")} {company_name} {document_type}'
Expand Down

0 comments on commit ebeb244

Please sign in to comment.