Skip to content

Commit

Permalink
Minor updates.
Browse files Browse the repository at this point in the history
  • Loading branch information
gdiaz384 committed Mar 7, 2024
1 parent 7a402ad commit f913db9
Show file tree
Hide file tree
Showing 3 changed files with 33 additions and 73 deletions.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,16 +314,17 @@ Variable name | Description | Examples
- DeepL is picky about the target English dialect based upon the source language.
- But yet, language dictionaries can be used with any dialect of that language (TODO: double-check this).
- DeepL's [API Free](//support.deepl.com/hc/en-us/articles/360021200939-DeepL-API-Free) vs Pro plans.
- The formal vs informal feature is only available for Pro users, so not available for the deepl-api-free or deepl-web translation engines. [About-the-formal-informal-feature](https://support.deepl.com/hc/en-us/articles/4406432463762-About-the-formal-informal-feature).
- The formal vs informal feature is only available for Pro users, so not available for the deepl-api-free or deepl-web translation engines. [About-the-formal-informal-feature](//support.deepl.com/hc/en-us/articles/4406432463762-About-the-formal-informal-feature).
- If translating to Japanese, not from, then read DeepL's [plain vs polite feature](//support.deepl.com/hc/en-us/articles/6306700061852-About-the-plain-polite-feature-in-Japanese).

### Regarding XLSX

- XLSX (XML... TODO: This part.) is the native format used in py3TranslateLLM to store data internally during processing and should be the most convenient way to edit translated entries directly without any.
- Here are some free and open source software ([FOSS](//en.wikipedia.org/wiki/Free_and_open-source_software)) office suits that can read and write the spreadsheet formats (.csv, .xlsx, .xls, .ods):
- [Open Office XML](//en.wikipedia.org/wiki/Office_Open_XML), .xlsx, is the native format used in py3TranslateLLM to store data internally during processing and should be the most convenient way to edit translated entries and the cache directly without any unnecessary conversions that could introduce formatting bugs.
- Here are some free and open source software ([FOSS](//en.wikipedia.org/wiki/Free_and_open-source_software)) office suits that can read and write Open Office XML and the other spreadsheet formats (.csv, .xlsx, .xls, .ods):
- Apache [OpenOffice](//www.openoffice.org). [License](//www.openoffice.org/license.html) and [source](//openoffice.apache.org/downloads.html). Note: Can read but not write to .xlsx.
- [LibreOffice](//www.libreoffice.org). [License](//www.libreoffice.org/about-us/licenses) and [source](//www.libreoffice.org/download/download-libreoffice/).
- [OnlyOffice](//www.onlyoffice.com/download-desktop.aspx) is [AGPL v3](//github.com/ONLYOFFICE/DesktopEditors/blob/master/LICENSE). [Source](//github.com/ONLYOFFICE/DesktopEditors).
- [OpenPyXL](//openpyxl.readthedocs.io), the library used in the core data structure, follows the OOXML standard closely, and [will not load](//openpyxl.readthedocs.io/en/stable/tutorial.html#errors-loading-workbooks) documents that do not follow the standard closely. In other words, Microsoft Office will probably not work.

### Text Encoding and py3TranslateLLM:

Expand Down Expand Up @@ -388,7 +389,7 @@ Variable name | Description | Examples

- Reccomended: If you do not want to deal with this, then use a binary file in the [releases](//github.com/gdiaz384/py3TranslateLLM/releases) page instead.
- py3TranslateLLM was developed on Python 3.7.6.
- deepl-python is going to start requiring Python 3.8+ because ???.
- deepl-python is going to start requiring Python 3.8+ in 2024 because ???.
- It is not necessarily clear what versions work with what other versions, in part due to the shenanigans of some developers creating deliberate incompatibilities, so just install whatever and hope it works.

Library name | Required, Reccomended, or Optional | Description | Install command | Version used to develop py3TranslateLLM
Expand All @@ -408,6 +409,7 @@ Libraries can also require other libraries.

- deepl-python requires: `requests`, `charset-normalizer`, `idna`, `urllib3`, `certifi`.
- odfpy requires: `defusedxml`.
- openpyxl has `defusedxml` as an optional library.
- py3TranslateLLM and the libraries above also use libraries from the Python standard library. For an enumeration of those, check the source code.

### Guide: Installing and managing Python library versions with `pip`:
Expand Down
77 changes: 16 additions & 61 deletions py3TranslateLLM.py
Original file line number Diff line number Diff line change
Expand Up @@ -110,8 +110,8 @@
commandLineParser.add_argument('-fe', '--fileToTranslateEncoding', help='The encoding of the input file. Default='+str(defaultTextEncoding), default=None, type=str)
commandLineParser.add_argument('-o', '--outputFile', help='The file to insert translations into, including path. Default is same as input file.', default=None, type=str)
commandLineParser.add_argument('-ofe', '--outputFileEncoding', help='The encoding of the output file. Default is same as input file.', default=None, type=str)
commandLineParser.add_argument('-pfile', '--parsingSettingsFile', help='This file defines how to parse raw text and .ks files. It is required for text and .ks files. If not specified, a template will be created.', default=None, type=str)
commandLineParser.add_argument('-pfe', '--parsingSettingsFileEncoding', help='Specify encoding for parsing definitions file, default='+str(defaultTextEncoding), default=None, type=str)
#commandLineParser.add_argument('-pfile', '--parsingSettingsFile', help='This file defines how to parse raw text and .ks files. It is required for text and .ks files. If not specified, a template will be created.', default=None, type=str)
#commandLineParser.add_argument('-pfe', '--parsingSettingsFileEncoding', help='Specify encoding for parsing definitions file, default='+str(defaultTextEncoding), default=None, type=str)
commandLineParser.add_argument('-p', '--promptFile', help='This file has the prompt for the LLM.', default=None, type=str)
commandLineParser.add_argument('-pe', '--promptFileEncoding', help='Specify encoding for prompt file, default='+str(defaultTextEncoding), default=None, type=str)

Expand All @@ -137,7 +137,7 @@
commandLineParser.add_argument('-rc', '--readOnlyCache', help='Opens the cache file in read-only mode and disables updates to it. This dramatically decreases the memory used by the cache file. Default=Read and write to the cache file.', action='store_true')

commandLineParser.add_argument('-hl', '--contextHistoryLength', help='The number of previous translations that should be sent to the translation engine to provide context for the current translation. Sane values are 2-10. Set to 0 to disable. Not all translation engines support context. Default='+str(defaultContextHistoryLength), default=None, type=int)
commandLineParser.add_argument('-lbl', '--lineByLineMode', help='Store and translate lines one at a time. Disables grouping lines by delimitor and paragraph style translations.', action='store_true')
#commandLineParser.add_argument('-lbl', '--lineByLineMode', help='Store and translate lines one at a time. Disables grouping lines by delimitor and paragraph style translations.', action='store_true')
commandLineParser.add_argument('-r', '--resume', help='Attempt to resume previously interupted operation. No gurantees.', action='store_true')

commandLineParser.add_argument('-a', '--address', help='Specify the protocol and IP for NMT/LLM server, Example: http://192.168.0.100', default=None,type=str)
Expand All @@ -159,7 +159,7 @@

translationEngine=commandLineArguments.translationEngine
fileToTranslate=commandLineArguments.fileToTranslate
parsingSettingsFile=commandLineArguments.parsingSettingsFile
#parsingSettingsFile=commandLineArguments.parsingSettingsFile
outputFile=commandLineArguments.outputFile
promptFile=commandLineArguments.promptFile

Expand All @@ -180,7 +180,7 @@
readOnlyCache=commandLineArguments.readOnlyCache

contextHistoryLength=commandLineArguments.contextHistoryLength
lineByLineMode=commandLineArguments.lineByLineMode
#lineByLineMode=commandLineArguments.lineByLineMode
resume=commandLineArguments.resume

address=commandLineArguments.address #Must be reachable. How to test for that?
Expand Down Expand Up @@ -329,7 +329,7 @@

#The variable names needed to be exact for the local()[x] dictionary shenanigans to work, but now that they have been updated, rename them to be more descriptive. The variable names for encoding options will be fixed later.
fileToTranslateFileName=fileToTranslate
parseSettingsFileName=parsingSettingsFile
#parseSettingsFileName=parsingSettingsFile
outputFileName=outputFile
promptFileName=promptFile

Expand Down Expand Up @@ -433,6 +433,8 @@
# Certain files must always exist, like fileToTranslateFileName, and usually languageCodesFileName.
# parseSettingsFileName is only needed if read or writing to text files. Reading from text files is easy to check.
# But how to check if writing to them? If output is .txt, .ks, .ts, then writing to text file. Output can also be based upon input. For output, parseOnly must not be specified, so this output text file check should not be checked with the parseOnly block. Alternatively: The only time parseSettingsFileName is not needed is when writting to output files.
# Updated py3TranslateLLM to only accept spreadsheet inputs in order to divide the logic between parsing files and translating spreadsheets.
# Use py3AnyText2Spreadsheet to create spreadsheets from raw text files from now on.
#Errors out if myFile does not exist.
"""
#Syntax:
Expand All @@ -448,7 +450,6 @@ def checkIfThisFolderExists(myFolder):
"""



if languageCodesFileName == None:
languageCodesFileName = currentScriptPathOnly + '/' + defaultLanguageCodesFile

Expand Down Expand Up @@ -481,59 +482,20 @@ def checkIfThisFolderExists(myFolder):
fileToTranslateIsASpreadsheet=True
else:
fileToTranslateIsASpreadsheet=False
#fileToTranslateFileName does not need to have an extension. If it has no extension, then it is assumed to be a text file and will thus require a parsefile.
print( ('Error: Unrecognized extension for a spreadsheet: ' + str(fileToTranslateFileExtensionOnly)).encode(consoleEncoding) )
sys.exit(1)


# Either a raw.unparsed.txt must be specified or a raw.untranslated.csv if selecting one of the other engines.
# if using parseOnly, a valid file (raw.unparsed.txt and parseDefinitionsFile.txt) must exist.
if mode == 'parseOnly':
# Check if valid parsing definition file exists. Example: parseKirikiri.py
py3TranslateLLMfunctions.verifyThisFileExists(parseSettingsFileName,'parseSettingsFileName')
pass

if fileToTranslateIsASpreadsheet == True:
sys.exit( ('parseOnly is only valid with text files. It is not valid with spreadsheets: ' + str(fileToTranslateFileName)).encode(consoleEncoding) )
# Check if valid parsing definition file exists. Example: parseKirikiri.py
#py3TranslateLLMfunctions.verifyThisFileExists(parseSettingsFileName,'parseSettingsFileName')

# Edit: parseOnly mode has been updated to act purely as a proxy to py3Any2Spreadsheet.py, which returns a chocolate.Strawberry(), and the settings file that must be specified is currently the python script that is passed to py3Any2Spreadsheet.py. All notion of character encodings is handled within that file.
# In other words, py3Any2Spreadsheet must import sucessfully as a library in order to be used in this program.
try:
# This works if py3Any2Spreadsheet\py3Any2Spreadsheet.py exists.
import py3Any2Spreadsheet.py3Any2Spreadsheet
#print('pie0')
except ImportError:
try:
# This works if ..\py3Any2Spreadsheet\py3Any2Spreadsheet.py exists.
# https://peps.python.org/pep-0328/#guido-s-decision
#from ...py3Any2Spreadsheet.py3Any2Spreadsheet import py3Any2Spreadsheet #Does not work.
tempPath=str( pathlib.Path(currentScriptPathOnly).absolute() ) + '/../py3Any2Spreadsheet'
if debug == True:
print( 'absolute=' + tempPath )
if py3TranslateLLMfunctions.checkIfThisFolderExists(tempPath) == True:
#print('pie1')
sys.path.append(tempPath)
import py3Any2Spreadsheet
else:
raise ImportError
#print('pie2')
except ImportError:
try:
# This works if resources\py3Any2Spreadsheet\py3Any2Spreadsheet.py exists.
import resources.py3Any2Spreadsheet.py3Any2Spreadsheet as py3Any2Spreadsheet
#print('pie3')
except ImportError:
try:
# This is nearly last because it will import any folder named resources\py3Any2Spreadsheet as opposed to only resources\py3Any2Spreadsheet.py .
# this should also work to import resources\py3Any2Spreadsheet.py
import resources.py3Any2Spreadsheet as py3Any2Spreadsheet
#print('pie4')
except ImportError:
try:
# This is last because it will import any folder named py3Any2Spreadsheet as opposed to only py3Any2Spreadsheet.py .
import py3Any2Spreadsheet
#print('pie5')
except ImportError:
sys.exit( 'Error: parseOnly specified but parser library could not be imported.')

sys.exit(1)
# if fileToTranslateIsASpreadsheet == True:
# sys.exit( ('parseOnly is only valid with text files. It is not valid with spreadsheets: ' + str(fileToTranslateFileName)).encode(consoleEncoding) )

# if parseSettingsFileName != None:
# if os.path.isfile(parseSettingsFileName) != True:
Expand Down Expand Up @@ -614,11 +576,6 @@ def checkIfThisFolderExists(myFolder):
# Syntax: os.environ['CT2_VERBOSE'] = '1'


# if the input file is not a spreadsheet, then a parse file is required. If it is a spreadsheed, then it will not have a parse file.
if fileToTranslateIsASpreadsheet == False:
py3TranslateLLMfunctions.verifyThisFileExists(parseSettingsFileName,'parseSettingsFileName')


if outputFileName == None:
# If no outputFileName was specified, then set it the same as the input file. This will have the date and appropriate extension appended to it later.
outputFileName=fileToTranslateFileName
Expand Down Expand Up @@ -696,7 +653,7 @@ def checkIfThisFolderExists(myFolder):
#set rest of encodings using dealWithEncoding.ofThisFile(myFileName, rawCommandLineOption, fallbackEncoding):

#parsingSettingsFileEncoding=commandLineArguments.parsingSettingsFileEncoding
parseSettingsFileEncoding = dealWithEncoding.ofThisFile(parseSettingsFileName, parsingSettingsFileEncoding, defaultTextEncoding)
#parseSettingsFileEncoding = dealWithEncoding.ofThisFile(parseSettingsFileName, parsingSettingsFileEncoding, defaultTextEncoding)

promptFileEncoding = dealWithEncoding.ofThisFile(promptFileName, promptFileEncoding, defaultTextEncoding)

Expand Down Expand Up @@ -826,8 +783,6 @@ def checkIfThisFolderExists(myFolder):
print( ('postWritingToFileDictionary='+str(postWritingToFileDictionary)).encode(consoleEncoding) )




# Next turn the main inputFile into a data structure.
#then create data structure seperately from reading the file
# This returns a very special dictionary where the value in key=value is a special list and then add data row by row using the dictionary values #Edit, moved to chocolate.py so as to not have to do that. All spreadsheets that require a parseFile will therefore always be Strawberries from the chocolate library.
Expand Down
19 changes: 11 additions & 8 deletions resources/py3TranslateLLMfunctions.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,14 +25,14 @@


#These must be here or the library will crash even if these modules have already been imported by main program.
import os, os.path #Extract extension from filename, and test if file exists.
from pathlib import Path #Override file in file system with another and create subfolders.
import requests
import sys #End program on fail condition.
import io #Manipulate files (open/read/write/close).
import datetime #Used to get current date and time.
import openpyxl #Used as the core internal data structure and to read/write xlsx files. Must be installed using pip.
import csv #Read and write to csv files. Example: Read in 'resources/languageCodes.csv'
import os, os.path # Extract extension from filename, and test if file exists.
#import pathlib # For pathlib.Path Override file in file system with another and create subfolders. Sane path handling.
import requests # Check if internet exists.
import sys # End program on fail condition.
import io # Manipulate files (open/read/write/close).
import datetime # Used to get current date and time.
import csv # Read and write to csv files. Example: Read in 'resources/languageCodes.csv'
import openpyxl # Used as the core internal data structure and to read/write xlsx files. Must be installed using pip.
try:
import odfpy #Provides interoperability for Open Document Spreadsheet (.ods).
odfpyLibraryIsAvailable=True
Expand Down Expand Up @@ -343,6 +343,9 @@ def importDictionaryFromCSV(myFile, myFileEncoding,ignoreWhitespace=False):

def importDictionaryFromXLSX(myFile, myFileEncoding):
print('Hello World'.encode(consoleEncoding))
workbook = openpyxl.load_workbook(filename=myFile) #, data_only=)
spreadsheet=workbook.active


def importDictionaryFromXLS(myFile, myFileEncoding):
print('Hello World'.encode(consoleEncoding))
Expand Down

0 comments on commit f913db9

Please sign in to comment.