PI invoices are now exported as PDFs #129

QuanMPhm · 2024-12-31T04:51:05Z

Closes #84. This PR consists of the last commit.
This PR mostly involved changes to the pi-specific invoice class. More details in the commit message. Two Python packages were crucial for this task, Jinja and Selenium

@knikolla @larsks I am aware that running the invoice script with this change will make pandas print a bunch of FutureWarnings for each PDF printed. Should I address them after the current state of the PR is approved?

larsks

I think using selenium is overkill here. You can drive chrome's print-to-pdf functionality directly from the command line:

google-chrome --headless --print-to-pdf=output.pdf --no-pdf-header-footer document.html

This takes a single subprocess.run() call. You can control the page size via CSS in your source document. Note that in the US, "Letter" size (8.5x11 inches) is much more common than A4.

process_report/invoices/pi_specific_invoice.py

QuanMPhm · 2025-01-09T22:06:23Z

@larsks I have removed the selenium dependancy and followed your approach. However, know I have a question regarding the unit test...

@larsks @knikolla The unit test failed because it did not find the chromium binary. Should I change the github action file to install Chrome, or should I just mock the subprocess.run() call to just write a pdf file?

knikolla · 2025-01-17T15:07:13Z

@larsks @knikolla The unit test failed because it did not find the chromium binary. Should I change the github action file to install Chrome, or should I just mock the subprocess.run() call to just write a pdf file?

Mock os.path.exists and subprocess.run and verify that it was called with the parameters you expected it to be called with.

larsks · 2025-01-17T15:14:23Z

@larsks @knikolla The unit test failed because it did not find the chromium binary. Should I change the github action file to install Chrome, or should I just mock the subprocess.run() call to just write a pdf file?

Unit tests shouldn't have external dependencies, which means you should mock out subprocess.run.

It might be nice to have an integration test that calls out to chrome; this would ensure that our chrome command line is valid. If @knikolla agrees that's a good idea, that could be a future pull request.

knikolla · 2025-01-17T15:25:02Z

@larsks @knikolla The unit test failed because it did not find the chromium binary. Should I change the github action file to install Chrome, or should I just mock the subprocess.run() call to just write a pdf file?

Unit tests shouldn't have external dependencies, which means you should mock out subprocess.run.

It might be nice to have an integration test that calls out to chrome; this would ensure that our chrome command line is valid. If @knikolla agrees that's a good idea, that could be a future pull request.

Considering the rapid pace of chrome development and inability to rely on semantic versioning as they bump the major version number too often, I think it's worth having a test in the future that verifies that the command line arguments that we're passing are still accepted by the current version of chrome.

For the purposes of this PR, a unit test is enough.

The PI-specific dataframes will first be converted to HTML tables using Jinja templates, and then converted to PDFs using Chromium. Now, users of the script must provide a path to the Chromium/Chrome binary throught the env var `CHROME_BIN_PATH` A html template folder has been added, and the test cases for the PI-specific invoice will now both check whether the dataframe is formatted correctly and if the PDFs are correctly generated. The dockerfile has been to install chromium

QuanMPhm · 2025-01-18T04:32:39Z

I have created a new issue to create the integration test at #145. I have also implemented all feedback so far.

larsks · 2025-01-22T15:04:49Z

process_report/invoices/pi_specific_invoice.py

+            else:
+                return "$" + str(data)


You don't release need the else clause here, and prefer f-strings over string concatenation:

Suggested change

else:

return "$" + str(data)

return f"${data}"

But see my comment later on about whether or not we even need this function.

larsks · 2025-01-22T15:08:58Z

process_report/invoices/pi_specific_invoice.py

+        column_sums = list()
+        sum_columns_list = list()


In general, prefer [] over list() to initialize list variables:

Suggested change

column_sums = list()

sum_columns_list = list()

column_sums = []

sum_columns_list = []

larsks · 2025-01-22T15:10:14Z

process_report/invoices/pi_specific_invoice.py

+
+        def _create_pdf_invoice(temp_fd_name):
+            chrome_binary_location = os.environ.get(
+                "CHROME_BIN_PATH", "usr/bin/chromium"


Should this actually be /usr/bin/chromium?

Suggested change

"CHROME_BIN_PATH", "usr/bin/chromium"

"CHROME_BIN_PATH", "/usr/bin/chromium"

larsks · 2025-01-22T15:13:35Z

process_report/invoices/pi_specific_invoice.py

+                    "--no-sandbox",
+                    f"--print-to-pdf={invoice_pdf_path}",
+                    "--no-pdf-header-footer",
+                    "file://" + temp_fd_name,


Prefer an f-string (especially since you're using this syntax just a few lines earlier):

Suggested change

"file://" + temp_fd_name,

f"file://{temp_fd_name}",

larsks · 2025-01-22T15:17:34Z

process_report/invoices/pi_specific_invoice.py

        if not os.path.exists(
            self.name
        ):  # self.name is name of folder storing invoices
            os.mkdir(self.name)


You can simplify this using os.makedirs:

Suggested change

if not os.path.exists(

self.name

): # self.name is name of folder storing invoices

os.mkdir(self.name)

os.makedirs(self.name, exist_ok=True)

larsks · 2025-01-22T15:27:25Z

process_report/tests/unit/invoices/test_pi_specific_invoice.py

+        def add_dollar_sign(data):
+            if pandas.isna(data):
+                return data
+            else:
+                return "$" + str(data)


(See my earlier comment about this function.)

larsks · 2025-01-22T15:33:08Z

process_report/tests/unit/invoices/test_pi_specific_invoice.py

+            for answer_arg in answer_arglist:
+                self.assertTrue(answer_arg in chrome_arglist[0])


I would be more constrained in your check:

Suggested change

for answer_arg in answer_arglist:

self.assertTrue(answer_arg in chrome_arglist[0])

self.assertTrue(answer_arglist == chrome_arglist[0][:-1])

larsks · 2025-01-22T15:41:22Z

process_report/tests/unit/invoices/test_pi_specific_invoice.py

-        self.assertIn("ProjectC", pi_df["Project - Allocation"].tolist())
+        mock_filter_cols.return_value = test_invoice
+        mock_path_exists.return_value = True
+        output_dir = tempfile.TemporaryDirectory()


We don't need to create a temporary directory -- because we're mocking out subprocess.run, we never create any output.

You can do this instead:

pi_inv = test_utils.new_pi_specific_invoice( "/fakedir", invoice_month, data=test_invoice ) pi_inv.process() pi_inv.export() pi_pdf_1 = f"/fakedir/BU_PI1_{invoice_month}.pdf" pi_pdf_2 = f"/fakedir/HU_PI2_{invoice_month}.pdf"

larsks · 2025-01-22T15:43:45Z

process_report/tests/unit/invoices/test_pi_specific_invoice.py

-        pi_inv = test_utils.new_pi_specific_invoice(
-            output_dir.name, invoice_month=self.invoice_month, data=self.dataframe
+        if not group_name:
+            group_name = [None for _ in range(len(pi))]


You can write instead:

Suggested change

group_name = [None for _ in range(len(pi))]

group_name = [None] * len(pi)

Compare:

>>> pi=[1,2,3,4,5] >>> [None for _ in range(len(pi))] [None, None, None, None, None] >>> [None] * len(pi) [None, None, None, None, None]

larsks · 2025-01-22T15:46:58Z

process_report/invoices/pi_specific_invoice.py

+            )
+            if not os.path.exists(chrome_binary_location):
+                sys.exit(
+                    f"Chrome binary does not exist at {chrome_binary_location}. Make sure the env var CHROME_BIN_PATH is set correctly or that Google Chrome is installed"


The message directs the user to ensure that Google Chrome is installed, but the code defaults to Chromium (which could lead to the situation in which Google Chrome is installed but the user still receives this error message). It makes more sense to replace or with and:

Suggested change

f"Chrome binary does not exist at {chrome_binary_location}. Make sure the env var CHROME_BIN_PATH is set correctly or that Google Chrome is installed"

f"Chrome binary does not exist at {chrome_binary_location}. Make sure that Google Chrome is installed and the environment variable CHROME_BIN_PATH is set correctly."

QuanMPhm requested review from larsks, knikolla and hakasapl December 31, 2024 04:51

QuanMPhm force-pushed the 84/pi_pdf branch from ff8f0a7 to 172d6da Compare January 2, 2025 20:07

QuanMPhm marked this pull request as ready for review January 2, 2025 20:08

larsks requested changes Jan 2, 2025

View reviewed changes

larsks reviewed Jan 2, 2025

View reviewed changes

process_report/invoices/pi_specific_invoice.py Outdated Show resolved Hide resolved

QuanMPhm force-pushed the 84/pi_pdf branch from 172d6da to 6df2cee Compare January 9, 2025 21:58

QuanMPhm force-pushed the 84/pi_pdf branch from 6df2cee to 3e86dd8 Compare January 10, 2025 14:37

QuanMPhm mentioned this pull request Jan 17, 2025

Add an integration test for chromium when generating PDF invoices #145

Open

QuanMPhm force-pushed the 84/pi_pdf branch from 3e86dd8 to 8e9798c Compare January 18, 2025 04:31

QuanMPhm requested a review from larsks January 22, 2025 15:18

larsks reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PI invoices are now exported as PDFs #129

PI invoices are now exported as PDFs #129

QuanMPhm commented Dec 31, 2024 •

edited

Loading

larsks left a comment •

edited

Loading

QuanMPhm commented Jan 9, 2025 •

edited

Loading

knikolla commented Jan 17, 2025

larsks commented Jan 17, 2025

knikolla commented Jan 17, 2025

QuanMPhm commented Jan 18, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

larsks Jan 22, 2025

	"CHROME_BIN_PATH", "usr/bin/chromium"
	"CHROME_BIN_PATH", "/usr/bin/chromium"

		for answer_arg in answer_arglist:
		self.assertTrue(answer_arg in chrome_arglist[0])

	for answer_arg in answer_arglist:
	self.assertTrue(answer_arg in chrome_arglist[0])
	self.assertTrue(answer_arglist == chrome_arglist[0][:-1])

	group_name = [None for _ in range(len(pi))]
	group_name = [None] * len(pi)

	f"Chrome binary does not exist at {chrome_binary_location}. Make sure the env var CHROME_BIN_PATH is set correctly or that Google Chrome is installed"
	f"Chrome binary does not exist at {chrome_binary_location}. Make sure that Google Chrome is installed and the environment variable CHROME_BIN_PATH is set correctly."

PI invoices are now exported as PDFs #129

Are you sure you want to change the base?

PI invoices are now exported as PDFs #129

Conversation

QuanMPhm commented Dec 31, 2024 • edited Loading

larsks left a comment • edited Loading

Choose a reason for hiding this comment

QuanMPhm commented Jan 9, 2025 • edited Loading

knikolla commented Jan 17, 2025

larsks commented Jan 17, 2025

knikolla commented Jan 17, 2025

QuanMPhm commented Jan 18, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

QuanMPhm commented Dec 31, 2024 •

edited

Loading

larsks left a comment •

edited

Loading

QuanMPhm commented Jan 9, 2025 •

edited

Loading