Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not redirect stderr to stdout when executing commands #73

Merged

Conversation

mcdonnnj
Copy link
Contributor

Update the execute() helper function to no longer pipe stderr to stdout when executing commands.

If cloc has an issue processing a file it will output an error message to stderr, and with stderr and stdout combined this breaks the ability to read the JSON output on stdout. Since cloc is designed to process text files formatted for human consumption it will often error when processing minified JavaScript files of enough size, as the timeout for processing is based on the number of lines in a file. The error message(s) for such files then breaks the ability to get information about the rest of the project.

On the main branch:

2023-03-15 11:47:36,546 - INFO: Connected to: https://github.com
2023-03-15 11:47:36,977 - INFO: Processing: cisagov/cset
2023-03-15 11:49:44,728 - ERROR: Error Decoding: url=https://github.com/cisagov/cset.git, out=
3 errors:
Line count, exceeded timeout:  /tmp/tmph8eqj0h3/clone-dir/CSETWebApi/CSETWeb_Api/CSETWeb_ApiCore/Diagram/src/main/webapp/js/app.min.js
Line count, exceeded timeout:  /tmp/tmph8eqj0h3/clone-dir/CSETWebApi/CSETWeb_Api/CSETWeb_ApiCore/Diagram/src/main/webapp/math/extensions/a11y/mathjax-sre.js
Line count, exceeded timeout:  /tmp/tmph8eqj0h3/clone-dir/CSETWebNg/src/app/models/xmlCompletionItemProvider.model.ts
{"header" : {
  "cloc_url"           : "github.com/AlDanial/cloc",
  "cloc_version"       : "1.94",
<omitted>
"SUM": {
  "blank": 104009,
  "comment": 101199,
  "code": 1661696,
  "nFiles": 5391} }

2023-03-15 11:49:45,427 - INFO: SLOC: 0
2023-03-15 11:49:45,427 - INFO: labor_hours: 0
2023-03-15 11:49:46,045 - INFO: Number of Projects: 1
2023-03-15 11:49:46,047 - INFO: Writing output to: code.json

On this branch:

2023-03-15 12:01:56,303 - INFO: Connected to: https://github.com
2023-03-15 12:01:56,725 - INFO: Processing: cisagov/cset
2023-03-15 12:04:23,443 - WARNING: Error encountered while analyzing: url=https://github.com/cisagov/cset.git stderr=
3 errors:
Line count, exceeded timeout:  /tmp/tmpc_di791i/clone-dir/CSETWebApi/CSETWeb_Api/CSETWeb_ApiCore/Diagram/src/main/webapp/js/app.min.js
Line count, exceeded timeout:  /tmp/tmpc_di791i/clone-dir/CSETWebApi/CSETWeb_Api/CSETWeb_ApiCore/Diagram/src/main/webapp/math/extensions/a11y/mathjax-sre.js
Line count, exceeded timeout:  /tmp/tmpc_di791i/clone-dir/CSETWebNg/src/app/models/xmlCompletionItemProvider.model.ts

2023-03-15 12:04:24,025 - INFO: SLOC: 1661696
2023-03-15 12:04:24,025 - INFO: labor_hours: 1555362
2023-03-15 12:04:24,468 - INFO: Number of Projects: 1
2023-03-15 12:04:24,468 - INFO: Writing output to: code.json

If stderr is piped into stdout analyzing a repository will fail if any
file fails during cloc's processing. This is most problematic for
projects that contain minified JavaScript code. The design of cloc is
around typical multi-line source files. Large minified JavaScript will
run the risk of causing analysis to fail with a "Line count, exceeded
timeout" error. This then causes the entire cloc analysis to fail due
to the design of the execute() function. With this change to execute()
I added a warning if stderr is not empty following the cloc call.
@IanLee1521
Copy link
Member

This looks good to me, thanks!

One question, in the case where there is an error, what ends up in out ? Should there be any sort of "make sure the cloc value is something reasonable, 0 maybe?"

@mcdonnnj
Copy link
Contributor Author

This looks good to me, thanks!

One question, in the case where there is an error, what ends up in out ? Should there be any sort of "make sure the cloc value is something reasonable, 0 maybe?"

So if cloc crashed completely then it should have a non-0 return code which is handled by

scraper/scraper/util.py

Lines 29 to 34 in e19717f

if process.returncode:
logging.error(
"Error Executing: command=%s, returncode=%d",
" ".join(command),
process.returncode,
)

In the case of no output (no files found but cloc ran successfully) then it would get caught by

scraper/scraper/util.py

Lines 148 to 150 in e19717f

except json.decoder.JSONDecodeError:
logger.error("Error Decoding: url=%s, out=%s", url, out)
sloc = 0

Otherwise it should still produce output. As an example I created a temporary directory that only contains the <cset repository root>/CSETWebApi/CSETWeb_Api/CSETWeb_ApiCore/Diagram/src/main/webapp/js/app.min.js file from the PR description.

Directory contents:

$ ls /tmp/scraper_testing/
app.min.js

Running cloc as the scraper package would:

$ cloc --json /tmp/scraper_testing/

1 error:
Line count, exceeded timeout:  /tmp/scraper_testing/app.min.js
{"header" : {
  "cloc_url"           : "github.com/AlDanial/cloc",
  "cloc_version"       : "1.94",
  "elapsed_seconds"    : 12.0404329299927,
  "n_files"            : 1,
  "n_lines"            : 12136,
  "files_per_second"   : 0.0830534919977008,
  "lines_per_second"   : 1007.9371788841},
"JavaScript" :{
  "nFiles": 1,
  "blank": 1,
  "comment": 12135,
  "code": 0},
"SUM": {
  "blank": 1,
  "comment": 12135,
  "code": 0,
  "nFiles": 1} }

Saving stdout and stderr separately:

$ cloc --json /tmp/scraper_testing/ 2>testing.err 1>testing.out
$ cat testing.out
{"header" : {
  "cloc_url"           : "github.com/AlDanial/cloc",
  "cloc_version"       : "1.94",
  "elapsed_seconds"    : 12.0349020957947,
  "n_files"            : 1,
  "n_lines"            : 12136,
  "files_per_second"   : 0.0830916605752387,
  "lines_per_second"   : 1008.4003927411},
"JavaScript" :{
  "nFiles": 1,
  "blank": 1,
  "comment": 12135,
  "code": 0},
"SUM": {
  "blank": 1,
  "comment": 12135,
  "code": 0,
  "nFiles": 1} }
$ cat testing.err

1 error:
Line count, exceeded timeout:  /tmp/scraper_testing/app.min.js

Does that address your concerns?

@IanLee1521
Copy link
Member

Yup, thanks!

@IanLee1521 IanLee1521 merged commit 9e3fa03 into LLNL:main Mar 22, 2023
@mcdonnnj mcdonnnj deleted the improvement/do_not_combine_stdout_and_stderr branch March 22, 2023 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants