Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: command line parse xml #197

Merged
merged 14 commits into from
Aug 24, 2018
Merged

Conversation

kratsg
Copy link
Contributor

@kratsg kratsg commented Aug 23, 2018

Description

This allows someone to use pyhf from command-line to parse the XML workspaces using the readxml.parse functionality. This uses click to set up options/arguments.

This also updates readxml with some slight reorganization to allow for a tqdm progress bar that can be enabled (default: disabled) for reading in the channels in various XML files.

pyhf_xml2json --entrypoint-xml validation/multibin_multibjets/config/NormalMeasurement.xml --workspace validation/multibin_multibjets/ --output_file test.json

screenshot 2018-08-23 10 45 23

NB: both click and tqdm are already installed via our dependencies, but I've explicitly listed them in setup.py now.

Checklist Before Requesting Approver

  • Tests are passing
  • "WIP" removed from the title of the pull request

@kratsg kratsg added the feat/enhancement New feature or request label Aug 23, 2018
@coveralls
Copy link

coveralls commented Aug 23, 2018

Coverage Status

Coverage decreased (-0.0001%) to 96.878% when pulling 29bca28 on feature/commandLineParseXML into 4b04b01 on master.


@click.command()
@click.option('--entrypoint-xml', required=True, prompt='Top-level XML', help='The top-level XML file for the workspace definition.', type=click.Path(exists=True))
@click.option('--workspace', required=True, prompt='Workspace directory', help='The location of workspace.', type=click.Path(exists=True))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think workspace is a bit of a misnomer here. The workspace is the result of parsing the xml and root files via hist2workspace. Should we use the same name as the parse function, i.e. --rootdir

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe top-level directory?

import pyhf

# see test_import.py for the same (detailed) test
def test_import_prepHistFactory(tmpdir, script_runner):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this fixture defined?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comes from pytest-console-scripts (#198) which adds the fixture (see readme)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah ok click has some built in testing capabilities from click.testing. Could use that and avoid the dependency unless pytest-console-scripts adds some nice features (haven't used it)

example usage
https://github.com/yadage/yadage/blob/master/tests/test_maincli.py#L5

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CliRunner doesn't isolate stdout/stderr. It's probably only specific to running click-enabled commands. The pytest-console-scripts is much more generic (runs any script). I would use CliRunner if I spent more time figuring out stderr extraction.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok yes, testing stdout/stderr separately is important, especially if we want to do e.g. > bla.json where we must ensure that the stdout is json deserializable. let's go with pytest-consolte-scripts then.

import json
from . import readxml

@click.command()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about a toplevel pyhf command and xml2json could be one of the subcommands

> pyhf --help
> pyhf xml2json --help
@click.group():
@click.option(...) #some global opts
def pyhf():
    pass

@pyhf.command()
@click.option('--entrypoint-xml')
...
def xml2json(...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one global option could be the logging verbosity

@click.option('verbosity', default='INFO')
def pyhf(verbosity):
    logging.basicConfig(level=getattr(logging, verbosity), format=LOGFORMAT)

btw I kinda like using log formats with fixed width sections like, thoughts?

LOGFORMAT = '%(asctime)s | %(name)20.20s | %(levelname)6s | %(message)s'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that too. I'm fine with doing that as well.

setup.py Outdated
@@ -52,6 +55,7 @@
]
},
entry_points = {
'console_scripts': ['pyhf_xml2json=pyhf.commandline:xml2json']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

possibly renamed after comment above

@kratsg
Copy link
Contributor Author

kratsg commented Aug 24, 2018

This now spawns three lines of progress bars when running from command line. The first one is overall progress (how many channels left to process), the next line is the number of samples for the given channel, and the last (third) line is the modifiers for the given sample for the given channel.

screenshot 2018-08-23 18 08 41

pyhf/readxml.py Outdated
@@ -11,6 +11,7 @@ def import_root_histogram(rootdir, filename, path, name):
#import pdb; pdb.set_trace()
#assert path == ''
# strip leading slashes as uproot doesn't use "/" for top-level
if path is None: path = ''
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed to handle situations where HistoPath wasn't included -- and in these cases, it's equivalent to ''. This code does need to be fixed up more to normalize the XMLs better...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

path = path or ''

is more pythonic?

@pyhf.command()
@click.option('--entrypoint-xml', required=True, prompt='Top-level XML', help='The top-level XML file for the PDF definition.', type=click.Path(exists=True))
@click.option('--basedir', required=True, prompt='Base directory', help='The base directory for the XML files to point relative to.', type=click.Path(exists=True))
@click.option('--output-file', required=True, prompt='Output file', help='The location of the output json file. If not specified, prints to screen.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about making the entrypoint-xml be a click.argument there is really not way to convert without input, so pyhf xml2json input.xml seems to be a good cmd line

--basedir could default to os.getcwd()

also, maybe it's somewhat more unixy to print to stdout if the output file is not provided?

pyhf xml2json input.xml > test.json ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pyhf xml2json input.xml > test.json ?

The good part is tqdm is part of stderr so we can definitely do that.

@click.option('--entrypoint-xml', required=True, prompt='Top-level XML', help='The top-level XML file for the PDF definition.', type=click.Path(exists=True))
@click.option('--basedir', required=True, prompt='Base directory', help='The base directory for the XML files to point relative to.', type=click.Path(exists=True))
@click.option('--output-file', required=True, prompt='Output file', help='The location of the output json file. If not specified, prints to screen.')
@click.option('--tqdm/--no-tqdm', default=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: --track/--no-track or --track-progress/--no-track-progress

pyhf/readxml.py Outdated
@@ -26,7 +28,7 @@ def import_root_histogram(rootdir, filename, path, name):

raise KeyError('Both {0:s} and {1:s} were tried and not found in {2:s}'.format(name, os.path.join(path, name), os.path.join(rootdir, filename)))

def process_sample(sample,rootdir,inputfile, histopath, channelname):
def process_sample(sample,rootdir,inputfile, histopath, channelname, enable_tqdm=False):
if 'InputFile' in sample.attrib:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in the other cases, I'd also suggest renaming to track_progress instead of enable_tqdm

@lukasheinrich
Copy link
Contributor

looks good to me. we have a cli! ✨

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants