Skip to content
/ infoshare Public

Run 'Export Direct' on StatsNZs Infoshare website programmatically.

Notifications You must be signed in to change notification settings

cmhh/infoshare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fetch saved queries from infoshare programmatically

This repository contains a small sbt project. Once compiled, it provides a simple command-line utility, written in Scala, which uses Selenium to automate the download of files from the Stats NZ Infoshare website. Chrome is used throughout. You will need to download and install Chrome and chromedriver yourself for this to work.

To create the program:

sbt assembly

This will create ./target/scala-2.13/infoshare.jar, which can then be run as follows:

java -jar target/scala-2.13/infoshare.jar test.sch blah.csv

Here test.sch is a file containing a list of series identifiers, and a test file is included as a reference. blah.csv is the output file. Note that what is returned by infoshare by default is not a valid CSV file–there are trailing spaces and source attribution, each row has a hanging comma, and the header also doesn't provide a name for the date column. This program removes these issues from the output before returning it.

There are actually two programs included with entrypoints org.cmhh.FetchSch and org.cmhh.FetchIds, and the example above is equivalent to:

java -cp target/scala-2.13/infoshare.jar org.cmhh.FetchSch test.sch blah.csv

If we just wanted to specify identifiers at the command-line, we could do that too:

java -cp target/scala-2.13/infoshare.jar org.cmhh.FetchIds HLFQ.SAA1AZ,HLFQ.SAA2AZ hlfsemp.csv

To simplify the process of installing compatible vesions of Chrome and Chromedriver, a simple Dockerfile is provided. To build the image (after running sbt assembly):

docker build -t infoshare .

One can run a container as a command-line program of sorts:

docker run -d --rm -v ${PWD}:/work infoshare \
  org.cmhh.FetchSch /work/test.sch /work/blah.csv

In this case the contents of the present working directory are mounted inside the container at /work, and so the output will be visible in ${PWD}/blah.csv after the container terminates. The container is run as root, so blah.csv will be owned by root. I'll revisit this at some point.

About

Run 'Export Direct' on StatsNZs Infoshare website programmatically.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published