This is a simple program to reformat certain spreadsheets of results downloaded by Caltech Library staff from caltech.tind.io when they are doing inventory.
- Introduction
- Installation
- Usage
- Known issues and limitations
- Getting help
- Contributing
- License
- Authors and history
- Acknowledgments
The librarians at Caltech Library periodically have to perform library inventory. In the process of doing that, they download spreadsheets of data from caltech.tind.io that have a format illustrated by the following example fragment:
574524,35047011136967,on shelf,,QA7 .A664 1991,
501345,350470002009169; 35047010046266,on shelf; on shelf,,QA7 .A67 1983,
381367,350470000767183; 350470000767192; 35047010794485,on shelf; on shelf; Limited circulation,,QA7 .S44,
The 2nd and 3rd lines above show examples where there are multiple results in a row, separated by semicolon (;
) characters. In these rows, columns 2 and 3 are parallel mappings of values, meaning that the barcode numbers in column 2 should be matched to the corresponding values in the semicolon-separated list of the 3rd column. The other column values apply to each of the individual values in the compound.
Split It! is a simple program that takes such as spreadsheet, splits the compound rows into separate rows, and produces a new spreadsheet with the collected results. For the fragment above, it would look like this:
574524,35047011136967,on shelf,,QA7 .A664 1991,
501345,350470002009169,on shelf,,QA7 .A67 1983,
501345,35047010046266,on shelf,,QA7 .A67 1983,
381367,350470000767183,on shelf,,QA7 .S44,
381367,350470000767192,on shelf,,QA7 .S44,
381367,35047010794485,Limited circulation,,QA7 .S44,
Binary installers for Windows can be downloaded from the project's releases page on GitHub. Alternatively, you can use Python pip
to install this from the repository using the following command:
sudo python3 -m pip install git+https://github.com/caltechlibrary/splitit.git --upgrade
As a final alternative, you can instead clone this GitHub repository and then run setup.py
manually. First, create a directory somewhere on your computer where you want to store the files, and cd to it from a terminal shell. Next, execute the following commands:
git clone https://github.com/caltechlibrary/splitit.git
cd splitit
sudo python3 -m pip install . --upgrade
Split It! is a command line program. It can be used from a terminal shell. On all systems, the installation should place a new program on your shell's search path called splitit
(or splitit.exe
on Windows), so that you can start Split It! with a simple terminal command:
splitit
Split It! accepts various command-line arguments. To get information about the available options, use the -h
argument (or /h
on Windows):
splitit -h
In normal operation, Split It! does not need to be started with any options; it will use GUI file dialogs to prompt for a file to open as input and a destination where to write the output. Alternatively, these files can be provided on the command line via two options: an -i
option (/i
on Windows) to indicate the input CSV file, and an -o
option (/o
on Windows) to indicate the file where it should write the output in CSV format. Here's an example to illustrate this:
splitit -i input.csv -o output.csv
If one or the other are not supplied, Split It! will resort to using GUI file dialogs, unless the option -G
(/G
on Windows) is used to indicate that no GUI should be used.
Split It! currently assumes that the input spreadsheet has a format consisting of 5 columns, with the 2nd and 3rd columns being the ones that contain semicolon-separated values. It does not verify that the input spreadsheet has this format; it simply proceeds on that assumption. If the input spreadsheet does not conform to this format, the results are unpredictable.
If you find an issue, please submit it in the GitHub issue tracker for this repository.
We would be happy to receive your help and participation with enhancing Split It! Please visit the guidelines for contributing for some tips on getting started.
Software produced by the Caltech Library is Copyright (C) 2019, Caltech. This software is freely distributed under a BSD/MIT type license. Please see the LICENSE file for more information.
Mike Hucka designed and implemented Split It! based on requests from Laurel Narizny in mid-2019.
This work was funded by the California Institute of Technology Library.
The vector artwork of a "page break" icon used as a starting point for the logo for this repository was created by Garrett Knoll for the Noun Project. It is licensed under the Creative Commons Attribution 3.0 Unported license. The vector graphics was modified by Mike Hucka to change the color.
Split It! makes use of numerous open-source packages, without which it would have been effectively impossible to develop Split It!. I want to acknowledge this debt. In alphabetical order, the packages are:
- colorama – makes ANSI escape character sequences work under MS Windows terminals
- ipdb – the IPython debugger
- plac – a command line argument parser
- setuptools – library for
setup.py
- termcolor – ANSI color formatting for output in terminal
- PyInstaller – a packaging program that creates standalone applications from Python programs for Windows, macOS, Linux and other platforms