Skip to content

Commit

Permalink
Made some small edits
Browse files Browse the repository at this point in the history
  • Loading branch information
gchure committed Oct 26, 2018
1 parent 54987b4 commit b0c46a0
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 23 deletions.
52 changes: 31 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,46 @@
# Git + GitHub As A Platform For Reproducible Research

## Overview
This repository sets out the skeleton of an organizational structure used for scientific research. It loosely follows what I have used for several of my research projects and I hope it inspires you to conduct your research in an open, reproducible, and honest manner.

## Layout
a
The repository is split into

### Directories
1. **`code`**: Where all of the *executed* code lives. This includes pipelines, scripts, and figure files.
+ **`processing`**: Any code used to *transform* the data into another type should live here. This can include everything from parsing of text data, image segmentation/filtering, or simulations.
+ **`analysis`**: Any code to to *draw conclusions* from an experiment or data set. This may include regression, dimensionality reduction, or calculation of various quantities.
+ **`exploratory`**: A sandbox where you keep a record of your different approaches to transformation, interpretation, cleaning, or generation of data.
+ **`figures`**: Any code used to generate figures for your finished work, presentations, or for any other use.
2. **`data`**: All raw data collected from your experiments as well as copies of the transformed data from your processing code.
3. **`miscellaneous`**: Files that may not be code, but are important for reproducibility of your findings.
+ **`protocols`**: A well annotated and general description of your experiments. These protocols should be descriptive enough for someone to follow your experiments independently
+ **`materials`**: Information regarding the materials used in your experiments or data generation. This could include manufacturer information, records of purity, and/or lot and catalog numbers.
+ **`software details`**: Information about your computational environment that are necessary for others to execute your code. This includes details about your operating system, software version and required packages.
5. **`tests`**: All test suites for your code. *Any custom code you've written should be thoroughly and adequately tested to make sure you know how it is working.*
6. **`software_module`**: Custom code you've written that is *not* executed directly, but is called from files in the `code` directory. If you've written your code in Python, for example, this can be the root folder for your custom software module or simply house a file with all of your functions.
7. **`templates`**: Files that serve as blank templates that document the procedures taken for each experiment, simulation, or analysis routine.

The repository is split into seven main directories, many of which have subdirectories. This structure has been designed to be easily navigable by humans and computers alike, allowing for rapid location of specific files and instructions. Within each directory is a `README.md` file which summarizes the purpose of that directory as well as some examples where necessary. This structure may not be perfect for your intended us and may need to be modified. Each section is briefly described below.

### **`code`**
Where all of the *executed* code lives. This includes pipelines, scripts, and figure files.
* **`processing`**: Any code used to *transform* the data into another type should live here. This can include everything from parsing of text data, image segmentation/filtering, or simulations.
* **`analysis`**: Any code to to *draw conclusions* from an experiment or data set. This may include regression, dimensionality reduction, or calculation of various quantities.
* **`exploratory`**: A sandbox where you keep a record of your different approaches to transformation, interpretation, cleaning, or generation of data.
* **`figures`**: Any code used to generate figures for your finished work, presentations, or for any other use.

### **`data`**
All raw data collected from your experiments as well as copies of the transformed data from your processing code.

### **`miscellaneous`**
Files that may not be code, but are important for reproducibility of your findings.
* **`protocols`**: A well annotated and general description of your experiments. These protocols should be descriptive enough for someone to follow your experiments independently
* **`materials`**: Information regarding the materials used in your experiments or data generation. This could include manufacturer information, records of purity, and/or lot and catalog numbers.
* **`software details`**: Information about your computational environment that are necessary for others to execute your code. This includes details about your operating system, software version and required packages.

### **`tests`**
All test suites for your code. *Any custom code you've written should be thoroughly and adequately tested to make sure you know how it is working.*

### **`software_module`**
Custom code you've written that is *not* executed directly, but is called from files in the `code` directory. If you've written your code in Python, for example, this can be the root folder for your custom software module or simply house a file with all of your functions.

### **`templates`**
Files that serve as blank templates that document the procedures taken for each experiment, simulation, or analysis routine.

### Required Files
There are some files which I consider to be mandatory for any project.

1. **`LICENSE`**: A legal protection of your work. *It is important to think deeply about the licensing of your work, and is not a decision to be made lightly. See [this useful site](https://choosealicense.com/) for more information about licensing and choosing the correct license for your project.*

1. **`LICENSE`**: A legal protection of your work. *It is important to think deeply about the licensing of your work, and is not a decision to be made lightly. See [blah blah]() for more information about licensing and choosing the correct license for your project.*
2. **`README.md`**: A descriptive yet succinct description of your research project and information regarding the structure outlined below.

## A Pipeline for Reproducible Research
1.

# License
# License Information
<p xmlns:dct="http://purl.org/dc/terms/">
<a rel="license" href="http://creativecommons.org/publicdomain/mark/1.0/">
<img src="http://i.creativecommons.org/p/mark/1.0/88x31.png"
Expand Down
4 changes: 3 additions & 1 deletion data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@

This directory houses all small (< 50 MB) data sets that are a result of individual experiments and/or simulations. Depending on the type of data collected, you may want to split them up based on file type.

If possible, data sets from individual experiments should be compiled in a long-form tidy format. This is important not only for your analysis, but for others who wish to reproduce your work. While you may have an intimate knowledge of your data and experimental structure, it may not be obvious to anyone else. It is much easier if you can combine the individual data sets into as few files as possible so only one or two files have to be read to perform the analysis and generate the figures.
If possible, data sets from individual experiments should be compiled in a long-form tidy format. This is important not only for your analysis, but for others who wish to reproduce your work. While you may have an intimate knowledge of your data and experimental structure, it may not be obvious to anyone else. It is much easier if you can combine the individual data sets into as few files as possible so only one or two files have to be read to perform the analysis and generate the figures.

This is **not** a place to store all of your large (> 50 MB) data files, such as images. For accessibility of these large data sets, there are myriad online data repositories such as [Zenodo](https://zenodo.org) which provide free storage and DOI generation. In addition, you should have all of your data backed up locally with redundancy.
2 changes: 1 addition & 1 deletion software_module/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
##`software_module`
## `software_module`

All of your custom made functions, processing routines, and analysis methods should be given here either as a formal software module or, at a minimum, a single file with enumerated functions that can be directly called. I have not provided an example here as it is highly dependent on the programming language used.

0 comments on commit b0c46a0

Please sign in to comment.