Big commit of text. Need to add more to main README and make sure I d…

…id not miss anything big
gchure · Oct 26, 2018 · 54987b4 · 54987b4
1 parent 457a75c
commit 54987b4
Show file tree

Hide file tree

Showing 22 changed files with 216 additions and 3 deletions.
diff --git a/LICENSE b/LICENSE
@@ -1,6 +1,9 @@
+This is an example MIT license file. This may not be the license that is best for your 
+work, so use with caution!!
+
 MIT License
 
-Copyright (c) 2018 Griffin Chure
+Copyright (c) YEAR AUTHOR
 
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -2,9 +2,9 @@
 ## Overview
 This repository sets out the skeleton of an organizational structure used for scientific research. It loosely follows what I have used for several of my research projects and I hope it inspires you to conduct your research in an open, reproducible, and honest manner.
 
-The layout of this repository is described below.
-
 ## Layout
+a
+The repository is split into 
 
 ### Directories
 1. **`code`**:  Where all of the *executed* code lives. This includes pipelines, scripts, and figure files. 

diff --git a/code/README.md b/code/README.md
@@ -0,0 +1,7 @@
+# `code`
+
+It's important to keep the code that is executed separated from that which is called. The distinct difference here is that executed code should change from day-to-day or analysis to analysis. You're not necessarily making all measurements at the same time, or at the exact same concentration, or even on the same materials. As your experiments change, your code should also change to make sure you a transforming or interpreting the data correctly. 
+
+However, any code that is critical for an analysis, such as a function that computes some quantity from your data, should be written such that it is *modular* and is used the same way from day-to-day. This type of code should be referenced by your analysis or processing scripts such that if you change the way you perform something, it can be applied to all of your experiments simply by rerunning the processing scripts. You *don't* want to go back through each experiment and make that change by hand.
+
+This directory is broken into several subdirectories, each of which has a `README.md` file describing what should be there with some examples. 
diff --git a/code/analysis/README.md b/code/analysis/README.md
@@ -0,0 +1,7 @@
+## `analysis`
+
+Much as executed code should be separated from code that is called, the analysis of your data should be separate from the processing. In this folder,you should have a collection of scripts and/or notebooks that use preprocessed data to draw conclusions or calculate quantities that will be reported. 
+
+In many fields, processing of raw data is the bottle neck of research. Processing images, reading and cleaning large data sets, or running simulations can often take considerable amounts of time. While you may fiddle with different ways of analyzing your data (once you have an idea of how to answer your scientific question), you usually don't want to reprocess everything you've taken thus far.
+
+By following this structure, the creating, transformation, and interpretation of data are kept separate and clear to anyone trying to follow your thought process. 
diff --git a/code/exploratory/README.md b/code/exploratory/README.md
@@ -0,0 +1,5 @@
+## `exploratory`
+
+Almost nobody goes into a research project knowing exactly how to solve the problem. The scientific process often requires many attempts at solving a problem, processing a data set, performing a regression, or myriad other ways of thinking about the problem. As a result, one often generates many files of failed approaches before the "correct way" is found. Rather than having these files clutter your desktop or your `analysis` folder, you can put all of those files here. 
+
+Once you arrive at a working bit of code, you should save it as a reduced template file and save it in the `templates` directory in the project root. You can keep all of your other attempts here in case you have to go back or if someone wants to see how else you attempted to solve the problem.  
diff --git a/code/figures/README.md b/code/figures/README.md
@@ -0,0 +1,5 @@
+## `figures`
+
+All figures containing data should be stored here, preferably labeled with a descriptive file name that also points to the correct figure, e.g. `Fig1_power_spectrum.py`. 
+
+Figure scripts should not perform any data processing, data generation, or inference whenever possible. This maintains a separation between each layer of the scientific method in a clear and reproducible way. Keeping figure generation separate from other parts of the scientific pipeline allows for simple modification of figures, such as correcting axis labels, without needing to rerun the processing and analysis steps.
diff --git a/code/processing/README.md b/code/processing/README.md
@@ -0,0 +1,12 @@
+## `processing`
+
+This folder contains all code executed to transform or generate data. Within this directory, there should be subdirectories that contain summaries of that experiment, the code used to process and transform the data, and the output of any processing functions.
+
+Individual subdirectories should have names that can be easily read and
+understood by both human eyes and your computer. The subdirectory shown here
+as an example has the foldername `YEAR-MONTH-DATE_experiment_description`.
+Note that all spaces are replaced with underscores (`_`) which will make any
+computational iteration through the folders simpler. In general, spaces in
+file and folder names should be avoided at all costs. 
+
+Within each subdirectory, there exists another `README.md` file that gives a summary of that particular experimental run. See the example for more information.
diff --git a/code/processing/YEAR-MONTH-DATE_experiment_description/README.md b/code/processing/YEAR-MONTH-DATE_experiment_description/README.md
@@ -0,0 +1,46 @@
+---
+status: Accepted/Rejected/Questionable
+reason: Provide a few (< 3) sentence summary of the experiment and if there are any objective indicators that render the data set invalid or suspect.
+---
+
+# YEAR-MONTH-DATE Experiment 1
+
+## Purpose
+Include a one-sentence explanation of why this experiment was performed. This
+could be as simple as "This experiment serves as a biological replicate of a
+kinesin-microtubule gliding assay."
+
+## Materials
+Set up a short table that includes vital information about the materials used. For biological experiments, this can include organism used and any genotypic information. For chemistry experiments, this could include reagents, amounts, molecular weights, and CAS numbers.
+
+| **Organism** | **Parental Cross** | **Shorthand** | **Location** |
+| :--: | :--: |:--:| :--:|
+| *Drosophila melanogaster*| ♂*w*;;bcd/bcd x ♀ ;;+/+ | *bcd/+* | `Flat 4 vial 10`|
+
+## Notes and Observations
+* Did anything go awry? Did you notice a particular vial was mixed up or accidentally thrown away? Did you run at a different voltage than previous runs? Anything that occurs during your experiment that differs from the canonical protocol written down in the `miscellaneous/protcols` directory should be clearly and succinctly annotated here. 
+
+## Summary of Results
+Briefly summarize the results of the experiment and report data validation outcomes here (if possible). For example,  in order to verify that an experiment didn't have any pathological results you could generate a specific plot. You can then include that plot in this section along with some comments. For example,
+
+![](output/validation_plot.png)
+Highlighted distribution shows oversampling relative to other samples.
+
+
+## Experimental Protocol
+What protocol did you follow? Write it down here in simple, easy to follow, and detailed steps. This will likely be identical to many other experimental runs, especially if you repeat the experiment. However, it should be modified wherever needed to ensure it represents the actual procedure follow. For example,
+
+1. A strip of double-sticky tape was mounted on a glass slide.
+2. Sample (listed in materials above) was harvested from crossing vials at 09h28m by flushing with CO_2_ and swiping with a wet paintbrush.
+2. Embryos were dechorionated with 50% bleach for 20 - 30 seconds before being thoroughly washed with water. 
+3. A strip of apple juice agarose was cut and mounted on a stereo microscope. Between 5 and 10 embryos were picked and mounted down the centerline of the agarose.
+4. The embryos were transferred to the tape by gently pressing the tape-covered slide against the agarose pad. 
+5. Two more strips of tape were mounted on either side of the embryos to create a channel approximately 5mm wide and two layers deep.
+6. 20µL of halocarbon oil was transferred to the channel followed by a cover slip. 
+7. Embryos were mounted on an inverted microscope. Between 5 and 10 positions were marked and a time lapse was acquired with the following parameters.
+    * Channel: Brightfield
+    * Exposure: 200ms
+    * Autofocus settings: 10µm range, 1µm step, 10 slices
+    * Time interval: 30 sec
+    * Number of frame: 50
+8. After the time lapse had completed, embryos were thrown in biological waste and data was transferred to the backup server. 
diff --git a/...experiment_description/output/YEAR-MONTH-DATE_description_processed_experimental_data.csv b/...experiment_description/output/YEAR-MONTH-DATE_description_processed_experimental_data.csv
@@ -0,0 +1,2 @@
+date, quantity 1, quantity 2, quantity 3
+2012-03-04, 0.998, 1.005e4, 2
diff --git a/code/processing/YEAR-MONTH-DATE_experiment_description/output/validation_plot.png b/code/processing/YEAR-MONTH-DATE_experiment_description/output/validation_plot.png
diff --git a/code/processing/YEAR-MONTH-DATE_experiment_description/processing.py b/code/processing/YEAR-MONTH-DATE_experiment_description/processing.py
@@ -0,0 +1,13 @@
+from software_module import processing_function
+
+# Experimental parameters
+DATE = "2012-03-04"
+DESCRIPTION = 'description'
+
+# Define a relative path to the data -- don't hardcode a path. 
+data_location = '../../data/{}_{}'.format(DATE, DESCRIPTION)
+processed_data = processing_function(data_dir)
+
+# Save the processed data to the local output directory. 
+processed_data.save('output/{}_{}_processed_experimental_data.csv'.format(DATE, DESCRIPTION))
+
diff --git a/data/README.md b/data/README.md
@@ -0,0 +1,5 @@
+## `data`
+
+This directory houses all small (< 50 MB) data sets that are a result of individual experiments and/or simulations. Depending on the type of data collected, you may want to split them up based on file type.
+
+If possible, data sets from individual experiments should be compiled in a long-form tidy format. This is important not only for your analysis, but for others who wish to reproduce your work. While you may have an intimate knowledge of your data and experimental structure, it may not be obvious to anyone else. It is much easier if you can combine the individual data sets into as few files as possible so only one or two files have to be read to perform the analysis and generate the figures. 
diff --git a/doc/README.md b/doc/README.md
diff --git a/miscellaneous/README.md b/miscellaneous/README.md
@@ -0,0 +1,3 @@
+## `miscellaneous`
+
+While the other directories in this repository are focused on code and data, this folder can serve as laboratory notebook where you can write down various details about your project. I find it useful to break this up by section where I can easily share a direct link if someone asks for a protocol. 
diff --git a/miscellaneous/materials/README.md b/miscellaneous/materials/README.md
@@ -0,0 +1,12 @@
+# Materials
+
+Here, you can provide a detailed list of all experimental materials used. You should invest the time to be exhaustive and accurate so there is no question about which particular manufacturer or variant you used for your experiments. A short example is shown below. 
+
+
+## Chemical Supplies
+
+| **Chemical** | **Manufacturer** | **CAS Number** | **Catalog No.** | **Notes**|
+|:--:|:--:| :--:| :--:| :--:|
+| Xanthosine dihydrate | Sigma-Aldrich | `31319-70-7` | `X3003` | > 98% (HPLC) |
+| Cycloheximide Solution | Sigma-Aldrich | `66-81-9` | `46401` | Analytical Standard |
+
diff --git a/miscellaneous/protocols/README.md b/miscellaneous/protocols/README.md
@@ -0,0 +1,3 @@
+## `protocols`
+
+Each experimetnal protocol should have its own file and should be sufficiently detailed for someone to print out and follow at a lab bench. While it may not be identical for each and every experiment, it should serve as the gold standard that is followed.
diff --git a/miscellaneous/software_information/README.md b/miscellaneous/software_information/README.md
@@ -0,0 +1,21 @@
+##`software_information`
+
+Just as knowing what chemicals, instruments, or samples were used, knowing the computing environment used for the analysis is vital to reproducibility. While there are many ways to enumerate the various versions of the software used, you should list here the exact software used including which particular version and other dependencies that are necessary to reproduce your results. An example can be seen below. 
+
+
+### Computing Environment
+
+* **Compiler**: GCC 4.8.2 20140120 (Red Hat 4.8.2-15)
+* **System**: Linux
+* **Release**: 4.4.0-59-generic
+* **Machine**: x86_64
+* **Processor**: x86_64
+* **Cores**: 4
+* **Interpreter**: 64 Bit
+
+
+## Required Software
+
+* Python:  v3.7
+* `numpy`: 1.11.2
+* `scipy`: 0.18.1
diff --git a/software_module/README.md b/software_module/README.md
@@ -0,0 +1,3 @@
+##`software_module`
+
+All of your custom made functions, processing routines, and analysis methods should be given here either as a formal software module or, at a minimum, a single file with enumerated functions that can be directly called. I have not provided an example here as it is highly dependent on the programming language used. 
diff --git a/templates/README.md b/templates/README.md
@@ -0,0 +1,10 @@
+## `templates`
+While you may repeat an experiment many, many (many) times. **you must never
+directly copy one day's folder and simply rename the files!** This is a
+recipe for disaster and confusion as you may forget to change a detail that
+is critical to the functioning of the experiment. 
+
+In this directory, you can store templates of your experiment descriptions
+and processing scripts such that updating them for each experiment is simple.
+In the example `README.md` file, I've put parameters that need to be changed in
+bold capitalized text so it is easy to review.  While it may be tempting, **never copy one experimental folder to another, even if you know it's exactly identical.** It takes only a few minutes to read through a file and change a handful of parameters and can save you much misery in the future. 
diff --git a/templates/experiment_A_README.md b/templates/experiment_A_README.md
@@ -0,0 +1,40 @@
+---
+status: Rejected
+reason: Experiment not yet completed 
+---
+
+# YEAR-MONTH-DATE Experiment 1
+
+## Purpose
+
+## Materials
+
+| **Organism** | **Parental Cross** | **Shorthand** | **Location** |
+| :--: | :--: |:--:| :--:|
+| | | | |
+
+## Notes and Observations
+
+
+## Summary of Results
+
+
+![](output/)
+
+
+## Experimental Protocol
+
+1. A strip of double-sticky tape was mounted on a glass slide.
+2. Sample (listed in materials above) was harvested from crossing vials at **XX TIME HARVESTED** by flushing with CO_2_ and swiping with a wet paintbrush.
+2. Embryos were dechorionated with 50% bleach for 20 - 30 seconds before being thoroughly washed with water. 
+3. A strip of apple juice agarose was cut and mounted on a stereo microscope. Between 5 and 10 embryos were picked and mounted down the centerline of the agarose.
+4. The embryos were transferred to the tape by gently pressing the tape-covered slide against the agarose pad. 
+5. Two more strips of tape were mounted on either side of the embryos to create a channel approximately 5mm wide and two layers deep.
+6. 20µL of halocarbon oil was transferred to the channel followed by a cover slip. 
+7. Embryos were mounted on an inverted microscope. Between 5 and 10 positions were marked and a time lapse was acquired with the following parameters.
+    * Channel: Brightfield
+    * Exposure: **EXPOSURE**
+    * Autofocus settings: **AUTOFOCUS SETTINGS**
+    * Time interval: **TIME INTERVAL**
+    * Number of frame: **FRAME NUMBER**
+8. After the time lapse had completed, embryos were thrown in biological waste and data was transferred to the backup server. 
diff --git a/templates/experiment_A_processing.py b/templates/experiment_A_processing.py
@@ -0,0 +1,13 @@
+from software_module import processing_function
+
+# Experimental parameters
+DATE = ""
+DESCRIPTION = ''
+
+# Define a relative path to the data -- don't hardcode a path. 
+data_location = '../../data/{}_{}'.format(DATE, DESCRIPTION)
+processed_data = processing_function(data_dir)
+
+# Save the processed data to the local output directory. 
+processed_data.save('output/{}_{}_processed_experimental_data.csv'.format(DATE, DESCRIPTION))
+
diff --git a/tests/README.md b/tests/README.md
@@ -0,0 +1,3 @@
+## `tests`
+
+Any mission-critical code you have written in the `software_module` folder should be thoroughly tested to make sure you understand how it works and that it produces the result you would expect. How this is done varies between programming languages, but should be performed without exception.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		date, quantity 1, quantity 2, quantity 3
		2012-03-04, 0.998, 1.005e4, 2
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		## `miscellaneous`

		While the other directories in this repository are focused on code and data, this folder can serve as laboratory notebook where you can write down various details about your project. I find it useful to break this up by section where I can easily share a direct link if someone asks for a protocol.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		## `protocols`

		Each experimetnal protocol should have its own file and should be sufficiently detailed for someone to print out and follow at a lab bench. While it may not be identical for each and every experiment, it should serve as the gold standard that is followed.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		##`software_module`

		All of your custom made functions, processing routines, and analysis methods should be given here either as a formal software module or, at a minimum, a single file with enumerated functions that can be directly called. I have not provided an example here as it is highly dependent on the programming language used.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,3 @@
		## `tests`

		Any mission-critical code you have written in the `software_module` folder should be thoroughly tested to make sure you understand how it works and that it produces the result you would expect. How this is done varies between programming languages, but should be performed without exception.