revisions

IdahoAgStats · Jan 13, 2025 · 6e011b8 · 6e011b8
1 parent 6a51ee8
commit 6e011b8
Show file tree

Hide file tree

Showing 4 changed files with 149 additions and 62 deletions.
diff --git a/chapters/factorial-design.qmd b/chapters/factorial-design.qmd
@@ -32,7 +32,7 @@ library(dplyr); library(performance)
 ```
 :::
 
-Next, we will load the dataset named 'cochran.factorial' from the '**agridat**' package. This data comprises a yield response of beans to different levels of manure (d), nitrogen (n), phosphorus The goal of this analysis is the estimate the effect on d, n, p, k, and their interactions on bean yield.
+Next, we will load the dataset named 'cochran.factorial' from the '**agridat**' package. This data comprises a yield response of beans to different levels of manure (d), nitrogen (n), phosphorus. The goal of this analysis is the estimate the effect of d, n, p, k, and their interactions on bean yield.
 
 Note, while importing the data, d, n, p, and k were converted into factor variables using the `mutate()` function from dplyr package. This helps in reducing the extra steps of converting each single variable to factor manually.
 
@@ -62,7 +62,7 @@ The objective of this example is evaluate the individual and interactive effect
 
 ### Data Integrity Checks
 
-Verify the class of variables, where rep, block, d, n, p, and k are supposed to be a factor/character and yield should be numeric/integer.
+First step is to Verify the class of variables, where rep, block, d, n, p, and k are supposed to be a factor/character and yield should be numeric/integer.
 
 ```{r}
 str(data1)
@@ -91,8 +91,7 @@ hist(data1$yield, main = "", xlab = "yield", cex.lab = 1.8, cex.axis = 1.5)
 ```{r, eval=FALSE}
 hist(data1$yield)
 ```
-
-The range is roughly falling into the expected range. I didn't observe any extreme observations (too high/low), indicating no issues with data. don't see 
+No extreme (low or high) yield values were observed in data. 
 
 ### Model fitting
 
@@ -125,7 +124,7 @@ tidy(model2_lme)
 
 :::: column-margin
 ::: callout-note
-Instead of `summary()` function, we used `tidy()` function from 'broom.mixed' package to get a short summary output of the model.
+Instead of `summary()` function, we used `tidy()` function from the 'broom.mixed' package to get a short summary output of the model.
 :::
 ::::
 
@@ -163,7 +162,7 @@ anova(model2_lme, type = "marginal")
 ```
 :::
 
-Let’s find estimates for some of the factors such as n, p, and n:k interaction. We will try the random intercept model first.
+Let’s find estimates for some of the factors such as n, p, and n:k interaction effect. This will help us look at the combined effect of n & k on bean yield.
 
 ::: panel-tabset
 ### lme4
@@ -181,6 +180,6 @@ emmeans(model2_lme, specs = ~ n:k)
 ```
 :::
 
-2.  Unbalanced factorial design
+In summary, while working with factorial designs make sure to carefully interpret ANOVA and estimated marginal means for main and interaction effects.
 
 
diff --git a/chapters/split-plot-design.qmd b/chapters/split-plot-design.qmd
@@ -4,7 +4,7 @@
 source(here::here("settings.r"))
 ```
 
-Split-plot design is frequently used for factorial experiments. Such design may incorporate one or more of the completely randomized (CRD), completely randomized block (RCBD), and Latin square designs. The main principle is that there are whole plots or whole units, to which the levels of one or more factors are applied. Thus each whole plot becomes a block for the subplot treatments.
+Split-plot design is frequently used for factorial experiments. Such design may incorporate one or more of the completely randomized (CRD), completely randomized block (RCBD). The main principle is that there are whole plots or whole units, to which the levels of one or more factors are applied. Thus each whole plot becomes a block for the subplot treatments.
 
 ## Details for Split Plot Designs
 
@@ -126,7 +126,9 @@ The levels of whole plots and subplots are balanced.
 str(height_data)
 ```
 
-The 'time', 'manage', and 'rep' are character and variable height is numeric. The structure of the data is in format as needed. - Check the number of missing values in each column.
+The 'time', 'manage', and 'rep' are character and variable height is numeric. The structure of the data is in format as needed. 
+
+-   Check the number of missing values in each column.
 
 ```{r}
 apply(height_data, 2, function(x) sum(is.na(x)))
@@ -139,6 +141,21 @@ ggplot(data = height_data, aes(y = height, x = time)) +
   geom_boxplot(aes(fill = manage), alpha = 0.6)
 ```
 
+Last, check the dependent variable by plotting a histogram of height data. 
+```{r, echo=FALSE}
+#| label: fig-split_hist
+#| fig-cap: "Histogram of the dependent variable."
+#| column: margin
+par(mar=c(5.1, 5, 2.1, 2.1))
+hist(height_data$height, main = "", xlab = "yield", cex.lab = 1.8, cex.axis = 1.5)
+```
+
+```{r, eval=FALSE}
+hist(height_data$height, main = "", xlab = "yield")
+```
+
+The distribution of height data looks close to normal.
+
 #### Model building
 
 ::: column-margin
@@ -250,7 +267,7 @@ pairs(m2)
 ::: callout-note
 ## `pairs()`
 
-The default p-value adjustment in `pairs()` function is "tukey", other options include “holm”, “hochberg”, “BH”, “BY”, and “none”. In addition, it's okay to use this function when independent variable has few factors (2-4). For variable with multiple levels, it's better to use custom contrasts. For more information on custom contrasts **please check this link**.
+The default p-value adjustment in `pairs()` function is "tukey", other options include “holm”, “hochberg”, “BH”, “BY”, and “none”. In addition, it's okay to use this function when independent variable has few factors (2-4). For variable with multiple levels, it's better to use custom contrasts. For more information on custom contrasts please visit [**Chapter 12**](means-and-contrasts.qmd). 
 :::
 
 ### Example model for RCBD Split Plot Designs
@@ -290,6 +307,27 @@ Next, run the table() command to verify the levels of main-plots and sub-plots.
 table(oats$V, oats$N)
 ```
 
+-   Check the number of missing values in each column.
+
+```{r}
+apply(oats, 2, function(x) sum(is.na(x)))
+```
+
+Last, check the dependent variable by plotting a histogram of yield data. 
+```{r, echo=FALSE}
+#| label: fig-split-rcbd_hist
+#| fig-cap: "Histogram of the dependent variable."
+#| column: margin
+par(mar=c(5.1, 5, 2.1, 2.1))
+hist(oats$Y, main = "", xlab = "yield", cex.lab = 1.8, cex.axis = 1.5)
+```
+
+```{r, eval=FALSE}
+hist(oats$Y, main = "", xlab = "yield")
+```
+
+
+
 #### Model Building the Model
 
 We are evaluating the effect of V, N and their interaction on yield. The `1|B/V` implies that random intercepts vary with block and V within each block.
@@ -377,4 +415,4 @@ emm1
 ```
 :::
 
-In the next chapter we will continue with extension of split plot design called split-split plot design.
+In the next chapter, we will continue with extension of split plot design called split-split plot design.