Skip to content

Commit

Permalink
Fix minor typos (#214)
Browse files Browse the repository at this point in the history
  • Loading branch information
Alexander Verge authored Mar 14, 2022
1 parent ca44d13 commit b1ee3ec
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 3 deletions.
4 changes: 2 additions & 2 deletions learning/bayesian/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ The idea we presented here is usually called "conjugacy". Using standard terminl

### Categorical Distribution

We give another example of a conjugate prior which generalizes the Bernoulli example above. Instead of being limited to binary outcomes, we can now consider the categorical distribution (think of a $$K$$-sided dice). Let $$\mathcal{D} = \{ X_1, \ldots, X_N \}$$ be $N$ rolls of the dice, where $$X_j \in \{ 1, \ldots, K \}$$ is the outcome of the $$j$$-th roll. The parameter of the categorical distribution is denoted by $$\theta$$
We give another example of a conjugate prior which generalizes the Bernoulli example above. Instead of being limited to binary outcomes, we can now consider the categorical distribution (think of a $$K$$-sided dice). Let $$\mathcal{D} = \{ X_1, \ldots, X_N \}$$ be $$N$$ rolls of the dice, where $$X_j \in \{ 1, \ldots, K \}$$ is the outcome of the $$j$$-th roll. The parameter of the categorical distribution is denoted by $$\theta$$

$$
\theta =(\theta_1, \cdots, \theta_K) := (P(X_j = 1), \ldots, P(X_j = K))
Expand All @@ -100,7 +100,7 @@ $$

where $$B(\alpha)$$ is still a normalization constant.

To show that the Dirichlet distribution is the conjugate prior for the categorial distribution, we need to show that the posterior is also a Dirichlet distribution. To calaulate the posterior $$p(\theta \mid \mathcal{D})$$ with Bayes rule we first calculate the likelihood $$p(\mathcal{D} \mid \theta)$$ as
To show that the Dirichlet distribution is the conjugate prior for the categorial distribution, we need to show that the posterior is also a Dirichlet distribution. To calculate the posterior $$p(\theta \mid \mathcal{D})$$ with Bayes rule we first calculate the likelihood $$p(\mathcal{D} \mid \theta)$$ as

$$
P(\mathcal{D} \mid \theta) = \prod_{k=1}^K \theta_k^{\sum_{j=1}^N 1\{ X_j = k \}}
Expand Down
2 changes: 1 addition & 1 deletion learning/latent/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ $$

To generate a new data point, we sample a cluster $$k$$ and then sample its Gaussian $$\mathcal{N}(x; \mu_k, \Sigma_k)$$.

{% include maincolumn_img.html src='assets/img/gmm2.png' caption='Example of a Gaussian mixture model, consisting of three components with different class proportions (a). The true class of each point is unobserved, so the distribution over $x$ looks like in (b); it is both multi-modal and non-Gaussian. Visualizing it in 3D shows the effects of class proportions on the magnitudes of the modes.' %}
{% include maincolumn_img.html src='assets/img/gmm2.png' caption='Example of a Gaussian mixture model, consisting of three components with different class proportions (a). The true class of each point is unobserved, so the distribution over $$x$$ looks like in (b); it is both multi-modal and non-Gaussian. Visualizing it in 3D shows the effects of class proportions on the magnitudes of the modes.' %}

### Why are latent variable models useful?

Expand Down

0 comments on commit b1ee3ec

Please sign in to comment.