GitHub - prendradjaja/phoneme-frequencies

Quick links

Summary

An estimate of the relative frequencies of English phonemes. Also, an estimate of the relative frequencies of English phonemes that follow /w/.

Methodology

Reproducing the work of Doug Blumeyer, I correlated the CMU Pronouncing Dictionary ("CMUdict") and Adam Kilgarriff's unlemmatized frequency list for the British National Corpus to find phoneme frequencies generally. I extended this technique to estimate post-/w/ phoneme frequencies as well.

Tools used

I used a combination of Python and Unix tools (grep, sed) for text processing.

Also, since there are many steps with various dependencies in between them, GNU Make was a decent fit for modeling the dependencies -- easy enough to just run make after each modification to my code.

Limitations

As Blumeyer notes, the source datasets have some limitations. CMUdict conflates "schwa with the near-open central vowel" and has "several noticeable errors." Kilgarriff's frequency list has some formatting issues that make it hard to work with words with accents and apostrophes, (at this time, I've completely ignored this issue) including common contractions.

Blumeyer did manual error checking on several hundred of the most common words. I have not done this.

The CMUdict has multiple pronunciations for some words. For these words, I used only the first pronunciation given. It's not clear to me if in these cases the multiple pronunciations are ordered in some way or just ordered arbitrarily.

Other notes

While the Kilgarriff list is for the British National Corpus, a quick inspection suggests that it uses American pronunciations over British ones.

References

Doug Blumeyer, "Relative Frequencies of English Phonemes"
CMU Pronouncing Dictionary (Local copy at version 0.7b. Retrieved May 28, 2018.)
Adam Kilgarriff, word frequencies for the BNC (Local copy retrieved May 28, 2018.)

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
intermediate		intermediate
local_intermediate		local_intermediate
local_source		local_source
local_target		local_target
scripts		scripts
source		source
target		target
.gitignore		.gitignore
CHANGELOG		CHANGELOG
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO		TODO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick links

Summary

Methodology

Tools used

Limitations

Other notes

References

About

Releases

Packages

Languages

License

prendradjaja/phoneme-frequencies

Folders and files

Latest commit

History

Repository files navigation

Quick links

Summary

Methodology

Tools used

Limitations

Other notes

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages