Where can I find an "official" list of English graphemes?

Question

Do you know of a list provided by some academic institution? I did find some lists, but I am unable to judge the quality and/or completeness of these:

This pdf, referenced here.
and this pdf, referenced here.

Background: I am trying to program a random name generator for project working titles, using the approach outlined here, by extracting graphemes from these downloadable free corpus samples and feeding it to some kind of markov chain.

UPDATE:

I used the Wikipedia list as suggested by @tchrist and the free COCA sample corpus referenced above. The approach worked quite well for my purposes. Here is a small random set of generated words for anyone interested:

Wanstasy, Indricis, Voformer, Colutove, Ingerstr, Tottione, Lspheres,
Umandsam, Extivelo, Pironoba, Zofiropr, Bingernt, Kitleron, Viewinef,
Juntialt, Enabbyth, Uplpofor, Everopeo, Heventri, Ntozzler, Buncener, 
Granalse, Nocosacc, Randeren, Randantu, Caredyou, Ftedowla, Ncesnarr, 
Ulilkien, Factitur, Grontoft, Noughtoo, Lackeded, Zofricsp, Viewedon, 
Tuartand, Dossions, Kifreaps, Xicatage, Evertsom, Emorever, Manksgis, 
Ponkiold, Nsualina, Atofficl, Mallitsi, Spmethir, Dayspeed, Anditout, 
Xatofrse, Izamedoo, Bupleati, Plitteni, Failitha, Hinglood, Dcoveyou,

"Official" makes no sense. Furthermore, whether something is a digraph/trigraph etc. depends. OA is such in coat but not coalesce; TH is such in bathhouse but not in boathouse. There is no end of these. — tchrist, Apr 12 '15 at 17:02
@tchrist Eventually I will work with whatever I can find, including the lists provided in the question... — Reto Höhener, Apr 12 '15 at 17:57
I wonder whether the various spellings for each given phoneme listed in Wikipedia’s section on “Sound to Spelling Correspondences” in their article on English Orthography might be of any use to you for sussing out possible graphemes in English. — tchrist, Apr 12 '15 at 18:03
Ok, I’ve looked at both your PDF sources: the Wikipedia section is better than either of those. Your task is harder than you may realize. — tchrist, Apr 12 '15 at 18:12
Thanks for that - I wasn't aware of that wikipedia list, you might want to turn that into an answer. — Reto Höhener, Apr 12 '15 at 18:38
I'm voting to close this question as off-topic because it's a resource request. — Helmar, Oct 24 '16 at 12:48

score 3 · Accepted Answer · answered Apr 12 '15 at 18:40

3

If you look at the various spellings for each given phoneme listed in Wikipedia’s section on “Sound to Spelling Correspondences” in their article on English Orthography, this may help.

I’ve looked at both your PDF sources: the Wikipedia section is better than either of those. Your task is harder than you may realize.

answered Apr 12 '15 at 18:40

tchrist

134,759

I ended up using this Wikipedia list. It worked quite well for my purposes. Thanks again. – Reto Höhener Apr 12 '15 at 23:40
1

@Zalumon Your results are quite good, and I bet you could make them even better. Some of the initial and final sequences don't work. I'm thinking if you include a special empty element to represent the beginning and end of the word for feeding to your markov chains, that that issue might go away. – tchrist Apr 13 '15 at 03:20
Yes, I see your point. Currently I treat all graphemes identically. I should definitely have separate probabilities for word boundaries. – Reto Höhener Apr 13 '15 at 13:48

score 0 · Answer 2 · edited Jan 15 '16 at 01:23

0

I get what you mean because there are officially over 44 that include the American and British, but I don't know where to find it, and I'm looking for it as well. :/ But check page R45 of the Oxford Advanced Learner's Dictionary, you'll find it there with an example each. If you don't want to use a dictionary, then I'm sorry.

edited Jan 15 '16 at 01:23

jimm101

10,753

answered Jan 14 '16 at 19:41

user155484

1

Where can I find an "official" list of English graphemes?

2 Answers2