5

I am trying to break a word down into syllables and am not quite sure how to do it for English. Some problems I face:

  1. The letter-to-sound rules are not one-to-one. As an example, notice that z"ea"l and "ee"l are different in orthography but map to the same sounds.

  2. English has a large number of foreign words incorporated into it, which makes "sticking" to a certain set of rules all the more difficult.

So my question is:
1. Are there any definite rules for breaking a word in English down into its constituent syllables (CV, CVV, CCV etc.?).
2. What is the "gold-standard" (something used by a majority of the community) on this?

JSBձոգչ
  • 54,843
Sriram
  • 151
  • 3
    For what purpose are you doing this? There is no single answer, and you might get very different answers if you are thinking about typesetting or about phonetic analysis. – Colin Fine Nov 16 '11 at 12:09
  • For example, my American Heritage Dictionary gives ra-tion-al for line-breaking during typesetting, and rash-ə-nəl for the pronunciation. – Peter Shor Nov 16 '11 at 13:45
  • 1
    No answer, but I love your word syllabification. I'm picturing one of those weird cartoon machines from Schoolhouse Rock as The Syllabificator. – T.E.D. Nov 16 '11 at 14:42
  • 1
    @T.E.D., syllabification is a perfectly cromulent linguistic term: http://en.wikipedia.org/wiki/Syllabification – JSBձոգչ Nov 16 '11 at 17:19
  • 1
    @ColinFine: I am trying to build a TTS (text-to-speech) engine. One of the ways to do that is to break a word into its constituent syllables and then synthesize syllable-by-syllable. – Sriram Nov 17 '11 at 05:38
  • @T.E.D: Even I was pretty surprised when I realized that the word I had been using for so long was actually a word.. :) – Sriram Nov 17 '11 at 05:39
  • I hope you are actually naming it The Syllabificator :-) – T.E.D. Nov 17 '11 at 14:18
  • 1
    @Sriram - in that case you need to make it clear that you are interested in phonetic, not orthographic syllabification, as some of the answers below are irrelevant. – Colin Fine Nov 18 '11 at 11:31
  • @Colin: Correct me if I am wrong, but is not all syllabification in English done on phonetic transcriptions? Even when we try to do it manually, we say the word before we break into syllables. This is especially so for a language like English where the letter to sound rules are not unique.. Please correct me if I am wrong. – Sriram Nov 23 '11 at 12:07

5 Answers5

4

The TeX typesetting system (used mostly by mathematicians) incorporates a syllable-breaking algorithm for English. For more information, you can probably ask in https://tex.stackexchange.com/

GEdgar
  • 25,177
  • The TeX typesetting system (if it's the same as when I looked at it) has been tuned to hyphenate all English words in some dictionary with a minimal amount of memory and computing. It works great for words in its original dictionary, but can occasionally fail miserably on words that weren't included. – Peter Shor Nov 16 '11 at 18:48
  • 1
    Phonetic syllabification and typographical hyphenation are not the same thing. The latter also takes into account etymology and stress. – Toothrot Mar 26 '16 at 23:50
3

English syllabification is different from many other languages where you see a C*VN* pattern.

C*VN* = Consonants + Vowel + Nasal

L2 speakers exhibit an accent because they apply L1 syllabification to English words and because of the way they map English CV patterns to L1 monosyllables.

In English, stress, phonotactics and formatives play a crucial role in syllabification. In rapid speech, phonotactics is violated.

Here is a paper by Charles-James N. Bailey of interest: Evidence for variable syllabic boundaries in English.

http://goo.gl/kkQUb

RainDoctor
  • 1,252
2

The procedures for determining syllable boundaries can be rather complex. The starting position is to apply what is known as the Maximal Onset Principle. This states that where there is a choice as to where to place a consonant, it goes into the onset rather than the coda, that is, into the beginning of the following syllable rather than the end of the preceding syllable. The principle applies only if there are no phonotactic constraints. These do not allow a syllable to end with a short vowel and they do not allow a syllable to begin or end with a consonant cluster that is not found at the beginning or end of an English word.

Barrie England
  • 140,205
  • What I see in your answer applies in languages where the rules are clear and definite. Is there some reference where I can find rules for this process concerning the English language? I've never come across such rules, and I've always been advised to split words only after I've consulted a good dictionary, since (to quote the words of my teachers) "There are no fixed rules to split words in English". – Irene Nov 16 '11 at 13:12
  • @Irene: You’d really need to consult a specialist publication such as Peter Roach’s ‘English Phonetics and Phonology’ http://www.amazon.com/English-Phonetics-Phonology-Paperback-Audio/dp/052171740X/ref=sr_1_2?ie=UTF8&qid=1321449556&sr=8-2 There are a few pages on the topic in ‘Linguistics: An Introduction’ by Andrew Radford and others http://www.amazon.com/Linguistics-Introduction-Andrew-Radford/dp/0521849489/ref=sr_1_8?s=books&ie=UTF8&qid=1321449748&sr=1-8. – Barrie England Nov 16 '11 at 13:23
  • Thank you for answering. From what I understand, schoolchidren don't learn such rules in English-speaking schools, just as I didn't learn any. I'll look this information up. – Irene Nov 16 '11 at 13:37
  • @Barrie: you also have to take into account that syllabification generally respects morphemes. For example, waitress is generally pronounced wait-ress rather than way-tress. – Peter Shor Nov 16 '11 at 13:52
  • @Irene: I wouldn't really expect them to. It's quite an advanced topic. – Barrie England Nov 16 '11 at 13:52
  • @Peter Shor: I’d say, rather, that the three morphemes in ‘waitress' were ‘wait’, ‘(e)r’ and ‘ess’. In phonetic terms, the syllabification of ‘waitress’ is /weɪ.trɪs/ (the dot is conventionally used to shows the division). That’s because the Maximal Onset Principal places the /t/ in the onset of the second syllable rather than the coda of the first syllable, and there are no phonotactic constraints that prevent it from doing so. – Barrie England Nov 16 '11 at 14:11
  • @BarrieEngland: Absolutely. But this isn't the case with other languages. Thanks again. – Irene Nov 16 '11 at 14:13
  • @Barrie: the dictionaries say you're right: /weɪ.trɪs/. But are they using the maximum onset principal to get this syllable division, or are they listening to people? Because that's not the way I pronounce it; I say /weɪt.rɪs/ but (e.g.) /peɪ.tri.ət/. And I bet I'm not the only one who pronounces it this way. (Although let me retract the claim in my previous comment and admit I have no idea which pronunciation is most common.) – Peter Shor Nov 16 '11 at 15:16
  • @PeterShor: A rough and ready way to identify syllables is by singing the word. Where you change the note, you change the syllable. – Barrie England Nov 16 '11 at 16:39
  • @Barry: but don't people sing words differently than they say them? – Peter Shor Nov 16 '11 at 16:49
  • @PeterShor: Of course, but if you were setting 'waitress' to music, you would have to spread it over two notes, let us say G and C. Consider what you would sing on the first note. Would it be 'wait' or would it be 'wai'? It's important to remember that in linguistics syllables are phonetic concepts, not lexical ones. – Barrie England Nov 16 '11 at 17:06
  • Maybe 'waitress' wasn't a good example. But my basic point is that morpheme boundaries play a role in syllabification; consider 'Whitestone' and 'Hightstown'. – Peter Shor Nov 17 '11 at 13:41
  • @PeterShor: I’ll have to concede the point. Where phonetic and morphemic considerations conflict, the morphemic ones will sometimes prevail. As the British linguist David Crystal has written of syllable boundaries, ‘English is full of cases where alternative analyses are possible.’ In your examples, there are no phonotactic constraints for 'White.stone', but I’m not sure if the morphemes are ‘White’ and ‘stone’ or ‘Whites’ and ‘tone’. The MOP would sanction ‘Hight.stown’ but I agree in that case morphemic considerations will produce ‘Hights.town’. – Barrie England Nov 17 '11 at 15:39
  • In the placename "Kingswinford", moving the morpheme boundary actually changes which phoneme the "s" represents - it's voiced if it attaches to the previous syllable (and morpheme) but unvoiced if it attaches to the following. Since I have hardly ever heard it pronounced, I actually do not know which the locals say. Note, Sriram, that this is an example - albeit a Proper name - where no algorithm will work: even a native English speaker (me) is unsure not just of the syllabification, but of the pronunciation. – Colin Fine Nov 18 '11 at 12:00
  • @Colin: Whitestone, a neighborhood of Queens, NY, is an example where the morpheme boundary seems to have moved. It was apparently originally White's Town, but if you listen to New York traffic reports on the radio, you hear about congestion on the White.stone Bridge. – Peter Shor Nov 18 '11 at 12:31
  • @PeterShor: Then the vowel in the second syllable would have changed too, I imagine, from /ə/ to /əʊ/ and there would be a more even stress between the two syllables. – Barrie England Nov 18 '11 at 13:44
0

I guess say the word and see how many separate little breaths (efforts) are required to say it (except the very last sound if it ends in a consonant, like the final burts of breath from the 't' when you 'eat'). 'Zeal' and 'eel' require one. 'Appeal' has two, 'reconsider' has four, and so on.

The original language has not much to do with really - if I'm wrong, I'd like to be corrected on this.

Where you need to pay attention are the dipthongs and tripthongs - they're not really separate breaths, but inflections in sound within a single effort.

Akin
  • 1,521
  • it is easy when you say a word, but my task is to automate it... and therein lies the need for a set of clear rules. – Sriram Nov 16 '11 at 14:28
  • Which suggests you are talking about speech, not writing, but others have answered in terms of writing. Please make your question clear. – Colin Fine Nov 16 '11 at 15:40
  • @Sriram: what exactly do you need to do, and why do you need to automate it, rather than (say) using a dictionary and looking it up? – Peter Shor Nov 16 '11 at 16:34
0

There are no definite rules for breaking English words into syllables. There are rules in other languages, for example French and Greek, in which you are allowed to split words at the end of a line. In English, however, you are advised never to split a word in syllables when writing. If you need to do so, consult a good dictionary which will show you the syllables of the word in question.

Irene
  • 12,562
  • 2
    Of course you're allowed to split a word at the end of a line while writing in English, and of course there are rules for how to do so. It's just that it's easier to look up the syllabification in the dictionary than to learn the rules. And for some words, the rules are ambiguous, which means that somebody has to make a judgment call, and the lexicographers don't want to leave this to us amateurs (but in these cases, it's probably better not to hyphenate the word anyway). – Peter Shor Nov 16 '11 at 12:27
  • @PeterShor: Any reference where I can find these rules? I found the information I give in my answer in grammar books and dictionaries. – Irene Nov 16 '11 at 12:46
  • I figured out some of them after receiving galley proofs typeset in Hungary with some truly terrible linebreaks: always break at morpheme boundaries, never break a word after a stressed short vowel, and choose the earliest line break consistent with English phonotactics (as Barrie says in his answer). For example, the pronunciation of 'a' makes rather hyphenate differently in the U.S. and the U.K. I can't answer what to do when these rules conflict with each other (ra-tion-al), or why American dictionaries count the ear vowel as long but the air vowel as short (wea-ry vs. char-i-ty). – Peter Shor Nov 18 '11 at 23:25
  • Thank you. Unfortunately, they don't help much without a good dictionary (something that isn't true for other languages, that's why I talked about lack of specific rules in English), but some of them can be of use. – Irene Nov 19 '11 at 07:01
  • I think you can figure out the hyphenation of over 90% of words just from the rules and their pronunciation. But then you run into questions like: "is term a morpheme in ter-mi-nal"? and problems like the fact that do-ry and cor-al are hyphenated differently because the vowel 'o' in them used to be pronounced differently, even though they are now generally pronounced identically by Americans. In my opinion, if you need to know how words were pronounced a century ago to hyphenate them, you should change the hyphenation system. The OED won't hyphenate anywhere near an r-influenced vowel. – Peter Shor Nov 19 '11 at 11:55