14 minute read

Here is the Chinese version of this article.

Zhuyin (Chinese: 注音, Pinyin: zhù yīn, Zhuyin: ㄓㄨˋ ㄧㄣ), also known as Bopomofo (which comes from its first four characters), is a Chinese transcription system that is capable of transcribing all the sounds of standard Mandarin Chinese, although it could also be used to transcribe other dialects of Chinese such as Taiwanese Hokkien. First introduced by the then Chinese government in the 1910s, Zhuyin played an important role in documenting the the sounds of the most prominent logographic language and later as an electronic input method for Chinese. Zhuyin is technically a semisyllabary: the characters for the consonants at the start of words function akin to an alphabet, but the word codas (some vowels and ending consonants) are grouped together and represented by distinct characters, similar to syllabaries such as the Devanagari (the Hindi script) or Hiragana (the primary Japanese script). Zhuyin was replaced by Pinyin, although still used as a reference, after the formation of the People’s Republic of China.

Today, Zhuyin is only actively used by Taiwanese people. Mainland Chinese people are usually disinterested in the idea of learning Zhuyin, as it provides little benefit to the use of Chinese in their daily lives. In fact, learning Zhuyin is often discouraged on the Chinese internet as well.1 Although I have always had a faint idea of Zhuyin, my first real encounter with Zhuyin was last Christmas, when a Taiwanese family friend of mine came over to visit and had an in-depth conversation with me regarding Zhuyin, after which I began looking into online resources for learning Zhuyin. To my astonishment, there was very little the internet had to offer. Interestingly, I was flooded with Taiwanese tutorials teaching students how to convert Zhuyin into Pinyin. There were some Zhuyin resources for non-Chinese speakers; however, there were no resources for Chinese speakers who are familiar with Pinyin. Furthermore, as I learned more about Zhuyin, I found that not only does its script derive from ancient Chinese characters and the Chinese tradition rather than the Latin alphabet, Zhuyin consequently also provides a more intuitive way of thinking about Chinese sounds sans the unnecessary complexities of Pinyin. Hence, I set out to figure out the system myself, with the goal being a general method for translating Pinyin to Zhuyin and the ability to spell out Chinese characters in Zhuyin by sound.2

This post is meant to be a guide and introduction to Zhuyin organized in the way I approached it. It is meant for Chinese speakers with a solid knowledge of Pinyin, although someone new to Chinese might still be able to benefit from learning the characters and the sounds in the introductory section. The first section will go over the basics of Zhuyin, and the second section will the Pinyin-to-Zhuyin translation rules that I devised after some analysis.

The Basics

Zhuyin is a semisyllabic script: it has consonants that go at the start of words called initials and vowel groups that go after the consonants and comprise the vocalized part of the word. The vowels split up into two main types: the semivowel sounds i, u, and ü (see below) called medials, and other vowels, possibly including codas such as n and ng, called finals. As the name suggest, finals always come after medials. The cover image has a complete of Zhuyin characters (initials are coloured blue, medials are grey, and finals are orange). Here is a list of Zhuyin characters.3

Consonants (Initials)

Zhuyin IPA Pinyin Equivalent
p b
p
m m
f f
t d
t
n n
l l
k g
k
x h
t͡ɕ j
t͡ɕʰ q
ɕ x
ʈ͡ʂ zh
ʈ͡ʂʰ ch
ʂ sh
ʐ/ɻ r
t͡s z
t͡sʰ c
s s

The consonants are organized in the cover image above by consonant type, which is a useful way to reference and memorize them. All Chinese students learn sounds (Pinyin or Zhuyin) in a similar order that preserves the adjacency of these column groups.

Vowels

The first category of vowels are semivowels, they comprise the vowels i, u, and ü (similar to German pronunciation) and go between initial and final sounds.4 However, they could also occur at the start of end of words, just never before an initial or after a final. Note that the vowel “ㄧ” is written as a horizontal bar when Zhuyin is written vertically (as it is traditionally).

Zhuyin IPA Pinyin Equivalent
j/i y/i
w/u w/u
ɥ/y yu/u

Then, we further divided the remaining vowels into two categories. There are the simple vowels (finals).

Zhuyin IPA Pinyin Equivalent
ä a
o o
ɤ e
ɛ e

Note that Pinyin has an ambiguity with its vowel “e” as it doesn’t distinguish between the two very different sounds (the ㄜ sound in 的 de and the ㄝ sound in 也 ye). You have to distinguish between them given the vowel cluster in which it appears. Lastly, there are the compound vowels (finals).

Zhuyin IPA Pinyin Equivalent
ai̯ ai
ei̯ ei
ɑu̯ ao
ou̯ ou
an an
ən en
ɑŋ ang
ɤŋ eng
ɑɻ er

Tip for memorization: To learn Zhuyin you must know all of these characters by heart. It is recommended that you learn the correct stroke order, so you will not need to spend more time fixing incorrect habits later on.

Tones

Standard Mandarin Chinese has five tones (the “fifth tone” being the neutral tone), so Zhuyin developed notation to distinguish between these tones. Unlike Pinyin, which puts the tone mark above the vowel letter with the least lexicographical value in the spelling (with exceptions!), Zhuyin usually places the tone mark to the side of the middle or final vowel character in vertically aligned writing and after the spelling in horizontally aligned writing. Pinyin also defaults to the neutral tone, whereas Zhuyin defaults to the first tone and uses a dot to denote the neutral tone instead. For completeness, here are the tone markers

Tone 1 2 3 4 Neutral
Zhuyin   ˊ ˇ ˋ ˙
Pinyin ¯ ˊ ˇ ˋ  

See an example from the Taiwanese Mandarin Daily Dictionary below.

Example of Zhuyin with tone markers in text

Structure

Zhuyin is an efficient system because any word in Mandarin Chinese takes at most three (four if you count the tone marker) characters to transcribe, compared to six in Pinyin (e.g. 壯 zhuàng vs ㄓㄨㄤˋ). Zhuyin always goes initial-medial-final-tone.

Of course, any of them may be omitted: for example, tone when the word is in first tone, initial when the word begins with a vowel sound, and et cetera. Though each word has to have at least one character. This design also simplifies Zhuyin because the user would know where each of the characters could go (based off their types), greatly reducing the chance of spelling errors.5

Pinyin to Zhuyin Conversion

The rest of the article assumes familiarity with the Chinese language as well as the Pinyin spelling system.

Notice that Pinyin was amongst the information presented above for each Zhuyin character. Once you remember all the canonical Zhuyin-Pinyin conversions that I provided above, you should be able to convert a majority of the possible Chinese word sounds between the two systems. However, there are still many edge cases of which we need to take care. Unfortunately, almost all of these changes are due to phonetic swaps or simplifications in Pinyin. For example, Pinyin does not like semivowels at the start of words, so words such as 翁 ueng become weng and 园 üan become yuan in Pinyin. Also, Pinyin simplifies many vowel clusters and deletes vowels when they are not needed to distinguish between sounds (since Chinese is not very densely populated when it comes to sound variety). For instance, 同, spelt ㄊㄨㄥˊ tueng becomes tong in Pinyin.6

However, this is a little bit harder to do in reverse, i.e. from Pinyin to Zhuyin, because you do not intuitively know when to add back an extra vowel or switch an “o” for a “ue.” My goal was to generalize all of these patterns for standard Mandarin Chinese (other dialectal pronunciations may vary in notation in both Pinyin and Zhuyin, and moreover I am not proficient enough in any other dialect of Chinese to provide apt analysis). The method I chose was brute force. After a bit of web surfing, I came across this website with a list of (almost) all Pinyin combinations in standard Mandarin Chinese. This was good enough to begin with, and I moved on to pasting all the Zhuyin into alternating rows for comparison. This was extremely easy to do, as the table was organized by initial consonant and the rest of the word (vowel group or “rhyme”), and while Pinyin has many simplification rules as aforementioned, Zhuyin does not have such complications. The whole operation amounted to a teaspoon of Excel magic: typing out each Zhuyin consonant (initial) and vowel (medial+final) on their respective columns/rows and concatenating them in the appropriate order and cells.

After that, I had a thought: earlier I stated that the canonical Zhuyin-Pinyin conversions I gave above almost work–it just needs a few tweaks. So I created another version of the comparison chart.

zhuyin-pinyin difference comparison test Highlighted cells are mismatches between the two approximated spelling systems.

Thanks to this excel comparison, I could finally use this spreadsheet as a mask on the first sheet to identify every Zhuyin-Pinyin word spelling pair that does not conform to the canonical, naïve conversion method.

finalized zhuyin-pinyin conversion chart

Being able to visualize the “shape” of all valid word-sounds in standard Mandarin Chinese on this spreadsheet was definitely an interesting perspective and an enjoyable part about this project. Here is a link to a high-quality PDF containing both the difference test and the conversion charts for further study. Finally, we can try to sum up what we learned from the chart.

Summing Up the Rules

Given a Pinyin spelling of a word, use the following steps to convert it into Zhuyin.

  1. Undo any of the following initial transformations (transformations at the start of words to remove semivowels).
    1. At the start of a word, y becomes an i, which gets removed if an i follows immediately. For example, ya -> ia and yi -> i.
    2. w becomes an u, which gets removed if an u follows immediately. For example, wo -> uo and wu -> u.
    3. yu always becomes ü (in place).
  2. Due to the nature of Zhuyin character’s internal readings, if any of the consonants j, q, x, zh, ch, sh, r, z, c, s is followed by the semivowel i and no more characters afterwords, then the i is removed. This is the only simplification that happens in Zhuyin, and it is really easy to remember.
    Example: the Zhuyin character ㄑ reads as in Pinyin by default, so the word 氣 simply becomes , which is just ㄑ, whereas the word 秋 qīu does not lose its i, its Zhuyin remains as ㄑㄧㄡ.
  3. A couple of vowel parings (that deal with semivowels) have to be exchanged. They are listed as follows.
    1. If i is followed by n or ng, an e is added between the i and the following coda.
    2. If u is followed by i or n, an e is added between the u and the following letter.
    3. If u follows immediately after j, q, or x, it is changed to a ü.7
  4. Lastly, as an extension to the previous rule, there are three (possibly annoying) pairing transformations that one will have to remember by heart (or brain, or other means such as audio).
    1. iu becomes iou.
    2. ong becomes ueng.
    3. iong becomes üeng.
  5. After Step 4, all transformations should be complete. If there is a leading consonant, replace it with the relevant Zhuyin character. Repeat the same process with the semivowel. And finally, what remains should be one single vowel (non-semivowel). Substituting the pertinent character should complete the conversion, don’t forget to add the tone if necessary.

Conclusion

Taking a step back, for readers who are not convinced that these rules transformations actually preserve the sounds of words with which you are familiar, you could try pronouncing them aloud, staring slow and keeping each Zhuyin character separate, and then gradually speed up until the sounds bend into one another. You should find that the pronunciation from the Zhuyin spelling ends up almost merging into the pronunciation of the word that you would expect. Disclaimer: this may or may not work depending on the dialect(s) of Chinese with which you may be familiar. Personally, I grew up speaking a Pekinese dialect of Chinese, which is very closely related to standard Mandarin Chinese.

All of these rules can be memorized, though perhaps it may be better for some of them to “come naturally.” For example, it would certainly be easier for you to accept a rule or pattern if you could first convince yourself that it actually works. Another helpful trick is to remember the difference between the semivowels and other vowels in the Zhuyin spelling system. For instance, ong became ueng because it had to. There is no Zhuyin character representing the sound ong, and something like oueng doesn’t sound right, and most importantly it is against the rules because the only way a two-vowel word could exist is if the first one was a semivowel and the second one was not. This way, ueng seems to be the only choice left with a similar sound (which ends up sounding the same when spoken at conversational speed).

Also, I figured out these patterns retroactively, which means I learned Zhuyin after using Pinyin exclusively for over a decade simply by intentionally using Zhuyin–guessing the spelling of a word I encounter, and checking it against a dictionary or the internet.

As for spelling out Zhuyin by sound, there are much less tricks, and all I could say is that practice is key. Learning Zhuyin has been a very fascinating experience for me. Thanks to the lack of resources online, probably due to a greater value being placed on Pinyin due to its political ties with China and its simplicity and compatibility with the west as a script derived from the Latin alphabet, learning Zhuyin became a hands-on, exploratory process for me. Figuring out these rules helped me understand the Chinese pronunciation system in a more intimate way, since I don’t normally think about it as a native speaker. This project made me take a step back and analyze the language and spelling systems with which I was familiar in ways that I wouldn’t discover otherwise (e.g. the difference between semivowels and non-semivowels). And my hope is that more people (especially Chinese speakers from the Mainland) would become interested in Zhuyin and learn how it works as well as how to use it.

Finally, one might wonder, since Pinyin was engineered to help Chinese fit on a QWERTY keyboard, how would a Zhuyin user type? Well, it could be neatly and logically organized into “columns” (or vertical rows) of three to four characters. This allows Zhuyin to have a completely logical keyboard layout that you don’t even have to learn! Just invert the cover image (as the keyboard goes left to right whereas the cover image reads right to left) and voilà, you have a keyboard capable of typing any Chinese character. Whether this layout is efficient/fast is beyond the scope of this article, and it is probably not efficient; however, it’s always good to have one less keyboard to memorize.

Zhuyin computer keyboard layout The most common variant of the Zhuyin computer keyboard layout (source: Wikipedia).

Colophon for Cover Image

This image was obtained from this link. (Though, I’m not sure that the link will persist. If it is dead, a simple reverse google search should do the trick as well.) I am an avid user of MacOS’s Preview.app, though I am recently trying out GIMP due to Preview’s limited functionality. I mostly deal with simple pixel-art images, hence sophisticated applications such as Photoshop are probably overkill for my purposes. I first removed the background of the image to make it compatible with dark mode by following these instructions. Then, I played around with colour transformations to change the colour of the middle group.

  1. Example

  2. Source: Wikipedia

  3. Source: Wikipedia (Chinese). 

  4. Vocabulary from this blog post

  5. See this blog post

  6. For these hypothetical pinyin transcriptions, I have avoided inserting a tone marker because there is no standard procedure/agreement on where to place them. These theoretical spellings are just for demonstration purposes. 

  7. This is because Pinyin u cannot follow after j, q, or x. This is technically a initial transformation rule in Pinyin, but since it does not change the first letter, I think it would be more intuitive to classify it as a vowel transformation instead.