An Akha shaman performance (Thailand)
in In the borderland between song and speech

This chapter analyses one section of a long shaman performance. A comparison of repeated lines shows rather stable patterns for the realization of lexical tones. A line includes rhyme-pairs: units of two syllables in which the latter is stressed. A repeated line may be compressed into a shorter time space. Analysis from the performance-template perspective reveals techniques that the performer uses.

The Akha people live in the border areas of China, Burma, Thailand, Laos, and Vietnam. Most of them live in the south-western part of the Yunnan province of China, forming part of the Hani nationality, and in adjacent areas in Burma. In Thailand, they are reported to have arrived from Burma at the beginning of the twentieth century. In Akha tradition, long texts are transmitted in the death ritual performed by priests, phirma, and in the rituals of the shamans, nyirpaq. The recording of a seance by Sjhá-gàw, the shaman employed here, was made in the Saensuk Akha village in north-western Thailand, in the Chiengrai province close to the Burmese border (Map 5); the priest Àbáw-Gaw assisted with the translation.

The shaman texts are said to be personal to each shaman, and to vary from one performance to the next. The shaman, male or female, makes a spiritual journey to find the reason for sickness. While on the journey, the shaman recites or chants everything she encounters. Inga-Lill Hansson went to one shaman twice just to see if her texts were the same; there was a seven-year gap between these visits. The texts are not identical, but they are very similar. Certain parts of the shaman’s journey, which goes from the house, out of the village, to the borderline dividing humans and spirits, continues to the ancestors, where a rope leading back to the shaman’s house is kept. This process is the same, and is described in more or less the same words. Other parts of the journey seem to be specific to each shaman and to each journey.

The shamans grow up listening to both the priests’ and the shamans’ reciting, thereby learning the rhythms and many of the stock phrases. They then build on this and express their own personal visions, drawing from others’ texts and adding their new experiences which are modelled on what they have learned by listening. I believe that the shamans, to a certain extent, reformulate themselves at each ‘performance’; thus, for instance, their visions may occur in a different order. The performance probably also involves a moment of creation and depends on what the shaman sees on this particular journey. The recitations are, however, expressed and moulded according to a given pattern that appears to be shared by all shamans.

Akha ritual texts are organized into lines based on iambic feet with a degenerate final foot. Briefly, it may be said that each line is built upon what I call a rhythm pair, i.e. two syllables with the second one more stressed. Each line contains a row of such pairs + one last syllable, making each line in slow recitation consist of an odd number of syllables.1

Akha language and the shaman’s vocal expression

The words of Akha shaman performances have been studied to some extent but, to our knowledge, there has been no previous study of the relationship between the linguistic and musical aspects of the performances. It was rather obvious that the long vocal expressions, based on a fundamental recurring melodic formula and variation, would be suitable for analysis from the performance-template perspective. Since Akha is a tone language, it was also obvious that relationships between lexical tones and pitches would play an important role. The main objective of this chapter is to analyse the way that musical phrases are modified – lengthened or shortened – in the realization of the prosodic phrases. This was done purely by use of Melodyne graphs, which are well suited for comparisons as they permit simultaneous aural and visual observations.

The Akha language belongs to the Tibeto-Burman sub-group of the Sino-Tibetan language family. It is closely related to the Hani language in China. There are three lexical tones: High (H), Mid (M), and Low (L). They are marked with accents over the vowel, for example á (High) and à (Low); the Mid tone is unmarked. Traditionally, Akha has no writing system; but during recent decades different writing systems, mainly based on Latin letters, have been introduced in Burma, Thailand, and China. The recording of a seance by the shaman Sjhá-gàw was made by Hansson in the Akha village of Saensuk in north-western Thailand, close to the Burmese border, in December 1983.2 The materials used for this study consisted of a cassette tape-recording and Hansson’s typewritten field notes consisting of glossed transcriptions of the words with lexical tones.

General starting points

  • A section of approximately 13 minutes of the recording was digitized by means of the free Audacity audio software and saved as a wav.file.
  • The field notes had been organized into lines, numbered for each page in the document, so that each page starts with the number 1. For this study, the lines were coded ‘page: number’, so that, for example, 166:7 refers to line number 7 on page 166.
  • The starting points of the 68 lines corresponding to the recording were tagged in Audacity.
  • The wav.file was transferred into Melodyne for analysis.
  • It was found that the recording could be divided into 3 sections:
    • Section 1 (approximately 166:6–168:13): The shaman speaks about beating bamboo on the way to the spirit village (so that the spirits would hear the shaman coming).
    • Section 2 (approximately 168:14–169:9): The shaman is in the spirit village, but is not able to tell everything.
    • Section 3 (approximately 169:10–171:7): The shaman returns to the human world.
  • Each of these sections contains sub-sections marked by the repetition of certain lines.
  • Some of the lines and the corresponding musical phrases are performed in pairs. The second musical phrase of a pair is often more compact, i.e. performed in a shorter time, and usually at a lower pitch. There may also be relatively long passages in which more lines are grouped together in this manner.
  • It is possible to distinguish some identical or nearly identical lines that appear during the performance, usually only within the same main section. These provide possible material for analysing the performance manner. Some lines have identical words. Some lines contain syllables that are identical – or fixed – and syllables that are varied. Therefore, lines can be used for measuring consistency as well as variation in performance, particularly concerning lexical tones.
  • A number of lines with obvious similarities were made into separate wav.files and were run through the Melodyne software with the aim of looking for regularities or irregularities with regard to musical phrases and the realization of lexical tones.

The performance

The performance tells a story at a rather slow tempo. The initial line starts with a very long tone (Example 100). Line 2 in the example contains words presented at a fairly slow tempo, alternating between a couple of pitches. The third line is a compressed line with many syllables delivered rather fast. The section ends with a long tone that appears as the tonal centre. The performer uses a creaky voice quality so that a rather steady pitch one octave below the notated ‘c’ is clearly audible, almost like a drone.

Example 100 The first line of the section discussed in this chapter (166:6). Stressed tones are marked with >. The arrow shows a long final tone performed in a creaky voice. Performed by Sjhá-gàw, Saensuk village, north-western Thailand in December 1983. Original pitch: c ≈ 130 Hz. Pulse ≈ 60 b/s. For translation and Melodyne graph, see Example 106. 23 Sjhá-gàw 166:6.

Analysis 21 Realization of lexical tones in identical lines

Comparison 1

Lines 166:8–9 and 167:8a–8b are both pairs of phrases. The pairs contain identical words and lexical tones.3

166:8 and 167:8a ŋà da làq-bö́-shɛ́-há í gà ɛ́ xhɔ̀ xhə á là
166:9 and 167:8b da lɔ̀ làq- bö́-shɛ́-há gà ɛ́ xhɔ̀ xhə
beating to let my spirit father làq- bö́-shɛ́-há hear it

In performance, there are restrictions on what types of grammatical forms can be in which position within a rhyme-pair. For example, the negation can only be in the first position, and noun and verb particles only in the second. Any problem with regard to ensuring the correct placing of the words may be resolved by means of, for instance, prefixes or suffixes. The filler syllable lɔ̀ is frequently used to fill an otherwise empty second position, exemplified by da lɔ̀ in line 166:9 above.4

In both cases, the first part (166:8 and 167:8a) starts high and ends low around G, below the tonic C, while most of the second part is lower than, or at most a second (2 semitones) above, the tonic.

The graphs in Example 101, which all used Melodyne, show that the pitches in the two versions of this linguistic phrase are nearly identical, and so it is possible to see how lexical tones are realized:

  • The initial 5 syllables LMLHH are performed LHHLH (aberrations underlined).

In the remainder of the phrase:

  • The High lexical tones are performed a fourth to a fifth (or 5–7 semitones) above the final tone, but in two cases a sixth to a seventh (7–11 semitones above). They are often falling.
  • The Low lexical tones are performed a second (2 semitones) above the final tone, or a second below (in the phrase ending).
  • In both examples, only one Mid lexical tone (syllable 11) is performed higher than the previous Low and lower than the following High.

Example 101 Melodyne graphs of a) 166:8 (top) 24 Sjhá-gàw 166:8 and b) 167:8a (bottom) 25 Sjhá-gàw 167:8a. Vertical: pitch, horizontal: time (1 column = 1 second). Syllable numbers and lexical tones (L, M, H) are given with the words.

Example 102 Melodyne graphs of 166:9 (top) and 167:8b (bottom).

The graphs in Example 102 show that the two performances of this linguistic phrase are quite similar to the previous one:

  • The first Mid tone syllable is performed higher than the following Low, albeit relatively high in pitch.
  • Syllables 3–10 are performed in a quick pendulum movement, the tones being short–long or unstressed–stressed. If the unstressed syllables are written with lower-case letters: l, m, h, and the stressed ones with capitals, this section is performed: lHlHlHlH, though the lexical tones are LHHHLHLM (aberrations underlined). This may be a case of melodic dominance over lexical tones.

Comparison 2

166:11, 167:4, and 167:11, which are nearly identical, are compared.

166:11 já pyq já né [le] bɔ̀ áŋ shḿ làq xhɔ̀ xhə lé la é
167:4 já pyq já né gàŋ áŋ shḿ làq xhɔ̀ xhə lé la é
167:11 já pyq já né gàŋ áŋ shḿ làq xhɔ̀ xhə lé la
beating three times on the burnt field, the soil becomes red

There are some differences at the very ends of the phrases in Examples 103105, but generally:

  • High lexical tones are the same as the tonal centre (C or B♭ respectively) or higher. Syllable 7 in Example 103 and syllable 6 in Example 104 (both áŋ) are slightly lower than the tonic, but higher than the preceding Low.
  • Low lexical tones are a second to a fourth (2–5 semitones) lower than the tonal centre.
  • Mid lexical tones are at the tonic level or lower.
  • One word’s lexical tone is uncertain (syllable 5, le, interpreted as Mid in Example 103).

Example 103 Melodyne graph of 166:11.

Example 104 Melodyne graph of 167:4.

Example 105 Melodyne graph of 167:11.

Characteristics of the Akha shaman performance

  • Lexical tones are realized.
  • High lexical tones: higher than the tonal centre, normally from a second to a fifth (2–7 semitones).
  • Mid lexical tones: at the level of the tonal centre, or slightly lower.
  • Low lexical tones: between a second (2 semitones) below the tonal centre, or a second (2 semitones) above.
  • Exceptions: There are some aberrations in one case of introductory movement and one case of fast pendulum movement.

Analysis 22 Performance of line-pairs

The performance is, for the most part, carried out with lines/linguistic phrases that constitute pairs, semantically and musically. Normally, the first line of a pair is slow and performed in long tones, whereas the second part is faster and compressed.

The lines 166:6 and 166:7 form a pair in which the first (6) is performed in long tones and the second (7) is compressed and shorter, while the number of syllables is almost the same.


zàq mja də lə̀q xhɔ̀ xhə lə̀q ó xhɔ̀ xhə lə̀q í / xhɔ̀ xhə lə̀q í ò xhɔ̀ xhə lə̀q hí xhɔ̀ í xhə
beating on the zàq-mja bamboo də lə̀q ó, beating lə̀q í / beating lə̀q í, beating lə̀q hí


zàq mja də lə̀q xhɔ̀ xhə lə̀q í ò xhɔ̀ xhə lə̀q / xhɔ̀ xhə lə́ í hí xhɔ̀ xhə lə̀q xhɔ̀ xhə lə́ hí
beating on the zàq-mja bamboo də lə̀q, beating lə̀q í, beating lə̀q ò / beating lə̀ í hí, beating lə̀q, beating lə́ hí

Example 106 Melodyne graph of 166:6. 23 Sjhá-gàw 166:6

166:6 in Example 106 is basically built on the combination short + long (of varying lengths). Normally it is 1 short + 1 long, but in two cases extra syllables are squeezed in: 3 short + 1 long. The two lines are about the same length, 23 and 24 syllables, respectively. While 166:6 is performed in 37 seconds (Example 106), the second line of the pair, 166:7, lasts for only 22 seconds (Example 107).

Example 107 Melodyne graph of 166:7. The number of syllables is indicated. The jump to the low octave on syllable 23 (lə̀) is the result of a creaky voice on the last long tone of the phrase. 26 Sjhá-gàw 166:7

The extent of contraction increases in the second line. More syllables are performed with very short durations, there are fewer long syllables, and those long syllables that do occur become shorter. This can be seen when corresponding phrases are compared (long syllables are marked with lines).

166:6 (Example 106): zàq mja də lə̀q––––––

166:7 (Example 107): zàq mja də lə̀q

166:6 (Example 106): xhɔ̀ xhə–––––– lə̀q ó––––

166:7 (Example 107): xhɔ̀ xhə lə̀q í ò––

Several syllables are squeezed in, the effect being that the words are pronounced in shorter time. At the end of the second line, 6–7 syllables are squeezed in between the longer ones.

166:7 (Example 107): xhə lə̀q xhɔ̀ xhə lə́–––– hí

xhɔ̀ xhə lə̀q xhɔ̀ xhə lə́ í–––––– hí


  • Mid lexical tones are principally about a fourth to a fifth (5–7 semitones) above the tonal centre (c).
  • High lexical tones are principally a fifth to a seventh (7–11 semitones) above.
  • Low lexical tones are principally a second to a fourth (2–5 semitones) above.
  • The long í in the middle of the line is performed high with a dip.
  • The initial melodic pattern is falling G#–F#–D (Example 107).


  • The first syllable is Low, but performed slightly high. That might be explained as a starting pattern.
  • Syllable 8 (ó) is performed low between two Low syllables (Example 106).
  • Syllable 11 (lə̀q) is performed high (glissando or upward-sliding movement). This might be explained by the movement up to the following High syllable. Similarly, syllable 13 (xhɔ̀) is performed high, while the melody falls from a preceding high pitch (Example 106).
  • Syllable 6 (xhə) is Mid but performed low, possibly because it is preceded and followed by Low syllables (Example 107).

Analysis 23 Performance of the final part

In the latter part of the performance, section 3, the performance changes character and becomes more melodious. The words here are about the shaman’s soul returning from the spirit world to the human world. The sub-sections start with həə (169:6, 10; 170:2, 5, 7, 12; 171:4, not transcribed; see, however, Example 110).


lo ma lɛq áŋ mɔ́ le mɛ́ nɔ̀ hὲ lo / ma àmjàŋ lɛq áŋ mɔ́ le mɛ́ i
seeing the big stone on the market, seeing the big stone on the market making us look like fools [for sitting on it even though it isn’t allowed].


àda nὲq ɣòq zá poq zá poq ɣòq dḿ dzɛ[?] í mía hὲ ɛ́
I’m going home on the spirits road


zə́ za mì màq tjhɔ̀ bɔ zə́ thé gà lá mía hὲ ɛ́
longing for the voice of the child-maker, who accompanies me

Example 108 Melodyne graph of 169:6. • 27 Sjhá-gàw 169:6

Characteristics of 169:6 (Example 108)

  • The performance starts with a long həə in a high position (about 200 Hz, not included in Example 108).
  • The final tone is about a fifth (7 semitones) lower and feels like a tonic (128 Hz).
  • Syllables 1–3 (disregarding the initial həə) are Mid-tone syllables, performed high and gradually lower.
  • High syllables (4, 5, 7, 15) are in all cases performed higher than the tonic, approximately a third higher (3–4 semitones, 155–165 Hz), but they may be performed rising or falling.
  • Syllables 16 and 18, both High, are performed at the tonic; this is the ending of the musical phrase.
  • Low syllables (8, 9, 12, 13) are performed approximately one fourth lower (5 semitones, around 95 Hz) than the tonic (but sometimes start much higher and slide down). One exception is syllable 12. It is the first of two consecutive Low lexical tones, preceded by a Mid-tone syllable performed as if High. Hence, it is performed at a lower pitch than the preceding syllable, and the pitch continues falling for the next Low.
  • Mid syllables (1, 2, 3, 6, 10, 11, 14, 17, 19):
    • Syllables 1–3 seem to be part of the initial formula at high pitches. Syllable 6 is between two High syllables and performed lower (at the tonic pitch).
    • Syllables 10–11 are between two Low syllables; syllable 10 is performed higher than the preceding Low and syllable 11 higher than the following Low (same level as a High syllable).
    • Syllable 14 is between a Low and a High syllable, and the pitch is in between.
    • Syllable 17 is between two High syllables performed at the tonic pitch. Syllable 17 is performed lower.
    • Syllable 19 is performed at the tonic or final pitch, as part of the final formula.

Example 109 Melodyne graph of 169:10.

Characteristics of 169:10 (Example 109)

  • The performance starts with a long həə in a high position (c. 200 Hz, not shown in Example 109). The final tone is about a fifth lower and feels like a tonic (7 semitones, 131 Hz).
  • Syllables 1–4 are part of the initial formula; they are performed high and gradually lower. In this case, syllables 1, 3, and 4 are Low and 2 is Mid.
  • Syllables 5, 7, 10, 13, and 14 are High. They are performed about a second higher than the tonic (2 semitones, 144–148 Hz), but the maxima are higher. Syllable 13 is really high, nearly a fifth (7 semitones, about 195 Hz).
  • The last syllable (17) is High. It ends approximately on the tonic but starts as a high.
  • Syllable 9 is Low at 119 Hz and 16 is Low at 116 Hz.
  • Syllables 6, 8, and 11 are Mid. Syllables 6 and 11 are placed between two Highs and performed still higher than those (180 and 195 Hz, respectively). Syllable 8 is located between a High and a Low.

Characteristics of 170:2 (Example 110)

  • Main outline as above (169:6 and 10).
  • Syllable 8 is High. It is preceded by a High and followed by a Low. The preceding and following syllables are performed as high and low, but syllable 8 is relatively low (tonic pitch). This may be because of a downward movement.

Example 110 Melodyne graph of 170:2.

Analysis 24 Variation

The basic delivery of the words is iambic and consists of a number of rhythm pairs, i.e. short + long, and a final long tone.5 This conclusion is based on performances by priests and shamans. The priest appears to have learnt the words by heart and repeats them fairly consistently from one performance to another, whereas the shaman performances are personal and open to changes from time to time, even during the same performance. This is evident in the section analysed here. The rhythm pairs are certainly there, but there is also a high degree of variation.

Example 111 Melodyne graph of rhythm pairs in the beginning of 166:6. Arrows indicate the short initial syllable of a rhythm pair consisting of 2 syllables.

Example 111 shows three rhythm pairs from the very first line of the digitized part of the performance. The short initial syllables of each pair are marked by arrows. It may be noted that even the first long tone (which occurs in some lines) contains a rhyme-pair: it is the second syllable of the line that is really long. Apart from such an exceptional long tone, the length of the second syllable in a pair is not constant, whereas there is not so much room for varying the length of the first short syllable. The tones are rising or falling, mainly – but not exclusively – being determined by the lexical tones.

When the speed increases, a line is compressed into a shorter time space, the differences in time between short and long become marginal, and the syllables are mainly distinguished by pitch and stress, sometimes – it would seem – disregarding lexical tones. In Example 112, the word shɛ́ is performed as if it had a Low lexical tone. In cases like this, it may be that the iambic pattern dominates over the lexical tones. This can be regarded as ‘condensed rhythm pairs’.

Example 112 Melodyne graph of condensed rhythm pairs in 166:9. Arrows show first syllables of condensed word-pairs.

Another way of performing more syllables in the same time slot is to squeeze in 2, 3, or even 4 short syllables before the long one (Example 113). Occasionally, the long syllable is divided as well.

Example 113 Melodyne graph of compressed rhythm pairs in 166:7. Boxes show short syllables squeezed into the first slot of two word-pairs (3 syllables in the first and 4 in the second). Lines show long syllables.

Sometimes a compressed rhythm pair is organized in a falling pattern, as is the first rhythm pair in Example 113 (zàq mja də lə̀q). In the very last section (section 3) of the performance, this falling pattern takes on a more ‘song-like’ form, perhaps signalling the return of the shaman’s spirit to the human world (Example 114).


Həə, thḿ lé mí màq dɔ̀ dáŋ mà dö tjhö́ ə
Həə, talking, but I can’t tell all the words to the end

Example 114 Melodyne graph of 170:5. The arrow shows stepwise downward melodic movement.

The Akha shaman performance template


  • Undulating, with much downward motion and a tonal centre (see Examples 100, 110).
  • After a couple of short syllables of prosodic phrasing, the melody moves to high and falls gradually, a pattern which generally lasts for two rhyme-pairs and is repeated within the same phrase.
  • Range: 20 semitones.


  • Slow regular pulse.
  • An iambic pattern dominates: iambic rhythmic pairs consisting of two syllables with the second one more stressed.


  • Litany with sections.


  • Prosodic phrases built on iambic rhythmic pairs with variations.
  • The prosodic phrases are organized in pairs in which the first is generally slow and long, while the second is faster and shorter. Such a pair may be continued with further parallel phrases.
  • The musical phrases are organized in pairs, the first generally being slow and long whereas the second is faster and shorter.
  • Contraction occurs: musical phrases are made shorter by squeezing several syllables into one space.
  • Final syllables are lengthened.

Initial/final formulae

  • The initial formula of a section consists of an initial long, high-pitched tone (Həə) and falls stepwise with the very first syllables.
  • The initial formula of a sub-section consists of a lengthened syllable in the mid pitch area.
  • In a final formula, the pitches level out at the tonic.

Word variations

  • Certain grammatical forms can be located only in certain positions within the rhythm pair, e.g. a negation can only be in the first position, noun and verb particles only in the second.
  • This location pattern is realized by the use of, for example, prefixes, suffixes, and filler syllables [lɔ].

Lexical tones

  • High lexical tones are generally performed above the tonic, a second (about 145 Hz) or a third (about 175 Hz), sometimes much higher.
  • Low lexical tones are performed significantly lower, often lower than the tonic, about a second to a fourth below the tonic, at about 95–115 Hz.
  • The pitches may be approached by an upward or left by a downward sliding motion.
  • Mid lexical tones tend to be between the High and the Low in cases where the lexical tone is realized.
  • In other cases, such syllables with Mid lexical tone are used in order to separate consecutive identical lexical word tones: for example, High–Mid–High or Low–Mid–Low can be performed high–higher–high or low–high–low.

It can be concluded that the description of this vocal expression may be considered as a performance template – or rather as performance templates, since it consists of different sections. There are melody-centred parts in initial and final formulae and certain other parts, like a fast pendulum movement, and there are also tone-centred parts. The basic structure is that of line-pairs with rhyme-pairs within the lines. Lines may be performed slow or fast and, in the latter case, short syllables are apt to be inserted. The general movement is from a high pitch to a low one. Lexical tones are realized, except when musical or rhythmical movement dominates.

1 Summarized from Hansson 1994: 26–27.
2 Hansson conducted extensive fieldwork among the Akha in northern Thailand in several periods. She stayed two years there in 1977–78, about two months per year in 1981–91, and paid shorter visits more or less yearly until 2013, when one of the main informants passed away. The shaman, Sjhá-gàw, and the priest Àbáw-gàw, who assisted with the translation, both passed away in 1984.
3 Actually, 166:9 has 3 additional syllables at the end that are disregarded here since the initial 10 syllables are identical to 167:8b.
4 Hansson 2014: 283–285. See Hansson 1991 for a detailed description of these factors in a death-ritual text.
5 Hansson 2014: 284.

In the borderland between song and speech

Vocal expressions in oral cultures


All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 95 95 73
PDF Downloads 38 38 30