principal metaphor for musical composition must change from one of architecture
to one of chemistry..... We may imagine a new personality combing the
beach of sonic possibilities, not someone who selects, rejects, classifies
and measures the acceptable, but a chemist who can take any pebble, and,
by numerical sorcery, separate its constituents, merge the constituents
from two quite different pebbles and, in fact, transform black pebbles
into gold pebbles.
takes a completely fresh look at the craft of musical composition, for
an age when all imaginable sounds have become accessible to us, and the
computer allows us to transform these in any way we desire.
Compositional techniques are discussed in detail, with applications to
radically different kinds of sound material and each is illustrated by
appropriate sound examples.
The text concludes with an extensive appendix in which all processes described
in the text are laid bare in diagrammatic form, avoiding mathematical
or technical language.
The accompanying CD of music examples, illustrates each point in the text.
Below is Chapter 3 from 'Audible Design' in full with RealAudio excerpts
from the CD.
Harmonicity - Inharmonicity
Noise, "Noisy Noise" & Complex Strata
Spectral Fission and Constructive Distortion
Spectral manipulation in the frequency
Spectral manipulation in the time
WHAT IS TIMBRE?
The spectral characteristics of sounds have, for so long, been inaccessible
to the composer that we have become accustomed to lumping together all
aspects of the spectral structure under the catch-all term "timbre"
and regarding it as an elementary, if unquantifiable, property of sounds.
Most musicians with a traditional background almost equate "timbre"
with instrument type (some instruments producing a variety of "timbres"
e.g. pizz, arco, legno, etc). Similarly, in the earliest analogue studios,
composers came into contact with oscillators producing featureless pitches,
noise generators, producing featureless noise bands, and "envelope
generators" which added simple loudness trajectories to these elementary
sources. This gave no insight into the subtlety and multidimensionality
of sound spectra.
However, a whole book could be devoted to the spectral characteristics
of sounds. The most important feature to note is that all sound spectra
of musical interest are time-varying, either in micro-articulation or
HARMONICITY - INHARMONICITY
As discussed in Chapter 2, if the partials which make up a sound have
frequencies which are exact multiples of some frequency in the audible
range (known as the fundamental) and, provided this relationship persists
for at least a grain-size time-frame, the spectrum fuses and we hear a
specific (possibly gliding) pitch. If the partials are not in this relationship,
and provided the relationships (from window to window) remain relatively
stable, the ear's attempts to extract harmonicity (whole number) relationships
amongst the partials will result in our hearing several pitches in the
sound. These several pitches will trace out the same micro-articulations
and hence will be fused into a single percept (as in a bell sound). The
one exception to this is that certain partials may decay more quickly
than others without destroying this perceived fusion (as in sustained
acoustic bell sounds).
example 3.1 we hear the syllable "ko->u" being gradually
spectrally stretched : Appendix p19). This means that the partials are
moved upwards in such a way that their whole number relationships are
preserved less and less exactly and eventually lost. (See Diagram 1).
Initially, the sound appears to have an indefinable "aura" around
it, akin to phasing, but gradually becomes more and more bell-like.
It is important to understand that this transformation "works"
due to a number of factors apart from the harmonic\inharmonic transition.
As the process proceeds, the tail of the sound is gradually time-stretched
to give it the longer decay time we would expect from an acoustic bell.
More importantly, the morphology (changing shape) of the spectrum is already
bell-like. The syllable "ko->u" begins with a very short
broad band spectrum with lots of high-frequency information ("k")
corresponding to the initial clang of a bell. This leads immediately into
a steady pitch, but the vowel formant is varied from "o" to
"u", a process which gradually fades out the higher partials
leaving the lower to continue. Bell sounds have this similar property,
the lower partials, and hence the lower heard pitches, persisting longer
than the higher components. A different initial morphology would have
produced a less bell-like result.
This example (used in the composition of Vox
5) illustrates the importance of the time-varying structure of the
spectrum (not simply its loudness trajectory).
We may vary this spectral stretching process by changing the overall stretch
(i.e. the top of the spectrum moves further up or further down from its
initial position) and we may vary the type of stretching involved. (Appendix
Different types of stretching will produce different relationships between
the pitches heard within the sounds.
Note that, small stretches produce an ambiguous area in which the original
sound appears " coloured" in some way rather than genuinely
example 3.3). Inharmonicity does not therefore necessarily mean multipitchedness.
Nor (as we have seen from the "ko->u" example), does it mean
bell sounds. Very short inharmonic sounds will sound percussive, like
drums, strangely coloured drums, or akin to wood-blocks (Sound
example 3.4). These inharmonic sounds can be transposed and caused
to move (subtle or complex pitch-gliding) just like pitched sounds (also
see Chapter 5 on Continuation).
Proceeding further, the spectrum can be made to vary, either slowly or
quickly, between the harmonic and the inharmonic creating a dynamic interpolation
between a harmonic and an inharmonic state (or between any state and something
more inharmonic) so that a sound changes its spectral character as it
unfolds. We can also imagine a kind of harmonic to inharmonic vibrato-like
fluctuation within a sound. (Sound
Once we vary the spectrum too quickly, and especially if we do so irregularly,
we no longer perceive individual moments or grains with specific spectral
qualities. We reach the area of noise (see below).
When transforming the harmonicity of the spectrum, we run into problems
about the position of formants akin to those encountered when pitch-changing
(see Chapter 2) and to preserve the formant characteristics of the source
we need to preserve the spectral contour of the source and apply it to
the resulting spectrum (see formant preserving spectral manipulation :
In any window, the contour of the spectrum will have peaks and troughs.
The peaks, known as formants, are responsible for such features as the
vowel-state of a sung note. For a vowel to persist, the spectral contour
(and therefore the position of the peaks and troughs) must remain where
it is even if the partials themselves move. (See Appendix p10).
As we know from singing, and as we can deduce from this diagram, the frequencies
of the partials in the spectrum (determining pitch(es), harmonicity-inharmonicity,
noisiness) and the position of the spectral peaks, can be varied independently
of each other. This is why we can produce coherent speech while singing
or whispering. (Sound
Because most conventional acoustic instruments have no articulate time-varying
control over spectral contour (one of the few examples is hand manipulable
brass mutes), the concept of formant control is less familiar as a musical
concept to traditional composers. However, we all use articulate formant
control when speaking.
It is possible to extract the (time varying) spectral contour from one
signal and impose it on another, a process originally developed in the
analogue studios and known as vocoding (no connection with the phase vocoder).
For this to work effectively, the sound to be vocoded must have energy
distributed over the whole spectrum so that the spectral contour to be
imposed has something to work on. Vocoding hence works well on noisy sounds
(e.g. the sea) or on sounds which are artificially prewhitened by adding
broad band noise, or subjected to some noise producing distortion process.
It is also possible to normalise the spectrum before imposing the new
contour. This process is described in Chapter 2, and in the under formant
preserving spectral manipulation in Appendix p17.
Formant-variation of the spectrum does not need to be speech-related and,
in complex signals, is often more significant than spectral change. We
can use spectral freezing to freeze certain aspects of the spectrum at
a particular moment. We hold the frequencies of the partials, allowing
their loudnesses to vary as originally. Or we can hold their amplitudes
stationary, allowing the frequencies to vary as originally. In a complex
signal, it is often holding steady the amplitudes, and hence the spectral
contour, which produces a sense of "freezing" the spectrum when
we might have anticipated that holding the frequencies would create this
percept more directly. (Sound
NOISE, "NOISY NOISE" & COMPLEX SPECTRA
Once the spectrum begins to change so rapidly and irregularly that we
cannot perceive the spectral quality of any particular grain, we hear
"noise". Noise spectra are not, however, a uniform grey area
of musical options (or even a few shades of pink and blue) which the name
(and past experience with noise generators) might suggest. The subtle
differences between unvoiced staccato "t", "d", "p",
"k", "s", "sh", "f", the variety
amongst cymbals and unpitched gongs give the lie to this.
Noisiness can be a matter of degree, particularly as the number of heard
out components in an inharmonic spectrum increases gradually to the point
of noise saturation. It can, of course, vary formant-wise in time: whispered
speech is the ideal example. It can be more or less focused towards static
or moving pitches, using band-pass filters or delay (see Chapter 2), and
it can have its own complex internal structure. In Sound
example 3.9 we hear portamentoing inharmonic spectra created by filtering
noise. This filtering is gradually removed and the bands become more noise-like.
A good example of the complexity of noise itself is "noisy noise",
the type of crackling signal one gets from very poor radio reception tuned
to no particular station, from masses of broad-band click-like sounds
(either in regular layers - cicadas - or irregular - masses of breaking
twigs or pebbles falling onto tiles - or semi-regular - the gritty vocal
sounds produced by water between the tongue and palate in e.g. Dutch "gh")
or from extremely time-contracted speech streams. There are also fluid
noises produced by portamentoing components, e.g. the sound of water falling
in a wide stream around many small rocks. These shade off into the area
of "Texture" which we will discuss in Chapter 8. (Sound
These examples illustrate that the rather dull sounding word "noise"
hides whole worlds of rich sonic material largely unexplored in detail
by composers in the past.
Two processes are worth mentioning in this respect. Noise with transient
pitch content like water falling in a stream (rather than dripping, flowing
or bubbling), might be pitch-enhanced by spectral tracing (see below).
example 3.11). Conversely, all sounds can be amassed to create a sound
with a noise-spectrum if superimposed randomly in a sufficiently frequency-dense
and time-dense way. At the end of Sound
example 3.9 the noise band finally resolves into the sound of voices.
The noise band was in fact simply a very dense superimposition of many
Different sounds (with or without harmonicity, soft or hard-edged, spectrally
bright or dull, grain-like, sustained, evolving, iterated or sequenced)
may produce different qualities of noise (see Chapter 8 on Texture). There
are also undoubtedly vast areas to be explored at the boundaries of inharmonicity/noise
and time-fluctuating-spectrum/noise. (Sound
A fruitful approach to this territory might be through spectral focusing,
described in Chapter 2 (and Appendix p20). This allows us to extract,
from a pitched sound, either the spectral contour only, or the true partials,
and to then use this data to filter a noise source. The filtered result
can vary from articulated noise formants (like unvoiced speech) following
just the formant articulation of the original source, to a reconstitution
of the partials of the the original sound (and hence of the original sound
itself). We can also move fluidly between these two states by varying
the analysis window size through time. This technique can be applied to
any source, whether it be spectrally pitched (harmonic), or inharmonic,
and gives us a means of passing from articulate noise to articulate not-noise
spectra in a seamless fashion.
Many of the sound phenomena we have discussed in this section are complex
concatenations of simpler units. It is therefore worthwhile to note that
any arbitrary collection of sounds, especially mixed in mono, has a well-defined
time-varying spectrum - a massed group of talkers at a party; a whole
orchestra individually, but simultaneously, practising their difficult
passages before a concert. At each moment there is a composite spectrum
for these events and any portion of it could be grist for the mill of
The already existing structure of a spectrum can be utilised to enhance
the original sound. This is particularly important with respect to the
onset portion of a sound and we will leave discussion of this until Chapter
4. We may reinforce the total spectral structure, adding additional partials
by spectral shifting the sound (without changing its duration) (Appendix
p18) and mixing the shifted spectrum on the original. As the digital signal
will retain its duration precisely, all the components in the shifted
signal will line up precisely with their non-shifted sources and the spectrum
will be thickened while retaining its (fused) integrity. Octave enhancement
is the most obvious approach but any interval of transposition (e.g. the
tritone) might be chosen. The process might be repeated and the relative
balance of the components adjusted as desired. (Appendix p48). (Sound
A further enrichment may be achieved by mixing an already stereo spectrum
with a pitch-shifted version which is left-right inverted. Theoretically
this produces merely a stage-centre resultant spectrum but in practice
there appear to be frequency dependent effects which lend the resultant
sound a new and richer spatial "fullness". (Sound
Finally, we can introduce a sense of multiple-sourcedness (!) to a sound
(e.g. make a single voice appear crowd-like) by adding small random time-changing
perturbations to the loudnesses of the spectral components (spectral shaking).This
mimics part of the effect of several voices attempting to deliver the
same information. (Sound
example 3.15). We may also perturb the partial frequencies (Sound
Once we understand that a spectrum contains many separate components,
we can imagine processing the sound to isolate or separate these components.
Filters, by permitting components in some frequency bands to pass and
rejecting others, allow us to select parts of the spectrum for closer
observation. With dense or complex spectra the results of filtering can
be relatively unexpected revealing aspects of the sound material not previously
appreciated. A not-too-narrow and static band pass filter will transform
a complex sound-source (usually) retaining its morphology (time-varying
shape) so that the resulting sound will relate to the source sound through
its articulation in time. (Sound
A filter may also be used to isolate some static or moving feature of
a sound. In a crude way, filters may be used to eliminate unwanted noise
or hums in recorded sounds, especially as digital filters can be very
precisely tuned. In the frequency domain, spectral components can be eliminated
on a channel-by-channel basis, either in terms of their frequency location
(using spectral splitting to define a frequency band and setting the band
loudness to zero) or in terms of their timevarying relative loudness (spectral
tracing will eliminate the N least significant, i.e. quietest, channel
components, window by window. At an elementary level this can be used
for signal-dependent noise reduction. But see also "Spectral Fission"
below). More radically, sets of narrow band pass filters can be used to
force a complex spectrum onto any desired Hpitch set (HArmonic field in
the traditional sense). (Sound
In a more signal sensitive sense a filter or a frequency-domain channel
selector can be used to separate some desired feature of a sound, e.g.
a moving high frequency component in the onset, a particular strong middle
partial etc, for further compositional development. In particular, we
can separate the spectrum into parts (using band pass filters or spectral
splitting) and apply processes to the N separated parts (e.g. pitch-shift,
add vibrato) and then recombine the two parts perhaps reconstituting the
spectrum in a new form. However, if the spectral parts are changed too
radically e.g. adding completely different vibrato to each part, they
will not fuse when remixed, but we may be interested in the gradual dissociation
of the spectrum. This leads us into the next area.
Ultimately we may use a procedure which follows the partials themselves,
separating the signal into its component partials (partial tracking).
This is quite a complex task which will involve pitch tracking and pattern-matching
(to estimate where the partials might lie) on a window by window basis.
Ideally it must deal in some way with inharmonic sounds (where the form
of the spectrum is not known in advance) and noise sources (where there
are, in effect, no partials). This technique is however particularly powerful
as it allows us to set up an additive synthesis model of our analysed
sound and thereby provides a bridge between unique recorded sound-events
and the control available through synthesis.
SPECTRAL FISSION & CONSTRUCTIVE DISTORTION
We have mentioned several times the idea of spectral fusion where the
parallel micro-articulation of the many components of a spectrum causes
us to perceive it as a unified entity - in the case of a harmonic spectrum,
as a single pitch. The opposite process, whereby the spectral components
seem to split apart, we will describe as spectral fission. Adding two
different sets of vibrato to two different groups of partials within the
same spectrum will cause the two sets of partials to be perceived independently
- the single aural stream will split into two. (Sound
Spectral fission can be achieved in a number of quite different ways in
the frequency domain. Spectral arpeggiation is a process that draws our
attention to the individual spectral components by isolating, or emphasising,
each in sequence. This can be achieved purely vocally over a drone pitch
by using appropriate vowel formants to emphasise partials above the fundamental.
The computer can apply this process to any sound-source, even whilst it
is in motion. (Sound
Spectral tracing strips away the spectral components in order of increasing
loudness (Appendix p25). When only a few components are left, any sound
is reduced to a delicate tracery of (shifting) sine-wave constituents.
Complexly varying sources produce the most fascinating results as those
partials which are at any moment in the permitted group (the loudest)
change from window to window. We hear new partials entering (while others
leave) producing "melodies" internal to the source sound. This
feature can often be enhanced by time-stretching so that the rate of partial
change is slowed down. Spectral tracing can also be done in a time-variable
manner so that a sound gradually dissolves into its internal sine-wave
Spectral time-stretching, which we will deal with more fully in Chapter
11, can produce unexpected spectral consequences when applied to noisy
sounds. In a noisy sound the spectrum is changing too quickly for us to
gain any pitch or inharmonic multi-pitched percept from any particular
time-window. Once, however, we slow down the rate of change the spectrum
becomes stable or stable-in-motion for long enough for us to hear out
the originally instantaneous window values. In general, these are inharmonic
and hence we produce a "metallic" inharmonic (usually moving)
ringing percept. By making perceptible what was not previously perceptible
we effect a "magical" transformation of the sonic material.
Again, this can be effected in a time-varying manner so that the inharmonicity
emerges gradually from within the stretching sound. (Sound
Alternatively we may elaborate the spectrum in the time-domain by a process
of constructive distortion. By searching for wavesets (zero-crossing pairs:
Appendix p50) and then repeating the wavesets before proceeding to the
next (Waveset time-stretching) we may time stretch the source without
altering its pitch (see elsewhere for the limitations on this process).
(Appendix p55). Wavesets correspond to wavecycles in many pitched sounds,
but not always (Appendix p50). Their advantage in the context of constructive
distortion is that very noisy sounds, having no pitch, have no true wavecycles
- but we can still segment them into wavesets (Appendix p50).
In a very simple sound source (e.g. a steady waveform, from any oscillator)
waveset time-stretching produces no artefacts. In a complexly evolving
signal (especially a noisy one) each waveset will be different, often
radically different, to the previous one, but we will not perceptually
register the content of that waveset in its own right (see the discussion
of time-frames in Chapter 1). It merely contributes to the more general
percept of noisiness. The more we repeat each waveset however, the closer
it comes to the grain threshold where we can hear out the implied pitch
and the spectral quality implicit in its waveform. With a 5 or 6 fold
repetition therefore, the source sound begins to reveal a lightning fast
stream of pitched beads, all of a slightly different spectral quality.
A 32 fold repetition produces a clear "random melody" apparently
quite divorced from the source. A three or four fold repetition produces
a "phasing"-like aura around the sound in which a glimmer of
the bead stream is beginning to be apparent. (Sound
Again, we have a compositional process which makes perceptible aspects
of the signal which were not perceptible. But in this case, the aural
result is entirely different. The new sounds are time-domain artefacts
consistent with the original signal, rather than revelations of an intrinsic
internal structure. For this reason I refer to these processes as constructive
SPECTRAL MANIPULATION IN THE FREQUENCY DOMAIN
There are many other processes of spectral manipulation we can apply to
signals in the frequency domain. Most of these are only interesting if
we apply them to moving spectra because they rely on the interaction of
data in different (time) windows - and if these sets of data are very
similar we will perceive no change.
We may select a window (or sets of windows) and freeze either the frequency
data or the loudness data which we find there over the ensuing signal
(spectral freezing). If the frequency data is held constant, the channel
amplitudes (loudnesses) continue to vary as in the original signal but
the channel frequencies do not change. If the amplitude data is held constant
then the channel frequencies continue to vary as in the original signal.
As mentioned previously, in a complex signal, holding the amplitude data
is often more effective in achieving a sense of "freezing" the
signal. We can also freeze both amplitude and frequency data but, with
a complex signal, this tends to sound like a sudden splice between a moving
signal and a synthetic drone.
(Sound example 3.24).
We may average the spectral data in a each frequency-band channel over
N time-windows (spectral blurring) thus reducing the amount of detail
available for reconstructing the signal. This can be used to "wash
out" the detail in a segmented signal and works especially effectively
on spikey crackly signals (those with brief, bright peaks). We can do
this and also reduce the number of partials (spectral trace & blur)
and we may do either of these things in a time-variable manner so that
the details of a sequence gradually become blurred or gradually emerge
as distinct. (Sound
Finally, we may shuffle the time-window data in any way we choose (spectral
shuffling), shuffling windows in groups of 1 or 2 or 64 etc. With large
numbers of windows in a shuffled group we produce an audible rearrangements
of signal segments, akin to brassage, but with only a few windows we create
another process of sound blurring, particularly apparent in rapid sequences.
SPECTRAL MANIPULATION IN THE TIME-DOMAIN
A whole series of spectral transformations can be effected in the time-domain
by operating on wavesets defined as pairs of zero-crossings. Bearing in
mind that these do not necessarily correspond to true wavecycles, even
in relatively simple signals, we will anticipate producing various unexpected
artefacts in complex sounds. In general, the effects produced will not
be entirely predictable, but they will be tied to the morphology (time
changing characteristics) of the original sound. Hence the resulting sound
will be clearly related to the source in a way which may be musically
useful. As this process destroys the original form of the wave I will
refer to it as destructive distortion. The following manipulations suggest
We may replace wavesets with a waveform of a different shape but the same
amplitude (waveset substitution : Appendix p52). Thus we may convert all
the wavesets to square waves, triangular waves, sine-waves, or even user-defined
waveforms. Superficially, one might expect that sine-wave replacement
would in some way simplify, or clarify, the spectrum. Again, this may
be true with simple sound input but complex sounds are just changed in
spectral "feel" as a rapidly changing sine-wave is no less perceptually
chaotic than a rapidly changing arbitrary wave-shape. In the sound examples
the sound with a wood-like attack has wavesets replaced by square waves,
and then by sine waves. Two interpolating sequences (see Chapter 12) between
the 'wood' and each of the transformed sounds is then created by inbetweening
(see Appendix p46 & Chapter 12). (Sound
the half-wave-cycles (waveset inversion: Appendix p51) usually produces
an "edge" to the spectral characteristics of the sound. We might
also change the spectrum by applying a power factor to the waveform shape
itself (waveset distortion: Appendix p52) (Sound
We may average the waveset shape over N wavesets (waveset averaging).
Although this process appears to be similar to the process of spectral
blurring, it is in fact quite irrational, averaging the waveset length
and the wave shape (and hence the resulting spectral contour) in perceptually
unpredictable way. More interesting (though apparently less promising)
we may replace N in every M wavesets by silence (waveset omission : Appendix
p51). For example, every alternate waveset may be replaced by silence.
Superficially, this would appear to be an unpromising approach but we
are in fact thus changing the waveform. Again, this process introduces
a slightly rasping "edge" to the sound quality of the source
sound which increases as more "silence" is introduced. (Sound
We may add 'harmonic components' to the waveset in any desired proportions
(waveset harmonic distortion) by making copies of the waveset which are
1/2 as short (1/3 as short etc) and superimposing 2 (3) of these on the
original waveform in any specified amplitude weighting. With an elementary
waveset form this adds harmonics in a rational and predictable way. With
a complex waveform, it enriches the spectrum in a not wholly predictable
way, though we can fairly well predict how the spectral energy will be
redistributed. (Appendix p52).
We may also rearrange wavesets in any specified way (waveset shuffling
: Appendix p51) or reverse wavesets or groups of N wavesets (waveset reversal
: Appendix p51). Again, where N is large we produce a fairly predictable
brassage of reverse segments, but with smaller values of N the signal
is altered in subtle ways. Values of N at the threshold of grain perceptibility
are especially interesting. Finally, we may introduce small, random changes
to the waveset lengths in the signal (waveset shaking : Appendix p51).
This has the effect of adding "roughness" to clearly pitched
Such distortion procedures work particularly well with short sounds having
distinctive loudness trajectories. In the sound example a set of such
sounds, suggesting a bouncing object, is destructively distorted in various
ways, suggesting a change in the physical medium in which the 'bouncing'
takes place (e.g. bouncing in sand). (Sound
In a sense, almost any manipulation of a signal will alter its spectrum.
Even editing (most obviously in very short time-frames in brassage e.g.)
alters the time-varying nature of the spectrum. But, as we have already
made clear, many of the areas discussed in the different chapters of this
book overlap considerably. Here we have attempted to focus on sound composition
in a particular way, through the concept of "spectrum". Spectral
thinking is integral to all sound composition and should be borne in mind
as we proceed to explore other aspects of this world.