Rich, 3D environments need sound to give the user a feeling of complete immersion. When visualization is provided by a cluster of computers, the sound processing can be handled by a single machine that is in turn connected to the speakers and other audio equipment. The Open Audio Server (OAS) provides this functionality via a network interface on a Linux machine. OAS is based loosely on the aging Windows DirectSound AudioServer project by Marc Schreier, and is backwards compatible with all known client applications.
The OAS project is separated into two components: server and client. The server component contains the source code and build system necessary to compile and run the actual audio server. The client component contains a client API to help applications communicate with the server. There is also sample code that can be compiled and played to demonstrate the server's functionality.
- CMake (minimum v2.8) for the build system
- OpenAL for the backend audio processing
- ALUT for file loading and OpenAL context management
- MXML (Minimal XML) library for XML parsing
- Optional, but highly recommended: FLTK (v1.3.0) for the GUI
- Optional: Doxygen (minimum 1.4.7) for documentation generation
- Install the dependencies above, if you have not already done so
- Get the OAS source code from the GitHub repository:
git clone git://github.com/CalVR/Open-Audio-Server.git
- Go to the
server/build
directory - If you want to configure build options yourself, do
ccmake ..
Otherwise, to go with the default options, you can docmake ..
- Assuming everything went okay with configuration, you can then compile with
make
- With sudo priviledges, you can install with
make install
- To generate documentation using Doxygen, use
make docs
- With sudo priviledges, you can install with
Before running the server, you will want to set some runtime options, such as which port you want the server to listen for connections on. Look at the sample configuration file `sample_oas_config.xml` to view this and other server options.
Pass the path to your XML configuration file as a command line argument to your OAS binary executable.
OAS ~/oas_config.xml
- Get the OAS source code from the GitHub repository:
git clone git://github.com/CalVR/Open-Audio-Server.git
- Go to the
client/build
directory - If you want to configure build options yourself (such as compiling examples), do
ccmake ..
Otherwise, to go with the default options, you can docmake ..
- Assuming everything went okay with configuration, you can then compile with
make
- With sudo priviledges, you can install with
make install
- To generate documentation using Doxygen, use
make docs
- With sudo priviledges, you can install with
There is a directory with source code containing basic examples that demonstrate the server's functionality. They are not compiled by default. The examples take various command line arguments, such as the IP address and port of the server the examples should connect to. By default, the examples will try to connect to the localhost, 127.0.0.1, at port 31231.
To enable compilation of the examples:
- Go to the
client/build
directory - Configure build options using
ccmake ..
- Enable the
BUILD_EXAMPLES
option, and then reconfigure by pressing'c'
- Enable the examples you want, and then reconfigure and generate the makefiles
- Compile with
make
- Your examples should compile into the
client/build/examples
directory.
All of the low level sound rendering is handled by OpenAL. There are two basic components to rendering sound: the listener, and the set of all individual sound sources.
The Listener represents the end-user inside the client application. For 3-D virtual environments, this is typically associated with the camera. The position, orientation, and velocity of the Listener will usually map to their corresponding counterparts for the camera.
Each sound source is associated with a particular sound file, and multiple sources can share the same sound file. These sound sources can also be moved around in the 3-D world. They all have their own position, velocity, and direction, as well as other properties like gain or pitch. OAS does not set a hard limit to the number of sound sources that can be allocated at the same time. However, the underlying OpenAL implementation and sound card will most likely have a limit, (usually around ~256). As long as the client application releases unneeded sound sources in a timely manner, this limit should not be an issue. The server maps each sound source to a unique handle number, which is greater than or equal to 0. The client can then use this handle to reference the corresponding sound source.
If a sound source is associated with a file that contains multiple channels of audio (such as stereo, or 5.1 surround), modifying the sound source's position, velocity, or direction will not have any effect. This is because multi-channel sources are assumed to be pre-mixed, and already contain all of the spatial information that is needed to render them in a 3D environment. Mono sources, or single-channel sources, can be thought of as a single point that emits sound, and therefore can be moved around in the 3D-world and rendered accurately. Although it is technically possible to handle positional audio rendering of multi-channel sound sources, it was not a primary goal of this project, and lies outside the scope of the desired feature set. Multi-channel/stereo sources can still be used very effectively for ambient/environmental audio. For example, stormy weather, forest noises, and background music are cases where the position of the sound is ambiguous or arbitrary. These can be great opportunities to use stereo or multi-channel premixed audio samples.
The doppler effect is when the pitch (frequency) of a sound source changes due to the sound source and the listener having a relative velocity to each other. In OpenAL and OAS, the velocity properties of both sound sources and the listener contribute only to the doppler shift calculations. As per the OpenAL specification, setting the velocity for the listener or sound sources will NOT cause the corresponding position property to be updated.
The position properties of the listener and sound sources are specified in terms of 3-D world coordinates, <x, y, z>. Orientation of the listener and direction of sound sources are also vectors in the same world coordinate system. The units used for position can be arbitrary, as long as they are consistent between different sound sources and the listener. Although the velocity property is also a vector in the 3D world coordinate system, the units are completely independent from the units used for position. Velocity, as mentioned above, is only used for the Doppler Effect, and its units must be the same as the units used to define the speed of sound. By default, the speed of sound is defined as 343.3 meters per second. Therefore, by default, velocities need to be specified in terms of meters per second. The speed of sound is a global parameter that can be modified, as shown here.
OAS uses OpenAL's "inverse distance clamped" model to calculate the effective gain/volume of a sound based
on the distance between the sound and the listener. This is equivalent to the IASIG I3DL2 model, and
is also the default distance model in OpenAL. In short, the attenuation is calculated as follows:
distance = Euclidean distance between sound source and listener
refDistance = the reference distance property of the sound source
rollOff = the rolloff factor property of the sound source
distance = MAX(distance, refDistance)
gain = refDistance / (refDistance + rollOff * (distance - refDistance))
Suppose you have two sounds - a helicopter, and someone talking (speech). The helicopter should be very audible in a wide surrounding area, whereas the speech should only be audible in a small area. To get the desired effect, and assuming units that are approximately in meters, we would want the reference distance of the helicopter to be ~25. On the other hand, the speech sound would have a reference distance of ~0.5. The rolloff factor for the helicopter would be in the range of ~0.1, while the speech sound could have a rolloff factor between 1 and 2. The helicopter would then be audible even if the listener was far away, and the listener would have to be in the immediate vicinity of the speech source in order to hear it. Naturally, a wide range of other values can be used depending on the application, but the desired effect would be the same.
When a sound source is directional, there are three important properties that determine how it is heard. Directional audio works by emulating a cone of sound. The effective gain of the sound changes based on where the listener is in this sound cone.
- Cone Inner Angle
- The cone inner angle specifies another conical region inside the main sound cone, where the sound is not attenuated. If the listener falls within this region, it will hear the sound at the maximum possible gain, after accounting for distance. For the default non-directional sources, this value is essentially 360 degrees.
- Cone Outer Angle
- The cone outer angle specifies the outer, large conical region, where sound is attenuated based on the angular distance away from the central inner cone, as defined above. When the listener is close to the inner sound cone, the sound will have a higher effective gain. When the listener is far away from the inner sound cone and is near the extremities of the outer cone, the sound will be attenuated greatly, and will have a low effective gain.
- Cone Outer Gain
- The cone outer gain specifies the gain to be applied when the listener falls completely outside of the conical regions defined by the above angles. By default, this is 0, so that if the sound source is pointing completely away from a listener, the listener will not be able to hear the source.
The server uses a plain-text (ASCII) based communication protocol to receive instructions from the client. This allows the client to be even a simple telnet connection, if desired. The general format for each message passed from client to the server is a four letter, uppercase message identifier, followed by any relevant parameters. Integer and floating point numbers are also passed as ASCII plaintext.
Parameters to messages can be separated by commas, semicolons, or extra spaces. The server's parser is lenient with how parameters are separated. So,
SSPO 4, 3.5, 6, 2.5is the same as
SSPO 4 3.5 6 2.5is the same as
SSPO 4; 3.5, 6, 2.5is the same as
SSPO 4;;;;; 3.5,, ;6; 2.5However, it is recommended that you stay true to a convention. The provided client API uses the following convention, to minimize the number of bytes sent over the network:
SSPO 4 3.5 6 2.5
The server can receive multiple messages per network packet. Each message should be separated by either a null character ('\0') or a whitespace character (' ', '\n', '\t').
Disclaimer: Clients should not attempt to fill up network packets before transmitting them to the server, especially if the different messages all modify the same sound sources. There are two reasons:
- The contents of each packet are processed very quickly, and many of the instructions in the packet may have no audible effect. For example, if the packet contains multiple instructions to change a sound source's position, only the very last of these instructions would have any kind of lasting effect that may be heard.
- Attempting to fill up packets before transmission can introduce a potentially noticeable delay between the client application and the server. This is why the provided client API explicitly sets the TCP_NODELAY option on the socket.
It should be noted that even with TCP_NODELAY enabled, the network layer of the client API will automatically buffer multiple messages into one packet if the messages are being added faster than they can be transmitted.
Message and Example(s) | Description | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GHDL filename
GHDL beachsound.wav | Load filename into a sound source, and return a handle for accessing the source. If filename cannot be found in the server's cache directory, or a sound source cannot be created for filename, the server will respond with "-1". Otherwise, the response will be a handle number, with values "0", "1", "2", "3", etc. | ||||||||||||
WAVE type frequency phase duration
WAVE 1 261.3 0.0 3.5 |
Generate a new sound source based on the specified simple waveform. Similar to GHDL, the server
will respond with a non-negative integer value in ASCII form on success, which will be the handle
for the generated source. On failure, the server will respond with "-1".
The first parameter, type, describes the shape of the wave that should be generated, and takes the following values:
Phase specifies the phase shift of the waveform, in degrees from -180 to +180, and can be a floating point. Duration specifies how long the sound should last through one playback, in seconds, and can be a floating point. The example creates a sinusoidal wave, with a frequency corresponding to middle-C, a phase shift of 0 degrees, and a duration of 3.5 seconds. | ||||||||||||
RHDL handle
RHDL 1 | Release the resources allocated for the source corresponding to handle. |
Message and Example(s) | Description | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PLAY handle
PLAY 5 | Play the source specified by handle. If the source is already playing, this will do nothing. | |||||||||||||||||||||
STOP handle
STOP 5 | Stop the source specified by handle. The playback position is reset to the beginning. | |||||||||||||||||||||
PAUS handle
PAUS 5 | Pause the source specified by handle. The current playback position is saved. To resume playback, PLAY must be used. Pausing a source that is already paused has no effect. | |||||||||||||||||||||
SSEC handle seconds
SSEC 7 2.95 |
Set the playback position, in seconds, of the source specified by handle. If the source is already playing,
playback will skip to the desired location. Otherwise, the playback position will be applied next time the source is
played. If the specified position is outside the bounds of the sound source, this will have no effect.
The example sets sound 7's play position to 2.95 seconds. | |||||||||||||||||||||
STAT handle
STAT 4 |
Retrieve the current state of the sound specified by handle. The server responds with an integer ranging from
0 to 5. Here are the possible values for the state:
|
Message and Example(s) | Description | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
SSPO handle x y z
SSPO 3 4.5 0 22.337 |
Set the position of the source specified by handle to <x, y, z>.
The position values can be floating point, and have a default of <0, 0, 0> The example sets sound 3's position to <4.5, 0, 22.337>. | |||||||||
SSVE handle x y z
SSVE 2 5.0 0 0 |
Set the velocity of the sound specified by handle to <x, y, z>. These values
are only used for doppler effect calculations. OpenAL does not use the velocity for updating the
sound's position, and OAS conforms to this specification. See the doppler effect
section for more information on how it works in OAS and OpenAL.
The example sets sound 2's velocity to 5.0 in the X direction. | |||||||||
SSVO handle gain
SSVO 17 0.85 |
Set the gain (volume) of the sound specified by handle to gain. A gain of 0 will mute the sound
completely, and the default gain for all sounds is 1. A gain of 0.5 corresponds to an attenuation of 6 dB. A gain
value greater than 1.0 (to amplify the sound) is possible. However, the final gain value (after source-to-listener
distance and orientation attentuation calculations) may be clamped by the sound card and drivers. If portability
between systems is a key issue, it is not recommended to have widespread use of gain values greater than 1.
The example sets sound 17's gain to 0.85, which will be quieter than the default. | |||||||||
FADE handle gain time
FADE 12 0.7 4 |
Gradually and linearly change the sound's current gain value to the specified gain value, over the duration
time, in seconds. This can be used to slowly fade a sound's volume in or out, such as, but not limited to, when
introducing objects/scenes or transitioning between different objects/scenes. Both gain and time can be
floating point.
The example fades sound twelve's current gain value (whatever it may be) to 0.7, over the course of 4 seconds. | |||||||||
SSLP handle doLoop
SSLP 3 1 |
Set the sound specified by handle to loop continuously, or to disable looping. The parameter doLoop takes
boolean values of 0 or 1, with 0 disabling looping, and 1 enabling looping. A sound with looping enabled
will resume playback from the beginning immediately after it is finished playing all the way through. Sounds have
looping disabled by default.
The example turns looping on in sound 3. | |||||||||
SSDI handle x y z
SSDI 3 0.5 0.0 -2.5 |
Set the direction of the sound specified by handle. The sound's direction is the vector specified by
<x, y, z>. If the sound does not point towards the listener,
the listener will not hear the sound at full volume. If the sound is pointing completely away from the listener, then
it will not be audible at all. When the direction vector is the default of <0,0,0>,
the sound source has no direction associated with it. Sound is then emitted equally in all directions, similarly to
a point light source.
The example sets the direction of sound 3 to <0.5, 0.0, -2.5>. | |||||||||
SSDI handle angle
SSDI 3 2.944 |
Set the direction of the sound specified by handle using angle, in radians. The angle is converted to a unit
vector in the X-Z plane using sine and cosine. There is no default angle value, because sound sources are not
directional by default.. Once you use this version of SSDI on a sound source, the sound source will remain directional
until you use "SSDI handle 0 0 0", to fully disable the directionality.
The example has the same result as the example for SSDI with a vector parameter. The direction of sound 3 is set to 2.944 radians. | |||||||||
SSDV handle angle gain
SSDV 3 2.944 0.869 |
Set both the direction and gain of the sound given by handle. The direction is specified by angle, in radians,
and the volume is set by gain. This effectively combines two messages: SSDI with the angle parameter and SSVO.
The example sets the direction and gain of sound 3 to 2.944 and 0.869, respectively. | |||||||||
SPIT handle pitch
SPIT 3 1.25 |
Set the pitch of sound specified by handle. Pitch can be any floating point greater than 0. A value of 1 is
the default. Doubling the pitch will increase the pitch of the sound by one octave, and halving the pitch will drop the
sound by one octave. Modifying the pitch also changes the playback speed, so a sound with pitch greater than 1 will play
at a faster speed, and a sound with a pitch less than 1 will play at a slower speed.
The example sets the pitch of sound 3 to 1.25. | |||||||||
SPAR handle parameter value
SPAR 8 1 0.001 SPAR 8 2 100 |
This instruction can be used to set sound rendering parameters for the sound source specified by handle. Currently,
the following parameters are available:
The first example sets the rolloff factor of sound source 8 to 0.001, which is appropriate for generic sounds and positional units of millimeters. The second example sets the reference distance of sound source 8 to 100. |
Message and Example(s) | Description |
---|---|
GAIN value
GAIN 0.0 | Change the gain (volume) for the listener to value. The default gain for the listener is 1.0. This can be useful especially to mute all sounds for the listener, as the example demonstrates. |
SLPO x y z
SLPO 20.1 0.08 -9.23 |
Set the listener's position to <x, y, z>, in the same coordinate
system as the sound sources. The default position is <0, 0, 0>
The example sets the position to <20.1, 0.08, -9.23>. |
SLOR aX aY aZ uX uY uZ
SLOR 1 0 0 0 1 0 |
Set the listener's orientation using two vectors. The first,
<aX, aY, aZ>, specifies the "look-at" direction. The second,
<uX, uY, uZ>, specifies the "up" direction. If two vectors are linearly
dependent, the behavior is undefined. They do not need to be normalized. The defaults are
<0, 0, -1> for the look-at vector and <0, 1, 0> for the
up vector.
The example sets the listener's orientation to <1, 0, 0> for the look-at vector and <0, 1, 0> for the up vector. |
SLVE x y z
SLVE -2.0 0 0 |
Set the listener's velocity to <x, y, z>. Similar to setting a sound source's
velocity, the server does not update the position of the listener based on this specified velocity. It is only used as a
parameter for doppler effect calculations. See the doppler effect
section for more information.
The example sets the velocity of the listener to -2.0 in the X direction. |
Message and Example(s) | Description | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
PARA parameter value
PARA 1 343300 PARA 3 0.001 |
This instruction can be used to set various global sound rendering parameters. Currently, the following parameters are available:
The first example sets the speed of sound to 343300, corresponding to the units of millimeters per second, through the medium of air. The second example sets the default rolloff factor to 0.001, which is appropriate for generic sounds and positional units of millimeters. |
Message and Example(s) | Description |
---|---|
SSVE handle speed
SSVE 6 3.5 |
(deprecated) Set the sound's speed. The sound's current direction is used to compute the effective velocity. This assumes that the sound is moving in the same direction it is facing. Note that if the sound has no direction associated with it, the behavior is undefined. This message is leftover to support old applications that were tailored to the old Windows AudioServer project. New applications should avoid using this functionality, or use at their own risk. The example sets sound 6's speed to 3.5. |
There were some messages that had inconsistent, incomplete, or altogether broken support in the old Windows AudioServer. This meant that in order to maintain full backwards compatibility, OAS would have had to implement these deprecated messages in a similar, broken manner. Instead, support for these messages has been dropped completely, under the assumption that old client applications would not be dependent on something that was broken. Support may be added in the future as needed.
Messages that are no longer supported: |
SSDR handle angle |
SSRV handle angle gain |
SSRV handle x y z gain |
- Extend the osgAudio nodekit to add support for the OAS client API
- Enhance sockets code to add support for multiple connected clients. Asynchronous socket I/O.
Software Developers:
Project Advisors: Misc. Development Assistance:- Philip Weber
- Andrew Prudhomme