The combined source and listener gains now can't exceed a multiplier of 16
(~24dB). This is to avoid mixes getting out of control with large volume
boosts, which reduces the effective precision given by floating-point.
This appears to be how Creative's Windows drivers handle it, and is necessary
for at least the Windows version of UT2k4 (otherwise it tries to play a source
while suspended, checks and sees it's stopped, then kills it before it's given
a chance to start playing).
Consequently, the internal properties it gets mixed with are determined by what
the source properties are at the time of the play call, and the listener
properties at the time of the suspend call.
This does not change alDeferUpdatesSOFT, which will still hold the play state
change until alProcessUpdatesSOFT.
Note that this now also causes all playing sources to update when an effect
slot is updated. This is a bit wasteful, as it should only need to re-update
sources that are using the effect slot (and only when a relevant property is
changed), but it's good enough. Especially with deferring since all playing
sources are going to get updated on the process call anyway.
This allows us to not have to play around with trying to avoid duplicate state
pointers, since the reference count will ensure they're deleted as appropriate.
The only caveat is that the mixer is not allowed to decrement references, since
that can cause the object to be freed (which the mixer code is not allowed to
do).
The source's voice holds a copy of the last properties it received, so listener
updates can make sources recalculate internal properties from that stored copy.
Last time this attempted to average the HRIRs according to their contribution
to a given B-Format channel as if they were loudspeakers, as well as averaging
the HRIR delays. The latter part resulted in the loss of the ITD (inter-aural
time delay), a key component of HRTF.
This time, the HRIRs are averaged similar to above, except instead of averaging
the delays, they're applied to the resulting coefficients (for example, a delay
of 8 would apply the HRIR starting at the 8th sample of the target HRIR). This
does roughly double the IR length, as the largest delay is about 35 samples
while the filter is normally 32 samples. However, this is still smaller the
original data set IR (which was 256 samples), it also only needs to be applied
to 4 channels for first-order ambisonics, rather than the 8-channel cube. So
it's doing twice as much work per sample, but only working on half the number
of samples.
Additionally, since the resulting HRIRs no longer rely on an extra delay line,
a more efficient HRTF mixing function can be made that doesn't use one. Such a
function can also avoid the per-sample stepping parameters the original uses.
Currently incomplete, as second- and third-order output will not correctly
handle B-Format input buffers. A standalone up-sampler will be needed, similar
to the high-quality decoder.
Also, output is ACN ordering with SN3D normalization. A config option will
eventually be provided to change this if desired.
Sometimes the mixer is temporarily prevented from applying updates, when
multiple sources need to be updated simultaneously for example, but does not
prevent mixing. If the mixer runs during that time and a voice was just
started, it would've mixed the voice without any internal properties being set
for it.
This will also allow backends to better synchronize the tracked clock time with
the device output latency, without necessarily needing to lock if the backend
API can allow for it.
If an unapplied update was superceded, it would be placed in the freelist with
its effect state object intact. This would cause another update with the same
effect state object to be placed into the freelist as well, or worse, cause it
to get deleted while in use when the container had its effect state cleared.
This necessitates a change in how source updates are handled. Rather than just
being able to update sources when a dependent object state is changed (e.g. a
listener gain change), now all source updates must be proactively provided.
Consequently, apps that do not utilize any deferring (AL_SOFT_defer_updates or
alcSuspendContext/alcProcessContext) may utilize more CPU since it'll be
filling out more update containers for the mixer thread to use.
The upside is that there's less blocking between the app's calling thread and
the mixer thread, particularly for vectors and other multi-value properties
(filters and sends). Deferring behavior when used is also improved, since
updates that shouldn't be applied yet are simply not provided. And when they
are provided, the mixer doesn't have to ignore them, meaning the actual
deferring of a context doesn't have to synchrnously force an update -- the
process call will send any pending updates, which the mixer will apply even if
another deferral occurs before the mixer runs, because it'll still be there
waiting on the next mixer invocation.
There is one slight bug introduced by this commit. When a listener change is
made, or changes to multiple sources while updates are being deferred, it is
possible for the mixer to run while the sources are prepping their updates,
causing some of the source updates to be seen before the other. This will be
fixed in short order.
Similar to the listener, separate containers are provided atomically for the
mixer thread to apply updates without needing to block, and a free-list is used
to reuse container objects.
A couple things to note. First, the lock is still used when the effect state's
deviceUpdate method is called to prevent asynchronous calls to reset the device
from interfering. This can be fixed by using the list lock in ALc.c instead.
Secondly, old effect states aren't immediately deleted when the effect type
changes (the actual type, not just its properties). This is because the mixer
thread is intended to be real-time safe, and so can't be freeing anything. They
are cleared away when updates reuse the container they were kept in, and they
don't incur any extra processing cost, but there may be cases where the memory
is kept around until the effect slot is deleted.
This uses a separate container to provide the relevant properties to the
internal update method, using atomic pointer swaps. A free-list is used to
avoid having too many individual containers.
This allows the mixer to update the internal listener properties without
requiring the lock to protect against async updates. It also allows concurrent
read access to the user-facing property values, even the multi-value ones (e.g.
the vectors).
Unfortunately they conflict with AL_EXT_SOURCE_RADIUS, as AL_SOURCE_RADIUS and
AL_BYTE_RW_OFFSETS_SOFT share the same source property value. A replacement for
AL_SOFT_buffer_samples will eventually be made.
Instead of looping over all the coefficients for each channel with multiplies,
when we know only one will have a non-0 factor for ambisonic mixing buffers,
just index the one with a non-0 factor.
Could really do with some optimizations to the mixing gain calculations. For
ambisonic targets, the coefficients will only have 1 non-0 entry for each
output, so the double loop in unnecessarily wasteful. Similarly, most uses
won't need a full height encoding either, so a horizontal-only or mixed-order
target could reduce the number of channels.
This uses a virtual B-Format buffer for mixing, and then uses a dual-band
decoder for improved positional quality. This currently only works with first-
order output since first-order input (from the AL_EXT_BFROMAT extension) would
not sound correct when fed through a second- or third-order decoder.
This also does not currently implement near-field compensation since near-field
rendering effects are not implemented.
There were phase issues caused by applying HRTF directly to the B-Format
channels, since the HRIR delays were all averaged which removed the inter-aural
time-delay, which in turn removed significant spatial information.
This mixes to a 4-channel first-order ambisonics buffer. With ACN ordering and
N3D scaling, this makes it easy to remain compatible with effects that only
care about mono input since channel 0 is an unattenuated mono signal.
This helps the stability of transforms to local space for sources that are at
or near the listener. With a single-precision matrix, even FLT_EPSILON might
not be enough to detect matching positions.
This is essentially a 12-point sinc resampler, unless it's resampling to a rate
higher than the output, at which point it will vary between 12 and 24 points
and do anti-aliasing to avoid/reduce frequencies going over nyquist.
Code provided by Christopher Fitzgerald.
The EAX-only effect properties will be set to compatible defaults when standard
reverb is set, and the EAX-only effects will be skipped during sample
processing.
If the filter properties are continually updated, and the HF or LF gain goes
from <1, to 1, and later back to <1, the history shouldn't hold stale values
from before it was at 1.
The extension's not going anywhere, and it can't do anything fluidsynth can't.
The code maintenance and bloat is not worth keeping around, and ideally the AL
API would be able to facilitate MIDI-like behavior anyway (envelopes, start-at-
time, etc).
This helps avoid different results when looping is toggled within a couple
samples of the loop point, or when a processed buffer is removed while the
source is only a couple samples into the next buffer.
The method takes a marked-up filename (e.g. may include %r for a sample rate,
%% for %, etc), and returns a vector of strings of found filenames that match.
It will search the CWD, the local, and global data directories, in that order.
Note that this is the multiple above the device sample rate, rather than the
source property limit. It could theoretically be increased to 511 by testing
against UINT_MAX instead of INT_MAX, since the increment and positions are
using unsigned integers. I'm just being paranoid about overflows.
It's possible for another invocation to increase the array size in between the
ReadUnlock and WriteLock calls, causing the 'i' index to refer to a taken
entry.
Note that it still uses FuMa scalings internally. Coefficients loaded from
config files specify if they're FuMa (in both ordering and scaling) or N3D,
and will get reordered or rescaled as needed.
It seems a simple scaling on the coefficients will allow first-order content to
work with second- and third-order coefficients, although obviously not with any
improved locality. That may be something to look into for the future, but this
is good enough for now.
This basically acts as if the app created a new context with the specified
attributes (causing the device to reset with new parameters), then immediately
delete it. Existing contexts remain undisturbed, except for a temporary pause
while the device output is reconfigured.
DISABLED - Generic disabled status
ENABLED - Generic enabled status
DENIED - Not allowed (user has configured HRTF to be off)
REQUIRED - Forced (user has forced HRTF to be used)
HEADPHONES_DETECTED - Enabled because headphones were detected
UNSUPPORTED_FORMAT - Device format is not compatible with available filters
This method is intended to help development by easily testing the quality of
the B-Format encode and B-Format-to-HRTF decode. When used with HRTF, all
sources are renderer using the virtual B-Format output, rather than just
B-Format sources.
Despite the CPU cost savings (only four channels need to be filtered with HRTF,
while sources all render normally), the spatial acuity offered by the B-Format
output is pretty poor since it's only first-order ambisonics, so "full" HRTF
rendering is definitely preferred.
It's /possible/ for some systems to be edge cases that prefer the CPU cost
savings provided by basic over the sharper localization provided by full, and
you do still get 3D positional cues, but this is unlikely to be an actual use-
case in practice.
Largely copied from JACK, it's extended to work with user-specified element
sizes instead of bytes. This is necessary to be able to work with 6- and 7-
channel output modes.
The sound localization with virtual channel mixing was just too poor, so while
it's more costly to do per-source HRTF mixing, it's unavoidable if you want
good localization.
This is only partially reverted because having the virtual channel is still
beneficial, particularly with B-Format rendering and effect mixing which
otherwise skip HRTF processing. As before, the number of virtual channels can
potentially be customized, specifying more or less channels depending on the
system's needs.
This way takes into account a new stereo-mode config option, which when set to
"headphones" will default to using HRTF. Eventually the device will also be
able to specify if headphones are being used.
This new method mixes sources normally into a 14-channel buffer with the
channels placed all around the listener. HRTF is then applied to the channels
given their positions and written to a 2-channel buffer, which gets written out
to the device.
This method has the benefit that HRTF processing becomes more scalable. The
costly HRTF filters are applied to the 14-channel buffer after the mix is done,
turning it into a post-process with a fixed overhead. Mixing sources is done
with normal non-HRTF methods, so increasing the number of playing sources only
incurs normal mixing costs.
Another benefit is that it improves B-Format playback since the soundfield gets
mixed into speakers covering all three dimensions, which then get filtered
based on their locations.
The main downside to this is that the spatial resolution of the HRTF dataset
does not play a big role anymore. However, the hope is that with ambisonics-
based panning, the perceptual position of panned sounds will still be good. It
is also an option to increase the number of virtual channels for systems that
can handle it, or maybe even decrease it for weaker systems.
Apparently, 5.1 surround sound is supposed to use the "side" channels, not the
back channels, and we've been wrong this whole time. That means the "5.1 Side"
is actually the correct 5.1 setup, and using the back channels is anomalous.
Additionally, this means the 5.1 buffer format should also use the the side
channels instead of the back channels.
A final note: the 5.1 mixing coefficients are changed so both use the original
5.1 surround sound set (with the surround channels at +/-110 degrees). So the
only difference now between 5.1 "side" and 5.1 "back" is the channel labels.
This behavior better matches Creative's hardware drivers and Rapture3D's OpenAL
driver. A compatibility environment variable is provided to restore the old
no-op behavior for any app that behaves badly from this change (set
__ALSOFT_SUSPEND_CONTEXT to "ignore").
If too many apps have a problem with this, the default behavior may need to be
changed to ignore, with the env var providing an option to defer/batch instead.
For mono sources, third-order ambisonics is utilized to generate panning gains.
The general idea is that a panned mono sound can be encoded into b-format
ambisonics as:
w[i] = sample[i] * 0.7071;
x[i] = sample[i] * dir[0];
y[i] = sample[i] * dir[1];
...
and subsequently rendered using:
output[chan][i] = w[i] * w_coeffs[chan] +
x[i] * x_coeffs[chan] +
y[i] * y_coeffs[chan] +
...;
By reordering the math, channel gains can be generated by doing:
gain[chan] = 0.7071 * w_coeffs[chan] +
dir[0] * x_coeffs[chan] +
dir[1] * y_coeffs[chan] +
...;
which then get applied as normal:
output[chan][i] = sample[i] * gain[chan];
One of the reasons to use ambisonics for panning is that it provides arguably
better reproduction for sounds emanating from between two speakers. As well,
this makes it easier to pan in all 3 dimensions, with for instance a "3D7.1" or
8-channel cube speaker configuration by simply providing the necessary
coefficients (this will need some work since some methods still use angle-based
panpot, particularly multi-channel sources).
Unfortunately, the math to reliably generate the coefficients for a given
speaker configuration is too costly to do at run-time. They have to be pre-
generated based on a pre-specified speaker arangement, which means the config
options for tweaking speaker angles are no longer supportable. Eventually I
hope to provide config options for custom coefficients, which can either be
generated and written in manually, or via alsoft-config from user-specified
speaker positions.
The current default set of coefficients were generated using the MATLAB scripts
(compatible with GNU Octave) from the excellent Ambisonic Decoder Toolbox, at
https://bitbucket.org/ambidecodertoolbox/adt/
There's apparently some issues with it causing noise or killing the output. It
might be due to the per-sample changes being too harsh for the filter to keep
up with, but it's not something I can take care of in time for release.
This commit should be reverted after release when work on fixing it can resume.