Given that they're nearly identical, it should be relatively simple to use the
same effect state to process either of them, similar to the reverbs. The big
differences seem to be the delay range (much shorter with flanger) and the
defaults.
Turns out the C version of the cubic resampler is just slightly faster than
even the SSE3 version of the FIR4 resampler. This is likely due to not using a
64KB random-access lookup table along with unaligned loads, both offseting the
gains from SSE.
And fix the output filtering. The modulation code is still there since it's
(probably) technically correct, but the interaction with the feedback loop and
filtering on the output caused improper behavior which needs to be sorted out.
Even though it's taking the address of a member, it's still technically a
derefernce and thus undefined behavior. sizeof doesn't "execute" the
expression, so derefering in it instead is fine.
The core LateReverb_* functions are explicitly written out now, since the
tapping and blending done by the Faded version is a bit more complex and it's
not so easy to ensure proper optimizing on the Unfaded version.
The outputs themselves use a variale-delay tap, but using a separate fixed-
delay tap on the feedback helps improve the perceived "wobble" with sustained
notes. This also applies to the flanger effect.
This will be to allow buffer layering, multiple buffers of the same format and
sample rate that are mixed together prior to resampling, filtering, and
panning. This will allow composing sounds from individual components that can
be swapped around on different invocations (e.g. layer SoundA and SoundB on one
instance and SoundA and SoundC on a different instance for a slightly different
sound, then just SoundA for a third instance, and so on). The longest buffer
within the list item determines the length of the list item.
More work needs to be done to fully support it, namely the ability to specity
multiple buffers to layer for static and streaming sources. Also the behavior
of loop points for layered static sources should be worked out. Should also
consider allowing each layer to have a sample offset.
Now FuMa and ACN channel orders are required, as are FuMa, SN3D, and N3D
normalization schemes. An integer query (alcGetIntegerv) is added for the
maximum ambisonic order.
The context state properties are less likely to change compared to the listener
state, and future changes may prefer more infrequent updates to the context
state.
Note that this puts the MetersPerUnit in as a context state, even though it's
handled through the listener functions. Considering the infrequency that it's
updated at (generally set just once for the context's lifetime), it makes more
sense to put it there than with the more frequently updated listener
properties. The aforementioned future changes would also prefer MetersPerUnit
to not be updated unnecessarily.
Apparently there is a bug with at least MinGW-W64 where fegetenv and fesetenv
do not properly save and restore the FPU rounding mode, resulting in the
rounding mode remaining as round-to-zero after certain function calls. I do not
know if this also affects MSVC, but better safe than sorry for now.
This improves the transition width, allowing more of the higher frequencies
remain audible. It would be preferrable to have an upper limit of 32 points
instead of 48, to reduce the overall table size and the CPU cost for down-
sampling.
This generates the filters using the proper size and scale. The 'a' divisor
should represent the +/- sample range (and thus be a whole number), with the
number of sample points being double that. Increasing the filter size to a
multiple of 4 (for SIMD) can be done by padding in 0s afterward.
Despite the claim that it was an 11th order filter, the transition width was
generated by specifying 12th order. A 12th order filter would need 14 sample
points rather than the 12 it had.
Rather than storing individual pointers to filter, scale delta, phase delta,
and scale phase delta entries, per phase index, the new table layout makes it
trivial to access the per-phase filter and delta entries given the base offset
and coefficient count.
The old layout separated filters, scale deltas, phase deltas, and scale phase
deltas into separate segments that each contained a numbers of scale and phase
entries, Since processing a sample needed a filter and one of each delta entry
relating to a particular scale and phase, the memory needed would be spread
across the whole table. And since subsequent samples would use a different
phase, it would jump around the table a whole lot as well.
The new layout packs the data in a way more consistent with its use. The
filters, scale deltas, phase deltas, and scale phase deltas are interleaved,
such that for a particular scale and phase, the filter and delta entries used
are contiguous. And the phase entries for a particular scale are kept together,
so the ~500 to ~1000 samples processed per source update stay within the same
3KB to 6KB area of the 70+KB table, which is much more cache friendly.