- added definition and gathering method to Vtr::Level
- extended Far::EndCap...PatchFactories with VSpan[4] for patch corners
- extended Far::EndCapGregoryPatchFactory to avoid last level assumption
- adapted Far::PatchTableFactory to use above extensions
- extended Far::GregoryBasis to recognize VSpan corners
- simplified Far::GregoryBasis treatement of boundaries
- fixed bug in Far::GregoryBasis related to smooth corners
An earlier change improved transient memory used during the construction
of end cap stencil tables but could leave unused capacity in the internal
containers of the resulting stencil tables. This change adds an additional
internal factory helper method which trims this storage.
This change make the bspline patch tess control/hull shader revert to
control vertex mirroring for boundary edges when the patch sharpness is
zero. This change helps improve some shader codegen optimization and
L1 cache behavior on (at least) Kepler GPUs with recent drivers.
Re-organized the single-crease patch code path in the tessellation
control/hull shaders to improve performance in the case where no patches
have non-zero crease sharpnesses.
Changed a couple of local 4x4 matrices to global uniforms to
work around a performance problem on recent GL drivers.
There are two local 4x4 float matrices with constant initializers
in the function OsdComputePerPatchVertexBSpline(...). Changing
these from local variables to global initialized uniforms improves
performance dramatically on recent NVIDIA drivers (e.g. 361.48 windows).
There is no such difference with Direct3D, but this change updates
the shader code for both implementations for consistency.
This method now returns the number of _farLevels where previously
it returned the number of _levels. This is primarily a semantic
difference, as the two containers should have equal size. But this
method is intended to accompany Far::TopologyRefiner::GetLevel()
which returns a reference to an element in _farLevels;
My previous fix added some defensive logic in case the
local point stencil table was empty when attempting to
append local point stencils to an existing stencil table.
This change fixes Far::PatchTableFactory to not return
empty local point stencil tables when there are no local
point stencils. This behavior is now more consistent with
earlier releases.
We will leave the earlier defensive logic in place as well.
Now StencilTableFactory::AppendLocalPointStencilTable() does
nothing when the localPointStencilTable is empty. This avoids
a potential crash (failed assertion) when both the baseStencilTable
and the localPointStencilTable are empty, as is the case for
simple geometry like the all-quads torus regression test shape.
For now, the common patch shader code supports fractional spacing
modes only when screen-space tessellation is also enabled.
It's possible to relax this restriction, but that requires changing
the client shader interface.
This change includes support for both fractional_even_spacing
and fractional_odd_spacing.
The implementation follows the existing pattern of re-parameterizing
the tessellation domain only along transition boundary edges. This
allows for crack-free tessellation, but it might be better to
consistently re-parameterize all of the outer edges of all patches,
which also would be required for numerically watertight tessellation.
This is implemented in a way that requires no changes to the client
shader API. It should be more efficient to move some computations to
the control/hull shaders and reduce divergence in the execution of
eval/domain shaders.
- instead of accumulating GregoryBasis::Point (fixed size stencils
backed by stackbuffer), pack the stencils into StencilTable as they
are evaluated
- use single integer for varying stencils of patch points, not
a GregoryBasis::Point
- cap the reserved stencil entry size.
optimize PatchTableFactory::Create performance
The primary hesitation here is that this change alters the approximation for BSpline end caps, however the consensus is that BSpline end caps were already the most extreme approximation, so it makes sense to sacrifice quality for speed.
Reviewed by David, Barry, George and myself.
- replace std::vector with vtr::StackBuffer in GregoryBasis::Point
- remote getQuadOffsets call from ProtoBasis
- rewrite some inefficient code in the endcap generation.
Note that this is a temporary remedy for the performance issue in 3.0.
We'll fix it again in the later release.