Updated MtlComputeEvaluator documentation to be
consistent with the documentation for other compute
evaluator implementations and fixed missing or
incorrect doxygen tags.
Also, updated the overloads for the EvalStencils
and EvalPatches methods to account for 1st and
2nd derivative evaluation.
Most GL implementations support a maximum of 4 transform
feedback buffer bindings. With the addition of 1st and 2nd
derivative evaluation up to 6 bindings might be required,
i.e. dst, du, dv, duu, duv, dvv.
This change extends the GLXFB Evaluator interface to allow
a client to specialize the evaluator when it is known that
(at least) the 1st derivative and 2nd derivative outputs
are interleaved together into shared buffers.
When this option is used, the maximum number of transform
feedback buffer bindings can be reduced to 3 instead of 6.
Now that Far::LimitStencilTable and Far::PatchTable
support evaluation of 1st and 2nd derivatives the
Osd Evaluator API for evaluating stencils and patches
has been updated to match.
- added Far::PatchTableFactory::Options::generateLegacySharpCornerPatches
- legacy behavior of sharp patches at smooth corners preserved by default
- added corresponding option bit to Osd::MeshBits
- updated examples/glViewer with option
Noticed a few typos when browsing comments. Proceeded with a "manual
spell check", reading all comments and tweaking spelling, grammar,
punctuation.
Didn't bother with Hbr library.
Comments only, no functional changes.
The symbol OPENSUBDIV_GREGORY_EVAL_TRUE_DERIVATIVES
determines the method used to compute derivative weights
for Gregory basis patches.
Setting this symbol during CMake configuration (and
hence during C++ and shader compilation) will enable
the use of true derivative weights.
The default behavior is to use a simpler approximation
for consistency with earlier releases.
The methods which return arrays of FVarPatchParam have
been made plural, e.g. GetFVarPatchParams(), for consistency
with the other methods in PatchTable.
Also fixed a missing doxygen tag.
Recent CUDA SDKs no longer support the "compute_11"
gpu architecture. We now fallback to "compute_20"
instead for newer SDK versions. Additionally, this
behavior can be overriden using the new CMake list
variable OSD_CUDA_NVCC_FLAGS so that it is easier
for clients to target newer architectures and specify
additional arguments.
Implemented EvalPatchesVarying and EvalPatchesFaceVarying
methods for Osd::*Evaluator classes, i.e. cpu, omp, tbb,
GLXFB, GLSLCompute, OpenCL, and CUDA.
Also, the GPU Kernel implementations have been updated to use
the common patchBasis implementation instead of re-implementing
methods to compute patch basis weights locally.
This is used to compute patch basis weights for
the Osd::*Evaluator classes that are unable to
use the C++ implementation from far/patchBasis.h,
e.g. the GLSL, HLSL, OpenCL, and CUDA kernels.
Instead of duplicating this code for each different
kernel language, we share a single implementation
which is minimally adapted to accommodate specific
language restrictions and syntax.
This implementation can also be used by client
shader code executed while drawing, e.g. to
compute patch basis weights for evaluating varying
and face-varying patches.
This reverts most of the recent changes to the
organization of Far::PatchParam. In particular,
the core parameterization is no longer exposed
as a speparate PatchParamBase class.
We'll revisit this again in a later release, but
for now we will stick with a more straight
forward implementation.
These methods now compute the patch basis in terms
of Far::PatchParamBase instead of Far::PatchParam
This allows these methods to be more easily reused
for evaluating patches for face-varying data.
This change make the bspline patch tess control/hull shader revert to
control vertex mirroring for boundary edges when the patch sharpness is
zero. This change helps improve some shader codegen optimization and
L1 cache behavior on (at least) Kepler GPUs with recent drivers.
Re-organized the single-crease patch code path in the tessellation
control/hull shaders to improve performance in the case where no patches
have non-zero crease sharpnesses.
Changed a couple of local 4x4 matrices to global uniforms to
work around a performance problem on recent GL drivers.
There are two local 4x4 float matrices with constant initializers
in the function OsdComputePerPatchVertexBSpline(...). Changing
these from local variables to global initialized uniforms improves
performance dramatically on recent NVIDIA drivers (e.g. 361.48 windows).
There is no such difference with Direct3D, but this change updates
the shader code for both implementations for consistency.
For now, the common patch shader code supports fractional spacing
modes only when screen-space tessellation is also enabled.
It's possible to relax this restriction, but that requires changing
the client shader interface.
This change includes support for both fractional_even_spacing
and fractional_odd_spacing.
The implementation follows the existing pattern of re-parameterizing
the tessellation domain only along transition boundary edges. This
allows for crack-free tessellation, but it might be better to
consistently re-parameterize all of the outer edges of all patches,
which also would be required for numerically watertight tessellation.
This is implemented in a way that requires no changes to the client
shader API. It should be more efficient to move some computations to
the control/hull shaders and reduce divergence in the execution of
eval/domain shaders.
Te issue here is that some of the functions were not considered templated
anymore because all their template arguments were specified, which made it
so compiler was creating implementation for them in every file from where
the header was included. This causes errors during linking related on the
same symbol implemented in several places.
Marking those functions as inlined solves the problem and should not cause
any bad side effects because they're small enough and likely being inlined
by an optimizer anyway.
This change restores the use of 4-bits in Far::PatchParam to
encode the refinement level of a patch. This restores one bit
that was stolen to allow for more general encoding of boundary
edge and transition edge masks. In order to accommodate all
of the bits that are required, the transition edge mask bits
are now stored along with the faceId bits.
Also, accessors are now exposed directly as members of Far::PatchParam
and the internal bitfield class is no longer directly exposed.
Unified transition patch drawing affects the calculation of
tessellation level metrics. Because a single edge of a shader
patch might be split into two halfs along a transition edge,
the effective maximum number of spans along any adaptive edge
is limited to half of the device maximum.
Besides we've not been computing accurate derivatives on gregory patch,
there was a separate bug in shaders which gives completely bogus dUdV on
corner vertices. This change fixes that significant artifact, however,
is still approximating derivatives by ignoring rational components.
- add HLSL equivalents of the previous GLSL change
- rename OsdGetSingleCreaseSegmentParameter to
OsdGetPatchSingleCreaseSegmentParameter.
- add shadingMode UI for dxViewer similar to glViewer
use boundaryMask to identify the crease edge from 4 edges.
with this change, single-crease patch no longer needs to be rotated on
its population.
In shader, experimentally use same infinite sharp matrix for both
boundary and single-crease patch.
Added a size specifier to the shader output array declaration
in the GregoryBasis and Gregory control shaders. This seems
to be required by the GLSL compiler on AMD and is harmless elsewhere.
Added a size specifier to the shader output array declaration
in the BSpline control shader. This seems to be required by the
GLSL compiler on AMD and is harmless elsewhere.
This change refactors the GLSL and HLSL patch shader code so that
most of the work is implemented within a library of common functions
and the remaining shader snippets just manage plumbing.
There is more to do here:
- varying and face-varying data can be managed entirely by the client
- similarly, displacement can be implemented in client code
- there's still quite a bit of residual boiler-plate code needed
in each shader stage that we should be able to wrap up in a more
convenient form.
To encapsulate endcap functions from public API, add methods to
tell the number of patch points needed (GetNumLocalPoints()) and
to compute those patch points as a result of change of basis from
the refined vertices (ComputeLocalPointValues()).
ComputeLocalPointValues takes contiguous source data of all levels
including level0 control vertices.
It looks like there's a compiler bug in some earlier nvidia driver 340/346 releases.
It has been fixed in 348.07 (win) as far as I can tell.
Following code behaves incorrectly.
void f(int a) {
for (int i=0; i<3; ++i) doSomething(a, i);
}
void g() {
for (int i=0; i<100; ++i) f(i);
}
The workaround is to use different identifiers for each function.
Add EvalStencils and EvalPatches API for most of CPU and GPU evaluators.
with this change, Eval API in the osd layer consists of following parts:
- Evaluators (Cpu, Omp, Tbb, Cuda, CL, GLXFB, GLCompute, D3D11Compute)
implements EvalStencils and EvalPatches(*). Both supports derivatives
(not fully implemented though)
- Interop vertex buffer classes (optional, same as before)
Note that these classes are not necessary to use Evaluators.
All evaluators have EvalStencils/Patches which take device-specific
buffer objects. For example, GLXFBEvaluator can take GLuint directly
for both stencil tables and input primvars. Although using these
interop classes makes it easy to integrate osd into relatively
simple applications.
- device-dependent StencilTable and PatchTable (optional)
These are also optional, but can be used simply a substitute of
Far::StencilTable and Far::PatchTable for osd evaluators.
- PatchArray, PatchCoord, PatchParam
They are tiny structs used for GPU based patch evaluation.
(*) TODO and known issues:
- CLEvaluator and D3D11Evaluator's EvalPatches() have not been implemented.
- GPU Gregory patch evaluation has not been implemented in EvalPatches().
- CudaEvaluator::EvalPatches() is very unstable.
- All patch evaluation kernels have not been well optimized.
- Currently GLXFB kernel doesn't support derivative evaluation.
There's a technical difficulty for the multi-stream output.
- it takes number and pointer for the input PatchCoords.
- add derivative evaluations.
- enhance glEvalLimit example to see the derivative evaluation works.
Cleaned up the Legacy Gregory shader source by accessing buffer
data through helper functions.
Switched to performing tessellation in untransformed (object) space.