Changing all device kernels to take two buffer identifiers for
source and destination separately.
This change is an intermediate step toward upcoming context/controller
refactoring.
Previously we have a limitation that the source and destination
vertex buffer has to be a single buffer, since the subdivision
kernels are iteratively applied by level.
With stencil tables, we don't have such a limitation any more,
so we may want to apply stencils from seprate source buffer to
another.
To specifiy the output location within the destination buffer,
we can use VertexBufferDescriptor.offset. This allows us not only
configuring arbitrary batching scheme, but also relaxing the
limitation that source and destination buffers are in same
interleaved layout. For examples, we could include derivatives only
in the destination buffer, which doesn't need to be allocated in
the source buffer.
we're teasing out ptex specific data from core osd entities,
so there's no reason to keep ptex texturing utilities in core osd.
move them into example libs and let clients assemble shader snippets
as needed.
Also removing older ptex texturing code (without mipmap)
Each patch has a corresponding patchParam. This is a set of three values
specifying additional information about the patch:
faceId -- topological face identifier (e.g. Ptex FaceId)
bitfield -- refinement-level, non-quad, boundary, transition, uv-offset
sharpness -- crease sharpness for a single-crease patch
These are stored in OsdPatchParamBuffer indexed by the value returned
from OsdGetPatchIndex() which is a function of the current PrimitiveID
along with an optional client provided offset.
Accessors are provided to extract values from a patchParam. These are
all named OsdGetPatch*().
While drawing patches, the patchParam is condensed into a patchCoord which
has four values (u, v, faceLevel, faceId). These patchCoords are treated
as int values during per-prim processing but are converted to float values
during per-vertex processing where the values are interpolated.
Also, cleaned up more of the shader namespace by giving an Osd prefix
to public functions, and consolidated boundary and transition handling
code into the PatchCommon shader files. The functions determining
tessellation levels are now all named OsdGetTessLevel*().
- resolves DX-CL interop functions in Osd::ClD3D11VertexBuffer.
- enable CL kernels in DX build.
- more cleanup in test harnesses, adding D3D11 initializations into DeviceContext.
- add new defines OPENSUBDIV_HAS_OPENGL and OPENSUBDIV_HAS_DX for convenience.
refactor CL/CUDA specific initialization stuffs into
examples/common/clDeviceContext and cudaDeviceContext, and
update examples to use those structs.
also
- remove CL/CUDA tests from osd_regression. The tests for those kernels will be covered by glImaging.
- update cuda initialization to use the GL-interoperable device if available.
- remove CL specialization from glShareTopology, following the same pattern as we took in the previous OsdGLMesh refactoring. (still something strange with XFB kernels though)
- fix file permissions.
Removed OpenCL/D3D11 specialization and add DEVICE_CONTEXT as a template
parameter. For the kernels which don't need a context object (e.g.
CPU, OpenGL, cuda) just ignore the context, and for the kernels which
use a context (e.g. OpenCL, DirectX) takes a context or a user-defined
class as which encapsulates device contexts. Note that OpenCL requires
two objects, cl_context and cl_command_queue. The user-defined
class must provide GetContext() and GetCommandQueue() for strongly typed
binding to osd VertexBuffers and ComputeContexts.
Osd::Mesh and MeshInterface have been used as a handy harness to host
multiple GPU kernels and graphics APIs. However it has CL/DirectX
specializations and duplicates large amount of plubming code. With this
change, glMesh.h and d3d11Mesh.h become just typedefs and all logic is
put into mesh.h without specializations.
Also cleaned up unused header files and code formatting.
- rename "Regular end cap" to "BSplineBasis end cap"
- revert templating and add EndCapType into PatchTablesFactory::Options.
- make EndCapFactories internal in PatchTablesFactory.
- move end cap stencils into PatchTables, keep them relative to the max level.
- add a utility StencilTablesFactory::AppendEndCapStencilTables to splice and factorize endcap stencil tables.
computes edge lengths using limit surface points. Made this
the default screen-space metric so that we avoid cracks when
using Gregory Basis or Regular B-spline end caps.
The alternative method which computes edge lengths using the
distance between B-spline control points is still available.
Added a diagram and comments to explain how the control
points and limit points are organized.
This change moves all gregory patch generation from Far::PatchTablesFactory
so that we can construct patch tables without stencil tables as well as client
can chose any end patch strategies (we have 3 options for now: legacy 2.x style
gregory patch, gregory basis patch and experimental regular patch approximation).
Also Far::EndCapGregoryBasisPatchFactory provides index mapping from patch index
to vtr face index, which can be used for single gregory patch evaluation on top
of refined points, without involving heavier stencil tables generation.
This is the first step to tease off Osd compute controller/contexts
from Far API.
Currently FarStencilTable only creates a kernelbatch for the entire range,
so we can use [0, numStencils) for all cases instead of KernelBatch.
This might not be true if we apply non-factorized level-wise stencils,
then we'll add another modular utility to serve those cases.
PatchTablesFactory fills 20 indices topology into patchtable, and use it for eval and draw.
note: currently screen-space adaptive tessellation of gregory basis patches is
broken and cracks appear around them.
- extend Far::PatchTables data structures & interfaces to store requisite
information for channels of face-varying bi-cubic patches
- implement gather function in Far::PatchTablesFactory to populate face-varying
channels with adaptive patches
- extend accessor interface in Vtr::Level
- propagate code fall-out throughout OpenSubdiv code base, examples & tutorials
- extend vtrViewer code to visualize tessellated bi-cubic face-varying patches
- move patch interpolation code out of Far::PatchTables into far/interpolate
- add bilinear quad interpolation function with derivatives
- switch OsdCpuEvalLimitController to far/interpolate
- add support for bilinear quad interpolation & clean varying interpolation
- changed ptex layout data types in shaders to match srv format
- changed ptex srv type to unorm format for uchar data
- fixed hlsl compiler warning: initialized edgeDistance of OutputVertex struct in domain shader even if we are not in wireframe mode
- added directx debug device and enabled automatic break points to easily spot dx errors
- move level of refinement / isolation into the Options structs
- fix splash damage in rest of the code
note 1: this is less than ideal, because most compilers accept the previous
call to these functions with an incorrect parameter list (ie. passing
the level instead of the struct issues no warnings and compiles...)
caveat emptor...
note 2: the level parameter names may not be final for adaptive modes
as we will likely want independent controls over crease vs.
extraordinary vertex isolation.
- don't rotate (s,t) coordinates but rotate the patch instead !
- refactor osd/cpuEvalLimitKernels to share Far::PatchTables cubic spline
interpolation functions : this replaces tensor product formulation with
weight matrices, which does not really impact performance here, but would
have to be replaced when implementing regular gridding functions.
- fix OsdCpuEvalLimitController to not rotate coordinates and pass the rotation bitfields
- expose Far::PatchTables spline interpolation API (protected -> public)
- fix glEvalLimit tangent buffers (remove empty padding - see below)
- change policy for tangent buffers : the output buffer descriptor is
**NO LONGER APPLIED** to tangent output buffers. Tangent primvar data
buffers are no longer applying the offset and stride from the descriptor
(because it doesn't make sense to share it). If more flexiblity is
required, we will consider adding independent descriptors for the tangent
buffers. This change will impact existing code that generates tangents
with the EvalLimit controller.
fixes#370
Const' declared instances of Vtr::Array do not protect the pointer held
privately by the class properly. In order to force the compiler to
protect this pointer, we removed all non-const accessors from Vtr::Array
(now renamed Vtr::ConstArray) and moved them to a child class (Vtr::Array),
which requires const_cast<> operators internally to allow access.
The change & renaming is then propagated to all internal dependencies.
- make sure we don't get conflicting enums (CODE_ERROR)
- fix template specialization for Far::TopologyRefinerFactory in regression/common/vtr_utils
- fix remaining error reporting code around osd
- change error codes from situational to general (fatal / coding / run-time...)
- pull error functions from Osd into Far
- add a templated topology validation reporting system to Far::TopologyRefinerFactory
- fix fallout on rest of code-base
- split Far::PatchDescriptor into its own class (mirrors Far::PatchParam)
- hide PatchArray as a private internal structure
- add public accessors patterned after Far::TopologyRefiner (returning Vtr::Arrays)
- propagate new API to all dependent code
note: some direct table accessors have not been removed *yet* - see code for details
- re-implement the pool allocator
- use templates to remove code redundancy between regular & limit stencils
- leverage [] operator overloading to simplify stencil factorization
- add the ability to treat subdivision levels independently (see below)
- refactor Far::TopologyRefiner::Interpolate<>() methods to pass buffers by reference
(allows overloading of [] operator)
- rename some of the stencil factory options
- propagate changes to Osd / examples / tutorials...
- remove #version declaration from the kernel code
- move it in front of shader sources before compiling to prevent some drivers from throwing errors
fixes#360
Catmull-Clark Subdivision Surfaces", Niessner et al, Eurographics 2012.
This change includes;
-topology identification for single-crease patch during adaptive refinement.
-patch array population (similar to boundary)
-sharpness buffer generation
-glsl shader
Eval stuffs will be coming.
Sync'ing the 'dev' branch with the 'feature_3.0dev' branch at commit 68c6d11fc36761ae1a5e6cdc3457be16f2e9704a
The branch 'feature_3.0dev' is now locked and preserved for historical purposes.
* assembler kernels are based on the C implementation in neonKernel.cpp
* enable assembler kernel functions in neonComputeController.cpp with #define USE_ASM_KERNELS 1
Unused argument `pass` was defined in the CUDA kernel and it was never
passed to this function from the C++ code. This argument is also wasn't
used by the function itself.
Solved by checking on run-time whether texture buffer objects
are supported.
When building with GLEW library doing compile-time check is
not enough, because actual information about existing features
is only known on runtime.
This only makes ti so CPU backend works, GLSL backends still
requires some work if we want them to make working. Not sure
it worth doing this now.
* added `OsdMeshInterface::GetFarMesh` and `OsdMesh::GetFarMesh` to match `OsdGLMesh` and `OsdD3D11Mesh`
* added `interleaved` argument to `OsdMesh::Refine` to match `OsdMeshInterface::Refine`
* The CATMARK_QUAD_FACE_VERTEX kernel calculates the face-vertex for a quadrilateral face. It applies to every face after the first subdivision step, and may be applied for the first subdivision step of a quadrilateral coarse mesh.
* The CATMARK_TRI_QUAD_FACE_VERTEX kernel calculates the face-vertex for a triangle or quadrilateral face. It may be applied for the first subdivision step of a coarse mesh composed of triangles and/or quadrilaterals.
* Both kernels calculate each face-vertex using four vertex indices (triangles are specified by repeating the third index). Therefore neither kernel uses the F_ITa codex table, and instead the first vertex offset in the F_IT index table is stored in the FarKernelBatch's table offset.
If the system has CLEW installed (which is detected by recently
added FindCLEW routines) then OpenSubduv would be compiled against
this library.
It makes binaries and libraries more portable across the systems,
so it's possible to run the same binary on systems with and without
OpenCL SDK installed.
The most annoying part of the change is updating examples to load
OpenCL libraries, but ideally code around controllers and interface
creation is to be de-duplicated anyway.
Based on the pull request #303 from Martijn Berger
Moved transient states (current vertex buffer etc) to controller.
ComputeContext becomes constant so that it's well suited for coarse-grain
parallelism on cpu.
Client-facing API has changed slightly - limitEval example has been adjusted
- fix some variable names (private vs. public)
- implement constructors to guarantee initialized pointers (d'oh)
- add a 'Reset' method to unbind buffers
Note: while the new contexts have been cleaned up, we now have a fair amount of duplicated code in the controllers...
Moved transient states (current vertex buffer etc) to controller.
ComputeContext becomes constant so that it's well suited for coarse-grain
parallelism on cpu.