Add EvalStencils and EvalPatches API for most of CPU and GPU evaluators.
with this change, Eval API in the osd layer consists of following parts:
- Evaluators (Cpu, Omp, Tbb, Cuda, CL, GLXFB, GLCompute, D3D11Compute)
implements EvalStencils and EvalPatches(*). Both supports derivatives
(not fully implemented though)
- Interop vertex buffer classes (optional, same as before)
Note that these classes are not necessary to use Evaluators.
All evaluators have EvalStencils/Patches which take device-specific
buffer objects. For example, GLXFBEvaluator can take GLuint directly
for both stencil tables and input primvars. Although using these
interop classes makes it easy to integrate osd into relatively
simple applications.
- device-dependent StencilTable and PatchTable (optional)
These are also optional, but can be used simply a substitute of
Far::StencilTable and Far::PatchTable for osd evaluators.
- PatchArray, PatchCoord, PatchParam
They are tiny structs used for GPU based patch evaluation.
(*) TODO and known issues:
- CLEvaluator and D3D11Evaluator's EvalPatches() have not been implemented.
- GPU Gregory patch evaluation has not been implemented in EvalPatches().
- CudaEvaluator::EvalPatches() is very unstable.
- All patch evaluation kernels have not been well optimized.
- Currently GLXFB kernel doesn't support derivative evaluation.
There's a technical difficulty for the multi-stream output.
- created new class Far::PrimvarRefiner with interpolation methods
- removed interpolation and limit methods from Far::TopologyRefiner
- replaced internal usage in Far::StencilTableFactory
- replaced usage in regressions, tutorials and examples
- it takes number and pointer for the input PatchCoords.
- add derivative evaluations.
- enhance glEvalLimit example to see the derivative evaluation works.
- replaced Evaluate() with EvaluateBasis() in far/tutorial_6
- commented out use of EvaluateFaceVarying() in examples/farViewer
- face-varying patches are work in progress
In osd layer, we use GLPatchTable (D3D11PatchTable) as a
device-specific representation of FarPatchTables instead of
DrawContext. GLPatchTable may be used not only for drawing
but also for GPU eval APIs (not yet supported though.
We may add CudaPatchTable etc as needed).
The legacy gregory patch drawing buffers are carved out to
the separate class, named GLLegacyGregoryPatchTable.
Also face-varying data are split into client side for now, until
we add new and more robust face-varying drawing structure
(scheduled at 3.1 release)
Tentatively replicate PatchArray structure in GLPatchTables. It will
be revised in the upcoming change.
Shifting hard-coded SRV locations of legacy gregory buffers in HLSL shaders.
hlslPatchGregoryBasis.hlsl is an equivalent to glslPatchGregoryBasis.
Update dxViewer to be able to switch among bspline, gregorybasis, legacy
end capping.
also fixes a bug of GLSL legacy gregory shader which had an inconsistent
resource naming with example codes.
It looks like there's still an issue of D3D11 patchParam data fetching.
we'll come back to that bug.
The code in farViewer that was used to draw the Hbr representation
of meshes is now gone. This code was mostly used as a way to compare
against the Vtr implementation. However, we don't want this to serve
as an example for others as the Hbr code is not meant to be instructive
otherwise.
As a preparation for retiring DrawContext, move SupportsAdaptiveTessellation
method to examples/common/glUtils, which is renamed and namespaced
from gl_common.{cpp,h} to be consistent to other files.
Same renamings applied to other example files.
This examples is rewritten as a more comprehensive example
of Far and Osd APIs to generate batched index buffer and
vertex buffer, as well as sharing same topology and stencil
table among multiple objects.
Also this change includes an experimental code path of using
glMultiDrawElementsIndirect. It's currently incomplete due to
the missing interface of osd tessellation shader.
This restores the previous defaults and works around an
apparent runtime error on some platforms which is triggered
in the legacy gregory patch drawing code when patch culling
is disabled.
Remove DrawRegistry from osd layer and put a simple shader caching
utility into examples/common. osd layer only provides patch shader
snippet and let client configure and compile the code. Clients also
maintain the lifetime of shader object, which is preferable for the
actual application integration.
update all examples to use the new scheme.
These are now redundant since all bspline patches are encoded in
the patch tables consistently using 16 point indices with boundary
and corner edges indicated in the boundary mask of the patch params.
My earlier change which simplified the categorization of
patch types broke evaluation for boundary and corner patches.
Previously, boundary and corner patches were always rotated
into a canoncial orientation by permuting the point indices
of the patch. This was convenient in some cases, but generally
made things unecessarily complicated, since the parameterization
of the patch had to be counter-rotated to compensate.
Now patches always remain correctly oriented with respect
to the underlying surface topology and evaluation of boundary
and corner patches is accommodated by simply adjusting the
spline weights to account for the missing/invalid patch
points along boundary and corner edges.
There is more to clean up and optimize, but this restores
correct behavior.
Since unified shading work already removed subPatch info from
Osd::PatchDescriptor, the difference between Far::PatchDescriptor and
Osd::PatchDescriptor is just maxValence and numElements. They are used
for legacy gregory patch drawing.
Both maxValence and numElements are actually constant within a topology
(drawContext). This change move maxValence to DrawContext and let client
manage numElements, then we can eliminate Osd::PatchDescriptor and simply
use Far::PatchDescritor instead.
This is still an intermediate step toward further DrawRegistry refactoring.
For the time being, adding EffectDesc struct to include maxValence and
numValence to be maintained by the clients. They will be cleaned up later.
The side benefit of this change is we no longer need to recompile regular b-spline
shaders for the different max-valences.
- Remove MeshPtexData bit from Osd::MeshBits. It's not used any more
- Rename ptexIndexBuffer in D3D11DrawContext to paramParamBuffer
- Remove Is/SetPtexEnabled from D3D11DrawRegistry
In OpenSubdiv 2.x, we encapsulated subdivision tables into
compute context in osd layer since those tables are order-dependent
and have to be applied in a certain manner. In 3.0, we adopted stencil
table based refinement. It's more simple and such an encapsulation is
no longer needed. Also 2.0 API has several ownership issues of GPU
kernel caching, and forces unnecessary instantiation of controllers
even though the cpu kernels typically don't need instances unlike GPU ones.
This change completely revisit osd client facing APIs. All contexts and
controllers were replaced with device-specific tables and evaluators.
While we can still use consistent API across various device backends,
unnecessary complexities have been removed. For example, cpu evaluator
is just a set of static functions and also there's no need to replicate
FarStencilTables to ComputeContext.
Also the new API delegates the ownership of compiled GPU kernels
to clients, for the better management of resources especially in multiple
GPU environment.
In addition to integrating ComputeController and EvalStencilController into
a single function Evaluator::EvalStencils(), EvalLimit API is also added
into Evaluator. This is working but still in progress, and we'll make a followup
change for the complete implementation.
-some naming convention changes:
GLSLTransformFeedback to GLXFBEvaluator
GLSLCompute to GLComputeEvaluator
-move LimitLocation struct into examples/glEvalLimit.
We're still discussing patch evaluation interface. Basically we'd like
to tease all ptex-specific parametrization out of far/osd layer.
TODO:
-implments EvalPatches() in the right way
-derivative evaluation API is still interim.
-VertexBufferDescriptor needs a better API to advance its location
-synchronization mechanism is not ideal (too global).
-OsdMesh class is hacky. need to fix it.
we're teasing out ptex specific data from core osd entities,
so there's no reason to keep ptex texturing utilities in core osd.
move them into example libs and let clients assemble shader snippets
as needed.
Also removing older ptex texturing code (without mipmap)
Each patch has a corresponding patchParam. This is a set of three values
specifying additional information about the patch:
faceId -- topological face identifier (e.g. Ptex FaceId)
bitfield -- refinement-level, non-quad, boundary, transition, uv-offset
sharpness -- crease sharpness for a single-crease patch
These are stored in OsdPatchParamBuffer indexed by the value returned
from OsdGetPatchIndex() which is a function of the current PrimitiveID
along with an optional client provided offset.
Accessors are provided to extract values from a patchParam. These are
all named OsdGetPatch*().
While drawing patches, the patchParam is condensed into a patchCoord which
has four values (u, v, faceLevel, faceId). These patchCoords are treated
as int values during per-prim processing but are converted to float values
during per-vertex processing where the values are interpolated.
Also, cleaned up more of the shader namespace by giving an Osd prefix
to public functions, and consolidated boundary and transition handling
code into the PatchCommon shader files. The functions determining
tessellation levels are now all named OsdGetTessLevel*().
- resolves DX-CL interop functions in Osd::ClD3D11VertexBuffer.
- enable CL kernels in DX build.
- more cleanup in test harnesses, adding D3D11 initializations into DeviceContext.
- add new defines OPENSUBDIV_HAS_OPENGL and OPENSUBDIV_HAS_DX for convenience.
refactor CL/CUDA specific initialization stuffs into
examples/common/clDeviceContext and cudaDeviceContext, and
update examples to use those structs.
also
- remove CL/CUDA tests from osd_regression. The tests for those kernels will be covered by glImaging.
- update cuda initialization to use the GL-interoperable device if available.
- remove CL specialization from glShareTopology, following the same pattern as we took in the previous OsdGLMesh refactoring. (still something strange with XFB kernels though)
- fix file permissions.
Removed OpenCL/D3D11 specialization and add DEVICE_CONTEXT as a template
parameter. For the kernels which don't need a context object (e.g.
CPU, OpenGL, cuda) just ignore the context, and for the kernels which
use a context (e.g. OpenCL, DirectX) takes a context or a user-defined
class as which encapsulates device contexts. Note that OpenCL requires
two objects, cl_context and cl_command_queue. The user-defined
class must provide GetContext() and GetCommandQueue() for strongly typed
binding to osd VertexBuffers and ComputeContexts.
Osd::Mesh and MeshInterface have been used as a handy harness to host
multiple GPU kernels and graphics APIs. However it has CL/DirectX
specializations and duplicates large amount of plubming code. With this
change, glMesh.h and d3d11Mesh.h become just typedefs and all logic is
put into mesh.h without specializations.
Also cleaned up unused header files and code formatting.
- rename "Regular end cap" to "BSplineBasis end cap"
- revert templating and add EndCapType into PatchTablesFactory::Options.
- make EndCapFactories internal in PatchTablesFactory.
- move end cap stencils into PatchTables, keep them relative to the max level.
- add a utility StencilTablesFactory::AppendEndCapStencilTables to splice and factorize endcap stencil tables.
Remove the ptex-specific code from the Far::TopologyRefiner and instead provide it in a separate class Far::PtexIndices. Clients who need to use the Ptex API need to first build a Far::PtexIndices object by providing it with a refiner.
This has the advantage of keeping the API on the TopologyRefiner a little cleaner. The ptex methods were const but were mutating state with const_casts. The new mechanism still achieves the same lazy initialization behavior by forcing clients to instantiate them exactly when needed.
A disadvantage of this approach is that the PatchTablesFactory creates its own PtexIndices and throws it out after the patch tables are created. This is great if you're never going to need the ptex indices again, but not so great if you will need them again.
This change moves all gregory patch generation from Far::PatchTablesFactory
so that we can construct patch tables without stencil tables as well as client
can chose any end patch strategies (we have 3 options for now: legacy 2.x style
gregory patch, gregory basis patch and experimental regular patch approximation).
Also Far::EndCapGregoryBasisPatchFactory provides index mapping from patch index
to vtr face index, which can be used for single gregory patch evaluation on top
of refined points, without involving heavier stencil tables generation.
If glew is present it will use it to query for tesselation information.
Otherwise it will use the information provided by the the header
files during compilation.
This is done since the header usually defines opengl
functionality that is not implemented. So it needs
to be queried at runtime using glew if available,
otherwise the use the header information.
This was done since since sometimes the headers indicates
an opengl feature that is not implemented. Now the
function queries the opengl capabilities during runtime
to determine which version to use.
This is the first step to tease off Osd compute controller/contexts
from Far API.
Currently FarStencilTable only creates a kernelbatch for the entire range,
so we can use [0, numStencils) for all cases instead of KernelBatch.
This might not be true if we apply non-factorized level-wise stencils,
then we'll add another modular utility to serve those cases.
PatchTablesFactory fills 20 indices topology into patchtable, and use it for eval and draw.
note: currently screen-space adaptive tessellation of gregory basis patches is
broken and cracks appear around them.
When CLEW is present, CL functions become macros which may resolve to null
function pointers. A previous change attempted to guard against CL-related
crashes, but introduced compiler warnings.
This change conditionally tests the function when CLEW is present.
Resolves: #400
- extend Far::PatchTables data structures & interfaces to store requisite
information for channels of face-varying bi-cubic patches
- implement gather function in Far::PatchTablesFactory to populate face-varying
channels with adaptive patches
- extend accessor interface in Vtr::Level
- propagate code fall-out throughout OpenSubdiv code base, examples & tutorials
- extend vtrViewer code to visualize tessellated bi-cubic face-varying patches
- recent MSVC versions attempt to compile files with hlsl
extensions when passed on the command-line. This breaks
the build because these files are not meant to be compiled
directly by MSVC. I removed the dependency from the
CMakeList to prevent this from happening.
- changed ptex layout data types in shaders to match srv format
- changed ptex srv type to unorm format for uchar data
- fixed hlsl compiler warning: initialized edgeDistance of OutputVertex struct in domain shader even if we are not in wireframe mode
- added directx debug device and enabled automatic break points to easily spot dx errors
- renamed Sdc::Type to SchemeType and TypeTraits to SchemeTypeTraits
- renamed TYPE_ prefix to SCHEME_
- updated all usage within core library
- updated all usage in examples, tutorials, etc.
- move level of refinement / isolation into the Options structs
- fix splash damage in rest of the code
note 1: this is less than ideal, because most compilers accept the previous
call to these functions with an incorrect parameter list (ie. passing
the level instead of the struct issues no warnings and compiles...)
caveat emptor...
note 2: the level parameter names may not be final for adaptive modes
as we will likely want independent controls over crease vs.
extraordinary vertex isolation.
- don't rotate (s,t) coordinates but rotate the patch instead !
- refactor osd/cpuEvalLimitKernels to share Far::PatchTables cubic spline
interpolation functions : this replaces tensor product formulation with
weight matrices, which does not really impact performance here, but would
have to be replaced when implementing regular gridding functions.
- fix OsdCpuEvalLimitController to not rotate coordinates and pass the rotation bitfields
- expose Far::PatchTables spline interpolation API (protected -> public)
- fix glEvalLimit tangent buffers (remove empty padding - see below)
- change policy for tangent buffers : the output buffer descriptor is
**NO LONGER APPLIED** to tangent output buffers. Tangent primvar data
buffers are no longer applying the offset and stride from the descriptor
(because it doesn't make sense to share it). If more flexiblity is
required, we will consider adding independent descriptors for the tangent
buffers. This change will impact existing code that generates tangents
with the EvalLimit controller.
fixes#370
- change topology refiner to check for edge sharpnesses when selecting faces for isolation
- add face-aggregator for edge tags to Vtr::Level
- fix logic in Far::PatchTablesFactory to correctly tag single-crease patches along infinitely sharp edges
note : this fix is a bit of a cludge - barfowl confirms that the vertex crease tags (VTags) are intended to
carry neighborhood information, which they currently do not. we will revisit this shortly and fix the tags,
which will allow us to simplify the traversal logic when isolating topology features.
fixes#369
Const' declared instances of Vtr::Array do not protect the pointer held
privately by the class properly. In order to force the compiler to
protect this pointer, we removed all non-const accessors from Vtr::Array
(now renamed Vtr::ConstArray) and moved them to a child class (Vtr::Array),
which requires const_cast<> operators internally to allow access.
The change & renaming is then propagated to all internal dependencies.
- VVarBoundaryInterpolation is now VtxBoundaryInterpolation
- enum prefix change from VVAR to VTX
- generel cleanup / doxyfication
- update of beta / release notes
- make sure we don't get conflicting enums (CODE_ERROR)
- fix template specialization for Far::TopologyRefinerFactory in regression/common/vtr_utils
- fix remaining error reporting code around osd
- change error codes from situational to general (fatal / coding / run-time...)
- pull error functions from Osd into Far
- add a templated topology validation reporting system to Far::TopologyRefinerFactory
- fix fallout on rest of code-base
- split Far::PatchDescriptor into its own class (mirrors Far::PatchParam)
- hide PatchArray as a private internal structure
- add public accessors patterned after Far::TopologyRefiner (returning Vtr::Arrays)
- propagate new API to all dependent code
note: some direct table accessors have not been removed *yet* - see code for details
- adding support for StencilTables creation from a Gregory basis
- fix a bug in the prot-stencil allocator (slow memory pool was not being cleared properly)
- adaptive mode: remove faces tagged as holes from the selection of faces to isolate
- uniform mode: faces tagged as holes are still included in the refinement process,
however they are removed from patch tables
- future improvements: add a 'selective refinement' code path separate from 'uniform refinement'
to handle this case without un-necessary subdivision work.
- re-implement the pool allocator
- use templates to remove code redundancy between regular & limit stencils
- leverage [] operator overloading to simplify stencil factorization
- add the ability to treat subdivision levels independently (see below)
- refactor Far::TopologyRefiner::Interpolate<>() methods to pass buffers by reference
(allows overloading of [] operator)
- rename some of the stencil factory options
- propagate changes to Osd / examples / tutorials...
Catmull-Clark Subdivision Surfaces", Niessner et al, Eurographics 2012.
This change includes;
-topology identification for single-crease patch during adaptive refinement.
-patch array population (similar to boundary)
-sharpness buffer generation
-glsl shader
Eval stuffs will be coming.
Make the sample locations dynamic by adding a velocity vector. Face boundary
crossing is handled using the new ptex adjacency functionality recently
added to the Far::TopologyRefiner.
- added an option to Far::StencilTablesFactory to generate stencils for
coarse control vertices
- refactored interpolation code out into Far::PatchTables
- corrected tangent interpolation
- code cleanup & comments
Sync'ing the 'dev' branch with the 'feature_3.0dev' branch at commit 68c6d11fc36761ae1a5e6cdc3457be16f2e9704a
The branch 'feature_3.0dev' is now locked and preserved for historical purposes.
- remove obsolete glPushClient functions
note: this code is still duplicated in the ptexViewer (which still needs to be upgraded to the new framebuffer)
fixes#307
If the system has CLEW installed (which is detected by recently
added FindCLEW routines) then OpenSubduv would be compiled against
this library.
It makes binaries and libraries more portable across the systems,
so it's possible to run the same binary on systems with and without
OpenCL SDK installed.
The most annoying part of the change is updating examples to load
OpenCL libraries, but ideally code around controllers and interface
creation is to be de-duplicated anyway.
Based on the pull request #303 from Martijn Berger
- Some missing includes of <algorithms> in order to have
stdd::min() and similar functions.
- Need to cast numIndices and numNVerts to int explicitly
in order to solve warning treated as an error about
precision loss.
- Can't do vector[0] for an empty vector, it'll generate
a runtime range check error.
- MSVC only works fine with make_pair(foo, bar) syntax,
without explicit template substitution here. Otherwise
weird 'can't cast int to int&&' errors are happening.
- add a framebuffer to gl_hud with programmable image shader
- add optional SSAO image shader to the new framebuffer
- add screenshot to png functionality
- implement in glViewer
note: ptexViewer and some others still need refactoring to use the new hud capabilities
Moved transient states (current vertex buffer etc) to controller.
ComputeContext becomes constant so that it's well suited for coarse-grain
parallelism on cpu.
Client-facing API has changed slightly - limitEval example has been adjusted
All kernels take offset/length/stride to apply subdivision partially in each vertex elements.
Also the offset can be used for client-based VBO aggregation, without modifying index buffers.
This is useful for topology sharing, in conjunction with glDrawElementsBaseVertex etc.
However, gregory patch shader fetches vertex buffer via texture buffer, which index should also
be offsetted too. Although gl_BaseVertexARB extension should be able to do that job, it's a
relatively new extension. So we use OsdBaseVertex() call to mitigate the compatibility
issue as clients can provide it in their way at least for the time being.
- fix default selection for pulldown widgets
- move widgets around to prevent overlap in examples
- add a little triangle indicator to the pulldown widget
- switch color from white to yellow for selected pulldown item
- switch shading radio buttons to pulldowns
- re-ordered elements on screen in most viewers
note: the ptex viewer has not been updated to the new look yet
* maintainance work on the D3D11 specialization of OsdMesh to bring it in line with the other template specializations
* updated the facePartition example to derive PartitionedMesh from OsdMesh in order to allow other vertex buffer and compute controller configurations
* added the numVertexElements argument to Osd*DrawContext::Create, which is used to initialize the patch arrays when calling OsdDrawContext::ConvertPatchArrays
* removed the unused level argument from Osd*DrawContext::_initialize
* maintenance work on CL/D3D11 bindings to get them to compile
* replace void* of all kernel applications with CONTEXT template parameter.
It eliminates many static_casts from void* for both far and osd classes.
* move the big switch-cases of far default kernel launches out of Refine so
that osd controllers can arbitrary mix default kernels and custom kernels.
* change FarKernelBatch::kernelType from enum to int, clients can add
custom kernel types.
* remove a back-pointer to farmesh from subdivision table.
* untemplate all subdivision table classes and template their compute methods
instead. Those methods take a typed vertex storage.
* remove an unused argument FarMesh from the constructor of subdivision
table factories.
The existing code in the node assumes that all the faces in the mesh have the same valence
when populating the component tables describing properties such as per-face material assignment.
This fix accounts for arbitrary topology.
fixes#269
- fix crash in the case of partial uv & color sets
- make sure that the default uv set is not dropped if its name is not 'map1'
note: the current fix carries the first uv set to the refined mesh, but custom
names are dropped and replaced by 'map1'. We appear to be running into a Maya
API bug.
fixes#267
On platforms with multiple devices (e.g. OS X Mavericks
with both Intel and Discrete GPU CL devices) we must
create the context and command queue with the correct
device in order to share resources with GL.
This follows existing patterns (more or less), but
there are certainly opportunities to move more of this
sort of logic into macros defined at the top level.
The example code now uses the new glfw*FramebufferSize methods
to determine the size of the windows's framebuffer for rendering
and glfw*WindowSize method for user interaction
Fixes#263
- renders off-screen a higher resolution version of the current view
- saves the render to image files (screenshot.x.png)
- modified the background color to have alpha set to 0 (screenshots can be composited)
Because our plugin sets UVs with individual face-varying vertices,
Maya interprets the buffer as discontinuous everywhere. Adding a
node in the graph that merges UVs along non-boundary edges resolves
this problem (until the plugin outputs the UV vertex indices in
an aggregated manner).
- added a _stringify function to top CMakeLists
- switched all stringification tasks to use the macro
- all suffixes are now .gen.h instead of .inc (to help cmake track dependencies)
- set OBJECT targets for osd cpu & gpu libs, and use the obj target for
static and dynamic linking
- add a new examples_common_obj OBJECT target
- replace direct source dependencies to obj target in all examples CMakeLists
This change makes it possible to not re-compile the same source files
multiple times when they are used in multiple targets. Thanks to jcowles
for uncovering the CMake functionality.
Note: it seems that multi-process build is working again (gmake -j <x>)
Do feature adaptive refinement, then use the cpuEvalLimit API to evaluate
grids of points on faces.
Test harness is tessellateObjFile which has a -blender option to trigger
the gridding tessellation code.