* assembler kernels are based on the C implementation in neonKernel.cpp
* enable assembler kernel functions in neonComputeController.cpp with #define USE_ASM_KERNELS 1
Unused argument `pass` was defined in the CUDA kernel and it was never
passed to this function from the C++ code. This argument is also wasn't
used by the function itself.
Solved by checking on run-time whether texture buffer objects
are supported.
When building with GLEW library doing compile-time check is
not enough, because actual information about existing features
is only known on runtime.
This only makes ti so CPU backend works, GLSL backends still
requires some work if we want them to make working. Not sure
it worth doing this now.
* added `OsdMeshInterface::GetFarMesh` and `OsdMesh::GetFarMesh` to match `OsdGLMesh` and `OsdD3D11Mesh`
* added `interleaved` argument to `OsdMesh::Refine` to match `OsdMeshInterface::Refine`
* The CATMARK_QUAD_FACE_VERTEX kernel calculates the face-vertex for a quadrilateral face. It applies to every face after the first subdivision step, and may be applied for the first subdivision step of a quadrilateral coarse mesh.
* The CATMARK_TRI_QUAD_FACE_VERTEX kernel calculates the face-vertex for a triangle or quadrilateral face. It may be applied for the first subdivision step of a coarse mesh composed of triangles and/or quadrilaterals.
* Both kernels calculate each face-vertex using four vertex indices (triangles are specified by repeating the third index). Therefore neither kernel uses the F_ITa codex table, and instead the first vertex offset in the F_IT index table is stored in the FarKernelBatch's table offset.
If the system has CLEW installed (which is detected by recently
added FindCLEW routines) then OpenSubduv would be compiled against
this library.
It makes binaries and libraries more portable across the systems,
so it's possible to run the same binary on systems with and without
OpenCL SDK installed.
The most annoying part of the change is updating examples to load
OpenCL libraries, but ideally code around controllers and interface
creation is to be de-duplicated anyway.
Based on the pull request #303 from Martijn Berger
Moved transient states (current vertex buffer etc) to controller.
ComputeContext becomes constant so that it's well suited for coarse-grain
parallelism on cpu.
Client-facing API has changed slightly - limitEval example has been adjusted
- fix some variable names (private vs. public)
- implement constructors to guarantee initialized pointers (d'oh)
- add a 'Reset' method to unbind buffers
Note: while the new contexts have been cleaned up, we now have a fair amount of duplicated code in the controllers...
Moved transient states (current vertex buffer etc) to controller.
ComputeContext becomes constant so that it's well suited for coarse-grain
parallelism on cpu.
All kernels take offset/length/stride to apply subdivision partially in each vertex elements.
Also the offset can be used for client-based VBO aggregation, without modifying index buffers.
This is useful for topology sharing, in conjunction with glDrawElementsBaseVertex etc.
However, gregory patch shader fetches vertex buffer via texture buffer, which index should also
be offsetted too. Although gl_BaseVertexARB extension should be able to do that job, it's a
relatively new extension. So we use OsdBaseVertex() call to mitigate the compatibility
issue as clients can provide it in their way at least for the time being.
Moved transient states (current vertex buffer etc) to controller.
ComputeContext becomes constant so that it's well suited for coarse-grain
parallelism on cpu. The prims sharing same topology (ComputeContext) can
be refined simultaneously by having mutiple compute controllers.
Client facing API doesn't change.
- add a limit evaluation method to EvalLimitController that allows
client code to directly pass the output buffer without binding it
to the Context (the call only computes vertex interpolation of a
single sample)
- switch the OsdUtilAdaptiveEvaluator to use the new method from the controller
and stop stomping member
- cleanup buffer and member variables no longer used
- cleanup initialization logic to be better aware of uniform / adaptive
- add some assert sanity checks in the cpuEvalLimitKernels
fixes#293
* rolled getNumFVarVertices into allocateTables
* renamed tessellate to triangulateQuads (technically speaking, Loop scheme uses a trivial triangulation)
* condensed the pointer arithmetic used for triangulating the data tables
* maintainance work on the D3D11 specialization of OsdMesh to bring it in line with the other template specializations
* updated the facePartition example to derive PartitionedMesh from OsdMesh in order to allow other vertex buffer and compute controller configurations
* added the numVertexElements argument to Osd*DrawContext::Create, which is used to initialize the patch arrays when calling OsdDrawContext::ConvertPatchArrays
* removed the unused level argument from Osd*DrawContext::_initialize
* maintenance work on CL/D3D11 bindings to get them to compile
Delete scheme specialized subdivision tables. The base class FarSubdivisionTables
already has all tables, so we just need scheme enum to identify which scheme
the subdivision tables belong to.
This brings a lot of code cleanups around far factory classes.
* replace void* of all kernel applications with CONTEXT template parameter.
It eliminates many static_casts from void* for both far and osd classes.
* move the big switch-cases of far default kernel launches out of Refine so
that osd controllers can arbitrary mix default kernels and custom kernels.
* change FarKernelBatch::kernelType from enum to int, clients can add
custom kernel types.
* remove a back-pointer to farmesh from subdivision table.
* untemplate all subdivision table classes and template their compute methods
instead. Those methods take a typed vertex storage.
* remove an unused argument FarMesh from the constructor of subdivision
table factories.
Prevent boundaryEdgeNeighbors[2] from being overrun when an interior
vertex has more than 2 boundary neighbor vertices. The fix is applied
to the GLSL / HLSL and CPU implementations.
Note: this appears to fix long-standing problems with Gregory patches,
but i am not entirely convinced that this fixes the general case.
fixes#259
Also:
- Add a _numVertices member to cpuSmoothNormalContext (for memory reset function)
- Fix memory reset function in cpuSmoothNormalContext (was performing redundant memsets)
- Add a resetMemory boolean to cpuSmoothNormalContext to make reset step optional (default is off)
- added a _stringify function to top CMakeLists
- switched all stringification tasks to use the macro
- all suffixes are now .gen.h instead of .inc (to help cmake track dependencies)
Further leverage cmake object libraries to share object files for CPU
and GPU OSD libraries, avoiding duplicate complation for dynamic/static
build passes.
CMake restricts object library inputs to header and source files, so the
.inc files were renamed to .gen.h (which seems like a better name
anyway) to make CMake happy.
Also updated the .gitignore file to ignore .gen.h files.
Conflicts:
opensubdiv/osd/CMakeLists.txt
- set OBJECT targets for osd cpu & gpu libs, and use the obj target for
static and dynamic linking
- add a new examples_common_obj OBJECT target
- replace direct source dependencies to obj target in all examples CMakeLists
This change makes it possible to not re-compile the same source files
multiple times when they are used in multiple targets. Thanks to jcowles
for uncovering the CMake functionality.
Note: it seems that multi-process build is working again (gmake -j <x>)
An object library allows other build targets to use the object files
from this library.
The change introduces osd_static_cpu_obj which is consumed by
osd_static_cpu.
This will be useful for emscripten integration where we cant use the
compiled library, rather it will use the object files, targeting
osd_static_cpu_obj.
Important notice: all client shader code must have following functions and compose them to osd intrinsic shaders (vertex/tessEval/tessControl)
mat4 OsdModelViewMatrix()
mat4 OsdProjectionMatrix()
mat4 OsdModelViewProjectionMatrix()
float OsdTessLevel()
int OsdGreogryQuadOffsetBase()
int OsdPrimitiveIdBase()
We probably should write a utility class for basic binding of them, to make client code simpler.
Moving Takahito's implementation into the core API:
- added <gl/d3d11>PtexCommon.<glsl/hlsl> shader code
- added control to enable Ptex common trunk in <gl/d3d11>DrawRegistryBase classes
- fixed GL & D3D11 ptexViewer examples to use the new API
New text:
Copyright 2013 Pixar
Licensed under the Apache License, Version 2.0 (the "Apache License")
with the following modification; you may not use this file except in
compliance with the Apache License and the following modification to it:
Section 6. Trademarks. is deleted and replaced with:
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor
and its affiliates, except as required to comply with Section 4(c) of
the License and to reproduce the content of the NOTICE file.
You may obtain a copy of the Apache License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the Apache License with the above modification is
distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the Apache License for the specific
language governing permissions and limitations under the Apache License.
- Adding FarStencilTables and FarStencilTablesFactory classes
- Adding Osd EvalStencil context & controllers for CPU, OMP and TBB backends
- Adding the code example glStencilViewer
- Adding reST documentation
- Changing version to 2.2.0_dev
- Fix HbrMesh::Unrefine function
- Fix "CanEval" function in OsdVertexBufferDescriptor
Note 0: there is no stencil support for hierarchical edits
Note 1: there is no support for face-varying data stencils yet
Note 2: the current stencil factory is lazy but the caching system is not re-entrant
- move python build section into the python directory (cleaning up)
- fix some broken dependencies
- remove the public_headers targets if doxygen was not found
TODO :
- fix MSVC targets for public headers (wbn if MSVC didn't require the pro version
in order to support solution folders)
- fix osd_regression to not build if -DNO_LIB is present (ie. fix the broken dependency)
- add macro "_add_doxy_headers" in order to track all header files elligible for
doxygen documenation. This captures public header files that would otherwise be
excluded from installation because they are not supported by the OS. Private
header files remain excluded though.
- add custom targets and commands so that documentation build produces functioning
RST and Doxygen documentation both in the build and install stages
- switched to Doxygen 1.8 (because markdown will make in-lined documentation easier)
- added build switches to disable examples, regression and python-SWIG targets
- fixed doxygen link in the nav bar
- modified python html processing tool to match Cmake changes
- Added OSD_ prefix to preprocessor symbols
- Adjusted transition sub-patch parameterization to be
consistent with non-transition patches
- Unified BSpline shader code
- Removed duplicate Boundary, Corner, and Transition shader source
- Fixed a few discrepancies in the remaining duplicate code paths
- Replaced EvalData and EvalVertexData classes with a simpler DataStream class that only
accesses a single data stream, binds and unbinds it
- DataStream has both an input and an output version which avoids much of the const-ness
const-related ambiguity of the previous design pattern
- Vertex, varying and face-varying data now all have a dedicate struct (VertexData, VaryingData, FaceVaryingData)
as a way of gathering the various data-streams required to perform sampling
- renamd some "Buffers" into "Tables" for better naming consistency with Far
- remove PatchMap from FarPatchTables
- add a new FarPatchMap quad-tree class (constructed from FarPatchTables)
- refactor the EvalLimitController to use the quad-tree search instead of a
serial loop access
fixes#174
- defined a fallback value for ROTATE
- made GetPatchLevel() a macro to avoid
referencing gl_PrimitiveID from vertex shaders
- fixed float array initializers
- minor refactoring of the LimitEvalContext to accomodate all the data buffers
- pushing some minor sub-patch functionality back to FarPatchParams
- extend example code with randomly generated varying vertex colors
adding an arbitrary break if vertex valence is > 256
- add a Warning function to Osd error reporting
- minor cleanup /refactor / document of OsdError
fixes#167
and adding the requisite accessors
Note : all our example code goes through the same boiler-plate texture
binding code - we might want to move it as a member function of the DrawContext.
- added boundary / corner kernel code
- bug fixes for Gregory patch kernel
- wired the new kernels in the controller class
Note 1 : corner / gregory kernels are not working yet
Note 2 : the vertex mirroring solution used for boundary / corner kernels could be incorrect...
- FarKernelBatch becomes a class w/ accessors
- split the FarKernelBatchFactory to its own header file
- add doxy doc
- propagate fallout to the rest of the code base
be used as intended to specify an installation directory, which can be located anywhere on the
file system.
Also improved the doxygen target and made the doxy build "quiet".
fixes#154
- replace ptex indexing with the FarPtexCoord structure as a way to pass per-patch
ptex data to the shaders.
We are replacing a vector<int> arranged as :
int[0] : ptex face index
int[1] : (u,v) as 16 bits encoding the log2 coordinate of the top left corner
Instead we are now using a struct arranged as :
int[0] : ptex face index
int[1] : is a bit-field containing u,v, rotation, depth and non-quad
The u,v coordinates have been reduced to 10 bits instead of 16, which still
gives us a lot of margin.
- Replace OsdVertexBufferDescriptor with something more adequate for general
primvar representation (this name will probably eventually change...)
- Improve OsdPatchDescriptor
- add a "loop" boolean (true if the patch is of loop type)
- add a GetPatchSize() accessor
- OsdPatchArray :
- remove some redundant elements (still more to do there)
- Fix all shader / examples / regressions & stuff to make this all work.
fixes#143
2 client APIs are changed.
- VertexBuffer::UpdateData() takes start vertex offset
- ComputeController::Refine() takes FarKernelBatchVector
Also, ComputeContext no longer holds farmesh.
Client can free farmesh after OsdComputeContext is created.
(but still need FarKernelBatchVector to apply subdivision kernels)
Now a ComputeController is passed as an
argument to OsdMesh::Create(). This is
a better match to the underlying object
model and can be much more efficient for
compute controllers that have expensive
resources, e.g. compiled shader kernels.
Fixes#103
- add bool OsdGLDrawContext::SupportsAdaptiveTessellation() method
- modify glViewer to use that instead of #ifdefs
Note : this is not the final word on this as OSD really needs a more comprehensive
system to provide run-time information about available features to the client code.
fixes#111
Model the GL VB after D3D11 one, where there are no data read backs, however this means
an extra memory copy of the buffer. 4th level uniform subdiv on Car, glGetBufferSubData
was taking 50% of CPU time before (actual subdiv 22%), now that is gone. Full CPU Draw
62ms -> 54ms, looks like most of overhead now is just waiting on GL queries).
In example code, GLUT has been replaced with GLFW so that glViewer/ptexViewer can run on OSX (10.7 or later).
OSX note: still have some problem with clang, may need to explicitly specify gcc on cmake cmdline
-DCMAKE_CXX_COMPILE=/usr/bin/g++
fixes#98
- remove the GL error check in cudaGLVertexBuffer :
* unrelated GL errors left on the stack were triggering erroneous
vertexBuffer allocation errors
* we should not be checking for GL errors here anyway (as most other
buffer allocations aren't checked either)
- add some pointer checking in the GL / D3D drawContexts in case the
vertexBuffer pointers passed are NULL
- add some additional typedefs in OsdError to report some of the new
CUDA / GL related errors
This avoids adaptive tessellation artifacts near silhouette edges
by using the projected diameter of an edge's bounding sphere
rather than the length of the projected edge itself.
There is a nice writeup of this by Bryan Dudash of NVIDIA at:
https://developer.nvidia.com/content/dynamic-hardware-tessellation-basics
- [Feature Adaptive GPU Rendering of Catmull-Clark Surfaces](http://research.microsoft.com/en-us/um/people/cloop/tog2012.pdf).
- New API architecture : we are planning to lock on to this new framework as the basis for backward compatibility, which we will enforce from Release 1.0 onward. Subsequent releases of OpenSubdiv should not break client code.
- DirectX 11 support
- and much more...
- All data representation classes are now single-templated for a vertex class 'U'
- All constructors / instancing code has been moved into "Factory" functions that are dual-templated
for two vertex classes <class T, class U=T>. This allows hbr specialization with a placeholder
vertex flass 'T' for faster analysis without paying interpolation costs, while far can still specialize
a fully implemented vertex class 'U' with full subdivision functionality.
- Some preliminary clean-up work on FarVertexEditTables with the addition of a FarVertexEdit class
as a replacement for the former HbrVertedEdit which was introducing back dependencies on hbr. The
implementation is very lightweight. Some slight renaming / cleanup of the code, with some more to
be done.
- there are no more dependencies on hbr (not even #include) from far's data structure !
Notes :
- the FarDispatcher mechanism has become somewhat awkward and should be re-evaluated when refactoring osd.
- the "Factory" pattern survives this round of refactoring until we can find something better.
Closes#34
mutex class with Lock / Unlock public functions.
- remove Mutex implementation from Hbr (and revert to original PRman code)
- provide a Mutex class stub in osd
- add some forward declarations in OsdMesh to limit some of the mutex spills
- #include <osd/mutex.h> where needed (little hackish until we can refactor
some of far better)
- remove ILM_BASE from some CMakeLists
Closes#48
where it can cause havoc downstream, and move vertexBuffers into the cpp
file to avoid gl.h inclusion and to fix dynamic cast issues. These were
found during Presto integration.
- modify shape_utils to return a vector of coarse vertices when creating an hbr mesh
- minor cleanup of osd mesh and the addition of a vector parameter in the creator to
save the remapping between the hbr mesh progenitor and the current serialized osd mesh.
- minor fallout modifications to the glutViewer & far regression code
Notes :
- the dual template of far is causing a lot of complications
-> suggest finding a way to isolate the T template to the factory code.
-> far needs a concept of a vector of vertex & varying data (to abstract the vertex buffer
away from osd)
-> the dispatched mechanism is awkward and needs refactoring
-> suggest moving the default CPU kernels away from the subdivision tables
-> suggest finding a way to completely untemplate the tables (we might need a templated
factory function though)
-> osd should be able to call delete on the far mesh to get rid of all the CPU-bound data
once the GPU data has been laid-out.
Closes#18.
kernel, call OsdKernelDispatcher::Factory::Register() and keep the integer
result value as kernel handle.
Attempted to elimiate registering function from client code, but currently
disabled (in kernelDispatcher.cpp) because of Maya plugin doesn't work with cuda
kernel.
glutViewer creates kernel menu dynamically according to linked kernels.
Fix a bug of maya plugin crashes.
Closes#14
- use find_package(OpenMP) to test that the compiler supports OMP
(looks like the "express" versions of MSVC do not)
- if not available, make sure that osd does not register those
compute kernels (but does register the CPU standalone ones)
- similar refinements on other dependencies (Maya, CUDA) where
the build "opts in" depending on which libs are found.
some CMakeLists still need more cleanup...
Closes#9
specification (how many elements exists in the buffer).
client will create OsdVertexBuffer and provide it as an argument of
OsdMesh::Subdivide() function. It would be more flexible and hopefully matches
various use cases.
Since each dispatcher has to accept arbitrary vertex buffer, introduced a simple
shader registry into glslDispatcher. It will configure shaders for given vertex
elements on demand (for now, just works only for varying buffer).
Fixed cuda kernel's GL resource leakage. Since cuda GL interop seems one-way,
OsdCudaVertexBuffer manages vertex updating instead of just using
OsdGpuVertexBuffer.
Cleaned up some kernel codes and renamed ambiguous names.