While this may be worth revisiting, we should first quantify the benefits and
identify the compilers that support it. Ultimately, we may never use pragma
once in favor of strictly using standard C++.
In OpenSubdiv 2.x, we encapsulated subdivision tables into
compute context in osd layer since those tables are order-dependent
and have to be applied in a certain manner. In 3.0, we adopted stencil
table based refinement. It's more simple and such an encapsulation is
no longer needed. Also 2.0 API has several ownership issues of GPU
kernel caching, and forces unnecessary instantiation of controllers
even though the cpu kernels typically don't need instances unlike GPU ones.
This change completely revisit osd client facing APIs. All contexts and
controllers were replaced with device-specific tables and evaluators.
While we can still use consistent API across various device backends,
unnecessary complexities have been removed. For example, cpu evaluator
is just a set of static functions and also there's no need to replicate
FarStencilTables to ComputeContext.
Also the new API delegates the ownership of compiled GPU kernels
to clients, for the better management of resources especially in multiple
GPU environment.
In addition to integrating ComputeController and EvalStencilController into
a single function Evaluator::EvalStencils(), EvalLimit API is also added
into Evaluator. This is working but still in progress, and we'll make a followup
change for the complete implementation.
-some naming convention changes:
GLSLTransformFeedback to GLXFBEvaluator
GLSLCompute to GLComputeEvaluator
-move LimitLocation struct into examples/glEvalLimit.
We're still discussing patch evaluation interface. Basically we'd like
to tease all ptex-specific parametrization out of far/osd layer.
TODO:
-implments EvalPatches() in the right way
-derivative evaluation API is still interim.
-VertexBufferDescriptor needs a better API to advance its location
-synchronization mechanism is not ideal (too global).
-OsdMesh class is hacky. need to fix it.
Changing all device kernels to take two buffer identifiers for
source and destination separately.
This change is an intermediate step toward upcoming context/controller
refactoring.
Previously we have a limitation that the source and destination
vertex buffer has to be a single buffer, since the subdivision
kernels are iteratively applied by level.
With stencil tables, we don't have such a limitation any more,
so we may want to apply stencils from seprate source buffer to
another.
To specifiy the output location within the destination buffer,
we can use VertexBufferDescriptor.offset. This allows us not only
configuring arbitrary batching scheme, but also relaxing the
limitation that source and destination buffers are in same
interleaved layout. For examples, we could include derivatives only
in the destination buffer, which doesn't need to be allocated in
the source buffer.
Sync'ing the 'dev' branch with the 'feature_3.0dev' branch at commit 68c6d11fc36761ae1a5e6cdc3457be16f2e9704a
The branch 'feature_3.0dev' is now locked and preserved for historical purposes.
* The CATMARK_QUAD_FACE_VERTEX kernel calculates the face-vertex for a quadrilateral face. It applies to every face after the first subdivision step, and may be applied for the first subdivision step of a quadrilateral coarse mesh.
* The CATMARK_TRI_QUAD_FACE_VERTEX kernel calculates the face-vertex for a triangle or quadrilateral face. It may be applied for the first subdivision step of a coarse mesh composed of triangles and/or quadrilaterals.
* Both kernels calculate each face-vertex using four vertex indices (triangles are specified by repeating the third index). Therefore neither kernel uses the F_ITa codex table, and instead the first vertex offset in the F_IT index table is stored in the FarKernelBatch's table offset.
All kernels take offset/length/stride to apply subdivision partially in each vertex elements.
Also the offset can be used for client-based VBO aggregation, without modifying index buffers.
This is useful for topology sharing, in conjunction with glDrawElementsBaseVertex etc.
However, gregory patch shader fetches vertex buffer via texture buffer, which index should also
be offsetted too. Although gl_BaseVertexARB extension should be able to do that job, it's a
relatively new extension. So we use OsdBaseVertex() call to mitigate the compatibility
issue as clients can provide it in their way at least for the time being.
New text:
Copyright 2013 Pixar
Licensed under the Apache License, Version 2.0 (the "Apache License")
with the following modification; you may not use this file except in
compliance with the Apache License and the following modification to it:
Section 6. Trademarks. is deleted and replaced with:
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor
and its affiliates, except as required to comply with Section 4(c) of
the License and to reproduce the content of the NOTICE file.
You may obtain a copy of the Apache License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the Apache License with the above modification is
distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the Apache License for the specific
language governing permissions and limitations under the Apache License.
2 client APIs are changed.
- VertexBuffer::UpdateData() takes start vertex offset
- ComputeController::Refine() takes FarKernelBatchVector
Also, ComputeContext no longer holds farmesh.
Client can free farmesh after OsdComputeContext is created.
(but still need FarKernelBatchVector to apply subdivision kernels)
- [Feature Adaptive GPU Rendering of Catmull-Clark Surfaces](http://research.microsoft.com/en-us/um/people/cloop/tog2012.pdf).
- New API architecture : we are planning to lock on to this new framework as the basis for backward compatibility, which we will enforce from Release 1.0 onward. Subsequent releases of OpenSubdiv should not break client code.
- DirectX 11 support
- and much more...
- All data representation classes are now single-templated for a vertex class 'U'
- All constructors / instancing code has been moved into "Factory" functions that are dual-templated
for two vertex classes <class T, class U=T>. This allows hbr specialization with a placeholder
vertex flass 'T' for faster analysis without paying interpolation costs, while far can still specialize
a fully implemented vertex class 'U' with full subdivision functionality.
- Some preliminary clean-up work on FarVertexEditTables with the addition of a FarVertexEdit class
as a replacement for the former HbrVertedEdit which was introducing back dependencies on hbr. The
implementation is very lightweight. Some slight renaming / cleanup of the code, with some more to
be done.
- there are no more dependencies on hbr (not even #include) from far's data structure !
Notes :
- the FarDispatcher mechanism has become somewhat awkward and should be re-evaluated when refactoring osd.
- the "Factory" pattern survives this round of refactoring until we can find something better.
Closes#34
specification (how many elements exists in the buffer).
client will create OsdVertexBuffer and provide it as an argument of
OsdMesh::Subdivide() function. It would be more flexible and hopefully matches
various use cases.
Since each dispatcher has to accept arbitrary vertex buffer, introduced a simple
shader registry into glslDispatcher. It will configure shaders for given vertex
elements on demand (for now, just works only for varying buffer).
Fixed cuda kernel's GL resource leakage. Since cuda GL interop seems one-way,
OsdCudaVertexBuffer manages vertex updating instead of just using
OsdGpuVertexBuffer.
Cleaned up some kernel codes and renamed ambiguous names.