- Limits the maximum number of threads to 64, since btThreadSupportPosix
and btThreadsupportWin32 don't support more than 64 bits at this moment,
due to the use of UINT64 bitmasks. This could be fixed by using
std::bitset or some other alternative.
- Introduces a threadpool class, b3ThreadPool, which is a simple wrapper
around btThreadSupportInterface and uses this instead of the global task
scheduler for parallel raycasting. This is actually quite a bit faster
than the task scheduler (~10-15% in my tests for parallel raycasts),
since the advanced features (parallelFor) are not necessary for the
parallel raycasts.
- Puts 16*1024 of MAX_RAY_INTERSECTION_MAX_SIZE_STREAMING in
parentheses, since it otherwise causes problems with other operators
of equal precedence and introduces a smaller constant for Apple targets.
- Refactors the parallel raycasts code and adds some more profiling.
use b3RaycastBatchAddRays API to enable MAX_RAY_INTERSECTION_BATCH_SIZE_STREAMING num rays.
Old API (b3RaycastBatchAddRay) sticks to 256 rays, MAX_RAY_INTERSECTION_BATCH_SIZE.
(and issue with TaskScheduler/btTaskScheduler.cpp, add JobQueue::exit, call it first, since it uses the m_threadSupport which was deleted before the destrucor was called.
Use a hashmap to store user timers, to avoid allocating many identical strings.
reduce 'm_cooldownTime' from 1000 microseconds to 100 microseconds (overhead in raycast is too large)
If needed, we can expose this cooldown time.
Replace malloc by btAlignedObjectArray (going through Bullet's memory allocator)
Method def isAlive(self), which defaults to return self._alive < 0, and each environment can override this method (Half Cheetah would implement return False)
(In response to bea468fb93)
As suggested in https://github.com/bulletphysics/bullet3/pull/1759. The default isDone lets done = alive<0, and a special case is made for halfcheetah, forcing done=False.
I had to pass the 'alive' condition as an additive parameter of WalkerBaseBulletEnv.
To enable the feature, enable the BULLET2_MULTITHREADING option.
Increases the number of rays that can go in a batch request by storing
them in the shared memory stream instead of the shared memory command.
Adds the API b3RaycastBatchSetNumThreads to specify the number of
threads to use for the raycast batch, also adds the argument numThreads
to the pybullet function rayTestBatch.
Rays are distributed among the threads in a greedy fashion there's a shared
queue of work, once a thread finishes its task, it picks the next
available ray from the task. This works better than pre-distributing the
rays among threads, since there's a large variance in computation time per ray.
Some controversial changes:
- Added a pointer to PhysicsClient to the SharedMemoryCommand struct, this
was necessary to keep the C-API the same for b3RaycastBatchAddRay, while
adding the ray to the shared memory stream instead of the command
struct. I think this may be useful to simplify other APIs as well, that
take both a client handle and a command handle.
- Moved #define SHARED_MEMORY_MAX_STREAM_CHUNK_SIZE to
SharedMemoryPublic. This was necessary for the definition of
MAX_RAY_INTERSECTION_BATCH_SIZE.