MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
#version 450
|
|
|
|
#extension GL_KHR_shader_subgroup_basic : require
|
|
|
|
#extension GL_KHR_shader_subgroup_ballot : require
|
|
|
|
#extension GL_KHR_shader_subgroup_vote : require
|
|
|
|
#extension GL_KHR_shader_subgroup_shuffle : require
|
|
|
|
#extension GL_KHR_shader_subgroup_shuffle_relative : require
|
|
|
|
#extension GL_KHR_shader_subgroup_arithmetic : require
|
|
|
|
#extension GL_KHR_shader_subgroup_clustered : require
|
|
|
|
#extension GL_KHR_shader_subgroup_quad : require
|
|
|
|
layout(local_size_x = 1) in;
|
|
|
|
|
|
|
|
layout(std430, binding = 0) buffer SSBO
|
|
|
|
{
|
|
|
|
float FragColor;
|
|
|
|
};
|
|
|
|
|
|
|
|
void main()
|
|
|
|
{
|
|
|
|
// basic
|
|
|
|
FragColor = float(gl_NumSubgroups);
|
|
|
|
FragColor = float(gl_SubgroupID);
|
|
|
|
FragColor = float(gl_SubgroupSize);
|
|
|
|
FragColor = float(gl_SubgroupInvocationID);
|
|
|
|
subgroupBarrier();
|
|
|
|
subgroupMemoryBarrier();
|
|
|
|
subgroupMemoryBarrierBuffer();
|
|
|
|
subgroupMemoryBarrierShared();
|
|
|
|
subgroupMemoryBarrierImage();
|
|
|
|
bool elected = subgroupElect();
|
|
|
|
|
|
|
|
// ballot
|
|
|
|
FragColor = float(gl_SubgroupEqMask);
|
|
|
|
FragColor = float(gl_SubgroupGeMask);
|
|
|
|
FragColor = float(gl_SubgroupGtMask);
|
|
|
|
FragColor = float(gl_SubgroupLeMask);
|
|
|
|
FragColor = float(gl_SubgroupLtMask);
|
|
|
|
vec4 broadcasted = subgroupBroadcast(vec4(10.0), 8u);
|
2020-10-21 06:51:48 +00:00
|
|
|
bvec2 broadcasted_bool = subgroupBroadcast(bvec2(true), 8u);
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
vec3 first = subgroupBroadcastFirst(vec3(20.0));
|
2020-10-21 06:51:48 +00:00
|
|
|
bvec4 first_bool = subgroupBroadcastFirst(bvec4(false));
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
uvec4 ballot_value = subgroupBallot(true);
|
|
|
|
bool inverse_ballot_value = subgroupInverseBallot(ballot_value);
|
|
|
|
bool bit_extracted = subgroupBallotBitExtract(uvec4(10u), 8u);
|
|
|
|
uint bit_count = subgroupBallotBitCount(ballot_value);
|
|
|
|
uint inclusive_bit_count = subgroupBallotInclusiveBitCount(ballot_value);
|
|
|
|
uint exclusive_bit_count = subgroupBallotExclusiveBitCount(ballot_value);
|
|
|
|
uint lsb = subgroupBallotFindLSB(ballot_value);
|
|
|
|
uint msb = subgroupBallotFindMSB(ballot_value);
|
|
|
|
|
|
|
|
// shuffle
|
|
|
|
uint shuffled = subgroupShuffle(10u, 8u);
|
2020-10-21 06:51:48 +00:00
|
|
|
bool shuffled_bool = subgroupShuffle(true, 9u);
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
uint shuffled_xor = subgroupShuffleXor(30u, 8u);
|
2020-10-21 06:51:48 +00:00
|
|
|
bool shuffled_xor_bool = subgroupShuffleXor(false, 9u);
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
|
|
|
|
// shuffle relative
|
|
|
|
uint shuffled_up = subgroupShuffleUp(20u, 4u);
|
2020-10-21 06:51:48 +00:00
|
|
|
bool shuffled_up_bool = subgroupShuffleUp(true, 4u);
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
uint shuffled_down = subgroupShuffleDown(20u, 4u);
|
2020-10-21 06:51:48 +00:00
|
|
|
bool shuffled_down_bool = subgroupShuffleDown(false, 4u);
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
|
|
|
|
// vote
|
|
|
|
bool has_all = subgroupAll(true);
|
|
|
|
bool has_any = subgroupAny(true);
|
|
|
|
bool has_equal = subgroupAllEqual(0);
|
|
|
|
has_equal = subgroupAllEqual(true);
|
2020-10-21 04:04:43 +00:00
|
|
|
has_equal = subgroupAllEqual(vec3(0.0, 1.0, 2.0));
|
|
|
|
has_equal = subgroupAllEqual(bvec4(true, true, false, true));
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
|
|
|
|
// arithmetic
|
|
|
|
vec4 added = subgroupAdd(vec4(20.0));
|
|
|
|
ivec4 iadded = subgroupAdd(ivec4(20));
|
|
|
|
vec4 multiplied = subgroupMul(vec4(20.0));
|
|
|
|
ivec4 imultiplied = subgroupMul(ivec4(20));
|
|
|
|
vec4 lo = subgroupMin(vec4(20.0));
|
|
|
|
vec4 hi = subgroupMax(vec4(20.0));
|
|
|
|
ivec4 slo = subgroupMin(ivec4(20));
|
|
|
|
ivec4 shi = subgroupMax(ivec4(20));
|
|
|
|
uvec4 ulo = subgroupMin(uvec4(20));
|
|
|
|
uvec4 uhi = subgroupMax(uvec4(20));
|
|
|
|
uvec4 anded = subgroupAnd(ballot_value);
|
|
|
|
uvec4 ored = subgroupOr(ballot_value);
|
|
|
|
uvec4 xored = subgroupXor(ballot_value);
|
|
|
|
|
|
|
|
added = subgroupInclusiveAdd(added);
|
|
|
|
iadded = subgroupInclusiveAdd(iadded);
|
|
|
|
multiplied = subgroupInclusiveMul(multiplied);
|
|
|
|
imultiplied = subgroupInclusiveMul(imultiplied);
|
|
|
|
//lo = subgroupInclusiveMin(lo); // FIXME: Unsupported by Metal
|
|
|
|
//hi = subgroupInclusiveMax(hi);
|
|
|
|
//slo = subgroupInclusiveMin(slo);
|
|
|
|
//shi = subgroupInclusiveMax(shi);
|
|
|
|
//ulo = subgroupInclusiveMin(ulo);
|
|
|
|
//uhi = subgroupInclusiveMax(uhi);
|
|
|
|
//anded = subgroupInclusiveAnd(anded);
|
|
|
|
//ored = subgroupInclusiveOr(ored);
|
|
|
|
//xored = subgroupInclusiveXor(ored);
|
|
|
|
//added = subgroupExclusiveAdd(lo);
|
|
|
|
|
|
|
|
added = subgroupExclusiveAdd(multiplied);
|
|
|
|
multiplied = subgroupExclusiveMul(multiplied);
|
|
|
|
iadded = subgroupExclusiveAdd(imultiplied);
|
|
|
|
imultiplied = subgroupExclusiveMul(imultiplied);
|
|
|
|
//lo = subgroupExclusiveMin(lo); // FIXME: Unsupported by Metal
|
|
|
|
//hi = subgroupExclusiveMax(hi);
|
|
|
|
//ulo = subgroupExclusiveMin(ulo);
|
|
|
|
//uhi = subgroupExclusiveMax(uhi);
|
|
|
|
//slo = subgroupExclusiveMin(slo);
|
|
|
|
//shi = subgroupExclusiveMax(shi);
|
|
|
|
//anded = subgroupExclusiveAnd(anded);
|
|
|
|
//ored = subgroupExclusiveOr(ored);
|
|
|
|
//xored = subgroupExclusiveXor(ored);
|
|
|
|
|
|
|
|
// clustered
|
|
|
|
added = subgroupClusteredAdd(added, 4u);
|
|
|
|
multiplied = subgroupClusteredMul(multiplied, 4u);
|
|
|
|
iadded = subgroupClusteredAdd(iadded, 4u);
|
|
|
|
imultiplied = subgroupClusteredMul(imultiplied, 4u);
|
|
|
|
lo = subgroupClusteredMin(lo, 4u);
|
|
|
|
hi = subgroupClusteredMax(hi, 4u);
|
|
|
|
ulo = subgroupClusteredMin(ulo, 4u);
|
|
|
|
uhi = subgroupClusteredMax(uhi, 4u);
|
|
|
|
slo = subgroupClusteredMin(slo, 4u);
|
|
|
|
shi = subgroupClusteredMax(shi, 4u);
|
|
|
|
anded = subgroupClusteredAnd(anded, 4u);
|
|
|
|
ored = subgroupClusteredOr(ored, 4u);
|
|
|
|
xored = subgroupClusteredXor(xored, 4u);
|
|
|
|
|
|
|
|
// quad
|
|
|
|
vec4 swap_horiz = subgroupQuadSwapHorizontal(vec4(20.0));
|
2020-10-21 06:51:48 +00:00
|
|
|
bvec4 swap_horiz_bool = subgroupQuadSwapHorizontal(bvec4(true));
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
vec4 swap_vertical = subgroupQuadSwapVertical(vec4(20.0));
|
2020-10-21 06:51:48 +00:00
|
|
|
bvec4 swap_vertical_bool = subgroupQuadSwapVertical(bvec4(true));
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
vec4 swap_diagonal = subgroupQuadSwapDiagonal(vec4(20.0));
|
2020-10-21 06:51:48 +00:00
|
|
|
bvec4 swap_diagonal_bool = subgroupQuadSwapDiagonal(bvec4(true));
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
vec4 quad_broadcast = subgroupQuadBroadcast(vec4(20.0), 3u);
|
2020-10-21 06:51:48 +00:00
|
|
|
bvec4 quad_broadcast_bool = subgroupQuadBroadcast(bvec4(true), 3u);
|
MSL: Add support for subgroup operations.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
2019-05-15 21:03:30 +00:00
|
|
|
}
|