How to “allocate” memory atomically in a GLSL SSBO?
up vote
0
down vote
favorite
I am building a tree structure inside of a buffer in glsl.
Each node in the tree has 8 integers which determine the location of the 8 children of the node in the array.
So for example:
tree[0].children = [1,5,3,9,100000,7,100,10]
tree[1].children = [2,3,4,900,8,11,20,13]
Are potential possible outcomes.
The way a single thread "allocates" memory is. The buffer has a global index, initialized to 0.
The current thread checks whether the one of the children of the current node is 0, if it is, it increases the index by 1 and sets the value of the child to the previous value of index.
In other words:
if(tree[node].children[current_child] == 0) {
tree[node].children[current_child] = tree_index++;
}
This quite clearly creates some awful race conditions, as the value of the child and the index may be changed by other threads while the above is executing.
My solution to this problem was to create a busy loop mutex to lock all threads attempting to modify the child node when one thread is allocating it. It looks like this:
int mrun = 100;
while ((tree[0].children[child] <= 0) && (mrun > 0))
{
mrun--;
//If the node wasn't allocated, allocate its memory (which also releases the lock)
if( (atomicCompSwap( tree[0].children[child] , 0 , -1) == 0 ))
{
tree[0].children[child] = int(atomicAdd(t_index, 1));
}
}
This works, but has multiple caveats. First, the while loop may run too much, so I had to limit the number of times it executes, which causes the loss of some values (there are multiple repeated values so this is mostly fine). It also forces me to waste computation cycles waiting and checking. And finally, while loop in shaders are not exactly ideal.
I attempted to fix the problem by doing:
atomicCompSwap(tree[0].children[child], 0, atomicAdd(t_index, 1));
However this does not have the desired effect, those operations are not running atomically simultaneously, so I end up doing wrong allocations.
Is there a way I can do this allocation more efficiently? Maybe in multiple passes?
multithreading opengl glsl gpu mutex
add a comment |
up vote
0
down vote
favorite
I am building a tree structure inside of a buffer in glsl.
Each node in the tree has 8 integers which determine the location of the 8 children of the node in the array.
So for example:
tree[0].children = [1,5,3,9,100000,7,100,10]
tree[1].children = [2,3,4,900,8,11,20,13]
Are potential possible outcomes.
The way a single thread "allocates" memory is. The buffer has a global index, initialized to 0.
The current thread checks whether the one of the children of the current node is 0, if it is, it increases the index by 1 and sets the value of the child to the previous value of index.
In other words:
if(tree[node].children[current_child] == 0) {
tree[node].children[current_child] = tree_index++;
}
This quite clearly creates some awful race conditions, as the value of the child and the index may be changed by other threads while the above is executing.
My solution to this problem was to create a busy loop mutex to lock all threads attempting to modify the child node when one thread is allocating it. It looks like this:
int mrun = 100;
while ((tree[0].children[child] <= 0) && (mrun > 0))
{
mrun--;
//If the node wasn't allocated, allocate its memory (which also releases the lock)
if( (atomicCompSwap( tree[0].children[child] , 0 , -1) == 0 ))
{
tree[0].children[child] = int(atomicAdd(t_index, 1));
}
}
This works, but has multiple caveats. First, the while loop may run too much, so I had to limit the number of times it executes, which causes the loss of some values (there are multiple repeated values so this is mostly fine). It also forces me to waste computation cycles waiting and checking. And finally, while loop in shaders are not exactly ideal.
I attempted to fix the problem by doing:
atomicCompSwap(tree[0].children[child], 0, atomicAdd(t_index, 1));
However this does not have the desired effect, those operations are not running atomically simultaneously, so I end up doing wrong allocations.
Is there a way I can do this allocation more efficiently? Maybe in multiple passes?
multithreading opengl glsl gpu mutex
add a comment |
up vote
0
down vote
favorite
up vote
0
down vote
favorite
I am building a tree structure inside of a buffer in glsl.
Each node in the tree has 8 integers which determine the location of the 8 children of the node in the array.
So for example:
tree[0].children = [1,5,3,9,100000,7,100,10]
tree[1].children = [2,3,4,900,8,11,20,13]
Are potential possible outcomes.
The way a single thread "allocates" memory is. The buffer has a global index, initialized to 0.
The current thread checks whether the one of the children of the current node is 0, if it is, it increases the index by 1 and sets the value of the child to the previous value of index.
In other words:
if(tree[node].children[current_child] == 0) {
tree[node].children[current_child] = tree_index++;
}
This quite clearly creates some awful race conditions, as the value of the child and the index may be changed by other threads while the above is executing.
My solution to this problem was to create a busy loop mutex to lock all threads attempting to modify the child node when one thread is allocating it. It looks like this:
int mrun = 100;
while ((tree[0].children[child] <= 0) && (mrun > 0))
{
mrun--;
//If the node wasn't allocated, allocate its memory (which also releases the lock)
if( (atomicCompSwap( tree[0].children[child] , 0 , -1) == 0 ))
{
tree[0].children[child] = int(atomicAdd(t_index, 1));
}
}
This works, but has multiple caveats. First, the while loop may run too much, so I had to limit the number of times it executes, which causes the loss of some values (there are multiple repeated values so this is mostly fine). It also forces me to waste computation cycles waiting and checking. And finally, while loop in shaders are not exactly ideal.
I attempted to fix the problem by doing:
atomicCompSwap(tree[0].children[child], 0, atomicAdd(t_index, 1));
However this does not have the desired effect, those operations are not running atomically simultaneously, so I end up doing wrong allocations.
Is there a way I can do this allocation more efficiently? Maybe in multiple passes?
multithreading opengl glsl gpu mutex
I am building a tree structure inside of a buffer in glsl.
Each node in the tree has 8 integers which determine the location of the 8 children of the node in the array.
So for example:
tree[0].children = [1,5,3,9,100000,7,100,10]
tree[1].children = [2,3,4,900,8,11,20,13]
Are potential possible outcomes.
The way a single thread "allocates" memory is. The buffer has a global index, initialized to 0.
The current thread checks whether the one of the children of the current node is 0, if it is, it increases the index by 1 and sets the value of the child to the previous value of index.
In other words:
if(tree[node].children[current_child] == 0) {
tree[node].children[current_child] = tree_index++;
}
This quite clearly creates some awful race conditions, as the value of the child and the index may be changed by other threads while the above is executing.
My solution to this problem was to create a busy loop mutex to lock all threads attempting to modify the child node when one thread is allocating it. It looks like this:
int mrun = 100;
while ((tree[0].children[child] <= 0) && (mrun > 0))
{
mrun--;
//If the node wasn't allocated, allocate its memory (which also releases the lock)
if( (atomicCompSwap( tree[0].children[child] , 0 , -1) == 0 ))
{
tree[0].children[child] = int(atomicAdd(t_index, 1));
}
}
This works, but has multiple caveats. First, the while loop may run too much, so I had to limit the number of times it executes, which causes the loss of some values (there are multiple repeated values so this is mostly fine). It also forces me to waste computation cycles waiting and checking. And finally, while loop in shaders are not exactly ideal.
I attempted to fix the problem by doing:
atomicCompSwap(tree[0].children[child], 0, atomicAdd(t_index, 1));
However this does not have the desired effect, those operations are not running atomically simultaneously, so I end up doing wrong allocations.
Is there a way I can do this allocation more efficiently? Maybe in multiple passes?
multithreading opengl glsl gpu mutex
multithreading opengl glsl gpu mutex
edited Nov 10 at 19:23
asked Nov 10 at 17:32
Makogan
1,5541728
1,5541728
add a comment |
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53241606%2fhow-to-allocate-memory-atomically-in-a-glsl-ssbo%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown