Using Task.Yield to overcome ThreadPool starvation while implementing producer/consumer pattern












3














Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:



  CancellationTokenSource cts;
void Start()
{
cts = new CancellationTokenSource();

// run async operation
var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
// wait for completion
// after the completion handle the result/ cancellation/ errors
}

async Task<int> SomeWork(CancellationToken cancellationToken)
{
int result = 0;

bool loopAgain = true;
while (loopAgain)
{
// do something ... means a substantial work or a micro batch here - not processing a single byte

loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
if (loopAgain) {
// reschedule the task to the threadpool and free this thread for other waiting tasks
await Task.Yield();
}
}
cancellationToken.ThrowIfCancellationRequested();
return result;
}

void Cancel()
{
// request cancelation
cts.Cancel();
}


But one user wrote




I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.




Anybody knows, why is not a good idea?










share|improve this question




















  • 2




    I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.
    – Lasse Vågsæther Karlsen
    Nov 12 at 13:33






  • 3




    I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios
    – Marc Gravell
    Nov 12 at 13:40






  • 1




    @MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…
    – Marc Gravell
    Nov 12 at 16:20






  • 2




    It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().
    – Hans Passant
    Nov 12 at 16:22






  • 1




    Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.
    – Hans Passant
    Nov 13 at 13:44
















3














Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:



  CancellationTokenSource cts;
void Start()
{
cts = new CancellationTokenSource();

// run async operation
var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
// wait for completion
// after the completion handle the result/ cancellation/ errors
}

async Task<int> SomeWork(CancellationToken cancellationToken)
{
int result = 0;

bool loopAgain = true;
while (loopAgain)
{
// do something ... means a substantial work or a micro batch here - not processing a single byte

loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
if (loopAgain) {
// reschedule the task to the threadpool and free this thread for other waiting tasks
await Task.Yield();
}
}
cancellationToken.ThrowIfCancellationRequested();
return result;
}

void Cancel()
{
// request cancelation
cts.Cancel();
}


But one user wrote




I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.




Anybody knows, why is not a good idea?










share|improve this question




















  • 2




    I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.
    – Lasse Vågsæther Karlsen
    Nov 12 at 13:33






  • 3




    I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios
    – Marc Gravell
    Nov 12 at 13:40






  • 1




    @MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…
    – Marc Gravell
    Nov 12 at 16:20






  • 2




    It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().
    – Hans Passant
    Nov 12 at 16:22






  • 1




    Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.
    – Hans Passant
    Nov 13 at 13:44














3












3








3







Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:



  CancellationTokenSource cts;
void Start()
{
cts = new CancellationTokenSource();

// run async operation
var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
// wait for completion
// after the completion handle the result/ cancellation/ errors
}

async Task<int> SomeWork(CancellationToken cancellationToken)
{
int result = 0;

bool loopAgain = true;
while (loopAgain)
{
// do something ... means a substantial work or a micro batch here - not processing a single byte

loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
if (loopAgain) {
// reschedule the task to the threadpool and free this thread for other waiting tasks
await Task.Yield();
}
}
cancellationToken.ThrowIfCancellationRequested();
return result;
}

void Cancel()
{
// request cancelation
cts.Cancel();
}


But one user wrote




I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.




Anybody knows, why is not a good idea?










share|improve this question















Answering the question: Task.Yield - real usages?
I proposed to use Task.Yield allowing a pool thread to be reused by other tasks. In such pattern:



  CancellationTokenSource cts;
void Start()
{
cts = new CancellationTokenSource();

// run async operation
var task = Task.Run(() => SomeWork(cts.Token), cts.Token);
// wait for completion
// after the completion handle the result/ cancellation/ errors
}

async Task<int> SomeWork(CancellationToken cancellationToken)
{
int result = 0;

bool loopAgain = true;
while (loopAgain)
{
// do something ... means a substantial work or a micro batch here - not processing a single byte

loopAgain = /* check for loop end && */ cancellationToken.IsCancellationRequested;
if (loopAgain) {
// reschedule the task to the threadpool and free this thread for other waiting tasks
await Task.Yield();
}
}
cancellationToken.ThrowIfCancellationRequested();
return result;
}

void Cancel()
{
// request cancelation
cts.Cancel();
}


But one user wrote




I don't think using Task.Yield to overcome ThreadPool starvation while
implementing producer/consumer pattern is a good idea. I suggest you
ask a separate question if you want to go into details as to why.




Anybody knows, why is not a good idea?







c# multithreading async-await task-parallel-library threadpool






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 13 at 7:58

























asked Nov 12 at 13:30









Maxim T

947




947








  • 2




    I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.
    – Lasse Vågsæther Karlsen
    Nov 12 at 13:33






  • 3




    I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios
    – Marc Gravell
    Nov 12 at 13:40






  • 1




    @MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…
    – Marc Gravell
    Nov 12 at 16:20






  • 2




    It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().
    – Hans Passant
    Nov 12 at 16:22






  • 1




    Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.
    – Hans Passant
    Nov 13 at 13:44














  • 2




    I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.
    – Lasse Vågsæther Karlsen
    Nov 12 at 13:33






  • 3




    I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios
    – Marc Gravell
    Nov 12 at 13:40






  • 1




    @MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…
    – Marc Gravell
    Nov 12 at 16:20






  • 2




    It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().
    – Hans Passant
    Nov 12 at 16:22






  • 1




    Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.
    – Hans Passant
    Nov 13 at 13:44








2




2




I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.
– Lasse Vågsæther Karlsen
Nov 12 at 13:33




I have no conclusive idea about the original commenters motivation, but you should try to avoid having a busy loop waiting for data to arrive, instead you should use a mechanism which allows you to trigger the processing.
– Lasse Vågsæther Karlsen
Nov 12 at 13:33




3




3




I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios
– Marc Gravell
Nov 12 at 13:40




I'd argue that hot loops are bad with or without adding async to the mix - I'd forgive it a lot more if it was await Task.Delay(50) or something, but: it would be even better to use an async activation rather than checking in this way; there is the new "channels" API, for example (nuget.org/packages/System.Threading.Channels) - which is designed for async producer/consumer scenarios
– Marc Gravell
Nov 12 at 13:40




1




1




@MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…
– Marc Gravell
Nov 12 at 16:20




@MaximT indeed - it is what I'm using for ordered message queues in SE.Redis: github.com/StackExchange/StackExchange.Redis/blob/master/src/…
– Marc Gravell
Nov 12 at 16:20




2




2




It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().
– Hans Passant
Nov 12 at 16:22




It is the exact opposite of what the Threadpool manager tries to do. It makes an effort to limit the number of active tp threads to the ideal number in order to cut down on the context switching overhead. When you use Task.Yield then you add context switching overhead. If you have too many tp threads that don't execute code efficiently (blocking too much) then use SetMinThreads().
– Hans Passant
Nov 12 at 16:22




1




1




Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.
– Hans Passant
Nov 13 at 13:44




Well, of course that's the way it must work. No amount of affordable money is going to buy you a machine with a thousand processor cores. You can't slam the threadpool with a that many jobs to do and expect instant magic. These are important details that belong in the question btw.
– Hans Passant
Nov 13 at 13:44












2 Answers
2






active

oldest

votes


















1














There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.



Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).



That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.



Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.



There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).






share|improve this answer



















  • 1




    i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.
    – Maxim T
    Nov 13 at 3:49












  • The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.
    – Maxim T
    Nov 13 at 3:53












  • @MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).
    – noseratio
    Nov 13 at 4:20










  • I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).
    – Maxim T
    Nov 13 at 5:04










  • [The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.
    – Maxim T
    Nov 13 at 7:10





















0














After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.



But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.



Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.



The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.



[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).



In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.



For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).



For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.



private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}





share|improve this answer























  • The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…
    – Maxim T
    Nov 14 at 12:48










  • It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.
    – Maxim T
    Nov 17 at 12:54












  • I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.
    – Maxim T
    Nov 17 at 18:15













Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263258%2fusing-task-yield-to-overcome-threadpool-starvation-while-implementing-producer-c%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.



Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).



That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.



Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.



There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).






share|improve this answer



















  • 1




    i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.
    – Maxim T
    Nov 13 at 3:49












  • The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.
    – Maxim T
    Nov 13 at 3:53












  • @MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).
    – noseratio
    Nov 13 at 4:20










  • I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).
    – Maxim T
    Nov 13 at 5:04










  • [The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.
    – Maxim T
    Nov 13 at 7:10


















1














There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.



Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).



That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.



Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.



There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).






share|improve this answer



















  • 1




    i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.
    – Maxim T
    Nov 13 at 3:49












  • The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.
    – Maxim T
    Nov 13 at 3:53












  • @MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).
    – noseratio
    Nov 13 at 4:20










  • I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).
    – Maxim T
    Nov 13 at 5:04










  • [The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.
    – Maxim T
    Nov 13 at 7:10
















1












1








1






There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.



Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).



That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.



Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.



There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).






share|improve this answer














There are some good points left in the comments to your question. Being the user you quoted, I'd just like to sum it up: use the right tool for the job.



Using ThreadPool doesn't feel like the right tool for executing multiple continuous CPU-bound tasks, even if you try to organize some cooperative execution by turning them into state machines which yield CPU time to each other with await Task.Yield(). Thread switching is rather expensive; by doing await Task.Yield() on a tight loop you add a significant overhead. Besides, you should never take over the whole ThreadPool, as the .NET framework (and the underlying OS process) may need it for other things. On a related note, TPL even has the TaskCreationOptions.LongRunning option that requests to not run the task on a ThreadPool thread (rather, it creates a normal thread with new Thread() behind the scene).



That said, using a custom TaskScheduler with limited parallelism on some dedicated, out-of-pool threads with thread affinity for individual long-running tasks might be a different thing. At least, await continuations would be posted on the same thread, which should help reducing the switching overhead. This reminds me of a different problem I was trying to solve a while ago with ThreadAffinityTaskScheduler.



Still, depending on a particular scenario, it's usually better to use an existing well-established and tested tool. To name a few: Parallel Class, TPL Dataflow, System.Threading.Channels, Reactive Extensions.



There is also a whole range of existing industrial-strength solutions to deal with Publish-Subscribe pattern (RabbitMQ, PubNub, Redis, Azure Service Bus, Firebase Cloud Messaging (FCM), Amazon Simple Queue Service (SQS) etc).







share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 13 at 2:03

























answered Nov 12 at 23:29









noseratio

45.6k13120317




45.6k13120317








  • 1




    i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.
    – Maxim T
    Nov 13 at 3:49












  • The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.
    – Maxim T
    Nov 13 at 3:53












  • @MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).
    – noseratio
    Nov 13 at 4:20










  • I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).
    – Maxim T
    Nov 13 at 5:04










  • [The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.
    – Maxim T
    Nov 13 at 7:10
















  • 1




    i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.
    – Maxim T
    Nov 13 at 3:49












  • The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.
    – Maxim T
    Nov 13 at 3:53












  • @MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).
    – noseratio
    Nov 13 at 4:20










  • I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).
    – Maxim T
    Nov 13 at 5:04










  • [The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.
    – Maxim T
    Nov 13 at 7:10










1




1




i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.
– Maxim T
Nov 13 at 3:49






i know about all the well established solutions - i used some of them myself. In terms of performance, Kafka is the best in this range, and NATS as well. But the performance gains are usually at the expense of the reliability. For reliable message processing it is needed to read them from a durable store and don't buffer them in memory. And the tasks are usually not that simple as to process a single byte, but usually take some milliseconds. I usually use WorkStealingTaskScheduler for long CPU bound tasks (it has a pool of custom threads). So it all depends on the context where it used.
– Maxim T
Nov 13 at 3:49














The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.
– Maxim T
Nov 13 at 3:53






The key in your answer - don't use Task.Yield in a tight loop. I agree 100%. But if the time taken to process each iteration exceeds additional ThreadPool.QueueUserWorkItem then the performance decrease is negligible, but increases responsiveness and tasks cooperation. By the way - it easy to test with a small custom setup. In a tight loop a decrease around 100%, but if some job (around several milliseconds) is done in each iteration - then the decrease is less than 10%.
– Maxim T
Nov 13 at 3:53














@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).
– noseratio
Nov 13 at 4:20




@MaximT, in any case I wouldn't overload the default ThreadPool with a million of small computational tasks. But let's say you have a custom pool, e.g. you created one with WorkStealingTaskScheduler. It would be interesting to see the actual benchmarks. E.g., having 10000 tasks each calculating the first 10000 digits of Pi number. Then compare it to ThreadPoolTaskScheduler (make sure to fix the number of threads with SetMinThreads/SetMaxThreads). Then compare it to a task scheduler with actual thread affinity (AFAIR, WorkStealingTaskScheduler isn't affine for await continuations).
– noseratio
Nov 13 at 4:20












I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).
– Maxim T
Nov 13 at 5:04




I my experience the performance killer is not a context switching, but very often context switching. If the job is very short as calculating PI numbers, then batching 10000 iterations into one ThreadPool.QueueUserWorkItem will solve the problem. the WorkStealingTaskScheduler is used for longer CPU bound synchronous tasks without async await in them. Look at github.com/BBGONE/REBUS-TaskCoordinator/blob/master/Showdown/… how the mixed job is split into subtasks (HandleLongRunMessage method).
– Maxim T
Nov 13 at 5:04












[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.
– Maxim T
Nov 13 at 7:10






[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]). The shorter the job, the bigger the price. For too long jobs - the thread pool is not a solution anyway. The drawback of the ThreadAffinityTaskScheduler is that it is not portable to the NET.Core - it is platform dependent. And this problem could be solved with microbatching - the batch will be processed on a single thread.
– Maxim T
Nov 13 at 7:10















0














After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.



But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.



Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.



The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.



[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).



In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.



For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).



For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.



private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}





share|improve this answer























  • The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…
    – Maxim T
    Nov 14 at 12:48










  • It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.
    – Maxim T
    Nov 17 at 12:54












  • I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.
    – Maxim T
    Nov 17 at 18:15


















0














After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.



But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.



Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.



The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.



[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).



In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.



For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).



For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.



private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}





share|improve this answer























  • The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…
    – Maxim T
    Nov 14 at 12:48










  • It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.
    – Maxim T
    Nov 17 at 12:54












  • I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.
    – Maxim T
    Nov 17 at 18:15
















0












0








0






After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.



But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.



Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.



The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.



[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).



In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.



For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).



For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.



private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}





share|improve this answer














After a bit of debating on the issue with other users - who are worried about the context switching and its influence on the performance.
I see what they are worried about.



But I meant: do something ... inside the loop to be a substantial task - usually in the form of a message handler which reads a message from the queue and processes it. The message handlers are usually user defined and the message bus executes them using some sort of dispatcher. The user can implement a handler which executes synchronously (nobody knows what the user will do), and without Task.Yield that will block the thread to process those synchronous tasks in a loop.



Not to be empty worded i added tests to github: https://github.com/BBGONE/TestThreadAffinity
They compare the ThreadAffinityTaskScheduler, .NET ThreadScheduler with BlockingCollection and .NET ThreadScheduler with Threading.Channels.



The tests show that for Ultra Short jobs the performance degradation is
around 15%. To use the Task.Yield without the performance degradation (even small) - it is not to use extremely short tasks and if the task is too short then combine shorter tasks into a bigger batch.



[The price of context switch] = [context switch duration] / ([job duration]+[context switch duration]).



In that case the influence of the switching the tasks is negligible on the performance. But it adds a better task cooperation and responsiveness of the system.



For long running tasks it is better to use a custom Scheduler which executes tasks on its own dedicated thread pool - (like the WorkStealingTaskScheduler).



For the mixed jobs - which can contain different parts - short running CPU bound, asynchronous and long running code parts. It is better to split the task into subtasks.



private async Task HandleLongRunMessage(TestMessage message, CancellationToken token = default(CancellationToken))
{
// SHORT SYNCHRONOUS TASK - execute as is on the default thread (from thread pool)
CPU_TASK(message, 50);
// IO BOUND ASYNCH TASK - used as is
await Task.Delay(50);
// BUT WRAP the LONG SYNCHRONOUS TASK inside the Task
// which is scheduled on the custom thread pool
// (to save threadpool threads)
await Task.Factory.StartNew(() => {
CPU_TASK(message, 100000);
}, token, TaskCreationOptions.DenyChildAttach, _workStealingTaskScheduler);
}






share|improve this answer














share|improve this answer



share|improve this answer








edited Nov 14 at 16:14

























answered Nov 13 at 8:14









Maxim T

947




947












  • The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…
    – Maxim T
    Nov 14 at 12:48










  • It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.
    – Maxim T
    Nov 17 at 12:54












  • I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.
    – Maxim T
    Nov 17 at 18:15




















  • The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…
    – Maxim T
    Nov 14 at 12:48










  • It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.
    – Maxim T
    Nov 17 at 12:54












  • I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.
    – Maxim T
    Nov 17 at 18:15


















The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…
– Maxim T
Nov 14 at 12:48




The info on the channels: github.com/stephentoub/corefxlab/blob/master/src/…
– Maxim T
Nov 14 at 12:48












It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.
– Maxim T
Nov 17 at 12:54






It seems i figured it out. Although, the performance is the same in all usages, the channels have the benefit of nonblocking wait for when the channel can be written. This happens in the bounded scenarios. The BlockingCollection blocks the thread - not allowing to write, the Channels leave the thread to be used by others.
– Maxim T
Nov 17 at 12:54














I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.
– Maxim T
Nov 17 at 18:15






I ported the test for the Threading.Channels to CoreFX (instead of full Net Framework) - it started to work 2,5 times faster. Now it is above 1 million messages per sec on my comp. I added this solution to the test. They are really good.
– Maxim T
Nov 17 at 18:15




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53263258%2fusing-task-yield-to-overcome-threadpool-starvation-while-implementing-producer-c%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python