Understanding Dask's Task Stream












1















I'm running dask locally using the distributed scheduler on my machine with 8 cores. On initialization I see:



enter image description here



Which looks correct, but I'm confused by the task stream in the diagnostics (shown below):
enter image description here



I was expecting 8 rows corresponding to the 8 workers/cores, is that incorrect?



Thanks



AJ



I've added the code I'm running:



import dask.dataframe as dd
from dask.distributed import Client, progress
client = Client()
progress(client)

# load datasets
trd = (dd.read_csv('trade_201811*.csv', compression='gzip',
blocksize=None, dtype={'Notional': 'float64'})
.assign(timestamp=lambda x: dd.to_datetime(x.timestamp.str.replace('D', 'T')))
.set_index('timestamp', sorted=True))









share|improve this question





























    1















    I'm running dask locally using the distributed scheduler on my machine with 8 cores. On initialization I see:



    enter image description here



    Which looks correct, but I'm confused by the task stream in the diagnostics (shown below):
    enter image description here



    I was expecting 8 rows corresponding to the 8 workers/cores, is that incorrect?



    Thanks



    AJ



    I've added the code I'm running:



    import dask.dataframe as dd
    from dask.distributed import Client, progress
    client = Client()
    progress(client)

    # load datasets
    trd = (dd.read_csv('trade_201811*.csv', compression='gzip',
    blocksize=None, dtype={'Notional': 'float64'})
    .assign(timestamp=lambda x: dd.to_datetime(x.timestamp.str.replace('D', 'T')))
    .set_index('timestamp', sorted=True))









    share|improve this question



























      1












      1








      1








      I'm running dask locally using the distributed scheduler on my machine with 8 cores. On initialization I see:



      enter image description here



      Which looks correct, but I'm confused by the task stream in the diagnostics (shown below):
      enter image description here



      I was expecting 8 rows corresponding to the 8 workers/cores, is that incorrect?



      Thanks



      AJ



      I've added the code I'm running:



      import dask.dataframe as dd
      from dask.distributed import Client, progress
      client = Client()
      progress(client)

      # load datasets
      trd = (dd.read_csv('trade_201811*.csv', compression='gzip',
      blocksize=None, dtype={'Notional': 'float64'})
      .assign(timestamp=lambda x: dd.to_datetime(x.timestamp.str.replace('D', 'T')))
      .set_index('timestamp', sorted=True))









      share|improve this question
















      I'm running dask locally using the distributed scheduler on my machine with 8 cores. On initialization I see:



      enter image description here



      Which looks correct, but I'm confused by the task stream in the diagnostics (shown below):
      enter image description here



      I was expecting 8 rows corresponding to the 8 workers/cores, is that incorrect?



      Thanks



      AJ



      I've added the code I'm running:



      import dask.dataframe as dd
      from dask.distributed import Client, progress
      client = Client()
      progress(client)

      # load datasets
      trd = (dd.read_csv('trade_201811*.csv', compression='gzip',
      blocksize=None, dtype={'Notional': 'float64'})
      .assign(timestamp=lambda x: dd.to_datetime(x.timestamp.str.replace('D', 'T')))
      .set_index('timestamp', sorted=True))






      python-3.x dask






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 15 '18 at 16:15







      Andy Johnson

















      asked Nov 14 '18 at 13:33









      Andy JohnsonAndy Johnson

      757




      757
























          1 Answer
          1






          active

          oldest

          votes


















          1














          Each line corresponds to a single thread. Some more sophisticated Dask operations will start up additional threads, this happens particularly when tasks launch other tasks, which is common especially in machine learning workloads.



          My guess is that you're using one of the following approaches:





          • dask.distributed.get_client or dask.distributed.worker_client

          • Scikit-Learn's Joblib

          • Dask-ML


          If so, the behavior that you're seeing is normal. The task stream plot will look a little odd, yes, but hopefully it is still interpretable.






          share|improve this answer
























          • Ok, makes sense although as I'm running on an 8 core machine with 8 workers, I just assumed any subtasks/procs/threads would appear on one of the existing 8 rows.

            – Andy Johnson
            Nov 15 '18 at 16:25













          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301463%2funderstanding-dasks-task-stream%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1














          Each line corresponds to a single thread. Some more sophisticated Dask operations will start up additional threads, this happens particularly when tasks launch other tasks, which is common especially in machine learning workloads.



          My guess is that you're using one of the following approaches:





          • dask.distributed.get_client or dask.distributed.worker_client

          • Scikit-Learn's Joblib

          • Dask-ML


          If so, the behavior that you're seeing is normal. The task stream plot will look a little odd, yes, but hopefully it is still interpretable.






          share|improve this answer
























          • Ok, makes sense although as I'm running on an 8 core machine with 8 workers, I just assumed any subtasks/procs/threads would appear on one of the existing 8 rows.

            – Andy Johnson
            Nov 15 '18 at 16:25


















          1














          Each line corresponds to a single thread. Some more sophisticated Dask operations will start up additional threads, this happens particularly when tasks launch other tasks, which is common especially in machine learning workloads.



          My guess is that you're using one of the following approaches:





          • dask.distributed.get_client or dask.distributed.worker_client

          • Scikit-Learn's Joblib

          • Dask-ML


          If so, the behavior that you're seeing is normal. The task stream plot will look a little odd, yes, but hopefully it is still interpretable.






          share|improve this answer
























          • Ok, makes sense although as I'm running on an 8 core machine with 8 workers, I just assumed any subtasks/procs/threads would appear on one of the existing 8 rows.

            – Andy Johnson
            Nov 15 '18 at 16:25
















          1












          1








          1







          Each line corresponds to a single thread. Some more sophisticated Dask operations will start up additional threads, this happens particularly when tasks launch other tasks, which is common especially in machine learning workloads.



          My guess is that you're using one of the following approaches:





          • dask.distributed.get_client or dask.distributed.worker_client

          • Scikit-Learn's Joblib

          • Dask-ML


          If so, the behavior that you're seeing is normal. The task stream plot will look a little odd, yes, but hopefully it is still interpretable.






          share|improve this answer













          Each line corresponds to a single thread. Some more sophisticated Dask operations will start up additional threads, this happens particularly when tasks launch other tasks, which is common especially in machine learning workloads.



          My guess is that you're using one of the following approaches:





          • dask.distributed.get_client or dask.distributed.worker_client

          • Scikit-Learn's Joblib

          • Dask-ML


          If so, the behavior that you're seeing is normal. The task stream plot will look a little odd, yes, but hopefully it is still interpretable.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Nov 14 '18 at 16:59









          MRocklinMRocklin

          25.9k1468127




          25.9k1468127













          • Ok, makes sense although as I'm running on an 8 core machine with 8 workers, I just assumed any subtasks/procs/threads would appear on one of the existing 8 rows.

            – Andy Johnson
            Nov 15 '18 at 16:25





















          • Ok, makes sense although as I'm running on an 8 core machine with 8 workers, I just assumed any subtasks/procs/threads would appear on one of the existing 8 rows.

            – Andy Johnson
            Nov 15 '18 at 16:25



















          Ok, makes sense although as I'm running on an 8 core machine with 8 workers, I just assumed any subtasks/procs/threads would appear on one of the existing 8 rows.

          – Andy Johnson
          Nov 15 '18 at 16:25







          Ok, makes sense although as I'm running on an 8 core machine with 8 workers, I just assumed any subtasks/procs/threads would appear on one of the existing 8 rows.

          – Andy Johnson
          Nov 15 '18 at 16:25






















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53301463%2funderstanding-dasks-task-stream%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Xamarin.iOS Cant Deploy on Iphone

          Glorious Revolution

          Dulmage-Mendelsohn matrix decomposition in Python