Overhead of multiple synchronized on the same object












-3















Consider this code:



void A() {
synchronized (obj) {
for (int i = 0; i < 1000; i++) {
B();
}
}
}

void B() {
synchronized (obj) {
// Do something
}
}


How much will be the overhead of "synchronized" in calling A? Will it be close to the overhead of only one "synchronized"?










share|improve this question

























  • What do you mean with "overhead"? Memory consumption? Runtime? Something else?

    – Korashen
    Nov 15 '18 at 16:26











  • I mean Runtime!

    – Shayan
    Nov 15 '18 at 16:27











  • etutorials.org/Programming/Java+performance+tuning/…

    – fantaghirocco
    Nov 15 '18 at 16:27











  • It will be more, as synchronized is getting called 1000 times more than in the single call use case. So the VM has to put 1000 additional lock tokens on obj.

    – Korashen
    Nov 15 '18 at 16:28













  • @Korashen Won't it understand that it has already acquired the lock?

    – Shayan
    Nov 15 '18 at 16:29
















-3















Consider this code:



void A() {
synchronized (obj) {
for (int i = 0; i < 1000; i++) {
B();
}
}
}

void B() {
synchronized (obj) {
// Do something
}
}


How much will be the overhead of "synchronized" in calling A? Will it be close to the overhead of only one "synchronized"?










share|improve this question

























  • What do you mean with "overhead"? Memory consumption? Runtime? Something else?

    – Korashen
    Nov 15 '18 at 16:26











  • I mean Runtime!

    – Shayan
    Nov 15 '18 at 16:27











  • etutorials.org/Programming/Java+performance+tuning/…

    – fantaghirocco
    Nov 15 '18 at 16:27











  • It will be more, as synchronized is getting called 1000 times more than in the single call use case. So the VM has to put 1000 additional lock tokens on obj.

    – Korashen
    Nov 15 '18 at 16:28













  • @Korashen Won't it understand that it has already acquired the lock?

    – Shayan
    Nov 15 '18 at 16:29














-3












-3








-3








Consider this code:



void A() {
synchronized (obj) {
for (int i = 0; i < 1000; i++) {
B();
}
}
}

void B() {
synchronized (obj) {
// Do something
}
}


How much will be the overhead of "synchronized" in calling A? Will it be close to the overhead of only one "synchronized"?










share|improve this question
















Consider this code:



void A() {
synchronized (obj) {
for (int i = 0; i < 1000; i++) {
B();
}
}
}

void B() {
synchronized (obj) {
// Do something
}
}


How much will be the overhead of "synchronized" in calling A? Will it be close to the overhead of only one "synchronized"?







java synchronized






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 15 '18 at 16:31







Shayan

















asked Nov 15 '18 at 16:23









ShayanShayan

1,21352647




1,21352647













  • What do you mean with "overhead"? Memory consumption? Runtime? Something else?

    – Korashen
    Nov 15 '18 at 16:26











  • I mean Runtime!

    – Shayan
    Nov 15 '18 at 16:27











  • etutorials.org/Programming/Java+performance+tuning/…

    – fantaghirocco
    Nov 15 '18 at 16:27











  • It will be more, as synchronized is getting called 1000 times more than in the single call use case. So the VM has to put 1000 additional lock tokens on obj.

    – Korashen
    Nov 15 '18 at 16:28













  • @Korashen Won't it understand that it has already acquired the lock?

    – Shayan
    Nov 15 '18 at 16:29



















  • What do you mean with "overhead"? Memory consumption? Runtime? Something else?

    – Korashen
    Nov 15 '18 at 16:26











  • I mean Runtime!

    – Shayan
    Nov 15 '18 at 16:27











  • etutorials.org/Programming/Java+performance+tuning/…

    – fantaghirocco
    Nov 15 '18 at 16:27











  • It will be more, as synchronized is getting called 1000 times more than in the single call use case. So the VM has to put 1000 additional lock tokens on obj.

    – Korashen
    Nov 15 '18 at 16:28













  • @Korashen Won't it understand that it has already acquired the lock?

    – Shayan
    Nov 15 '18 at 16:29

















What do you mean with "overhead"? Memory consumption? Runtime? Something else?

– Korashen
Nov 15 '18 at 16:26





What do you mean with "overhead"? Memory consumption? Runtime? Something else?

– Korashen
Nov 15 '18 at 16:26













I mean Runtime!

– Shayan
Nov 15 '18 at 16:27





I mean Runtime!

– Shayan
Nov 15 '18 at 16:27













etutorials.org/Programming/Java+performance+tuning/…

– fantaghirocco
Nov 15 '18 at 16:27





etutorials.org/Programming/Java+performance+tuning/…

– fantaghirocco
Nov 15 '18 at 16:27













It will be more, as synchronized is getting called 1000 times more than in the single call use case. So the VM has to put 1000 additional lock tokens on obj.

– Korashen
Nov 15 '18 at 16:28







It will be more, as synchronized is getting called 1000 times more than in the single call use case. So the VM has to put 1000 additional lock tokens on obj.

– Korashen
Nov 15 '18 at 16:28















@Korashen Won't it understand that it has already acquired the lock?

– Shayan
Nov 15 '18 at 16:29





@Korashen Won't it understand that it has already acquired the lock?

– Shayan
Nov 15 '18 at 16:29












2 Answers
2






active

oldest

votes


















1














The answer to this (legitimate) question depends on the OS, hardware and specific VM implementation.



Putting aside the cost of a function call, it may cost near to nothing on one OS/architecture (consider modern processor/OS/VM) and much more on another (consider purely software processor emulation). On a single green thread VM it may cost near to zero (except the call overhead). The cost will differ even between ARM and Intel of a comparable power.



synchronized() is usually implemented inside a VM by using OS synchronization primitives, with some heuristics to speed up common cases. OS, in turn, uses hardware instructions and heuristics to perform this task. Usually, subsequent acquisition of an already acquired synchronization primitive is exceptionally efficient in an OS and is very efficient on a typical production grade VM.



On modern Windows/Linux VM and Intel/AMD processor, usually, it doesn't cost a lot of CPU cycles (assuming otherwise idle machine) and is in the low nanoseconds range.



Note, in general, it is a very complex topic. Multiple layers of software, hardware (and the effect of other tasks running on the same hardware resources) are involved. Rigorous research of even a small sub-topic here can compose multiple Ph.D. thesis.



In practice, though, my advice is to assume the cost of a second synchronized in small loops to be zero unless you encounter a particular bottleneck (which is quite unlikely).



If there is a large number of iterations, it definitely will increase the cost vs single synchronized, and the overall effect depends on what you are doing inside the loop. Usually, there is some work in each iteration making the relative overhead negligible. But for some cases, it may prevent loop optimization and add a substantial overhead (substantial comparing to single synchronized, not as a practical measure). However, in common practical cases of huge loops, one should think about different design and avoid performing the outer synchronized to reduce lock contention.



To get a sense about VM implementation you may look, for example, into the Synchronization section of this paper. It is a bit outdated but is straightforward to understand.






share|improve this answer

































    0














    synchronized locks are re-entrant and acquiring a lock when a thread is already holding a lock is a) the time to check it already hols the lock, b) the time to increment a counter and later decrement it.



    The first one takes the longest and adds about 10 - 50 ns each time.






    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53323776%2foverhead-of-multiple-synchronized-on-the-same-object%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      1














      The answer to this (legitimate) question depends on the OS, hardware and specific VM implementation.



      Putting aside the cost of a function call, it may cost near to nothing on one OS/architecture (consider modern processor/OS/VM) and much more on another (consider purely software processor emulation). On a single green thread VM it may cost near to zero (except the call overhead). The cost will differ even between ARM and Intel of a comparable power.



      synchronized() is usually implemented inside a VM by using OS synchronization primitives, with some heuristics to speed up common cases. OS, in turn, uses hardware instructions and heuristics to perform this task. Usually, subsequent acquisition of an already acquired synchronization primitive is exceptionally efficient in an OS and is very efficient on a typical production grade VM.



      On modern Windows/Linux VM and Intel/AMD processor, usually, it doesn't cost a lot of CPU cycles (assuming otherwise idle machine) and is in the low nanoseconds range.



      Note, in general, it is a very complex topic. Multiple layers of software, hardware (and the effect of other tasks running on the same hardware resources) are involved. Rigorous research of even a small sub-topic here can compose multiple Ph.D. thesis.



      In practice, though, my advice is to assume the cost of a second synchronized in small loops to be zero unless you encounter a particular bottleneck (which is quite unlikely).



      If there is a large number of iterations, it definitely will increase the cost vs single synchronized, and the overall effect depends on what you are doing inside the loop. Usually, there is some work in each iteration making the relative overhead negligible. But for some cases, it may prevent loop optimization and add a substantial overhead (substantial comparing to single synchronized, not as a practical measure). However, in common practical cases of huge loops, one should think about different design and avoid performing the outer synchronized to reduce lock contention.



      To get a sense about VM implementation you may look, for example, into the Synchronization section of this paper. It is a bit outdated but is straightforward to understand.






      share|improve this answer






























        1














        The answer to this (legitimate) question depends on the OS, hardware and specific VM implementation.



        Putting aside the cost of a function call, it may cost near to nothing on one OS/architecture (consider modern processor/OS/VM) and much more on another (consider purely software processor emulation). On a single green thread VM it may cost near to zero (except the call overhead). The cost will differ even between ARM and Intel of a comparable power.



        synchronized() is usually implemented inside a VM by using OS synchronization primitives, with some heuristics to speed up common cases. OS, in turn, uses hardware instructions and heuristics to perform this task. Usually, subsequent acquisition of an already acquired synchronization primitive is exceptionally efficient in an OS and is very efficient on a typical production grade VM.



        On modern Windows/Linux VM and Intel/AMD processor, usually, it doesn't cost a lot of CPU cycles (assuming otherwise idle machine) and is in the low nanoseconds range.



        Note, in general, it is a very complex topic. Multiple layers of software, hardware (and the effect of other tasks running on the same hardware resources) are involved. Rigorous research of even a small sub-topic here can compose multiple Ph.D. thesis.



        In practice, though, my advice is to assume the cost of a second synchronized in small loops to be zero unless you encounter a particular bottleneck (which is quite unlikely).



        If there is a large number of iterations, it definitely will increase the cost vs single synchronized, and the overall effect depends on what you are doing inside the loop. Usually, there is some work in each iteration making the relative overhead negligible. But for some cases, it may prevent loop optimization and add a substantial overhead (substantial comparing to single synchronized, not as a practical measure). However, in common practical cases of huge loops, one should think about different design and avoid performing the outer synchronized to reduce lock contention.



        To get a sense about VM implementation you may look, for example, into the Synchronization section of this paper. It is a bit outdated but is straightforward to understand.






        share|improve this answer




























          1












          1








          1







          The answer to this (legitimate) question depends on the OS, hardware and specific VM implementation.



          Putting aside the cost of a function call, it may cost near to nothing on one OS/architecture (consider modern processor/OS/VM) and much more on another (consider purely software processor emulation). On a single green thread VM it may cost near to zero (except the call overhead). The cost will differ even between ARM and Intel of a comparable power.



          synchronized() is usually implemented inside a VM by using OS synchronization primitives, with some heuristics to speed up common cases. OS, in turn, uses hardware instructions and heuristics to perform this task. Usually, subsequent acquisition of an already acquired synchronization primitive is exceptionally efficient in an OS and is very efficient on a typical production grade VM.



          On modern Windows/Linux VM and Intel/AMD processor, usually, it doesn't cost a lot of CPU cycles (assuming otherwise idle machine) and is in the low nanoseconds range.



          Note, in general, it is a very complex topic. Multiple layers of software, hardware (and the effect of other tasks running on the same hardware resources) are involved. Rigorous research of even a small sub-topic here can compose multiple Ph.D. thesis.



          In practice, though, my advice is to assume the cost of a second synchronized in small loops to be zero unless you encounter a particular bottleneck (which is quite unlikely).



          If there is a large number of iterations, it definitely will increase the cost vs single synchronized, and the overall effect depends on what you are doing inside the loop. Usually, there is some work in each iteration making the relative overhead negligible. But for some cases, it may prevent loop optimization and add a substantial overhead (substantial comparing to single synchronized, not as a practical measure). However, in common practical cases of huge loops, one should think about different design and avoid performing the outer synchronized to reduce lock contention.



          To get a sense about VM implementation you may look, for example, into the Synchronization section of this paper. It is a bit outdated but is straightforward to understand.






          share|improve this answer















          The answer to this (legitimate) question depends on the OS, hardware and specific VM implementation.



          Putting aside the cost of a function call, it may cost near to nothing on one OS/architecture (consider modern processor/OS/VM) and much more on another (consider purely software processor emulation). On a single green thread VM it may cost near to zero (except the call overhead). The cost will differ even between ARM and Intel of a comparable power.



          synchronized() is usually implemented inside a VM by using OS synchronization primitives, with some heuristics to speed up common cases. OS, in turn, uses hardware instructions and heuristics to perform this task. Usually, subsequent acquisition of an already acquired synchronization primitive is exceptionally efficient in an OS and is very efficient on a typical production grade VM.



          On modern Windows/Linux VM and Intel/AMD processor, usually, it doesn't cost a lot of CPU cycles (assuming otherwise idle machine) and is in the low nanoseconds range.



          Note, in general, it is a very complex topic. Multiple layers of software, hardware (and the effect of other tasks running on the same hardware resources) are involved. Rigorous research of even a small sub-topic here can compose multiple Ph.D. thesis.



          In practice, though, my advice is to assume the cost of a second synchronized in small loops to be zero unless you encounter a particular bottleneck (which is quite unlikely).



          If there is a large number of iterations, it definitely will increase the cost vs single synchronized, and the overall effect depends on what you are doing inside the loop. Usually, there is some work in each iteration making the relative overhead negligible. But for some cases, it may prevent loop optimization and add a substantial overhead (substantial comparing to single synchronized, not as a practical measure). However, in common practical cases of huge loops, one should think about different design and avoid performing the outer synchronized to reduce lock contention.



          To get a sense about VM implementation you may look, for example, into the Synchronization section of this paper. It is a bit outdated but is straightforward to understand.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Nov 15 '18 at 18:00

























          answered Nov 15 '18 at 17:31









          Fedor LosevFedor Losev

          2,771812




          2,771812

























              0














              synchronized locks are re-entrant and acquiring a lock when a thread is already holding a lock is a) the time to check it already hols the lock, b) the time to increment a counter and later decrement it.



              The first one takes the longest and adds about 10 - 50 ns each time.






              share|improve this answer






























                0














                synchronized locks are re-entrant and acquiring a lock when a thread is already holding a lock is a) the time to check it already hols the lock, b) the time to increment a counter and later decrement it.



                The first one takes the longest and adds about 10 - 50 ns each time.






                share|improve this answer




























                  0












                  0








                  0







                  synchronized locks are re-entrant and acquiring a lock when a thread is already holding a lock is a) the time to check it already hols the lock, b) the time to increment a counter and later decrement it.



                  The first one takes the longest and adds about 10 - 50 ns each time.






                  share|improve this answer















                  synchronized locks are re-entrant and acquiring a lock when a thread is already holding a lock is a) the time to check it already hols the lock, b) the time to increment a counter and later decrement it.



                  The first one takes the longest and adds about 10 - 50 ns each time.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 15 '18 at 16:52

























                  answered Nov 15 '18 at 16:34









                  Peter LawreyPeter Lawrey

                  447k56571973




                  447k56571973






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53323776%2foverhead-of-multiple-synchronized-on-the-same-object%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Bressuire

                      Vorschmack

                      Quarantine