Why is division more expensive than multiplication?





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







35















I am not really trying to optimize anything, but I remember hearing this from programmers all the time, that I took it as a truth. After all they are supposed to know this stuff.



But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition? So mathematically I don't see why going one way or the other has computationally very different costs.



Can anyone please clarify the reason/cause of this so I know, instead of what I heard from other programmer's that I asked before which is: "because".










share|improve this question




















  • 10





    "After all they are supposed to know this stuff." - You might be surprised what most people don't know.

    – David
    Apr 1 '13 at 15:01






  • 9





    You will have to ask an electronics engineer, it is a circuit design problem. Creating a hardware multiplier is pretty easy, a hardware divider is not. Practical divider circuits are iterative and therefore take longer. Ask at electronics.stackexchange.com

    – Hans Passant
    Apr 1 '13 at 15:08






  • 1





    Wikipedia (cf. article on FLOPS) and other sources (en.community.dell.com/techcenter/high-performance-computing/w/…) claim that typical CPUs can execute 4 floating point operations per clock cycle. This seems to be regardless of the type. Following this, division would be as expensive/cheap as multiplication. Who is volunteering to do a benchmark?

    – Axel Kemper
    Apr 1 '13 at 15:52






  • 3





    In short: quotient estimate and correction steps.

    – Brett Hale
    Apr 1 '13 at 17:18






  • 14





    You're right that multiplication breaks down into multiple additions and division breaks down into multiple subtractions. The difference is that the additions in multiplication can be done in parallel, whereas in division, you can't do the next subtraction until finish the previous one and do a comparison. So a hardware multiplier will exploit this inherent parallelism by computing and summing up many sub-products simultaneously at the cost of increased area real-estate. Division does not have this luxury.

    – Mysticial
    Apr 3 '13 at 4:07




















35















I am not really trying to optimize anything, but I remember hearing this from programmers all the time, that I took it as a truth. After all they are supposed to know this stuff.



But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition? So mathematically I don't see why going one way or the other has computationally very different costs.



Can anyone please clarify the reason/cause of this so I know, instead of what I heard from other programmer's that I asked before which is: "because".










share|improve this question




















  • 10





    "After all they are supposed to know this stuff." - You might be surprised what most people don't know.

    – David
    Apr 1 '13 at 15:01






  • 9





    You will have to ask an electronics engineer, it is a circuit design problem. Creating a hardware multiplier is pretty easy, a hardware divider is not. Practical divider circuits are iterative and therefore take longer. Ask at electronics.stackexchange.com

    – Hans Passant
    Apr 1 '13 at 15:08






  • 1





    Wikipedia (cf. article on FLOPS) and other sources (en.community.dell.com/techcenter/high-performance-computing/w/…) claim that typical CPUs can execute 4 floating point operations per clock cycle. This seems to be regardless of the type. Following this, division would be as expensive/cheap as multiplication. Who is volunteering to do a benchmark?

    – Axel Kemper
    Apr 1 '13 at 15:52






  • 3





    In short: quotient estimate and correction steps.

    – Brett Hale
    Apr 1 '13 at 17:18






  • 14





    You're right that multiplication breaks down into multiple additions and division breaks down into multiple subtractions. The difference is that the additions in multiplication can be done in parallel, whereas in division, you can't do the next subtraction until finish the previous one and do a comparison. So a hardware multiplier will exploit this inherent parallelism by computing and summing up many sub-products simultaneously at the cost of increased area real-estate. Division does not have this luxury.

    – Mysticial
    Apr 3 '13 at 4:07
















35












35








35


8






I am not really trying to optimize anything, but I remember hearing this from programmers all the time, that I took it as a truth. After all they are supposed to know this stuff.



But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition? So mathematically I don't see why going one way or the other has computationally very different costs.



Can anyone please clarify the reason/cause of this so I know, instead of what I heard from other programmer's that I asked before which is: "because".










share|improve this question
















I am not really trying to optimize anything, but I remember hearing this from programmers all the time, that I took it as a truth. After all they are supposed to know this stuff.



But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition? So mathematically I don't see why going one way or the other has computationally very different costs.



Can anyone please clarify the reason/cause of this so I know, instead of what I heard from other programmer's that I asked before which is: "because".







performance division cpu-architecture multiplication






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 27 '18 at 21:01









Peter Cordes

136k19207349




136k19207349










asked Apr 1 '13 at 14:58









Joan VengeJoan Venge

104k175402626




104k175402626








  • 10





    "After all they are supposed to know this stuff." - You might be surprised what most people don't know.

    – David
    Apr 1 '13 at 15:01






  • 9





    You will have to ask an electronics engineer, it is a circuit design problem. Creating a hardware multiplier is pretty easy, a hardware divider is not. Practical divider circuits are iterative and therefore take longer. Ask at electronics.stackexchange.com

    – Hans Passant
    Apr 1 '13 at 15:08






  • 1





    Wikipedia (cf. article on FLOPS) and other sources (en.community.dell.com/techcenter/high-performance-computing/w/…) claim that typical CPUs can execute 4 floating point operations per clock cycle. This seems to be regardless of the type. Following this, division would be as expensive/cheap as multiplication. Who is volunteering to do a benchmark?

    – Axel Kemper
    Apr 1 '13 at 15:52






  • 3





    In short: quotient estimate and correction steps.

    – Brett Hale
    Apr 1 '13 at 17:18






  • 14





    You're right that multiplication breaks down into multiple additions and division breaks down into multiple subtractions. The difference is that the additions in multiplication can be done in parallel, whereas in division, you can't do the next subtraction until finish the previous one and do a comparison. So a hardware multiplier will exploit this inherent parallelism by computing and summing up many sub-products simultaneously at the cost of increased area real-estate. Division does not have this luxury.

    – Mysticial
    Apr 3 '13 at 4:07
















  • 10





    "After all they are supposed to know this stuff." - You might be surprised what most people don't know.

    – David
    Apr 1 '13 at 15:01






  • 9





    You will have to ask an electronics engineer, it is a circuit design problem. Creating a hardware multiplier is pretty easy, a hardware divider is not. Practical divider circuits are iterative and therefore take longer. Ask at electronics.stackexchange.com

    – Hans Passant
    Apr 1 '13 at 15:08






  • 1





    Wikipedia (cf. article on FLOPS) and other sources (en.community.dell.com/techcenter/high-performance-computing/w/…) claim that typical CPUs can execute 4 floating point operations per clock cycle. This seems to be regardless of the type. Following this, division would be as expensive/cheap as multiplication. Who is volunteering to do a benchmark?

    – Axel Kemper
    Apr 1 '13 at 15:52






  • 3





    In short: quotient estimate and correction steps.

    – Brett Hale
    Apr 1 '13 at 17:18






  • 14





    You're right that multiplication breaks down into multiple additions and division breaks down into multiple subtractions. The difference is that the additions in multiplication can be done in parallel, whereas in division, you can't do the next subtraction until finish the previous one and do a comparison. So a hardware multiplier will exploit this inherent parallelism by computing and summing up many sub-products simultaneously at the cost of increased area real-estate. Division does not have this luxury.

    – Mysticial
    Apr 3 '13 at 4:07










10




10





"After all they are supposed to know this stuff." - You might be surprised what most people don't know.

– David
Apr 1 '13 at 15:01





"After all they are supposed to know this stuff." - You might be surprised what most people don't know.

– David
Apr 1 '13 at 15:01




9




9





You will have to ask an electronics engineer, it is a circuit design problem. Creating a hardware multiplier is pretty easy, a hardware divider is not. Practical divider circuits are iterative and therefore take longer. Ask at electronics.stackexchange.com

– Hans Passant
Apr 1 '13 at 15:08





You will have to ask an electronics engineer, it is a circuit design problem. Creating a hardware multiplier is pretty easy, a hardware divider is not. Practical divider circuits are iterative and therefore take longer. Ask at electronics.stackexchange.com

– Hans Passant
Apr 1 '13 at 15:08




1




1





Wikipedia (cf. article on FLOPS) and other sources (en.community.dell.com/techcenter/high-performance-computing/w/…) claim that typical CPUs can execute 4 floating point operations per clock cycle. This seems to be regardless of the type. Following this, division would be as expensive/cheap as multiplication. Who is volunteering to do a benchmark?

– Axel Kemper
Apr 1 '13 at 15:52





Wikipedia (cf. article on FLOPS) and other sources (en.community.dell.com/techcenter/high-performance-computing/w/…) claim that typical CPUs can execute 4 floating point operations per clock cycle. This seems to be regardless of the type. Following this, division would be as expensive/cheap as multiplication. Who is volunteering to do a benchmark?

– Axel Kemper
Apr 1 '13 at 15:52




3




3





In short: quotient estimate and correction steps.

– Brett Hale
Apr 1 '13 at 17:18





In short: quotient estimate and correction steps.

– Brett Hale
Apr 1 '13 at 17:18




14




14





You're right that multiplication breaks down into multiple additions and division breaks down into multiple subtractions. The difference is that the additions in multiplication can be done in parallel, whereas in division, you can't do the next subtraction until finish the previous one and do a comparison. So a hardware multiplier will exploit this inherent parallelism by computing and summing up many sub-products simultaneously at the cost of increased area real-estate. Division does not have this luxury.

– Mysticial
Apr 3 '13 at 4:07







You're right that multiplication breaks down into multiple additions and division breaks down into multiple subtractions. The difference is that the additions in multiplication can be done in parallel, whereas in division, you can't do the next subtraction until finish the previous one and do a comparison. So a hardware multiplier will exploit this inherent parallelism by computing and summing up many sub-products simultaneously at the cost of increased area real-estate. Division does not have this luxury.

– Mysticial
Apr 3 '13 at 4:07














2 Answers
2






active

oldest

votes


















48














CPU's ALU (Arithmetic-Logic Unit) executes algorithms, though they are implemented in hardware. Classic multiplications algorithms includes Wallace tree and Dadda tree. More information is available here. More sophisticated techniques are available in newer processors. Generally, processors strive to parallelize bit-pairs operations in order the minimize the clock cycles required. Multiplication algorithms can be parallelized quite effectively (though more transistors are required).



Division algorithms can't be parallelized as efficiently. The most efficient division algorithms are quite complex (The Pentium FDIV bug demonstrates the level of complexity). Generally, they requires more clock cycles per bit. If you're after more technical details, here is a nice explanation from Intel. Intel actually patented their division algorithm.






share|improve this answer

































    4















    But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition?




    The big difference is that in a long multiplication you just need to add up a bunch of numbers after shifting and masking. In a long division you have to test for overflow after each subtraction.





    Lets consider a long multiplication of two n bit binary numbers.




    • shift (no time)

    • mask (constant time)

    • add (neively looks like time proportional to n²)


    But if we look closer it turns out we can optimise the addition by using two tricks (there are further optimisations but these are the most important).




    1. We can add the numbers in groups rather than sequentially.

    2. Until the final step we can add three numbers to produce two rather than adding two to produce one. While adding two numbers to produce one takes time proportional to n, adding three numbers to produce two can be done in constant time because we can eliminate the carry chain.


    So now our algorithm looks like




    • shift (no time)

    • mask (constant time)

    • add numbers in groups of three to produce two until there are only two left (time proportional to log(n))

    • perform the final addition (time proportional to n)


    In other words we can build a multiplier for two n bit numbers in time roughly proportional to n (and space roughly proportional to n²). As long as the CPU designer is willing to dedicate the logic multiplication can be almost as fast as addition.





    In long division we need to know whether each subtraction overflowed before we can decide what inputs to use for the next one. So we can't apply the same parallising tricks as we can with long multiplication.



    There are methods of division that are faster than basic long division but still they are slower than multiplication.






    share|improve this answer


























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f15745819%2fwhy-is-division-more-expensive-than-multiplication%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      48














      CPU's ALU (Arithmetic-Logic Unit) executes algorithms, though they are implemented in hardware. Classic multiplications algorithms includes Wallace tree and Dadda tree. More information is available here. More sophisticated techniques are available in newer processors. Generally, processors strive to parallelize bit-pairs operations in order the minimize the clock cycles required. Multiplication algorithms can be parallelized quite effectively (though more transistors are required).



      Division algorithms can't be parallelized as efficiently. The most efficient division algorithms are quite complex (The Pentium FDIV bug demonstrates the level of complexity). Generally, they requires more clock cycles per bit. If you're after more technical details, here is a nice explanation from Intel. Intel actually patented their division algorithm.






      share|improve this answer






























        48














        CPU's ALU (Arithmetic-Logic Unit) executes algorithms, though they are implemented in hardware. Classic multiplications algorithms includes Wallace tree and Dadda tree. More information is available here. More sophisticated techniques are available in newer processors. Generally, processors strive to parallelize bit-pairs operations in order the minimize the clock cycles required. Multiplication algorithms can be parallelized quite effectively (though more transistors are required).



        Division algorithms can't be parallelized as efficiently. The most efficient division algorithms are quite complex (The Pentium FDIV bug demonstrates the level of complexity). Generally, they requires more clock cycles per bit. If you're after more technical details, here is a nice explanation from Intel. Intel actually patented their division algorithm.






        share|improve this answer




























          48












          48








          48







          CPU's ALU (Arithmetic-Logic Unit) executes algorithms, though they are implemented in hardware. Classic multiplications algorithms includes Wallace tree and Dadda tree. More information is available here. More sophisticated techniques are available in newer processors. Generally, processors strive to parallelize bit-pairs operations in order the minimize the clock cycles required. Multiplication algorithms can be parallelized quite effectively (though more transistors are required).



          Division algorithms can't be parallelized as efficiently. The most efficient division algorithms are quite complex (The Pentium FDIV bug demonstrates the level of complexity). Generally, they requires more clock cycles per bit. If you're after more technical details, here is a nice explanation from Intel. Intel actually patented their division algorithm.






          share|improve this answer















          CPU's ALU (Arithmetic-Logic Unit) executes algorithms, though they are implemented in hardware. Classic multiplications algorithms includes Wallace tree and Dadda tree. More information is available here. More sophisticated techniques are available in newer processors. Generally, processors strive to parallelize bit-pairs operations in order the minimize the clock cycles required. Multiplication algorithms can be parallelized quite effectively (though more transistors are required).



          Division algorithms can't be parallelized as efficiently. The most efficient division algorithms are quite complex (The Pentium FDIV bug demonstrates the level of complexity). Generally, they requires more clock cycles per bit. If you're after more technical details, here is a nice explanation from Intel. Intel actually patented their division algorithm.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jun 28 '13 at 19:11

























          answered Jun 28 '13 at 18:26









          Lior KoganLior Kogan

          15.2k34574




          15.2k34574

























              4















              But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition?




              The big difference is that in a long multiplication you just need to add up a bunch of numbers after shifting and masking. In a long division you have to test for overflow after each subtraction.





              Lets consider a long multiplication of two n bit binary numbers.




              • shift (no time)

              • mask (constant time)

              • add (neively looks like time proportional to n²)


              But if we look closer it turns out we can optimise the addition by using two tricks (there are further optimisations but these are the most important).




              1. We can add the numbers in groups rather than sequentially.

              2. Until the final step we can add three numbers to produce two rather than adding two to produce one. While adding two numbers to produce one takes time proportional to n, adding three numbers to produce two can be done in constant time because we can eliminate the carry chain.


              So now our algorithm looks like




              • shift (no time)

              • mask (constant time)

              • add numbers in groups of three to produce two until there are only two left (time proportional to log(n))

              • perform the final addition (time proportional to n)


              In other words we can build a multiplier for two n bit numbers in time roughly proportional to n (and space roughly proportional to n²). As long as the CPU designer is willing to dedicate the logic multiplication can be almost as fast as addition.





              In long division we need to know whether each subtraction overflowed before we can decide what inputs to use for the next one. So we can't apply the same parallising tricks as we can with long multiplication.



              There are methods of division that are faster than basic long division but still they are slower than multiplication.






              share|improve this answer






























                4















                But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition?




                The big difference is that in a long multiplication you just need to add up a bunch of numbers after shifting and masking. In a long division you have to test for overflow after each subtraction.





                Lets consider a long multiplication of two n bit binary numbers.




                • shift (no time)

                • mask (constant time)

                • add (neively looks like time proportional to n²)


                But if we look closer it turns out we can optimise the addition by using two tricks (there are further optimisations but these are the most important).




                1. We can add the numbers in groups rather than sequentially.

                2. Until the final step we can add three numbers to produce two rather than adding two to produce one. While adding two numbers to produce one takes time proportional to n, adding three numbers to produce two can be done in constant time because we can eliminate the carry chain.


                So now our algorithm looks like




                • shift (no time)

                • mask (constant time)

                • add numbers in groups of three to produce two until there are only two left (time proportional to log(n))

                • perform the final addition (time proportional to n)


                In other words we can build a multiplier for two n bit numbers in time roughly proportional to n (and space roughly proportional to n²). As long as the CPU designer is willing to dedicate the logic multiplication can be almost as fast as addition.





                In long division we need to know whether each subtraction overflowed before we can decide what inputs to use for the next one. So we can't apply the same parallising tricks as we can with long multiplication.



                There are methods of division that are faster than basic long division but still they are slower than multiplication.






                share|improve this answer




























                  4












                  4








                  4








                  But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition?




                  The big difference is that in a long multiplication you just need to add up a bunch of numbers after shifting and masking. In a long division you have to test for overflow after each subtraction.





                  Lets consider a long multiplication of two n bit binary numbers.




                  • shift (no time)

                  • mask (constant time)

                  • add (neively looks like time proportional to n²)


                  But if we look closer it turns out we can optimise the addition by using two tricks (there are further optimisations but these are the most important).




                  1. We can add the numbers in groups rather than sequentially.

                  2. Until the final step we can add three numbers to produce two rather than adding two to produce one. While adding two numbers to produce one takes time proportional to n, adding three numbers to produce two can be done in constant time because we can eliminate the carry chain.


                  So now our algorithm looks like




                  • shift (no time)

                  • mask (constant time)

                  • add numbers in groups of three to produce two until there are only two left (time proportional to log(n))

                  • perform the final addition (time proportional to n)


                  In other words we can build a multiplier for two n bit numbers in time roughly proportional to n (and space roughly proportional to n²). As long as the CPU designer is willing to dedicate the logic multiplication can be almost as fast as addition.





                  In long division we need to know whether each subtraction overflowed before we can decide what inputs to use for the next one. So we can't apply the same parallising tricks as we can with long multiplication.



                  There are methods of division that are faster than basic long division but still they are slower than multiplication.






                  share|improve this answer
















                  But I wonder why is division actually slower than multiplication? Isn't division just a glorified subtraction, and multiplication is a glorified addition?




                  The big difference is that in a long multiplication you just need to add up a bunch of numbers after shifting and masking. In a long division you have to test for overflow after each subtraction.





                  Lets consider a long multiplication of two n bit binary numbers.




                  • shift (no time)

                  • mask (constant time)

                  • add (neively looks like time proportional to n²)


                  But if we look closer it turns out we can optimise the addition by using two tricks (there are further optimisations but these are the most important).




                  1. We can add the numbers in groups rather than sequentially.

                  2. Until the final step we can add three numbers to produce two rather than adding two to produce one. While adding two numbers to produce one takes time proportional to n, adding three numbers to produce two can be done in constant time because we can eliminate the carry chain.


                  So now our algorithm looks like




                  • shift (no time)

                  • mask (constant time)

                  • add numbers in groups of three to produce two until there are only two left (time proportional to log(n))

                  • perform the final addition (time proportional to n)


                  In other words we can build a multiplier for two n bit numbers in time roughly proportional to n (and space roughly proportional to n²). As long as the CPU designer is willing to dedicate the logic multiplication can be almost as fast as addition.





                  In long division we need to know whether each subtraction overflowed before we can decide what inputs to use for the next one. So we can't apply the same parallising tricks as we can with long multiplication.



                  There are methods of division that are faster than basic long division but still they are slower than multiplication.







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Nov 27 '18 at 21:13

























                  answered Nov 16 '18 at 23:09









                  plugwashplugwash

                  4,6471227




                  4,6471227






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f15745819%2fwhy-is-division-more-expensive-than-multiplication%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Xamarin.iOS Cant Deploy on Iphone

                      Glorious Revolution

                      Dulmage-Mendelsohn matrix decomposition in Python