splitting a node in b+ tree












6















I'm trying to figure out what exactly happens when there is a node overflow.
info:
in my b+ tree there are 4 pointers per block and 3 data sections .
problem:
I understood that when there is an overflow we split into 2 nodes in my case each with 2
keys,
and insert the parent node the mid value, without erasing from the son(unlike in b tree).



however I got into situation:



                                |21|30|50|

|10|20|-| |21|22|25| |30|40|-| |50|60|80|


and I want to insert the key 23
first I split |21|22|25| into: |21|22|-| and |23|25|-|
now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
|21|23|-| and |30|50|-|
but where does the pointer before 30 points to?
is it possible that both this pointer & the one after 23 point to |23|25|-|
?










share|improve this question





























    6















    I'm trying to figure out what exactly happens when there is a node overflow.
    info:
    in my b+ tree there are 4 pointers per block and 3 data sections .
    problem:
    I understood that when there is an overflow we split into 2 nodes in my case each with 2
    keys,
    and insert the parent node the mid value, without erasing from the son(unlike in b tree).



    however I got into situation:



                                    |21|30|50|

    |10|20|-| |21|22|25| |30|40|-| |50|60|80|


    and I want to insert the key 23
    first I split |21|22|25| into: |21|22|-| and |23|25|-|
    now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
    |21|23|-| and |30|50|-|
    but where does the pointer before 30 points to?
    is it possible that both this pointer & the one after 23 point to |23|25|-|
    ?










    share|improve this question



























      6












      6








      6


      2






      I'm trying to figure out what exactly happens when there is a node overflow.
      info:
      in my b+ tree there are 4 pointers per block and 3 data sections .
      problem:
      I understood that when there is an overflow we split into 2 nodes in my case each with 2
      keys,
      and insert the parent node the mid value, without erasing from the son(unlike in b tree).



      however I got into situation:



                                      |21|30|50|

      |10|20|-| |21|22|25| |30|40|-| |50|60|80|


      and I want to insert the key 23
      first I split |21|22|25| into: |21|22|-| and |23|25|-|
      now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
      |21|23|-| and |30|50|-|
      but where does the pointer before 30 points to?
      is it possible that both this pointer & the one after 23 point to |23|25|-|
      ?










      share|improve this question
















      I'm trying to figure out what exactly happens when there is a node overflow.
      info:
      in my b+ tree there are 4 pointers per block and 3 data sections .
      problem:
      I understood that when there is an overflow we split into 2 nodes in my case each with 2
      keys,
      and insert the parent node the mid value, without erasing from the son(unlike in b tree).



      however I got into situation:



                                      |21|30|50|

      |10|20|-| |21|22|25| |30|40|-| |50|60|80|


      and I want to insert the key 23
      first I split |21|22|25| into: |21|22|-| and |23|25|-|
      now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
      |21|23|-| and |30|50|-|
      but where does the pointer before 30 points to?
      is it possible that both this pointer & the one after 23 point to |23|25|-|
      ?







      tree b-tree multiway-tree






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 16 '18 at 4:57









      Cœur

      19.1k9114155




      19.1k9114155










      asked Jun 11 '11 at 7:48









      farchyfarchy

      3112




      3112
























          4 Answers
          4






          active

          oldest

          votes


















          2














          When inserting 23:




          • as you said, 21|22|-| and |23|25|-| are created

          • the 2 nodes require a parent

          • a parent is created in the the root node: |21|23|30|50|

          • the root has too many elements now

          • split the root in 2 nodes |21|23|- and |30|50|-

          • add a new parent for the 2 new nodes (which happens to be the new root of the tree)


          Basically, that insert will increase the depth of the tree by 1






          share|improve this answer
























          • thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?

            – farchy
            Jun 11 '11 at 8:00













          • somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???

            – farchy
            Jun 11 '11 at 8:50





















          0














          Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)



                          [21             30           50]
          /
          [10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]


          After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.



          So splits should look like this.



            old          fresh 
          [21|22|-] => [23|25|-]


          since left node is same instance, key 21 at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.



                          [21                             30           50]
          /
          [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


          only a new item with key 23 should be added next to 21 in root. since root has no space it will also split. middle key is 23 (first item of right split).



                          [21                          30]          [ 50 ]
          / *
          [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


          Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!



          (root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)



          our middle key was 23. so lets add 23.



                          [21            23             30]          [ 50 ]
          / *
          [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


          Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.



          (In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)



          here new middle key is 30. lets pop it from left node.



           30

          [30|40|-]


          (Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)



                          [21            23]            30           [ 50 ]
          / *
          [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


          Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.



                          [21            23]       30            [50]
          / * /
          [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


          Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.



                                             [       30        ]
          /
          [21 23] [50]
          / /
          [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


          Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.





          There are couple of ways to represent these key-node items in memory. my approach is each key-node item has right pointer with a key. so each internal node will have array of key-nodes. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).






          share|improve this answer

































            -1














            The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.



            So this is how you should represent the b-tree node prior of insertion.



                                     10|21|30|50|                       <--- root node

            10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes


            In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.



            When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.



                                     10|21|30|50|                       <--- root node

            10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes


            When inserting 24 which produces the effect you described, what you get is the following



                                          10|30|                         <--- new root node

            10|21|24| 30|50| <--- intermediate nodes

            10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes


            As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.



            This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.



            As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)






            share|improve this answer


























            • I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.

              – Victor Hurdugaci
              Jun 11 '11 at 16:35











            • This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.

              – chmike
              Jun 13 '11 at 9:59





















            -1














            From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:



                         |23|
            /
            |21|22|-| |25|-|-|


            I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.



            If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!






            share|improve this answer























              Your Answer






              StackExchange.ifUsing("editor", function () {
              StackExchange.using("externalEditor", function () {
              StackExchange.using("snippets", function () {
              StackExchange.snippets.init();
              });
              });
              }, "code-snippets");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "1"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314790%2fsplitting-a-node-in-b-tree%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              2














              When inserting 23:




              • as you said, 21|22|-| and |23|25|-| are created

              • the 2 nodes require a parent

              • a parent is created in the the root node: |21|23|30|50|

              • the root has too many elements now

              • split the root in 2 nodes |21|23|- and |30|50|-

              • add a new parent for the 2 new nodes (which happens to be the new root of the tree)


              Basically, that insert will increase the depth of the tree by 1






              share|improve this answer
























              • thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?

                – farchy
                Jun 11 '11 at 8:00













              • somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???

                – farchy
                Jun 11 '11 at 8:50


















              2














              When inserting 23:




              • as you said, 21|22|-| and |23|25|-| are created

              • the 2 nodes require a parent

              • a parent is created in the the root node: |21|23|30|50|

              • the root has too many elements now

              • split the root in 2 nodes |21|23|- and |30|50|-

              • add a new parent for the 2 new nodes (which happens to be the new root of the tree)


              Basically, that insert will increase the depth of the tree by 1






              share|improve this answer
























              • thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?

                – farchy
                Jun 11 '11 at 8:00













              • somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???

                – farchy
                Jun 11 '11 at 8:50
















              2












              2








              2







              When inserting 23:




              • as you said, 21|22|-| and |23|25|-| are created

              • the 2 nodes require a parent

              • a parent is created in the the root node: |21|23|30|50|

              • the root has too many elements now

              • split the root in 2 nodes |21|23|- and |30|50|-

              • add a new parent for the 2 new nodes (which happens to be the new root of the tree)


              Basically, that insert will increase the depth of the tree by 1






              share|improve this answer













              When inserting 23:




              • as you said, 21|22|-| and |23|25|-| are created

              • the 2 nodes require a parent

              • a parent is created in the the root node: |21|23|30|50|

              • the root has too many elements now

              • split the root in 2 nodes |21|23|- and |30|50|-

              • add a new parent for the 2 new nodes (which happens to be the new root of the tree)


              Basically, that insert will increase the depth of the tree by 1







              share|improve this answer












              share|improve this answer



              share|improve this answer










              answered Jun 11 '11 at 7:55









              Victor HurdugaciVictor Hurdugaci

              25.7k57499




              25.7k57499













              • thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?

                – farchy
                Jun 11 '11 at 8:00













              • somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???

                – farchy
                Jun 11 '11 at 8:50





















              • thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?

                – farchy
                Jun 11 '11 at 8:00













              • somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???

                – farchy
                Jun 11 '11 at 8:50



















              thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?

              – farchy
              Jun 11 '11 at 8:00







              thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?

              – farchy
              Jun 11 '11 at 8:00















              somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???

              – farchy
              Jun 11 '11 at 8:50







              somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???

              – farchy
              Jun 11 '11 at 8:50















              0














              Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)



                              [21             30           50]
              /
              [10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]


              After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.



              So splits should look like this.



                old          fresh 
              [21|22|-] => [23|25|-]


              since left node is same instance, key 21 at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.



                              [21                             30           50]
              /
              [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


              only a new item with key 23 should be added next to 21 in root. since root has no space it will also split. middle key is 23 (first item of right split).



                              [21                          30]          [ 50 ]
              / *
              [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


              Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!



              (root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)



              our middle key was 23. so lets add 23.



                              [21            23             30]          [ 50 ]
              / *
              [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


              Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.



              (In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)



              here new middle key is 30. lets pop it from left node.



               30

              [30|40|-]


              (Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)



                              [21            23]            30           [ 50 ]
              / *
              [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


              Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.



                              [21            23]       30            [50]
              / * /
              [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


              Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.



                                                 [       30        ]
              /
              [21 23] [50]
              / /
              [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


              Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.





              There are couple of ways to represent these key-node items in memory. my approach is each key-node item has right pointer with a key. so each internal node will have array of key-nodes. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).






              share|improve this answer






























                0














                Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)



                                [21             30           50]
                /
                [10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]


                After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.



                So splits should look like this.



                  old          fresh 
                [21|22|-] => [23|25|-]


                since left node is same instance, key 21 at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.



                                [21                             30           50]
                /
                [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                only a new item with key 23 should be added next to 21 in root. since root has no space it will also split. middle key is 23 (first item of right split).



                                [21                          30]          [ 50 ]
                / *
                [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!



                (root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)



                our middle key was 23. so lets add 23.



                                [21            23             30]          [ 50 ]
                / *
                [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.



                (In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)



                here new middle key is 30. lets pop it from left node.



                 30

                [30|40|-]


                (Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)



                                [21            23]            30           [ 50 ]
                / *
                [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.



                                [21            23]       30            [50]
                / * /
                [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.



                                                   [       30        ]
                /
                [21 23] [50]
                / /
                [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.





                There are couple of ways to represent these key-node items in memory. my approach is each key-node item has right pointer with a key. so each internal node will have array of key-nodes. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).






                share|improve this answer




























                  0












                  0








                  0







                  Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)



                                  [21             30           50]
                  /
                  [10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]


                  After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.



                  So splits should look like this.



                    old          fresh 
                  [21|22|-] => [23|25|-]


                  since left node is same instance, key 21 at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.



                                  [21                             30           50]
                  /
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  only a new item with key 23 should be added next to 21 in root. since root has no space it will also split. middle key is 23 (first item of right split).



                                  [21                          30]          [ 50 ]
                  / *
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!



                  (root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)



                  our middle key was 23. so lets add 23.



                                  [21            23             30]          [ 50 ]
                  / *
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.



                  (In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)



                  here new middle key is 30. lets pop it from left node.



                   30

                  [30|40|-]


                  (Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)



                                  [21            23]            30           [ 50 ]
                  / *
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.



                                  [21            23]       30            [50]
                  / * /
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.



                                                     [       30        ]
                  /
                  [21 23] [50]
                  / /
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.





                  There are couple of ways to represent these key-node items in memory. my approach is each key-node item has right pointer with a key. so each internal node will have array of key-nodes. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).






                  share|improve this answer















                  Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)



                                  [21             30           50]
                  /
                  [10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]


                  After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.



                  So splits should look like this.



                    old          fresh 
                  [21|22|-] => [23|25|-]


                  since left node is same instance, key 21 at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.



                                  [21                             30           50]
                  /
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  only a new item with key 23 should be added next to 21 in root. since root has no space it will also split. middle key is 23 (first item of right split).



                                  [21                          30]          [ 50 ]
                  / *
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!



                  (root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)



                  our middle key was 23. so lets add 23.



                                  [21            23             30]          [ 50 ]
                  / *
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.



                  (In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)



                  here new middle key is 30. lets pop it from left node.



                   30

                  [30|40|-]


                  (Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)



                                  [21            23]            30           [ 50 ]
                  / *
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.



                                  [21            23]       30            [50]
                  / * /
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.



                                                     [       30        ]
                  /
                  [21 23] [50]
                  / /
                  [10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]


                  Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.





                  There are couple of ways to represent these key-node items in memory. my approach is each key-node item has right pointer with a key. so each internal node will have array of key-nodes. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 9 '18 at 17:12

























                  answered Jan 9 '18 at 16:59









                  M.kazem AkhgaryM.kazem Akhgary

                  12.2k53574




                  12.2k53574























                      -1














                      The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.



                      So this is how you should represent the b-tree node prior of insertion.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes


                      In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.



                      When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes


                      When inserting 24 which produces the effect you described, what you get is the following



                                                    10|30|                         <--- new root node

                      10|21|24| 30|50| <--- intermediate nodes

                      10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes


                      As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.



                      This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.



                      As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)






                      share|improve this answer


























                      • I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.

                        – Victor Hurdugaci
                        Jun 11 '11 at 16:35











                      • This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.

                        – chmike
                        Jun 13 '11 at 9:59


















                      -1














                      The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.



                      So this is how you should represent the b-tree node prior of insertion.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes


                      In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.



                      When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes


                      When inserting 24 which produces the effect you described, what you get is the following



                                                    10|30|                         <--- new root node

                      10|21|24| 30|50| <--- intermediate nodes

                      10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes


                      As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.



                      This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.



                      As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)






                      share|improve this answer


























                      • I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.

                        – Victor Hurdugaci
                        Jun 11 '11 at 16:35











                      • This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.

                        – chmike
                        Jun 13 '11 at 9:59
















                      -1












                      -1








                      -1







                      The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.



                      So this is how you should represent the b-tree node prior of insertion.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes


                      In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.



                      When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes


                      When inserting 24 which produces the effect you described, what you get is the following



                                                    10|30|                         <--- new root node

                      10|21|24| 30|50| <--- intermediate nodes

                      10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes


                      As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.



                      This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.



                      As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)






                      share|improve this answer















                      The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.



                      So this is how you should represent the b-tree node prior of insertion.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes


                      In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.



                      When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.



                                               10|21|30|50|                       <--- root node

                      10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes


                      When inserting 24 which produces the effect you described, what you get is the following



                                                    10|30|                         <--- new root node

                      10|21|24| 30|50| <--- intermediate nodes

                      10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes


                      As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.



                      This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.



                      As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)







                      share|improve this answer














                      share|improve this answer



                      share|improve this answer








                      edited Jun 11 '11 at 9:54

























                      answered Jun 11 '11 at 9:16









                      chmikechmike

                      8,212155372




                      8,212155372













                      • I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.

                        – Victor Hurdugaci
                        Jun 11 '11 at 16:35











                      • This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.

                        – chmike
                        Jun 13 '11 at 9:59





















                      • I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.

                        – Victor Hurdugaci
                        Jun 11 '11 at 16:35











                      • This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.

                        – chmike
                        Jun 13 '11 at 9:59



















                      I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.

                      – Victor Hurdugaci
                      Jun 11 '11 at 16:35





                      I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.

                      – Victor Hurdugaci
                      Jun 11 '11 at 16:35













                      This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.

                      – chmike
                      Jun 13 '11 at 9:59







                      This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.

                      – chmike
                      Jun 13 '11 at 9:59













                      -1














                      From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:



                                   |23|
                      /
                      |21|22|-| |25|-|-|


                      I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.



                      If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!






                      share|improve this answer




























                        -1














                        From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:



                                     |23|
                        /
                        |21|22|-| |25|-|-|


                        I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.



                        If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!






                        share|improve this answer


























                          -1












                          -1








                          -1







                          From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:



                                       |23|
                          /
                          |21|22|-| |25|-|-|


                          I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.



                          If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!






                          share|improve this answer













                          From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:



                                       |23|
                          /
                          |21|22|-| |25|-|-|


                          I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.



                          If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered May 1 '13 at 12:07









                          kevinkevin

                          1,6091519




                          1,6091519






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Stack Overflow!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314790%2fsplitting-a-node-in-b-tree%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Xamarin.iOS Cant Deploy on Iphone

                              Glorious Revolution

                              Dulmage-Mendelsohn matrix decomposition in Python