splitting a node in b+ tree
I'm trying to figure out what exactly happens when there is a node overflow.
info:
in my b+ tree there are 4 pointers per block and 3 data sections .
problem:
I understood that when there is an overflow we split into 2 nodes in my case each with 2
keys,
and insert the parent node the mid value, without erasing from the son(unlike in b tree).
however I got into situation:
|21|30|50|
|10|20|-| |21|22|25| |30|40|-| |50|60|80|
and I want to insert the key 23
first I split |21|22|25| into: |21|22|-| and |23|25|-|
now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
|21|23|-| and |30|50|-|
but where does the pointer before 30 points to?
is it possible that both this pointer & the one after 23 point to |23|25|-|
?
tree b-tree multiway-tree
add a comment |
I'm trying to figure out what exactly happens when there is a node overflow.
info:
in my b+ tree there are 4 pointers per block and 3 data sections .
problem:
I understood that when there is an overflow we split into 2 nodes in my case each with 2
keys,
and insert the parent node the mid value, without erasing from the son(unlike in b tree).
however I got into situation:
|21|30|50|
|10|20|-| |21|22|25| |30|40|-| |50|60|80|
and I want to insert the key 23
first I split |21|22|25| into: |21|22|-| and |23|25|-|
now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
|21|23|-| and |30|50|-|
but where does the pointer before 30 points to?
is it possible that both this pointer & the one after 23 point to |23|25|-|
?
tree b-tree multiway-tree
add a comment |
I'm trying to figure out what exactly happens when there is a node overflow.
info:
in my b+ tree there are 4 pointers per block and 3 data sections .
problem:
I understood that when there is an overflow we split into 2 nodes in my case each with 2
keys,
and insert the parent node the mid value, without erasing from the son(unlike in b tree).
however I got into situation:
|21|30|50|
|10|20|-| |21|22|25| |30|40|-| |50|60|80|
and I want to insert the key 23
first I split |21|22|25| into: |21|22|-| and |23|25|-|
now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
|21|23|-| and |30|50|-|
but where does the pointer before 30 points to?
is it possible that both this pointer & the one after 23 point to |23|25|-|
?
tree b-tree multiway-tree
I'm trying to figure out what exactly happens when there is a node overflow.
info:
in my b+ tree there are 4 pointers per block and 3 data sections .
problem:
I understood that when there is an overflow we split into 2 nodes in my case each with 2
keys,
and insert the parent node the mid value, without erasing from the son(unlike in b tree).
however I got into situation:
|21|30|50|
|10|20|-| |21|22|25| |30|40|-| |50|60|80|
and I want to insert the key 23
first I split |21|22|25| into: |21|22|-| and |23|25|-|
now I need to insert the key 23 to the parent |21|30|50| witch causes another split.
|21|23|-| and |30|50|-|
but where does the pointer before 30 points to?
is it possible that both this pointer & the one after 23 point to |23|25|-|
?
tree b-tree multiway-tree
tree b-tree multiway-tree
edited Nov 16 '18 at 4:57
Cœur
19.1k9114155
19.1k9114155
asked Jun 11 '11 at 7:48
farchyfarchy
3112
3112
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
When inserting 23:
- as you said, 21|22|-| and |23|25|-| are created
- the 2 nodes require a parent
- a parent is created in the the root node: |21|23|30|50|
- the root has too many elements now
- split the root in 2 nodes |21|23|- and |30|50|-
- add a new parent for the 2 new nodes (which happens to be the new root of the tree)
Basically, that insert will increase the depth of the tree by 1
thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?
– farchy
Jun 11 '11 at 8:00
somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???
– farchy
Jun 11 '11 at 8:50
add a comment |
Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)
[21 30 50]
/
[10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]
After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.
So splits should look like this.
old fresh
[21|22|-] => [23|25|-]
since left node is same instance, key 21
at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.
[21 30 50]
/
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
only a new item with key 23
should be added next to 21
in root. since root has no space it will also split. middle key is 23 (first item of right split).
[21 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!
(root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)
our middle key was 23. so lets add 23.
[21 23 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.
(In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)
here new middle key is 30. lets pop it from left node.
30
[30|40|-]
(Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)
[21 23] 30 [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.
[21 23] 30 [50]
/ * /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.
[ 30 ]
/
[21 23] [50]
/ /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.
There are couple of ways to represent these key-node
items in memory. my approach is each key-node
item has right pointer with a key. so each internal node will have array of key-node
s. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).
add a comment |
The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.
So this is how you should represent the b-tree node prior of insertion.
10|21|30|50| <--- root node
10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes
In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.
When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.
10|21|30|50| <--- root node
10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes
When inserting 24 which produces the effect you described, what you get is the following
10|30| <--- new root node
10|21|24| 30|50| <--- intermediate nodes
10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes
As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.
This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.
As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)
I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.
– Victor Hurdugaci
Jun 11 '11 at 16:35
This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.
– chmike
Jun 13 '11 at 9:59
add a comment |
From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:
|23|
/
|21|22|-| |25|-|-|
I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.
If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314790%2fsplitting-a-node-in-b-tree%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
When inserting 23:
- as you said, 21|22|-| and |23|25|-| are created
- the 2 nodes require a parent
- a parent is created in the the root node: |21|23|30|50|
- the root has too many elements now
- split the root in 2 nodes |21|23|- and |30|50|-
- add a new parent for the 2 new nodes (which happens to be the new root of the tree)
Basically, that insert will increase the depth of the tree by 1
thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?
– farchy
Jun 11 '11 at 8:00
somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???
– farchy
Jun 11 '11 at 8:50
add a comment |
When inserting 23:
- as you said, 21|22|-| and |23|25|-| are created
- the 2 nodes require a parent
- a parent is created in the the root node: |21|23|30|50|
- the root has too many elements now
- split the root in 2 nodes |21|23|- and |30|50|-
- add a new parent for the 2 new nodes (which happens to be the new root of the tree)
Basically, that insert will increase the depth of the tree by 1
thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?
– farchy
Jun 11 '11 at 8:00
somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???
– farchy
Jun 11 '11 at 8:50
add a comment |
When inserting 23:
- as you said, 21|22|-| and |23|25|-| are created
- the 2 nodes require a parent
- a parent is created in the the root node: |21|23|30|50|
- the root has too many elements now
- split the root in 2 nodes |21|23|- and |30|50|-
- add a new parent for the 2 new nodes (which happens to be the new root of the tree)
Basically, that insert will increase the depth of the tree by 1
When inserting 23:
- as you said, 21|22|-| and |23|25|-| are created
- the 2 nodes require a parent
- a parent is created in the the root node: |21|23|30|50|
- the root has too many elements now
- split the root in 2 nodes |21|23|- and |30|50|-
- add a new parent for the 2 new nodes (which happens to be the new root of the tree)
Basically, that insert will increase the depth of the tree by 1
answered Jun 11 '11 at 7:55
Victor HurdugaciVictor Hurdugaci
25.7k57499
25.7k57499
thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?
– farchy
Jun 11 '11 at 8:00
somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???
– farchy
Jun 11 '11 at 8:50
add a comment |
thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?
– farchy
Jun 11 '11 at 8:00
somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???
– farchy
Jun 11 '11 at 8:50
thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?
– farchy
Jun 11 '11 at 8:00
thnx for the quick answer,but the question is: what happen to the pointers as mentioned. could it be that 2 pointers point to the same node?
– farchy
Jun 11 '11 at 8:00
somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???
– farchy
Jun 11 '11 at 8:50
somebody answered that the pointer right to 23 key would point to the newly created node |30|50|-|. for some reason it was deleted, is this true for a non-leaf node???
– farchy
Jun 11 '11 at 8:50
add a comment |
Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)
[21 30 50]
/
[10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]
After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.
So splits should look like this.
old fresh
[21|22|-] => [23|25|-]
since left node is same instance, key 21
at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.
[21 30 50]
/
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
only a new item with key 23
should be added next to 21
in root. since root has no space it will also split. middle key is 23 (first item of right split).
[21 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!
(root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)
our middle key was 23. so lets add 23.
[21 23 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.
(In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)
here new middle key is 30. lets pop it from left node.
30
[30|40|-]
(Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)
[21 23] 30 [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.
[21 23] 30 [50]
/ * /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.
[ 30 ]
/
[21 23] [50]
/ /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.
There are couple of ways to represent these key-node
items in memory. my approach is each key-node
item has right pointer with a key. so each internal node will have array of key-node
s. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).
add a comment |
Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)
[21 30 50]
/
[10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]
After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.
So splits should look like this.
old fresh
[21|22|-] => [23|25|-]
since left node is same instance, key 21
at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.
[21 30 50]
/
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
only a new item with key 23
should be added next to 21
in root. since root has no space it will also split. middle key is 23 (first item of right split).
[21 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!
(root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)
our middle key was 23. so lets add 23.
[21 23 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.
(In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)
here new middle key is 30. lets pop it from left node.
30
[30|40|-]
(Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)
[21 23] 30 [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.
[21 23] 30 [50]
/ * /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.
[ 30 ]
/
[21 23] [50]
/ /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.
There are couple of ways to represent these key-node
items in memory. my approach is each key-node
item has right pointer with a key. so each internal node will have array of key-node
s. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).
add a comment |
Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)
[21 30 50]
/
[10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]
After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.
So splits should look like this.
old fresh
[21|22|-] => [23|25|-]
since left node is same instance, key 21
at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.
[21 30 50]
/
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
only a new item with key 23
should be added next to 21
in root. since root has no space it will also split. middle key is 23 (first item of right split).
[21 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!
(root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)
our middle key was 23. so lets add 23.
[21 23 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.
(In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)
here new middle key is 30. lets pop it from left node.
30
[30|40|-]
(Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)
[21 23] 30 [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.
[21 23] 30 [50]
/ * /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.
[ 30 ]
/
[21 23] [50]
/ /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.
There are couple of ways to represent these key-node
items in memory. my approach is each key-node
item has right pointer with a key. so each internal node will have array of key-node
s. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).
Here are how pointers should be handled. this is your B+ tree before insertion. (some padding used to make pointers easier to see)
[21 30 50]
/
[10|20|-] => [21|22|25] => [30|40|-] => [50|60|80]
After inserting 23 you will have 2 nodes. Its important to know that left split should be always same instance and right split should be a fresh node. this will make handling pointers somewhat easier.
So splits should look like this.
old fresh
[21|22|-] => [23|25|-]
since left node is same instance, key 21
at root has the correct right pointer. Here is what nodes look like before splitting root and after splitting leaf. we are in middle of process.
[21 30 50]
/
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
only a new item with key 23
should be added next to 21
in root. since root has no space it will also split. middle key is 23 (first item of right split).
[21 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Because root is split, we must add our middle key to left or right splits and find a new middle key to promote. notice the null pointer at right split. because that node is fresh, it must be initialized soon!
(root was split from right meaning that less items lay in right node in case of having odd amount of items. but in general it doesn't matter which way you split.)
our middle key was 23. so lets add 23.
[21 23 30] [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Good! If there was no split here, our insertion would be finished by now. but since there is split at this level, we must find new middle key to promote. also don't forget null pointer for fresh node.
(In case of splitting leafs, you must keep key-values at leafs and promote copy of middle key, but in case of splitting internal nodes you can safely move and promote keys up.)
here new middle key is 30. lets pop it from left node.
30
[30|40|-]
(Its important how to choose middle key. Always a node that gets the middle key from bottom splits, should drop one item for a new middle key. if that node is left node, drop last item from it, otherwise drop first item.)
[21 23] 30 [ 50 ]
/ *
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Notice that 30 is not inside any of splits. that's our middle key we have to handle. Always give right node of middle key to left of fresh node.
[21 23] 30 [50]
/ * /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Then right pointer of middle key will point to fresh node on right split. finally middle key is promoted, new root with left of left split and right of right split and key of middle key is created.
[ 30 ]
/
[21 23] [50]
/ /
[10|20|-] => [21|22|-] => [23|25|-] => [30|40|-] => [50|60|80]
Congrats! you have done your insertion. it may look easy on paper, but I have to admit its a bit of challenge when coding it.
There are couple of ways to represent these key-node
items in memory. my approach is each key-node
item has right pointer with a key. so each internal node will have array of key-node
s. this way, keys and node pointers are always kept together. left most child is handled by a separate left pointer, this pointer is kept on each internal node and is never null (after complete operations).
edited Jan 9 '18 at 17:12
answered Jan 9 '18 at 16:59
M.kazem AkhgaryM.kazem Akhgary
12.2k53574
12.2k53574
add a comment |
add a comment |
The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.
So this is how you should represent the b-tree node prior of insertion.
10|21|30|50| <--- root node
10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes
In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.
When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.
10|21|30|50| <--- root node
10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes
When inserting 24 which produces the effect you described, what you get is the following
10|30| <--- new root node
10|21|24| 30|50| <--- intermediate nodes
10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes
As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.
This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.
As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)
I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.
– Victor Hurdugaci
Jun 11 '11 at 16:35
This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.
– chmike
Jun 13 '11 at 9:59
add a comment |
The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.
So this is how you should represent the b-tree node prior of insertion.
10|21|30|50| <--- root node
10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes
In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.
When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.
10|21|30|50| <--- root node
10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes
When inserting 24 which produces the effect you described, what you get is the following
10|30| <--- new root node
10|21|24| 30|50| <--- intermediate nodes
10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes
As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.
This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.
As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)
I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.
– Victor Hurdugaci
Jun 11 '11 at 16:35
This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.
– chmike
Jun 13 '11 at 9:59
add a comment |
The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.
So this is how you should represent the b-tree node prior of insertion.
10|21|30|50| <--- root node
10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes
In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.
When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.
10|21|30|50| <--- root node
10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes
When inserting 24 which produces the effect you described, what you get is the following
10|30| <--- new root node
10|21|24| 30|50| <--- intermediate nodes
10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes
As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.
This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.
As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)
The problem you have to understand what happens is because of the bad representation model you are using. You should have one data value associated with a pointer and the invariant that the data value is the smallest value in the subtree referenced by the pointer.
So this is how you should represent the b-tree node prior of insertion.
10|21|30|50| <--- root node
10|20| 21|22|25| 30|40| 50|60|80| <--- leaf nodes
In this representation the pointer after the value 10 in the root node points to the leaf node with the first value 10, etc.
When you insert 23, it is inserted in the leaf node with first value of 21. This yields the following tree and no need to split.
10|21|30|50| <--- root node
10|20| 21|22|23|25| 30|40| 50|60|80| <--- leaf nodes
When inserting 24 which produces the effect you described, what you get is the following
10|30| <--- new root node
10|21|24| 30|50| <--- intermediate nodes
10|20| 21|22|23| 24|25| 30|40| 50|60|80| <--- leaf nodes
As you see there is no ambiguity anymore. The leaf node was split, and the key pointer pair 24| had to be inserted in the root node. Since it was full, it had to be split. Now that there are 5 values, you get one node with 3 values and one node with 2. You are free to choose if the left node or the right node get the three value. What is important is that the fundamental invariant is preserved. Each key value in a node is the smallest element in the subtree referenced by its associated pointer. A new root node had to be added and its key value pointer set should now be obvious. No ambiguity.
This also makes obvious that many optimization strategies are possible. Instead of splitting the root node, we could have moved the value 21 into the left leaf node which was not full. The one with the first value 10. This avoids the need to split the root node and keeps the b-tree hight unchanged and yields a better filling of the b-tree. So you minimize space and search time. But this implies one can efficiently check if sideway balancing is feasible. A change of key value in the root node is still needed. etc.
As you see, the value 10 in your b-tree is not really required in none leaf nodes and it is often omitted in b-tree representation (i.e. wikipedia) but this can make it confusing and was probably the reason you where. :)
edited Jun 11 '11 at 9:54
answered Jun 11 '11 at 9:16
chmikechmike
8,212155372
8,212155372
I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.
– Victor Hurdugaci
Jun 11 '11 at 16:35
This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.
– chmike
Jun 13 '11 at 9:59
add a comment |
I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.
– Victor Hurdugaci
Jun 11 '11 at 16:35
This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.
– chmike
Jun 13 '11 at 9:59
I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.
– Victor Hurdugaci
Jun 11 '11 at 16:35
I believe there is mistake in your model. The first representation has 4 values which imply that 5 child nodes exist from root. I think you actually consider the values represented as the values of the pointers but the author of the initial question represented the values IN the node. So, the first model in this question has 3 values in the root node and 4 pointers, not represented, to leaf nodes.
– Victor Hurdugaci
Jun 11 '11 at 16:35
This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.
– chmike
Jun 13 '11 at 9:59
This is not a mistake because it is a b+tree and not a b-tree. In a b+tree each key/data pair is stored in the leaf. In a b-tree, key/data pairs are stored in leafs and in upper level nodes. What is shown in the question is a b+tree, and this is also what I show. As I said, one may drop one of the key/data pair because it is not required to be known to find the key, but this only works in some nodes. The invariant I described, makes it all very simple and removes all ambiguity.
– chmike
Jun 13 '11 at 9:59
add a comment |
From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:
|23|
/
|21|22|-| |25|-|-|
I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.
If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!
add a comment |
From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:
|23|
/
|21|22|-| |25|-|-|
I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.
If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!
add a comment |
From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:
|23|
/
|21|22|-| |25|-|-|
I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.
If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!
From what I've been taught, it is ok for the last node to have one node less than n/2. So in your example, the top nodes will become:
|23|
/
|21|22|-| |25|-|-|
I think it kind of make sense. If you think about it, you have 5 leave nodes, so you need 5 pointers from the upper level. This arrangement is the only way such that you can have 5 pointers, all other combinations would either overflow the node or create extra pointers.
If the nodes were |21|22|-| and |23|25|-|, then the root node would be |23|-|-|. Then, it does not make sense to have 23 in the right node, because whatever that is in the right node must be equal or greater than 23 anyway!
answered May 1 '13 at 12:07
kevinkevin
1,6091519
1,6091519
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f6314790%2fsplitting-a-node-in-b-tree%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown