Oracle SQL in string matches and table joins
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}
I need some guidance on how to approach this problem.
Consider the following 3 tables:
I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.
The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.
How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.
The resulting table would look like this:
python oracle pandas plsql
add a comment |
I need some guidance on how to approach this problem.
Consider the following 3 tables:
I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.
The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.
How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.
The resulting table would look like this:
python oracle pandas plsql
What is the resulting table supposed to look like? What's the format ofConcat_identifier_plate
?
– default locale
Nov 16 '18 at 13:48
The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.
– vwdewaal
Nov 16 '18 at 13:55
Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?
– default locale
Nov 16 '18 at 14:52
1
Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.
– vwdewaal
Nov 16 '18 at 14:56
This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g.select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams)
)
– default locale
Nov 16 '18 at 15:04
add a comment |
I need some guidance on how to approach this problem.
Consider the following 3 tables:
I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.
The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.
How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.
The resulting table would look like this:
python oracle pandas plsql
I need some guidance on how to approach this problem.
Consider the following 3 tables:
I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.
The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.
How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.
The resulting table would look like this:
python oracle pandas plsql
python oracle pandas plsql
edited Nov 16 '18 at 13:54
vwdewaal
asked Nov 16 '18 at 13:42
vwdewaalvwdewaal
3392418
3392418
What is the resulting table supposed to look like? What's the format ofConcat_identifier_plate
?
– default locale
Nov 16 '18 at 13:48
The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.
– vwdewaal
Nov 16 '18 at 13:55
Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?
– default locale
Nov 16 '18 at 14:52
1
Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.
– vwdewaal
Nov 16 '18 at 14:56
This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g.select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams)
)
– default locale
Nov 16 '18 at 15:04
add a comment |
What is the resulting table supposed to look like? What's the format ofConcat_identifier_plate
?
– default locale
Nov 16 '18 at 13:48
The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.
– vwdewaal
Nov 16 '18 at 13:55
Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?
– default locale
Nov 16 '18 at 14:52
1
Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.
– vwdewaal
Nov 16 '18 at 14:56
This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g.select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams)
)
– default locale
Nov 16 '18 at 15:04
What is the resulting table supposed to look like? What's the format of
Concat_identifier_plate
?– default locale
Nov 16 '18 at 13:48
What is the resulting table supposed to look like? What's the format of
Concat_identifier_plate
?– default locale
Nov 16 '18 at 13:48
The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.
– vwdewaal
Nov 16 '18 at 13:55
The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.
– vwdewaal
Nov 16 '18 at 13:55
Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?
– default locale
Nov 16 '18 at 14:52
Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?
– default locale
Nov 16 '18 at 14:52
1
1
Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.
– vwdewaal
Nov 16 '18 at 14:56
Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.
– vwdewaal
Nov 16 '18 at 14:56
This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g.
select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams)
)– default locale
Nov 16 '18 at 15:04
This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g.
select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams)
)– default locale
Nov 16 '18 at 15:04
add a comment |
1 Answer
1
active
oldest
votes
I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.
Doing something simple like this to split the ids onto different rows:
select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))
Gives an output like this:
+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+
Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.
After your data is in the easier format, the joins become easy.
select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier
Output:
+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+
Perfect! This i can work with. Thanks!
– vwdewaal
Nov 16 '18 at 23:37
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339049%2foracle-sql-in-string-matches-and-table-joins%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.
Doing something simple like this to split the ids onto different rows:
select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))
Gives an output like this:
+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+
Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.
After your data is in the easier format, the joins become easy.
select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier
Output:
+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+
Perfect! This i can work with. Thanks!
– vwdewaal
Nov 16 '18 at 23:37
add a comment |
I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.
Doing something simple like this to split the ids onto different rows:
select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))
Gives an output like this:
+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+
Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.
After your data is in the easier format, the joins become easy.
select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier
Output:
+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+
Perfect! This i can work with. Thanks!
– vwdewaal
Nov 16 '18 at 23:37
add a comment |
I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.
Doing something simple like this to split the ids onto different rows:
select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))
Gives an output like this:
+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+
Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.
After your data is in the easier format, the joins become easy.
select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier
Output:
+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+
I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.
Doing something simple like this to split the ids onto different rows:
select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))
Gives an output like this:
+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+
Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.
After your data is in the easier format, the joins become easy.
select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier
Output:
+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+
answered Nov 16 '18 at 19:52
Patrick HPatrick H
520513
520513
Perfect! This i can work with. Thanks!
– vwdewaal
Nov 16 '18 at 23:37
add a comment |
Perfect! This i can work with. Thanks!
– vwdewaal
Nov 16 '18 at 23:37
Perfect! This i can work with. Thanks!
– vwdewaal
Nov 16 '18 at 23:37
Perfect! This i can work with. Thanks!
– vwdewaal
Nov 16 '18 at 23:37
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339049%2foracle-sql-in-string-matches-and-table-joins%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What is the resulting table supposed to look like? What's the format of
Concat_identifier_plate
?– default locale
Nov 16 '18 at 13:48
The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.
– vwdewaal
Nov 16 '18 at 13:55
Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?
– default locale
Nov 16 '18 at 14:52
1
Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.
– vwdewaal
Nov 16 '18 at 14:56
This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g.
select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams)
)– default locale
Nov 16 '18 at 15:04