Oracle SQL in string matches and table joins





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}







2















I need some guidance on how to approach this problem.



Consider the following 3 tables:
enter image description here



I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.



The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.



How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.



The resulting table would look like this:
enter image description here










share|improve this question

























  • What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

    – default locale
    Nov 16 '18 at 13:48













  • The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

    – vwdewaal
    Nov 16 '18 at 13:55













  • Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

    – default locale
    Nov 16 '18 at 14:52






  • 1





    Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

    – vwdewaal
    Nov 16 '18 at 14:56











  • This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

    – default locale
    Nov 16 '18 at 15:04




















2















I need some guidance on how to approach this problem.



Consider the following 3 tables:
enter image description here



I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.



The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.



How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.



The resulting table would look like this:
enter image description here










share|improve this question

























  • What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

    – default locale
    Nov 16 '18 at 13:48













  • The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

    – vwdewaal
    Nov 16 '18 at 13:55













  • Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

    – default locale
    Nov 16 '18 at 14:52






  • 1





    Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

    – vwdewaal
    Nov 16 '18 at 14:56











  • This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

    – default locale
    Nov 16 '18 at 15:04
















2












2








2


1






I need some guidance on how to approach this problem.



Consider the following 3 tables:
enter image description here



I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.



The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.



How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.



The resulting table would look like this:
enter image description here










share|improve this question
















I need some guidance on how to approach this problem.



Consider the following 3 tables:
enter image description here



I inherited this dataset in Oracle. I need to find a way to merge these 3 tables into a single table. Why? Compliance and fraud checks. We're being audited, I have no choice but to comply.



The tables are millions of records. 69million. 7 million, 4 million. If I try to join and search by using contains, or instr or like, I'd be creating Cartesian joins (I think) and it will be quite slow.



How do I approach this? The data is formatted with the square brackets as in the picture. I have considered pulling these tables into pandas and using python to get the answer.



The resulting table would look like this:
enter image description here







python oracle pandas plsql






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 16 '18 at 13:54







vwdewaal

















asked Nov 16 '18 at 13:42









vwdewaalvwdewaal

3392418




3392418













  • What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

    – default locale
    Nov 16 '18 at 13:48













  • The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

    – vwdewaal
    Nov 16 '18 at 13:55













  • Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

    – default locale
    Nov 16 '18 at 14:52






  • 1





    Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

    – vwdewaal
    Nov 16 '18 at 14:56











  • This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

    – default locale
    Nov 16 '18 at 15:04





















  • What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

    – default locale
    Nov 16 '18 at 13:48













  • The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

    – vwdewaal
    Nov 16 '18 at 13:55













  • Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

    – default locale
    Nov 16 '18 at 14:52






  • 1





    Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

    – vwdewaal
    Nov 16 '18 at 14:56











  • This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

    – default locale
    Nov 16 '18 at 15:04



















What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

– default locale
Nov 16 '18 at 13:48







What is the resulting table supposed to look like? What's the format of Concat_identifier_plate?

– default locale
Nov 16 '18 at 13:48















The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

– vwdewaal
Nov 16 '18 at 13:55







The format of concat_identifier_plate can either be the identifier number of a person or a team, or 2 people, with a fish. It's really convoluted in some places.

– vwdewaal
Nov 16 '18 at 13:55















Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

– default locale
Nov 16 '18 at 14:52





Hmm, I think this is going to be slow anyway, there're multiple meals and teams for a person which might lead to an unpredictable increase in the number of combinations. By the way, why is the "George" record duplicated in the sample result?

– default locale
Nov 16 '18 at 14:52




1




1





Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

– vwdewaal
Nov 16 '18 at 14:56





Because both George and Jenny appear in the same teams string. This is honestly a messy issue. I'm thinking of deconstructing the data first maybe, by table, and stitching again. There might just not be an easy answer.

– vwdewaal
Nov 16 '18 at 14:56













This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

– default locale
Nov 16 '18 at 15:04







This might be a good idea. I'd also start with splitting concatenated strings into numeric ids. This should be relatively trivial (e.g. select to_number(regexp_substr(concat_identifier_partner, '[0-9]+', 1, 1)) as first_person_id from teams))

– default locale
Nov 16 '18 at 15:04














1 Answer
1






active

oldest

votes


















3














I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.



Doing something simple like this to split the ids onto different rows:



select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))


Gives an output like this:



+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+


Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.



After your data is in the easier format, the joins become easy.



select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier


Output:



+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+





share|improve this answer
























  • Perfect! This i can work with. Thanks!

    – vwdewaal
    Nov 16 '18 at 23:37












Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339049%2foracle-sql-in-string-matches-and-table-joins%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









3














I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.



Doing something simple like this to split the ids onto different rows:



select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))


Gives an output like this:



+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+


Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.



After your data is in the easier format, the joins become easy.



select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier


Output:



+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+





share|improve this answer
























  • Perfect! This i can work with. Thanks!

    – vwdewaal
    Nov 16 '18 at 23:37
















3














I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.



Doing something simple like this to split the ids onto different rows:



select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))


Gives an output like this:



+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+


Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.



After your data is in the easier format, the joins become easy.



select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier


Output:



+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+





share|improve this answer
























  • Perfect! This i can work with. Thanks!

    – vwdewaal
    Nov 16 '18 at 23:37














3












3








3







I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.



Doing something simple like this to split the ids onto different rows:



select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))


Gives an output like this:



+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+


Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.



After your data is in the easier format, the joins become easy.



select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier


Output:



+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+





share|improve this answer













I think your first step would be to get the meals and teams tables in a usable format. Using these weird concatenation columns is never going to be easy or fast.



Doing something simple like this to split the ids onto different rows:



select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"'))


Gives an output like this:



+------------+------+
| PARTNER_ID | TEAM |
+------------+------+
| [012345] | Blue |
| [012346] | Blue |
| [012347] | Red |
+------------+------+


Then you could use these as inline views in your query or create a new temporary table for each of these. The temporary table would allow you to index them on the new ID column.



After your data is in the easier format, the joins become easy.



select people.identifier,
people.name,
formatted_teams.team,
formatted_meals.meal
from people
join (select trim(column_value) partner_id,
team
from teams,
xmltable(('"' || replace(teams.concat_identifier_partner, ']',']","') || '"')) ) formatted_teams
on people.identifier = formatted_teams.partner_id
left join (select trim(column_value) plate_id,
meal
from meals,
xmltable(('"' || replace(meals.concat_identifier_plate, ']',']","') || '"')) ) formatted_meals
on people.identifier = formatted_meals.plate_id
order by people.identifier


Output:



+------------+--------+------+--------+
| IDENTIFIER | NAME | TEAM | MEAL |
+------------+--------+------+--------+
| [012345] | George | Blue | Salmon |
| [012346] | Jenny | Blue | |
| [012347] | Alex | Red | Pike |
| [012347] | Alex | Red | Cod |
+------------+--------+------+--------+






share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 16 '18 at 19:52









Patrick HPatrick H

520513




520513













  • Perfect! This i can work with. Thanks!

    – vwdewaal
    Nov 16 '18 at 23:37



















  • Perfect! This i can work with. Thanks!

    – vwdewaal
    Nov 16 '18 at 23:37

















Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37





Perfect! This i can work with. Thanks!

– vwdewaal
Nov 16 '18 at 23:37




















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53339049%2foracle-sql-in-string-matches-and-table-joins%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python