How to join 2 tables using names column with similar but not the same?
Table 1
SchoolId_1 Name
Table 2
SchoolId_2 Name
SchooldId_1
and SchooldId_2
have different values. Name have similar but not the same values.
In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.
Table1
University of Strathclyde
University of Maine Fort Kent
Bryn Athyn College of the New Church
Adirondack Community College
Allen Community College
Nova Scotia Community College
Unity College
Allen College
Table2
University of Strathclyde -- Glasgow
University of Maine at Fort Kent
Bryn Athyn College
Unity College
Allen College
I want to join them using the name column with values stated above and the results will be like
SchoolId_1 Name SchoolId_2
1 University of Strathclyde 1101
2 University of Maine Fort Kent 1102
3 Bryn Athyn College of the New Church 1103
4 Adirondack Community College NULL
5 Allen Community College NULL
6 Nova Scotia Community College NULL
7 Unity College 1104
8 Allen College 1105
I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.
Current code (but only returns exactly the same name values):
SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;
sql left-join sql-server-2014
|
show 2 more comments
Table 1
SchoolId_1 Name
Table 2
SchoolId_2 Name
SchooldId_1
and SchooldId_2
have different values. Name have similar but not the same values.
In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.
Table1
University of Strathclyde
University of Maine Fort Kent
Bryn Athyn College of the New Church
Adirondack Community College
Allen Community College
Nova Scotia Community College
Unity College
Allen College
Table2
University of Strathclyde -- Glasgow
University of Maine at Fort Kent
Bryn Athyn College
Unity College
Allen College
I want to join them using the name column with values stated above and the results will be like
SchoolId_1 Name SchoolId_2
1 University of Strathclyde 1101
2 University of Maine Fort Kent 1102
3 Bryn Athyn College of the New Church 1103
4 Adirondack Community College NULL
5 Allen Community College NULL
6 Nova Scotia Community College NULL
7 Unity College 1104
8 Allen College 1105
I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.
Current code (but only returns exactly the same name values):
SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;
sql left-join sql-server-2014
If you want to have a equality between two string with "kind of' logic prefer usingdsa.Name LIKE '%'+ ds.Name +'%'
– Max
Nov 13 '18 at 16:45
1
This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.
– Richardissimo
Nov 13 '18 at 18:13
@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more
– Eirin Gonzales
Nov 14 '18 at 4:38
@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!
– Eirin Gonzales
Nov 14 '18 at 4:41
"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.
– philipxy
Nov 14 '18 at 5:19
|
show 2 more comments
Table 1
SchoolId_1 Name
Table 2
SchoolId_2 Name
SchooldId_1
and SchooldId_2
have different values. Name have similar but not the same values.
In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.
Table1
University of Strathclyde
University of Maine Fort Kent
Bryn Athyn College of the New Church
Adirondack Community College
Allen Community College
Nova Scotia Community College
Unity College
Allen College
Table2
University of Strathclyde -- Glasgow
University of Maine at Fort Kent
Bryn Athyn College
Unity College
Allen College
I want to join them using the name column with values stated above and the results will be like
SchoolId_1 Name SchoolId_2
1 University of Strathclyde 1101
2 University of Maine Fort Kent 1102
3 Bryn Athyn College of the New Church 1103
4 Adirondack Community College NULL
5 Allen Community College NULL
6 Nova Scotia Community College NULL
7 Unity College 1104
8 Allen College 1105
I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.
Current code (but only returns exactly the same name values):
SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;
sql left-join sql-server-2014
Table 1
SchoolId_1 Name
Table 2
SchoolId_2 Name
SchooldId_1
and SchooldId_2
have different values. Name have similar but not the same values.
In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.
Table1
University of Strathclyde
University of Maine Fort Kent
Bryn Athyn College of the New Church
Adirondack Community College
Allen Community College
Nova Scotia Community College
Unity College
Allen College
Table2
University of Strathclyde -- Glasgow
University of Maine at Fort Kent
Bryn Athyn College
Unity College
Allen College
I want to join them using the name column with values stated above and the results will be like
SchoolId_1 Name SchoolId_2
1 University of Strathclyde 1101
2 University of Maine Fort Kent 1102
3 Bryn Athyn College of the New Church 1103
4 Adirondack Community College NULL
5 Allen Community College NULL
6 Nova Scotia Community College NULL
7 Unity College 1104
8 Allen College 1105
I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.
Current code (but only returns exactly the same name values):
SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;
sql left-join sql-server-2014
sql left-join sql-server-2014
edited Nov 14 '18 at 6:20
Eirin Gonzales
asked Nov 13 '18 at 16:41
Eirin GonzalesEirin Gonzales
12
12
If you want to have a equality between two string with "kind of' logic prefer usingdsa.Name LIKE '%'+ ds.Name +'%'
– Max
Nov 13 '18 at 16:45
1
This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.
– Richardissimo
Nov 13 '18 at 18:13
@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more
– Eirin Gonzales
Nov 14 '18 at 4:38
@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!
– Eirin Gonzales
Nov 14 '18 at 4:41
"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.
– philipxy
Nov 14 '18 at 5:19
|
show 2 more comments
If you want to have a equality between two string with "kind of' logic prefer usingdsa.Name LIKE '%'+ ds.Name +'%'
– Max
Nov 13 '18 at 16:45
1
This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.
– Richardissimo
Nov 13 '18 at 18:13
@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more
– Eirin Gonzales
Nov 14 '18 at 4:38
@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!
– Eirin Gonzales
Nov 14 '18 at 4:41
"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.
– philipxy
Nov 14 '18 at 5:19
If you want to have a equality between two string with "kind of' logic prefer using
dsa.Name LIKE '%'+ ds.Name +'%'
– Max
Nov 13 '18 at 16:45
If you want to have a equality between two string with "kind of' logic prefer using
dsa.Name LIKE '%'+ ds.Name +'%'
– Max
Nov 13 '18 at 16:45
1
1
This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.
– Richardissimo
Nov 13 '18 at 18:13
This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.
– Richardissimo
Nov 13 '18 at 18:13
@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more
– Eirin Gonzales
Nov 14 '18 at 4:38
@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more
– Eirin Gonzales
Nov 14 '18 at 4:38
@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!
– Eirin Gonzales
Nov 14 '18 at 4:41
@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!
– Eirin Gonzales
Nov 14 '18 at 4:41
"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.
– philipxy
Nov 14 '18 at 5:19
"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.
– philipxy
Nov 14 '18 at 5:19
|
show 2 more comments
1 Answer
1
active
oldest
votes
Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.
My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.
eg
SELECT
t1.schoolid_1
,t2.schoolid_2
,t1.name as name_1
,t2.name as name_2
FROM
table_1 t1
LEFT JOIN
table_2 t2
ON
my_udf(t1.name, t2.name) > 70
Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.
eg
SELECT
1.0 * a.matches / a.rows as match_score
FROM
(SELECT
count(1) as rows
,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
FROM
(SELECT value FROM string_split(t1.name, '1')
) t1
FULL OUTER JOIN
(SELECT value FROM string_split(t2.name, '1')
) t2
ON
t1.value = t2.value
) a
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285680%2fhow-to-join-2-tables-using-names-column-with-similar-but-not-the-same%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.
My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.
eg
SELECT
t1.schoolid_1
,t2.schoolid_2
,t1.name as name_1
,t2.name as name_2
FROM
table_1 t1
LEFT JOIN
table_2 t2
ON
my_udf(t1.name, t2.name) > 70
Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.
eg
SELECT
1.0 * a.matches / a.rows as match_score
FROM
(SELECT
count(1) as rows
,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
FROM
(SELECT value FROM string_split(t1.name, '1')
) t1
FULL OUTER JOIN
(SELECT value FROM string_split(t2.name, '1')
) t2
ON
t1.value = t2.value
) a
add a comment |
Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.
My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.
eg
SELECT
t1.schoolid_1
,t2.schoolid_2
,t1.name as name_1
,t2.name as name_2
FROM
table_1 t1
LEFT JOIN
table_2 t2
ON
my_udf(t1.name, t2.name) > 70
Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.
eg
SELECT
1.0 * a.matches / a.rows as match_score
FROM
(SELECT
count(1) as rows
,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
FROM
(SELECT value FROM string_split(t1.name, '1')
) t1
FULL OUTER JOIN
(SELECT value FROM string_split(t2.name, '1')
) t2
ON
t1.value = t2.value
) a
add a comment |
Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.
My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.
eg
SELECT
t1.schoolid_1
,t2.schoolid_2
,t1.name as name_1
,t2.name as name_2
FROM
table_1 t1
LEFT JOIN
table_2 t2
ON
my_udf(t1.name, t2.name) > 70
Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.
eg
SELECT
1.0 * a.matches / a.rows as match_score
FROM
(SELECT
count(1) as rows
,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
FROM
(SELECT value FROM string_split(t1.name, '1')
) t1
FULL OUTER JOIN
(SELECT value FROM string_split(t2.name, '1')
) t2
ON
t1.value = t2.value
) a
Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.
My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.
eg
SELECT
t1.schoolid_1
,t2.schoolid_2
,t1.name as name_1
,t2.name as name_2
FROM
table_1 t1
LEFT JOIN
table_2 t2
ON
my_udf(t1.name, t2.name) > 70
Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.
eg
SELECT
1.0 * a.matches / a.rows as match_score
FROM
(SELECT
count(1) as rows
,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
FROM
(SELECT value FROM string_split(t1.name, '1')
) t1
FULL OUTER JOIN
(SELECT value FROM string_split(t2.name, '1')
) t2
ON
t1.value = t2.value
) a
answered Nov 13 '18 at 18:31
ScottieBScottieB
1,69042442
1,69042442
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285680%2fhow-to-join-2-tables-using-names-column-with-similar-but-not-the-same%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
If you want to have a equality between two string with "kind of' logic prefer using
dsa.Name LIKE '%'+ ds.Name +'%'
– Max
Nov 13 '18 at 16:45
1
This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.
– Richardissimo
Nov 13 '18 at 18:13
@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more
– Eirin Gonzales
Nov 14 '18 at 4:38
@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!
– Eirin Gonzales
Nov 14 '18 at 4:41
"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.
– philipxy
Nov 14 '18 at 5:19