How to join 2 tables using names column with similar but not the same?












-1















Table 1



SchoolId_1   Name


Table 2



SchoolId_2   Name


SchooldId_1 and SchooldId_2 have different values. Name have similar but not the same values.



In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.



Table1



University of Strathclyde



University of Maine Fort Kent



Bryn Athyn College of the New Church



Adirondack Community College



Allen Community College



Nova Scotia Community College



Unity College



Allen College



Table2



University of Strathclyde -- Glasgow



University of Maine at Fort Kent



Bryn Athyn College



Unity College



Allen College



I want to join them using the name column with values stated above and the results will be like



SchoolId_1    Name    SchoolId_2
1 University of Strathclyde 1101

2 University of Maine Fort Kent 1102

3 Bryn Athyn College of the New Church 1103

4 Adirondack Community College NULL

5 Allen Community College NULL

6 Nova Scotia Community College NULL

7 Unity College 1104

8 Allen College 1105


I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.



Current code (but only returns exactly the same name values):



SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;









share|improve this question

























  • If you want to have a equality between two string with "kind of' logic prefer using dsa.Name LIKE '%'+ ds.Name +'%'

    – Max
    Nov 13 '18 at 16:45






  • 1





    This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.

    – Richardissimo
    Nov 13 '18 at 18:13











  • @Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more

    – Eirin Gonzales
    Nov 14 '18 at 4:38











  • @Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!

    – Eirin Gonzales
    Nov 14 '18 at 4:41













  • "join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.

    – philipxy
    Nov 14 '18 at 5:19


















-1















Table 1



SchoolId_1   Name


Table 2



SchoolId_2   Name


SchooldId_1 and SchooldId_2 have different values. Name have similar but not the same values.



In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.



Table1



University of Strathclyde



University of Maine Fort Kent



Bryn Athyn College of the New Church



Adirondack Community College



Allen Community College



Nova Scotia Community College



Unity College



Allen College



Table2



University of Strathclyde -- Glasgow



University of Maine at Fort Kent



Bryn Athyn College



Unity College



Allen College



I want to join them using the name column with values stated above and the results will be like



SchoolId_1    Name    SchoolId_2
1 University of Strathclyde 1101

2 University of Maine Fort Kent 1102

3 Bryn Athyn College of the New Church 1103

4 Adirondack Community College NULL

5 Allen Community College NULL

6 Nova Scotia Community College NULL

7 Unity College 1104

8 Allen College 1105


I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.



Current code (but only returns exactly the same name values):



SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;









share|improve this question

























  • If you want to have a equality between two string with "kind of' logic prefer using dsa.Name LIKE '%'+ ds.Name +'%'

    – Max
    Nov 13 '18 at 16:45






  • 1





    This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.

    – Richardissimo
    Nov 13 '18 at 18:13











  • @Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more

    – Eirin Gonzales
    Nov 14 '18 at 4:38











  • @Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!

    – Eirin Gonzales
    Nov 14 '18 at 4:41













  • "join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.

    – philipxy
    Nov 14 '18 at 5:19
















-1












-1








-1








Table 1



SchoolId_1   Name


Table 2



SchoolId_2   Name


SchooldId_1 and SchooldId_2 have different values. Name have similar but not the same values.



In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.



Table1



University of Strathclyde



University of Maine Fort Kent



Bryn Athyn College of the New Church



Adirondack Community College



Allen Community College



Nova Scotia Community College



Unity College



Allen College



Table2



University of Strathclyde -- Glasgow



University of Maine at Fort Kent



Bryn Athyn College



Unity College



Allen College



I want to join them using the name column with values stated above and the results will be like



SchoolId_1    Name    SchoolId_2
1 University of Strathclyde 1101

2 University of Maine Fort Kent 1102

3 Bryn Athyn College of the New Church 1103

4 Adirondack Community College NULL

5 Allen Community College NULL

6 Nova Scotia Community College NULL

7 Unity College 1104

8 Allen College 1105


I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.



Current code (but only returns exactly the same name values):



SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;









share|improve this question
















Table 1



SchoolId_1   Name


Table 2



SchoolId_2   Name


SchooldId_1 and SchooldId_2 have different values. Name have similar but not the same values.



In Table 1, I got name column value 'University of Maine -- Farmington'. In Table 2, I have 'University of Maine at Farmington'. I want to join these but sql skips them because they are not exactly the same.



Table1



University of Strathclyde



University of Maine Fort Kent



Bryn Athyn College of the New Church



Adirondack Community College



Allen Community College



Nova Scotia Community College



Unity College



Allen College



Table2



University of Strathclyde -- Glasgow



University of Maine at Fort Kent



Bryn Athyn College



Unity College



Allen College



I want to join them using the name column with values stated above and the results will be like



SchoolId_1    Name    SchoolId_2
1 University of Strathclyde 1101

2 University of Maine Fort Kent 1102

3 Bryn Athyn College of the New Church 1103

4 Adirondack Community College NULL

5 Allen Community College NULL

6 Nova Scotia Community College NULL

7 Unity College 1104

8 Allen College 1105


I tried LIKE, CHARINDEX but they require Name columns to have exactly the same values.



Current code (but only returns exactly the same name values):



SELECT
dsa.SchoolId_1, dsa.Name, ds.SchoolId_2, ds.sch_address, ds.city, ds.sch_state
INTO
match_sch
FROM
dbo.sa_schools dsa
LEFT JOIN
dbo.school ds ON dsa.Name = ds.Name;






sql left-join sql-server-2014






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 14 '18 at 6:20







Eirin Gonzales

















asked Nov 13 '18 at 16:41









Eirin GonzalesEirin Gonzales

12




12













  • If you want to have a equality between two string with "kind of' logic prefer using dsa.Name LIKE '%'+ ds.Name +'%'

    – Max
    Nov 13 '18 at 16:45






  • 1





    This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.

    – Richardissimo
    Nov 13 '18 at 18:13











  • @Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more

    – Eirin Gonzales
    Nov 14 '18 at 4:38











  • @Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!

    – Eirin Gonzales
    Nov 14 '18 at 4:41













  • "join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.

    – philipxy
    Nov 14 '18 at 5:19





















  • If you want to have a equality between two string with "kind of' logic prefer using dsa.Name LIKE '%'+ ds.Name +'%'

    – Max
    Nov 13 '18 at 16:45






  • 1





    This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.

    – Richardissimo
    Nov 13 '18 at 18:13











  • @Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more

    – Eirin Gonzales
    Nov 14 '18 at 4:38











  • @Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!

    – Eirin Gonzales
    Nov 14 '18 at 4:41













  • "join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.

    – philipxy
    Nov 14 '18 at 5:19



















If you want to have a equality between two string with "kind of' logic prefer using dsa.Name LIKE '%'+ ds.Name +'%'

– Max
Nov 13 '18 at 16:45





If you want to have a equality between two string with "kind of' logic prefer using dsa.Name LIKE '%'+ ds.Name +'%'

– Max
Nov 13 '18 at 16:45




1




1





This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.

– Richardissimo
Nov 13 '18 at 18:13





This is referred to as fuzzy matching; but we're probably going to need a few more examples of values from the two tables that you want to be matched, so that we can provide a suitable answer. Please edit your question to add a few more examples that you would expect to match.

– Richardissimo
Nov 13 '18 at 18:13













@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more

– Eirin Gonzales
Nov 14 '18 at 4:38





@Richardissimo I edited my question and provided more values. Hope it will be clearer but let me know if you need more

– Eirin Gonzales
Nov 14 '18 at 4:38













@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!

– Eirin Gonzales
Nov 14 '18 at 4:41







@Max I did what you suggested but It maps my data like Nova Scotia Community College to Unity College. But actually, it would really map because of 'unity College'. Still, thank you!

– Eirin Gonzales
Nov 14 '18 at 4:41















"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.

– philipxy
Nov 14 '18 at 5:19







"join them using the name column with those kinds of values" is not clear. You have not told us exactly what result table should be given for any input tables. So how can we help you? (Rhetorical.) Please use enough words & sentences to clearly say what you mean. See How to Ask--what have you learned researching such "fuzzy" condtions? PS Please read & act on Minimal, Complete, and Verifiable example.

– philipxy
Nov 14 '18 at 5:19














1 Answer
1






active

oldest

votes


















1














Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.



My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.



eg



SELECT
t1.schoolid_1
,t2.schoolid_2
,t1.name as name_1
,t2.name as name_2
FROM
table_1 t1
LEFT JOIN
table_2 t2
ON
my_udf(t1.name, t2.name) > 70


Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.



eg



SELECT
1.0 * a.matches / a.rows as match_score
FROM
(SELECT
count(1) as rows
,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
FROM
(SELECT value FROM string_split(t1.name, '1')
) t1
FULL OUTER JOIN
(SELECT value FROM string_split(t2.name, '1')
) t2
ON
t1.value = t2.value
) a





share|improve this answer























    Your Answer






    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "1"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285680%2fhow-to-join-2-tables-using-names-column-with-similar-but-not-the-same%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    1














    Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.



    My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.



    eg



    SELECT
    t1.schoolid_1
    ,t2.schoolid_2
    ,t1.name as name_1
    ,t2.name as name_2
    FROM
    table_1 t1
    LEFT JOIN
    table_2 t2
    ON
    my_udf(t1.name, t2.name) > 70


    Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.



    eg



    SELECT
    1.0 * a.matches / a.rows as match_score
    FROM
    (SELECT
    count(1) as rows
    ,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
    FROM
    (SELECT value FROM string_split(t1.name, '1')
    ) t1
    FULL OUTER JOIN
    (SELECT value FROM string_split(t2.name, '1')
    ) t2
    ON
    t1.value = t2.value
    ) a





    share|improve this answer




























      1














      Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.



      My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.



      eg



      SELECT
      t1.schoolid_1
      ,t2.schoolid_2
      ,t1.name as name_1
      ,t2.name as name_2
      FROM
      table_1 t1
      LEFT JOIN
      table_2 t2
      ON
      my_udf(t1.name, t2.name) > 70


      Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.



      eg



      SELECT
      1.0 * a.matches / a.rows as match_score
      FROM
      (SELECT
      count(1) as rows
      ,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
      FROM
      (SELECT value FROM string_split(t1.name, '1')
      ) t1
      FULL OUTER JOIN
      (SELECT value FROM string_split(t2.name, '1')
      ) t2
      ON
      t1.value = t2.value
      ) a





      share|improve this answer


























        1












        1








        1







        Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.



        My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.



        eg



        SELECT
        t1.schoolid_1
        ,t2.schoolid_2
        ,t1.name as name_1
        ,t2.name as name_2
        FROM
        table_1 t1
        LEFT JOIN
        table_2 t2
        ON
        my_udf(t1.name, t2.name) > 70


        Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.



        eg



        SELECT
        1.0 * a.matches / a.rows as match_score
        FROM
        (SELECT
        count(1) as rows
        ,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
        FROM
        (SELECT value FROM string_split(t1.name, '1')
        ) t1
        FULL OUTER JOIN
        (SELECT value FROM string_split(t2.name, '1')
        ) t2
        ON
        t1.value = t2.value
        ) a





        share|improve this answer













        Fantastic question! I have this problem constantly thanks to user-generated data on freeform text fields.



        My solution was to build a UDF to use python to fix this. My UDF uses python and specifically the fuzzywuzzy library. I pass in two strings and it returns a numeric score which can be used for filtering.



        eg



        SELECT
        t1.schoolid_1
        ,t2.schoolid_2
        ,t1.name as name_1
        ,t2.name as name_2
        FROM
        table_1 t1
        LEFT JOIN
        table_2 t2
        ON
        my_udf(t1.name, t2.name) > 70


        Although it looks like you're running on SQL Server, so python might not be available. In that case you could try to hack together something similar in a SQL-based UDF.



        eg



        SELECT
        1.0 * a.matches / a.rows as match_score
        FROM
        (SELECT
        count(1) as rows
        ,count(CASE WHEN t1 is not NULL and t2 is not NULL THEN 1 END) as matches
        FROM
        (SELECT value FROM string_split(t1.name, '1')
        ) t1
        FULL OUTER JOIN
        (SELECT value FROM string_split(t2.name, '1')
        ) t2
        ON
        t1.value = t2.value
        ) a






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Nov 13 '18 at 18:31









        ScottieBScottieB

        1,69042442




        1,69042442






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Stack Overflow!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53285680%2fhow-to-join-2-tables-using-names-column-with-similar-but-not-the-same%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Xamarin.iOS Cant Deploy on Iphone

            Glorious Revolution

            Dulmage-Mendelsohn matrix decomposition in Python