Select distinct row with an empty column but not when empty column is not part of a duplicate row












0















Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.



e.g. Input is



id | name | fathername | address
1 | bob | john | street1
1 | bob | john |
2 | amir | khan |
3 | roby | johanson | street3


Output



id | name | fathername | address
1 | bob | john | street1
2 | amir | khan |
3 | roby | johanson | street3


We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.










share|improve this question





























    0















    Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.



    e.g. Input is



    id | name | fathername | address
    1 | bob | john | street1
    1 | bob | john |
    2 | amir | khan |
    3 | roby | johanson | street3


    Output



    id | name | fathername | address
    1 | bob | john | street1
    2 | amir | khan |
    3 | roby | johanson | street3


    We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.










    share|improve this question



























      0












      0








      0








      Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.



      e.g. Input is



      id | name | fathername | address
      1 | bob | john | street1
      1 | bob | john |
      2 | amir | khan |
      3 | roby | johanson | street3


      Output



      id | name | fathername | address
      1 | bob | john | street1
      2 | amir | khan |
      3 | roby | johanson | street3


      We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.










      share|improve this question
















      Using Hive, I have duplicate rows and i want to drop duplicate rows (selecting distinct row with non empty column) when a particular column is empty. But I want to keep the rows when the column is empty but not in duplicate row.



      e.g. Input is



      id | name | fathername | address
      1 | bob | john | street1
      1 | bob | john |
      2 | amir | khan |
      3 | roby | johanson | street3


      Output



      id | name | fathername | address
      1 | bob | john | street1
      2 | amir | khan |
      3 | roby | johanson | street3


      We dropped row for id 1 when address was empty because it was a duplicated row. Although address for id 2 is missing, we still want to keep the row because its not a duplicated row. I need it for hive. There are many columns in actual problem and solution need to work with selecting * rather than particular columns.







      hive hiveql






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 14 '18 at 22:20









      VK_217

      7,68752145




      7,68752145










      asked Nov 14 '18 at 22:00









      user1859366user1859366

      5819




      5819
























          2 Answers
          2






          active

          oldest

          votes


















          0














          You can use GROUP BY with MAX:



          select id, name, fathername, max(address)
          from data
          group by id, name, fathername


          Or if you want to use select *:



          select *
          from data
          where address is not null
          union
          select *
          from data
          where address is null and id not in (
          select id
          from data
          where address is not null
          )





          share|improve this answer


























          • Will it work with select * as well because * will also contain address.

            – user1859366
            Nov 14 '18 at 22:27






          • 1





            No, you will need to use max in the first one. Added version with select *

            – slaakso
            Nov 14 '18 at 22:32



















          0














          You can prioritize the non-null address row in an order by using row_number.



          select *
          from (select t.*
          ,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
          from tbl t
          ) t
          where rnum = 1


          Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.






          share|improve this answer























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309374%2fselect-distinct-row-with-an-empty-column-but-not-when-empty-column-is-not-part-o%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            2 Answers
            2






            active

            oldest

            votes








            2 Answers
            2






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            You can use GROUP BY with MAX:



            select id, name, fathername, max(address)
            from data
            group by id, name, fathername


            Or if you want to use select *:



            select *
            from data
            where address is not null
            union
            select *
            from data
            where address is null and id not in (
            select id
            from data
            where address is not null
            )





            share|improve this answer


























            • Will it work with select * as well because * will also contain address.

              – user1859366
              Nov 14 '18 at 22:27






            • 1





              No, you will need to use max in the first one. Added version with select *

              – slaakso
              Nov 14 '18 at 22:32
















            0














            You can use GROUP BY with MAX:



            select id, name, fathername, max(address)
            from data
            group by id, name, fathername


            Or if you want to use select *:



            select *
            from data
            where address is not null
            union
            select *
            from data
            where address is null and id not in (
            select id
            from data
            where address is not null
            )





            share|improve this answer


























            • Will it work with select * as well because * will also contain address.

              – user1859366
              Nov 14 '18 at 22:27






            • 1





              No, you will need to use max in the first one. Added version with select *

              – slaakso
              Nov 14 '18 at 22:32














            0












            0








            0







            You can use GROUP BY with MAX:



            select id, name, fathername, max(address)
            from data
            group by id, name, fathername


            Or if you want to use select *:



            select *
            from data
            where address is not null
            union
            select *
            from data
            where address is null and id not in (
            select id
            from data
            where address is not null
            )





            share|improve this answer















            You can use GROUP BY with MAX:



            select id, name, fathername, max(address)
            from data
            group by id, name, fathername


            Or if you want to use select *:



            select *
            from data
            where address is not null
            union
            select *
            from data
            where address is null and id not in (
            select id
            from data
            where address is not null
            )






            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 14 '18 at 22:31

























            answered Nov 14 '18 at 22:08









            slaaksoslaakso

            3,016819




            3,016819













            • Will it work with select * as well because * will also contain address.

              – user1859366
              Nov 14 '18 at 22:27






            • 1





              No, you will need to use max in the first one. Added version with select *

              – slaakso
              Nov 14 '18 at 22:32



















            • Will it work with select * as well because * will also contain address.

              – user1859366
              Nov 14 '18 at 22:27






            • 1





              No, you will need to use max in the first one. Added version with select *

              – slaakso
              Nov 14 '18 at 22:32

















            Will it work with select * as well because * will also contain address.

            – user1859366
            Nov 14 '18 at 22:27





            Will it work with select * as well because * will also contain address.

            – user1859366
            Nov 14 '18 at 22:27




            1




            1





            No, you will need to use max in the first one. Added version with select *

            – slaakso
            Nov 14 '18 at 22:32





            No, you will need to use max in the first one. Added version with select *

            – slaakso
            Nov 14 '18 at 22:32













            0














            You can prioritize the non-null address row in an order by using row_number.



            select *
            from (select t.*
            ,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
            from tbl t
            ) t
            where rnum = 1


            Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.






            share|improve this answer




























              0














              You can prioritize the non-null address row in an order by using row_number.



              select *
              from (select t.*
              ,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
              from tbl t
              ) t
              where rnum = 1


              Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.






              share|improve this answer


























                0












                0








                0







                You can prioritize the non-null address row in an order by using row_number.



                select *
                from (select t.*
                ,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
                from tbl t
                ) t
                where rnum = 1


                Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.






                share|improve this answer













                You can prioritize the non-null address row in an order by using row_number.



                select *
                from (select t.*
                ,row_number() over(partition by id order by case when address is not null then 1 else 2 end) as rnum
                from tbl t
                ) t
                where rnum = 1


                Note: If there is more than one non-null row, you might have to specify one or more columns to break the ties.







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Nov 14 '18 at 22:32









                Vamsi PrabhalaVamsi Prabhala

                41.4k42039




                41.4k42039






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309374%2fselect-distinct-row-with-an-empty-column-but-not-when-empty-column-is-not-part-o%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Xamarin.iOS Cant Deploy on Iphone

                    Glorious Revolution

                    Dulmage-Mendelsohn matrix decomposition in Python