automatic variable selection












0















I have a data set with following columns



Acres,FamilyType, NumBedrooms,NumChildren, NumPeople, NumRooms,NumUnits NumVehicles,NumWorkers, OwnRent,YearBuilt, HouseCosts,ElectricBill, FoodStamp,HeatingFuel,Insurance,Language, above_150K


I did



fit<-glm(above_150K~Acres+ FamilyType+ NumBedrooms+ NumChildren+NumPeople+NumRooms+NumUnits+NumVehicles+NumWorkers+OwnRent+YearBuilt+HouseCosts+ElectricBill+FoodStamp+HeatingFuel+Insurance+Language,data=‘df’) 
summary(fit)


It breaks down each column further down into sub columns like below



                        Abbreviation
Acres10+ A
AcresSub 1 A1
FamilyTypeMale Head FH
FamilyTypeMarried FT
NumBedrooms NB
NumChildren NC
NumPeople NP
NumRooms NR
NumUnitsSingle attached Na
NumUnitsSingle detached Nd
NumVehicles NV
NumWorkers NW
OwnRentOutright ORO
OwnRentRented ORR
YearBuilt1940-1949 YB194
YearBuilt1950-1959 YB195
YearBuilt1960-1969 YB196
YearBuilt1970-1979 YB197
YearBuilt1980-1989 YB198
YearBuilt1990-1999 YB199
YearBuilt2000-2004 YB2000
YearBuilt2005 YB2005
YearBuilt2006 YB2006
YearBuilt2007 YB2007
YearBuilt2008 YB2008
YearBuilt2009 YB2009
YearBuilt2010 YB201
YearBuiltBefore 1939 Y1
HouseCosts HC
ElectricBill E
FoodStampYes FS
HeatingFuelElectricity HFE
HeatingFuelGas HFG
HeatingFuelNone HFN
HeatingFuelOil HtngFlOl
HeatingFuelOther HtngFlOt
HeatingFuelSolar HFS
HeatingFuelWood HFW
Insurance I
LanguageEnglish LnE
LanguageOther LO
LanguageOther European LOE
LanguageSpanish LS


As you can see a single column HeatingFuel is broken down into



HeatingFuelElectricity           HFE
HeatingFuelGas HFG
HeatingFuelNone HFN
HeatingFuelOil HtngFlOl
HeatingFuelOther HtngFlOt
HeatingFuelSolar HFS
HeatingFuelWood HFW


Why is this happening?



I wanted to select the variables for prediction above_150K, I used Stepwise, AllSubsets automatic variable selection, they suggest
To use all the variables. Could some one please clarify?










share|improve this question





























    0















    I have a data set with following columns



    Acres,FamilyType, NumBedrooms,NumChildren, NumPeople, NumRooms,NumUnits NumVehicles,NumWorkers, OwnRent,YearBuilt, HouseCosts,ElectricBill, FoodStamp,HeatingFuel,Insurance,Language, above_150K


    I did



    fit<-glm(above_150K~Acres+ FamilyType+ NumBedrooms+ NumChildren+NumPeople+NumRooms+NumUnits+NumVehicles+NumWorkers+OwnRent+YearBuilt+HouseCosts+ElectricBill+FoodStamp+HeatingFuel+Insurance+Language,data=‘df’) 
    summary(fit)


    It breaks down each column further down into sub columns like below



                            Abbreviation
    Acres10+ A
    AcresSub 1 A1
    FamilyTypeMale Head FH
    FamilyTypeMarried FT
    NumBedrooms NB
    NumChildren NC
    NumPeople NP
    NumRooms NR
    NumUnitsSingle attached Na
    NumUnitsSingle detached Nd
    NumVehicles NV
    NumWorkers NW
    OwnRentOutright ORO
    OwnRentRented ORR
    YearBuilt1940-1949 YB194
    YearBuilt1950-1959 YB195
    YearBuilt1960-1969 YB196
    YearBuilt1970-1979 YB197
    YearBuilt1980-1989 YB198
    YearBuilt1990-1999 YB199
    YearBuilt2000-2004 YB2000
    YearBuilt2005 YB2005
    YearBuilt2006 YB2006
    YearBuilt2007 YB2007
    YearBuilt2008 YB2008
    YearBuilt2009 YB2009
    YearBuilt2010 YB201
    YearBuiltBefore 1939 Y1
    HouseCosts HC
    ElectricBill E
    FoodStampYes FS
    HeatingFuelElectricity HFE
    HeatingFuelGas HFG
    HeatingFuelNone HFN
    HeatingFuelOil HtngFlOl
    HeatingFuelOther HtngFlOt
    HeatingFuelSolar HFS
    HeatingFuelWood HFW
    Insurance I
    LanguageEnglish LnE
    LanguageOther LO
    LanguageOther European LOE
    LanguageSpanish LS


    As you can see a single column HeatingFuel is broken down into



    HeatingFuelElectricity           HFE
    HeatingFuelGas HFG
    HeatingFuelNone HFN
    HeatingFuelOil HtngFlOl
    HeatingFuelOther HtngFlOt
    HeatingFuelSolar HFS
    HeatingFuelWood HFW


    Why is this happening?



    I wanted to select the variables for prediction above_150K, I used Stepwise, AllSubsets automatic variable selection, they suggest
    To use all the variables. Could some one please clarify?










    share|improve this question



























      0












      0








      0








      I have a data set with following columns



      Acres,FamilyType, NumBedrooms,NumChildren, NumPeople, NumRooms,NumUnits NumVehicles,NumWorkers, OwnRent,YearBuilt, HouseCosts,ElectricBill, FoodStamp,HeatingFuel,Insurance,Language, above_150K


      I did



      fit<-glm(above_150K~Acres+ FamilyType+ NumBedrooms+ NumChildren+NumPeople+NumRooms+NumUnits+NumVehicles+NumWorkers+OwnRent+YearBuilt+HouseCosts+ElectricBill+FoodStamp+HeatingFuel+Insurance+Language,data=‘df’) 
      summary(fit)


      It breaks down each column further down into sub columns like below



                              Abbreviation
      Acres10+ A
      AcresSub 1 A1
      FamilyTypeMale Head FH
      FamilyTypeMarried FT
      NumBedrooms NB
      NumChildren NC
      NumPeople NP
      NumRooms NR
      NumUnitsSingle attached Na
      NumUnitsSingle detached Nd
      NumVehicles NV
      NumWorkers NW
      OwnRentOutright ORO
      OwnRentRented ORR
      YearBuilt1940-1949 YB194
      YearBuilt1950-1959 YB195
      YearBuilt1960-1969 YB196
      YearBuilt1970-1979 YB197
      YearBuilt1980-1989 YB198
      YearBuilt1990-1999 YB199
      YearBuilt2000-2004 YB2000
      YearBuilt2005 YB2005
      YearBuilt2006 YB2006
      YearBuilt2007 YB2007
      YearBuilt2008 YB2008
      YearBuilt2009 YB2009
      YearBuilt2010 YB201
      YearBuiltBefore 1939 Y1
      HouseCosts HC
      ElectricBill E
      FoodStampYes FS
      HeatingFuelElectricity HFE
      HeatingFuelGas HFG
      HeatingFuelNone HFN
      HeatingFuelOil HtngFlOl
      HeatingFuelOther HtngFlOt
      HeatingFuelSolar HFS
      HeatingFuelWood HFW
      Insurance I
      LanguageEnglish LnE
      LanguageOther LO
      LanguageOther European LOE
      LanguageSpanish LS


      As you can see a single column HeatingFuel is broken down into



      HeatingFuelElectricity           HFE
      HeatingFuelGas HFG
      HeatingFuelNone HFN
      HeatingFuelOil HtngFlOl
      HeatingFuelOther HtngFlOt
      HeatingFuelSolar HFS
      HeatingFuelWood HFW


      Why is this happening?



      I wanted to select the variables for prediction above_150K, I used Stepwise, AllSubsets automatic variable selection, they suggest
      To use all the variables. Could some one please clarify?










      share|improve this question
















      I have a data set with following columns



      Acres,FamilyType, NumBedrooms,NumChildren, NumPeople, NumRooms,NumUnits NumVehicles,NumWorkers, OwnRent,YearBuilt, HouseCosts,ElectricBill, FoodStamp,HeatingFuel,Insurance,Language, above_150K


      I did



      fit<-glm(above_150K~Acres+ FamilyType+ NumBedrooms+ NumChildren+NumPeople+NumRooms+NumUnits+NumVehicles+NumWorkers+OwnRent+YearBuilt+HouseCosts+ElectricBill+FoodStamp+HeatingFuel+Insurance+Language,data=‘df’) 
      summary(fit)


      It breaks down each column further down into sub columns like below



                              Abbreviation
      Acres10+ A
      AcresSub 1 A1
      FamilyTypeMale Head FH
      FamilyTypeMarried FT
      NumBedrooms NB
      NumChildren NC
      NumPeople NP
      NumRooms NR
      NumUnitsSingle attached Na
      NumUnitsSingle detached Nd
      NumVehicles NV
      NumWorkers NW
      OwnRentOutright ORO
      OwnRentRented ORR
      YearBuilt1940-1949 YB194
      YearBuilt1950-1959 YB195
      YearBuilt1960-1969 YB196
      YearBuilt1970-1979 YB197
      YearBuilt1980-1989 YB198
      YearBuilt1990-1999 YB199
      YearBuilt2000-2004 YB2000
      YearBuilt2005 YB2005
      YearBuilt2006 YB2006
      YearBuilt2007 YB2007
      YearBuilt2008 YB2008
      YearBuilt2009 YB2009
      YearBuilt2010 YB201
      YearBuiltBefore 1939 Y1
      HouseCosts HC
      ElectricBill E
      FoodStampYes FS
      HeatingFuelElectricity HFE
      HeatingFuelGas HFG
      HeatingFuelNone HFN
      HeatingFuelOil HtngFlOl
      HeatingFuelOther HtngFlOt
      HeatingFuelSolar HFS
      HeatingFuelWood HFW
      Insurance I
      LanguageEnglish LnE
      LanguageOther LO
      LanguageOther European LOE
      LanguageSpanish LS


      As you can see a single column HeatingFuel is broken down into



      HeatingFuelElectricity           HFE
      HeatingFuelGas HFG
      HeatingFuelNone HFN
      HeatingFuelOil HtngFlOl
      HeatingFuelOther HtngFlOt
      HeatingFuelSolar HFS
      HeatingFuelWood HFW


      Why is this happening?



      I wanted to select the variables for prediction above_150K, I used Stepwise, AllSubsets automatic variable selection, they suggest
      To use all the variables. Could some one please clarify?







      r logistic-regression forecasting feature-selection variable-selection






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 22 '18 at 9:39









      Vivek Kumar

      16.2k42054




      16.2k42054










      asked Nov 14 '18 at 21:49









      RajRaj

      378




      378
























          0






          active

          oldest

          votes











          Your Answer






          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "1"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309232%2fautomatic-variable-selection%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Stack Overflow!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53309232%2fautomatic-variable-selection%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Xamarin.iOS Cant Deploy on Iphone

          Glorious Revolution

          Dulmage-Mendelsohn matrix decomposition in Python