Maximum number of feature dimensions












-2















I have a classification problem and my current feature vector does not seem to hold enough information.
My training set has 10k entries and I am using a SVM as classifier (scikit-learn).



What is the maximum reasonable feature vector size (how many dimension)?

(Training and evaluation using Labtop CPU)



100? 1k? 10k? 100k? 1M?










share|improve this question





























    -2















    I have a classification problem and my current feature vector does not seem to hold enough information.
    My training set has 10k entries and I am using a SVM as classifier (scikit-learn).



    What is the maximum reasonable feature vector size (how many dimension)?

    (Training and evaluation using Labtop CPU)



    100? 1k? 10k? 100k? 1M?










    share|improve this question



























      -2












      -2








      -2








      I have a classification problem and my current feature vector does not seem to hold enough information.
      My training set has 10k entries and I am using a SVM as classifier (scikit-learn).



      What is the maximum reasonable feature vector size (how many dimension)?

      (Training and evaluation using Labtop CPU)



      100? 1k? 10k? 100k? 1M?










      share|improve this question
















      I have a classification problem and my current feature vector does not seem to hold enough information.
      My training set has 10k entries and I am using a SVM as classifier (scikit-learn).



      What is the maximum reasonable feature vector size (how many dimension)?

      (Training and evaluation using Labtop CPU)



      100? 1k? 10k? 100k? 1M?







      machine-learning






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Nov 15 '18 at 19:17







      000000000000000000000

















      asked Nov 15 '18 at 18:54









      000000000000000000000000000000000000000000

      6771727




      6771727
























          1 Answer
          1






          active

          oldest

          votes


















          2














          The thing is not how many features should it be for a certain number of cases (i.e. entries) but rather the opposite:




          It’s not who has the best algorithm that wins. It’s who has the most data. (Banko and Brill, 2001)




          Banko and Brill in 2001 made a comparison among 4 different algorithms, they kept increasing the Training Set Size to millions and came up with the above-quoted conclusion.



          Moreover, Prof. Andrew Ng clearly covered this topic, and I’m quoting here:




          If a learning algorithm is suffering from high variance, getting more training data is likely to help.



          If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much






          So as a rule of thumb, your data cases must be greater than the number of features in your dataset taking into account that all features should be informative as much as possible (i.e. the features are not highly collinear (i.e. redundant)).



          I read once in more than one place and somewhere in Scikit-Learn Documentation, that the number of inputs (i.e. samples) must be at least the square size of the number of features (i.e. n_samples > n_features ** 2 ).





          Nevertheless, for SVM in particular, the number of features n v.s number of entries m is an important factor to specify the type of kernel to use initially, as a second rule of thumb for SVM in particular (also according to Prof. Andrew Ng):




          1. If number of features is much greater than number of entries (i.e. n is up to 10K and m is up to 1K) --> use SVM without a kernel (i.e. "linear kernel") or use Logistic Regression.

          2. If number of features is small and is number of entries is intermediate (i.e. n is up to 1K and m is up to 10K) --> use SVM with Gaussian kernel.

          3. If number of feature is small and is number of entries is much larger (i.e. n is up to 1K and m > 50K) --> Create/add more features, then use SVM without a kernel or use Logistic Regression.






          share|improve this answer

























            Your Answer






            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "1"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326173%2fmaximum-number-of-feature-dimensions%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            2














            The thing is not how many features should it be for a certain number of cases (i.e. entries) but rather the opposite:




            It’s not who has the best algorithm that wins. It’s who has the most data. (Banko and Brill, 2001)




            Banko and Brill in 2001 made a comparison among 4 different algorithms, they kept increasing the Training Set Size to millions and came up with the above-quoted conclusion.



            Moreover, Prof. Andrew Ng clearly covered this topic, and I’m quoting here:




            If a learning algorithm is suffering from high variance, getting more training data is likely to help.



            If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much






            So as a rule of thumb, your data cases must be greater than the number of features in your dataset taking into account that all features should be informative as much as possible (i.e. the features are not highly collinear (i.e. redundant)).



            I read once in more than one place and somewhere in Scikit-Learn Documentation, that the number of inputs (i.e. samples) must be at least the square size of the number of features (i.e. n_samples > n_features ** 2 ).





            Nevertheless, for SVM in particular, the number of features n v.s number of entries m is an important factor to specify the type of kernel to use initially, as a second rule of thumb for SVM in particular (also according to Prof. Andrew Ng):




            1. If number of features is much greater than number of entries (i.e. n is up to 10K and m is up to 1K) --> use SVM without a kernel (i.e. "linear kernel") or use Logistic Regression.

            2. If number of features is small and is number of entries is intermediate (i.e. n is up to 1K and m is up to 10K) --> use SVM with Gaussian kernel.

            3. If number of feature is small and is number of entries is much larger (i.e. n is up to 1K and m > 50K) --> Create/add more features, then use SVM without a kernel or use Logistic Regression.






            share|improve this answer






























              2














              The thing is not how many features should it be for a certain number of cases (i.e. entries) but rather the opposite:




              It’s not who has the best algorithm that wins. It’s who has the most data. (Banko and Brill, 2001)




              Banko and Brill in 2001 made a comparison among 4 different algorithms, they kept increasing the Training Set Size to millions and came up with the above-quoted conclusion.



              Moreover, Prof. Andrew Ng clearly covered this topic, and I’m quoting here:




              If a learning algorithm is suffering from high variance, getting more training data is likely to help.



              If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much






              So as a rule of thumb, your data cases must be greater than the number of features in your dataset taking into account that all features should be informative as much as possible (i.e. the features are not highly collinear (i.e. redundant)).



              I read once in more than one place and somewhere in Scikit-Learn Documentation, that the number of inputs (i.e. samples) must be at least the square size of the number of features (i.e. n_samples > n_features ** 2 ).





              Nevertheless, for SVM in particular, the number of features n v.s number of entries m is an important factor to specify the type of kernel to use initially, as a second rule of thumb for SVM in particular (also according to Prof. Andrew Ng):




              1. If number of features is much greater than number of entries (i.e. n is up to 10K and m is up to 1K) --> use SVM without a kernel (i.e. "linear kernel") or use Logistic Regression.

              2. If number of features is small and is number of entries is intermediate (i.e. n is up to 1K and m is up to 10K) --> use SVM with Gaussian kernel.

              3. If number of feature is small and is number of entries is much larger (i.e. n is up to 1K and m > 50K) --> Create/add more features, then use SVM without a kernel or use Logistic Regression.






              share|improve this answer




























                2












                2








                2







                The thing is not how many features should it be for a certain number of cases (i.e. entries) but rather the opposite:




                It’s not who has the best algorithm that wins. It’s who has the most data. (Banko and Brill, 2001)




                Banko and Brill in 2001 made a comparison among 4 different algorithms, they kept increasing the Training Set Size to millions and came up with the above-quoted conclusion.



                Moreover, Prof. Andrew Ng clearly covered this topic, and I’m quoting here:




                If a learning algorithm is suffering from high variance, getting more training data is likely to help.



                If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much






                So as a rule of thumb, your data cases must be greater than the number of features in your dataset taking into account that all features should be informative as much as possible (i.e. the features are not highly collinear (i.e. redundant)).



                I read once in more than one place and somewhere in Scikit-Learn Documentation, that the number of inputs (i.e. samples) must be at least the square size of the number of features (i.e. n_samples > n_features ** 2 ).





                Nevertheless, for SVM in particular, the number of features n v.s number of entries m is an important factor to specify the type of kernel to use initially, as a second rule of thumb for SVM in particular (also according to Prof. Andrew Ng):




                1. If number of features is much greater than number of entries (i.e. n is up to 10K and m is up to 1K) --> use SVM without a kernel (i.e. "linear kernel") or use Logistic Regression.

                2. If number of features is small and is number of entries is intermediate (i.e. n is up to 1K and m is up to 10K) --> use SVM with Gaussian kernel.

                3. If number of feature is small and is number of entries is much larger (i.e. n is up to 1K and m > 50K) --> Create/add more features, then use SVM without a kernel or use Logistic Regression.






                share|improve this answer















                The thing is not how many features should it be for a certain number of cases (i.e. entries) but rather the opposite:




                It’s not who has the best algorithm that wins. It’s who has the most data. (Banko and Brill, 2001)




                Banko and Brill in 2001 made a comparison among 4 different algorithms, they kept increasing the Training Set Size to millions and came up with the above-quoted conclusion.



                Moreover, Prof. Andrew Ng clearly covered this topic, and I’m quoting here:




                If a learning algorithm is suffering from high variance, getting more training data is likely to help.



                If a learning algorithm is suffering from high bias, getting more training data will not (by itself) help much






                So as a rule of thumb, your data cases must be greater than the number of features in your dataset taking into account that all features should be informative as much as possible (i.e. the features are not highly collinear (i.e. redundant)).



                I read once in more than one place and somewhere in Scikit-Learn Documentation, that the number of inputs (i.e. samples) must be at least the square size of the number of features (i.e. n_samples > n_features ** 2 ).





                Nevertheless, for SVM in particular, the number of features n v.s number of entries m is an important factor to specify the type of kernel to use initially, as a second rule of thumb for SVM in particular (also according to Prof. Andrew Ng):




                1. If number of features is much greater than number of entries (i.e. n is up to 10K and m is up to 1K) --> use SVM without a kernel (i.e. "linear kernel") or use Logistic Regression.

                2. If number of features is small and is number of entries is intermediate (i.e. n is up to 1K and m is up to 10K) --> use SVM with Gaussian kernel.

                3. If number of feature is small and is number of entries is much larger (i.e. n is up to 1K and m > 50K) --> Create/add more features, then use SVM without a kernel or use Logistic Regression.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Nov 15 '18 at 20:18

























                answered Nov 15 '18 at 20:13









                YahyaYahya

                3,7722930




                3,7722930
































                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Stack Overflow!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53326173%2fmaximum-number-of-feature-dimensions%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Xamarin.iOS Cant Deploy on Iphone

                    Glorious Revolution

                    Dulmage-Mendelsohn matrix decomposition in Python