How do you simulate statistically significant data according to an ANOVA F test in Python?












0














I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.



However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.



I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.



Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?



Any help whatsoever (even general comments on my project) will be extremely appreciated!!










share|improve this question




















  • 1




    Welcome to Stack Overflow! If you want Python, why did you include the R tag?
    – G5W
    Nov 12 at 18:53










  • Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
    – Prune
    Nov 12 at 19:00
















0














I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.



However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.



I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.



Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?



Any help whatsoever (even general comments on my project) will be extremely appreciated!!










share|improve this question




















  • 1




    Welcome to Stack Overflow! If you want Python, why did you include the R tag?
    – G5W
    Nov 12 at 18:53










  • Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
    – Prune
    Nov 12 at 19:00














0












0








0







I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.



However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.



I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.



Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?



Any help whatsoever (even general comments on my project) will be extremely appreciated!!










share|improve this question















I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.



However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.



I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.



Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?



Any help whatsoever (even general comments on my project) will be extremely appreciated!!







python machine-learning statistics supervised-learning






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 12 at 19:12









juanpa.arrivillaga

37.1k33470




37.1k33470










asked Nov 12 at 18:47









Neelasha Bhattacharjee

1




1








  • 1




    Welcome to Stack Overflow! If you want Python, why did you include the R tag?
    – G5W
    Nov 12 at 18:53










  • Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
    – Prune
    Nov 12 at 19:00














  • 1




    Welcome to Stack Overflow! If you want Python, why did you include the R tag?
    – G5W
    Nov 12 at 18:53










  • Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
    – Prune
    Nov 12 at 19:00








1




1




Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53




Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53












Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00




Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268302%2fhow-do-you-simulate-statistically-significant-data-according-to-an-anova-f-test%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268302%2fhow-do-you-simulate-statistically-significant-data-according-to-an-anova-f-test%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

List item for chat from Array inside array React Native

Thiostrepton

Caerphilly