How do you simulate statistically significant data according to an ANOVA F test in Python?
I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.
However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.
I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.
Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?
Any help whatsoever (even general comments on my project) will be extremely appreciated!!
python machine-learning statistics supervised-learning
add a comment |
I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.
However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.
I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.
Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?
Any help whatsoever (even general comments on my project) will be extremely appreciated!!
python machine-learning statistics supervised-learning
1
Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53
Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00
add a comment |
I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.
However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.
I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.
Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?
Any help whatsoever (even general comments on my project) will be extremely appreciated!!
python machine-learning statistics supervised-learning
I am in 10th grade and I am looking to simulate data for a machine learning science fair project. The final model will be used on patient data and will predict the correlation between certain times of the week or day and the effect this has on the medication adherence within the data of a single patient. Adherence values will be decimal proportions (i.e. on Wednesday they took the medication at the right time twice out of the required 3 times, adherence value for that Wednesday will be 0.67). I am looking to create a machine learning model which is able to learn from the relationship between days of the week and time of day, and the adherence of a patient and predict adherence in test sets when given the label "days of the week" or "time of day". To do this, I am looking to simulate 1,000 patients worth of data. Each patient will have a 30 weeks worth of data. The data will be split into 4 groups of 250 patients. in the first group, one trend (time of day to adherence) will be considered statistically significant. In the other group, the other trend will be considered statistically significant (day of week to adherence). In another both trends will be significant, and in the other none of the trends will be significant.
However, in order to decide whether, out of the many time of days, one day has a statistically significant effect on adherence, an F test must be used, as multiple variables are involved. I cannot just make the adherence on a specific day much lower than the others and hope for the best, as the trend must be considered significant statistically.
I understand there are modules for evaluating a data set using the F test, but I am looking for a way to create data which will pass the F test. This data would not be linear, making the problem much harder.
Does anybody have a suggestion on how I would go about doing this, or whether I should use a different approach altogether?
Any help whatsoever (even general comments on my project) will be extremely appreciated!!
python machine-learning statistics supervised-learning
python machine-learning statistics supervised-learning
edited Nov 12 at 19:12
juanpa.arrivillaga
37.1k33470
37.1k33470
asked Nov 12 at 18:47
Neelasha Bhattacharjee
1
1
1
Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53
Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00
add a comment |
1
Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53
Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00
1
1
Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53
Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53
Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00
Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00
add a comment |
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268302%2fhow-do-you-simulate-statistically-significant-data-according-to-an-anova-f-test%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268302%2fhow-do-you-simulate-statistically-significant-data-according-to-an-anova-f-test%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Welcome to Stack Overflow! If you want Python, why did you include the R tag?
– G5W
Nov 12 at 18:53
Your question doesn't fit here particularly well, but is a worthy question. Please check "Which site?". I suggest either the stats group or cross-validated.
– Prune
Nov 12 at 19:00