Replace value with the average of it's column - many columns
up vote
2
down vote
favorite
I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.
I would like to replace all of the '*' values with the average of the column that it is in.
Doing this manually would take a long time, so is there a formula that would achieve this?
Thanks so much in advanced for any help.
excel machine-learning
add a comment |
up vote
2
down vote
favorite
I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.
I would like to replace all of the '*' values with the average of the column that it is in.
Doing this manually would take a long time, so is there a formula that would achieve this?
Thanks so much in advanced for any help.
excel machine-learning
add a comment |
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.
I would like to replace all of the '*' values with the average of the column that it is in.
Doing this manually would take a long time, so is there a formula that would achieve this?
Thanks so much in advanced for any help.
excel machine-learning
I have an excel sheet with over 1000 columns and 11000 rows - all with numeric data. Within the data, there are missing values represented with '*'.
I would like to replace all of the '*' values with the average of the column that it is in.
Doing this manually would take a long time, so is there a formula that would achieve this?
Thanks so much in advanced for any help.
excel machine-learning
excel machine-learning
asked Nov 10 at 14:36
Sam
213
213
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
up vote
1
down vote
I can give you a three sheet solution Sam?:
Sheet 2:
Cell A1=
=AVERAGE(Sheet1!A:A)
Paste that along the top row for each of 1000 columns in sheet 2.
Sheet 3:
Cell A1=
=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)
Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!
thank you sooo much - that works brilliantly!!
– Sam
Nov 10 at 15:25
You're welcome :-) ...please could you mark it as answered then :-) link
– RichardBJ
Nov 10 at 15:29
add a comment |
up vote
1
down vote
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*"
as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
- Export the sheet as a CSV and save it.
- Use Ctrl+ F to bring up the find and replace dialog and enter
"~*"
for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE
and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS
collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv
and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download
. Done.
add a comment |
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
I can give you a three sheet solution Sam?:
Sheet 2:
Cell A1=
=AVERAGE(Sheet1!A:A)
Paste that along the top row for each of 1000 columns in sheet 2.
Sheet 3:
Cell A1=
=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)
Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!
thank you sooo much - that works brilliantly!!
– Sam
Nov 10 at 15:25
You're welcome :-) ...please could you mark it as answered then :-) link
– RichardBJ
Nov 10 at 15:29
add a comment |
up vote
1
down vote
I can give you a three sheet solution Sam?:
Sheet 2:
Cell A1=
=AVERAGE(Sheet1!A:A)
Paste that along the top row for each of 1000 columns in sheet 2.
Sheet 3:
Cell A1=
=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)
Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!
thank you sooo much - that works brilliantly!!
– Sam
Nov 10 at 15:25
You're welcome :-) ...please could you mark it as answered then :-) link
– RichardBJ
Nov 10 at 15:29
add a comment |
up vote
1
down vote
up vote
1
down vote
I can give you a three sheet solution Sam?:
Sheet 2:
Cell A1=
=AVERAGE(Sheet1!A:A)
Paste that along the top row for each of 1000 columns in sheet 2.
Sheet 3:
Cell A1=
=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)
Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!
I can give you a three sheet solution Sam?:
Sheet 2:
Cell A1=
=AVERAGE(Sheet1!A:A)
Paste that along the top row for each of 1000 columns in sheet 2.
Sheet 3:
Cell A1=
=IF(Sheet1!A1="*",Sheet2!A$1,Sheet1!A1)
Copy that and then paste it into the entire worksheet 3 (i.e., that top left corner symbol that allows you to do that). It's gonna take a while to update but will deliver what you want!
edited Nov 10 at 15:12
answered Nov 10 at 14:59
RichardBJ
1265
1265
thank you sooo much - that works brilliantly!!
– Sam
Nov 10 at 15:25
You're welcome :-) ...please could you mark it as answered then :-) link
– RichardBJ
Nov 10 at 15:29
add a comment |
thank you sooo much - that works brilliantly!!
– Sam
Nov 10 at 15:25
You're welcome :-) ...please could you mark it as answered then :-) link
– RichardBJ
Nov 10 at 15:29
thank you sooo much - that works brilliantly!!
– Sam
Nov 10 at 15:25
thank you sooo much - that works brilliantly!!
– Sam
Nov 10 at 15:25
You're welcome :-) ...please could you mark it as answered then :-) link
– RichardBJ
Nov 10 at 15:29
You're welcome :-) ...please could you mark it as answered then :-) link
– RichardBJ
Nov 10 at 15:29
add a comment |
up vote
1
down vote
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*"
as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
- Export the sheet as a CSV and save it.
- Use Ctrl+ F to bring up the find and replace dialog and enter
"~*"
for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE
and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS
collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv
and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download
. Done.
add a comment |
up vote
1
down vote
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*"
as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
- Export the sheet as a CSV and save it.
- Use Ctrl+ F to bring up the find and replace dialog and enter
"~*"
for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE
and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS
collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv
and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download
. Done.
add a comment |
up vote
1
down vote
up vote
1
down vote
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*"
as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
- Export the sheet as a CSV and save it.
- Use Ctrl+ F to bring up the find and replace dialog and enter
"~*"
for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE
and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS
collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv
and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download
. Done.
As you have mentioned machine learning I thought I would introduce you to how you could do this with Azure Machine Learning Studio (AML) using a free account.
By using AML you gain access to a number of methods for replacing missing values which are extremely quick. AML has a Clean Missing Data module which exposes methods of replacement such as Multivariate Imputation using Chained Equation, Mean, Median and several others. The great thing here is you can visualize the dataset columns by right clicking on the dataset and see which columns have skew. You can then select on a column by column basis which replacement method to use. If you have heavily skewed columns you might use median instead for instance. This also offers great opportunities for data normalization (scale and reduce). You also gain access to using Python and R with your dataset.
I don't know if there is a method for directly treating "*"
as missing values, I am trying to find that out, but if you do a little processing in advance of load then all is fine. The step before loading requires:
- Export the sheet as a CSV and save it.
- Use Ctrl+ F to bring up the find and replace dialog and enter
"~*"
for Find and leave Replace blank
Then login into AML and click the + New at the bottom of the screen
Select New > DATASET > FROM LOCAL FILE
and select your file
When selecting type ensure to select CSV with no header if you data has no header row or with header if it does:
Your dataset will start uploading as shown by progress bar at bottom of screen and then appear in the SAVED DATASETS
collection.
Click the + New button again and select EXPERIMENT > BLANK EXPERIMENT
Drag and drop your saved dataset onto the canvas on the right:
In the Search experiment items box on the right, type: Clean Missing Data
then drag the module that appears onto the canvas
Join the 2 boxes by clicking the dot at the bottom of the top box and dragging to the other box
Select the bottom box and then input the following parameters on the right (here is where you can choose which method to apply for missing values e.g. replace missing with mean, or perhaps median if your column data is skewed.
Right click the bottom module and select Run selected
Right click again and select Cleaned dataset > Save as Dataset
The progress bar at the bottom will inform you when complete
Type in the Search experiment items box again: convert to csv
and drag that onto the canvas and connect the left hand side bottom of the second module to the top of the newly added third:
Select the bottom module and right click > Run selected
Wait for the progress bar to complete.
Right-click the bottom module and hit Download
. Done.
edited 2 days ago
answered Nov 10 at 19:07
QHarr
25.3k81839
25.3k81839
add a comment |
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53239988%2freplace-value-with-the-average-of-its-column-many-columns%23new-answer', 'question_page');
}
);
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password