error while assigning label encoded values to column in dask dataframe
I am facing error in label encoding features. To generate my case(Originally, i have imported a csv file in dask dataframe and after cleaning, it is left with 28 columns), I have created dask dataframe like below:
import dask
import dask.dataframe as dd
from dask_ml.preprocessing import LabelEncoder
country = np.random.choice(['US','UK','IN'],1700000)
df = pd.DataFrame({'A':country,'B':range(1700000)})
ddf = dd.from_pandas(df,npartitions=2,sort=False)
Then I tried to label encode categorical columns like below :
le = LabelEncoder()
ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
which threw following error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-106-480a5e12886a> in <module>()
10 type(le.fit_transform(ddf['A']))
11 #ddf['A'] = dd.from_array(le.fit_transform(ddf['A']))
---> 12 ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in assign(self, **kwargs)
2698 # Figure out columns of the output
2699 df2 = self._meta.assign(**_extract_meta(kwargs))
-> 2700 return elemwise(methods.assign, self, *pairs, meta=df2)
2701
2702 @derived_from(pd.DataFrame, ua_args=['index'])
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
3277
3278 from .multi import _maybe_align_partitions
-> 3279 args = _maybe_align_partitions(args)
3280 dasks = [arg for arg in args if isinstance(arg, (_Frame, Scalar, Array))]
3281 dfs = [df for df in dasks if isinstance(df, _Frame)]
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in _maybe_align_partitions(args)
145 divisions = dfs[0].divisions
146 if not all(df.divisions == divisions for df in dfs):
--> 147 dfs2 = iter(align_partitions(*dfs)[0])
148 return [a if not isinstance(a, _Frame) else next(dfs2) for a in args]
149 return args
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in align_partitions(*dfs)
101 raise ValueError("dfs contains no DataFrame and Series")
102 if not all(df.known_divisions for df in dfs1):
--> 103 raise ValueError("Not all divisions are known, can't align "
104 "partitions. Please use `set_index` "
105 "to set the index.")
ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.
python-3.x dask dask-ml
add a comment |
I am facing error in label encoding features. To generate my case(Originally, i have imported a csv file in dask dataframe and after cleaning, it is left with 28 columns), I have created dask dataframe like below:
import dask
import dask.dataframe as dd
from dask_ml.preprocessing import LabelEncoder
country = np.random.choice(['US','UK','IN'],1700000)
df = pd.DataFrame({'A':country,'B':range(1700000)})
ddf = dd.from_pandas(df,npartitions=2,sort=False)
Then I tried to label encode categorical columns like below :
le = LabelEncoder()
ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
which threw following error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-106-480a5e12886a> in <module>()
10 type(le.fit_transform(ddf['A']))
11 #ddf['A'] = dd.from_array(le.fit_transform(ddf['A']))
---> 12 ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in assign(self, **kwargs)
2698 # Figure out columns of the output
2699 df2 = self._meta.assign(**_extract_meta(kwargs))
-> 2700 return elemwise(methods.assign, self, *pairs, meta=df2)
2701
2702 @derived_from(pd.DataFrame, ua_args=['index'])
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
3277
3278 from .multi import _maybe_align_partitions
-> 3279 args = _maybe_align_partitions(args)
3280 dasks = [arg for arg in args if isinstance(arg, (_Frame, Scalar, Array))]
3281 dfs = [df for df in dasks if isinstance(df, _Frame)]
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in _maybe_align_partitions(args)
145 divisions = dfs[0].divisions
146 if not all(df.divisions == divisions for df in dfs):
--> 147 dfs2 = iter(align_partitions(*dfs)[0])
148 return [a if not isinstance(a, _Frame) else next(dfs2) for a in args]
149 return args
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in align_partitions(*dfs)
101 raise ValueError("dfs contains no DataFrame and Series")
102 if not all(df.known_divisions for df in dfs1):
--> 103 raise ValueError("Not all divisions are known, can't align "
104 "partitions. Please use `set_index` "
105 "to set the index.")
ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.
python-3.x dask dask-ml
add a comment |
I am facing error in label encoding features. To generate my case(Originally, i have imported a csv file in dask dataframe and after cleaning, it is left with 28 columns), I have created dask dataframe like below:
import dask
import dask.dataframe as dd
from dask_ml.preprocessing import LabelEncoder
country = np.random.choice(['US','UK','IN'],1700000)
df = pd.DataFrame({'A':country,'B':range(1700000)})
ddf = dd.from_pandas(df,npartitions=2,sort=False)
Then I tried to label encode categorical columns like below :
le = LabelEncoder()
ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
which threw following error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-106-480a5e12886a> in <module>()
10 type(le.fit_transform(ddf['A']))
11 #ddf['A'] = dd.from_array(le.fit_transform(ddf['A']))
---> 12 ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in assign(self, **kwargs)
2698 # Figure out columns of the output
2699 df2 = self._meta.assign(**_extract_meta(kwargs))
-> 2700 return elemwise(methods.assign, self, *pairs, meta=df2)
2701
2702 @derived_from(pd.DataFrame, ua_args=['index'])
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
3277
3278 from .multi import _maybe_align_partitions
-> 3279 args = _maybe_align_partitions(args)
3280 dasks = [arg for arg in args if isinstance(arg, (_Frame, Scalar, Array))]
3281 dfs = [df for df in dasks if isinstance(df, _Frame)]
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in _maybe_align_partitions(args)
145 divisions = dfs[0].divisions
146 if not all(df.divisions == divisions for df in dfs):
--> 147 dfs2 = iter(align_partitions(*dfs)[0])
148 return [a if not isinstance(a, _Frame) else next(dfs2) for a in args]
149 return args
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in align_partitions(*dfs)
101 raise ValueError("dfs contains no DataFrame and Series")
102 if not all(df.known_divisions for df in dfs1):
--> 103 raise ValueError("Not all divisions are known, can't align "
104 "partitions. Please use `set_index` "
105 "to set the index.")
ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.
python-3.x dask dask-ml
I am facing error in label encoding features. To generate my case(Originally, i have imported a csv file in dask dataframe and after cleaning, it is left with 28 columns), I have created dask dataframe like below:
import dask
import dask.dataframe as dd
from dask_ml.preprocessing import LabelEncoder
country = np.random.choice(['US','UK','IN'],1700000)
df = pd.DataFrame({'A':country,'B':range(1700000)})
ddf = dd.from_pandas(df,npartitions=2,sort=False)
Then I tried to label encode categorical columns like below :
le = LabelEncoder()
ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
which threw following error :
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-106-480a5e12886a> in <module>()
10 type(le.fit_transform(ddf['A']))
11 #ddf['A'] = dd.from_array(le.fit_transform(ddf['A']))
---> 12 ddf = ddf.assign(A=dd.from_dask_array(le.fit_transform(ddf['A'])))
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in assign(self, **kwargs)
2698 # Figure out columns of the output
2699 df2 = self._meta.assign(**_extract_meta(kwargs))
-> 2700 return elemwise(methods.assign, self, *pairs, meta=df2)
2701
2702 @derived_from(pd.DataFrame, ua_args=['index'])
/opt/conda/lib/python3.6/site-packages/dask/dataframe/core.py in elemwise(op, *args, **kwargs)
3277
3278 from .multi import _maybe_align_partitions
-> 3279 args = _maybe_align_partitions(args)
3280 dasks = [arg for arg in args if isinstance(arg, (_Frame, Scalar, Array))]
3281 dfs = [df for df in dasks if isinstance(df, _Frame)]
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in _maybe_align_partitions(args)
145 divisions = dfs[0].divisions
146 if not all(df.divisions == divisions for df in dfs):
--> 147 dfs2 = iter(align_partitions(*dfs)[0])
148 return [a if not isinstance(a, _Frame) else next(dfs2) for a in args]
149 return args
/opt/conda/lib/python3.6/site-packages/dask/dataframe/multi.py in align_partitions(*dfs)
101 raise ValueError("dfs contains no DataFrame and Series")
102 if not all(df.known_divisions for df in dfs1):
--> 103 raise ValueError("Not all divisions are known, can't align "
104 "partitions. Please use `set_index` "
105 "to set the index.")
ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.
python-3.x dask dask-ml
python-3.x dask dask-ml
asked Nov 15 '18 at 11:03
bp89bp89
6119
6119
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53318032%2ferror-while-assigning-label-encoded-values-to-column-in-dask-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53318032%2ferror-while-assigning-label-encoded-values-to-column-in-dask-dataframe%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown