Google colab TPU and reading from disc while traning

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ height:90px;width:728px;box-sizing:border-box;
}

I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.

dataset = tf.data.Dataset.from_tensor_slices(in_pics)

dataset = dataset.map(extract_fn)



def extract_fn(x):

    x = tf.read_file(x)

    x = tf.image.decode_jpeg(x, channels=3)

    x = tf.image.resize_images(x, [64, 64])

return x

But then I try to train, I get this error

File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')

Can I work around it somehow?
Also tried with TFRecords API, get the same error.

edited Nov 17 '18 at 14:34

asked Nov 17 '18 at 1:11

had

547

add a comment |

I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.

dataset = tf.data.Dataset.from_tensor_slices(in_pics)

dataset = dataset.map(extract_fn)



def extract_fn(x):

    x = tf.read_file(x)

    x = tf.image.decode_jpeg(x, channels=3)

    x = tf.image.resize_images(x, [64, 64])

return x

But then I try to train, I get this error

File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')

Can I work around it somehow?
Also tried with TFRecords API, get the same error.

edited Nov 17 '18 at 14:34

asked Nov 17 '18 at 1:11

had

547

add a comment |

I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.

dataset = tf.data.Dataset.from_tensor_slices(in_pics)

dataset = dataset.map(extract_fn)



def extract_fn(x):

    x = tf.read_file(x)

    x = tf.image.decode_jpeg(x, channels=3)

    x = tf.image.resize_images(x, [64, 64])

return x

But then I try to train, I get this error

File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')

Can I work around it somehow?
Also tried with TFRecords API, get the same error.

edited Nov 17 '18 at 14:34

asked Nov 17 '18 at 1:11

had

547

I have 100k pics, and it doesn't fit into ram, so I need read it from disc while training.

dataset = tf.data.Dataset.from_tensor_slices(in_pics)

dataset = dataset.map(extract_fn)



def extract_fn(x):

    x = tf.read_file(x)

    x = tf.image.decode_jpeg(x, channels=3)

    x = tf.image.resize_images(x, [64, 64])

return x

But then I try to train, I get this error

File system scheme '[local]' not implemented (file: '/content/anime-faces/black_hair/danbooru_2629248_487b383a8a6e7cc0e004383300477d66.jpg')

Can I work around it somehow?
Also tried with TFRecords API, get the same error.

python tensorflow google-colaboratory google-cloud-tpu

edited Nov 17 '18 at 14:34

asked Nov 17 '18 at 1:11

had

547

edited Nov 17 '18 at 14:34

asked Nov 17 '18 at 1:11

had

547

edited Nov 17 '18 at 14:34

asked Nov 17 '18 at 1:11

had

547

asked Nov 17 '18 at 1:11

had

547

asked Nov 17 '18 at 1:11

had

547

add a comment |

1 Answer
1

active

oldest

votes

The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.

To optimize performance when using GCS add prefetch(AUTOTUNE) to your tf.data pipeline, and for small (<50GB) datasets use cache().

edited Feb 4 at 21:09

michaelb

24616

answered Dec 13 '18 at 2:05

Ami F

80729

Its how I do it, but it has a great performance hit.

– had
Dec 14 '18 at 3:03

That's odd; GCS storage should be the fastest way to get data to a TPU. Perhaps try increasing the replication of your stored data? (E.g. global or multi regional storage instead of zonal)

– Ami F
Dec 14 '18 at 15:36

I will try, for now, I only compared colab vm ram vs google cloud storage. And second option 3 times slower.

– had
Dec 15 '18 at 1:09

Tested with multi-regional, its faster now, but still worse then colab RAM.

– had
Jan 17 at 8:46

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53347293%2fgoogle-colab-tpu-and-reading-from-disc-while-traning%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.

To optimize performance when using GCS add prefetch(AUTOTUNE) to your tf.data pipeline, and for small (<50GB) datasets use cache().

edited Feb 4 at 21:09

michaelb

24616

answered Dec 13 '18 at 2:05

Ami F

80729

Its how I do it, but it has a great performance hit.

– had
Dec 14 '18 at 3:03

That's odd; GCS storage should be the fastest way to get data to a TPU. Perhaps try increasing the replication of your stored data? (E.g. global or multi regional storage instead of zonal)

– Ami F
Dec 14 '18 at 15:36

I will try, for now, I only compared colab vm ram vs google cloud storage. And second option 3 times slower.

– had
Dec 15 '18 at 1:09

Tested with multi-regional, its faster now, but still worse then colab RAM.

– had
Jan 17 at 8:46

add a comment |

The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.

To optimize performance when using GCS add prefetch(AUTOTUNE) to your tf.data pipeline, and for small (<50GB) datasets use cache().

edited Feb 4 at 21:09

michaelb

24616

answered Dec 13 '18 at 2:05

Ami F

80729

Its how I do it, but it has a great performance hit.

– had
Dec 14 '18 at 3:03

That's odd; GCS storage should be the fastest way to get data to a TPU. Perhaps try increasing the replication of your stored data? (E.g. global or multi regional storage instead of zonal)

– Ami F
Dec 14 '18 at 15:36

I will try, for now, I only compared colab vm ram vs google cloud storage. And second option 3 times slower.

– had
Dec 15 '18 at 1:09

Tested with multi-regional, its faster now, but still worse then colab RAM.

– had
Jan 17 at 8:46

add a comment |

The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.

To optimize performance when using GCS add prefetch(AUTOTUNE) to your tf.data pipeline, and for small (<50GB) datasets use cache().

edited Feb 4 at 21:09

michaelb

24616

answered Dec 13 '18 at 2:05

Ami F

80729

The Cloud TPU you use in this scenario is not colocated on the same VM where your python runs. Easiest is to stage your data on GCS and use a gs:// URI to point the TPU at it.

To optimize performance when using GCS add prefetch(AUTOTUNE) to your tf.data pipeline, and for small (<50GB) datasets use cache().

edited Feb 4 at 21:09

michaelb

24616

answered Dec 13 '18 at 2:05

Ami F

80729

edited Feb 4 at 21:09

michaelb

24616

edited Feb 4 at 21:09

michaelb

24616

edited Feb 4 at 21:09

michaelb

24616

answered Dec 13 '18 at 2:05

Ami F

80729

answered Dec 13 '18 at 2:05

Ami F

80729

answered Dec 13 '18 at 2:05

Ami F

80729

Its how I do it, but it has a great performance hit.

– had
Dec 14 '18 at 3:03

That's odd; GCS storage should be the fastest way to get data to a TPU. Perhaps try increasing the replication of your stored data? (E.g. global or multi regional storage instead of zonal)

– Ami F
Dec 14 '18 at 15:36

I will try, for now, I only compared colab vm ram vs google cloud storage. And second option 3 times slower.

– had
Dec 15 '18 at 1:09

Tested with multi-regional, its faster now, but still worse then colab RAM.

– had
Jan 17 at 8:46

add a comment |

Its how I do it, but it has a great performance hit.

– had
Dec 14 '18 at 3:03

That's odd; GCS storage should be the fastest way to get data to a TPU. Perhaps try increasing the replication of your stored data? (E.g. global or multi regional storage instead of zonal)

– Ami F
Dec 14 '18 at 15:36

I will try, for now, I only compared colab vm ram vs google cloud storage. And second option 3 times slower.

– had
Dec 15 '18 at 1:09

Tested with multi-regional, its faster now, but still worse then colab RAM.

– had
Jan 17 at 8:46

Its how I do it, but it has a great performance hit.

– had
Dec 14 '18 at 3:03

That's odd; GCS storage should be the fastest way to get data to a TPU. Perhaps try increasing the replication of your stored data? (E.g. global or multi regional storage instead of zonal)

– Ami F
Dec 14 '18 at 15:36

I will try, for now, I only compared colab vm ram vs google cloud storage. And second option 3 times slower.

– had
Dec 15 '18 at 1:09

Tested with multi-regional, its faster now, but still worse then colab RAM.

– had
Jan 17 at 8:46

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ldnppbEYd,IMjVe6ktQmbO UAgk64fVvb19,aje7TI,F2OgxxR4M2fErkifzdat73 QVCX6BvWuqrneKXZJe77Urg nRRwiA1Obnp2r

搜尋此網誌

Vfrdtyky