minimizing the cost of uploading a very large tar file to Google Cloud Storage
I'm currently trying upload and then untar a very large file (1.3 tb) into Google Cloud Storage at the lowest price.
I initially thought about creating a really cheap instance just to download the file and put it in a bucket, then creating a new instance with a good amount of RAM to untar the file and then put the result in a new bucket.
However since the bucket price depends on the nbr of request I/O I'm not sure it's the best option, and even for performance it might not be the best.
What would be the best strategy to untar the file in the cheapest way?
google-cloud-platform google-cloud-storage google-compute-engine bucket
add a comment |
I'm currently trying upload and then untar a very large file (1.3 tb) into Google Cloud Storage at the lowest price.
I initially thought about creating a really cheap instance just to download the file and put it in a bucket, then creating a new instance with a good amount of RAM to untar the file and then put the result in a new bucket.
However since the bucket price depends on the nbr of request I/O I'm not sure it's the best option, and even for performance it might not be the best.
What would be the best strategy to untar the file in the cheapest way?
google-cloud-platform google-cloud-storage google-compute-engine bucket
add a comment |
I'm currently trying upload and then untar a very large file (1.3 tb) into Google Cloud Storage at the lowest price.
I initially thought about creating a really cheap instance just to download the file and put it in a bucket, then creating a new instance with a good amount of RAM to untar the file and then put the result in a new bucket.
However since the bucket price depends on the nbr of request I/O I'm not sure it's the best option, and even for performance it might not be the best.
What would be the best strategy to untar the file in the cheapest way?
google-cloud-platform google-cloud-storage google-compute-engine bucket
I'm currently trying upload and then untar a very large file (1.3 tb) into Google Cloud Storage at the lowest price.
I initially thought about creating a really cheap instance just to download the file and put it in a bucket, then creating a new instance with a good amount of RAM to untar the file and then put the result in a new bucket.
However since the bucket price depends on the nbr of request I/O I'm not sure it's the best option, and even for performance it might not be the best.
What would be the best strategy to untar the file in the cheapest way?
google-cloud-platform google-cloud-storage google-compute-engine bucket
google-cloud-platform google-cloud-storage google-compute-engine bucket
edited Nov 15 '18 at 19:58
Dan
4,26411938
4,26411938
asked Nov 15 '18 at 8:31
user1672455user1672455
865
865
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
First some background information on pricing:
Google has pretty good documentation about how to ingest data into GCS. From that guide:
Today, when you move data to Cloud Storage, there are no ingress traffic charges. The gsutil tool and the Storage Transfer Service are both offered at no charge. See the GCP network pricing page for the most up-to-date pricing details.
The "network pricing page" just says:
[Traffic type: Ingress] Price: No charge, unless there is a resource such as a load balancer that is processing ingress traffic. Responses to requests count as egress and are charged.
There is additional information on the GCS pricing page about your idea to use a GCE VM to write to GCS:
There are no network charges for accessing data in your Cloud Storage buckets when you do so with other GCP services in the following scenarios:
- Your bucket and GCP service are located in the same multi-regional or regional location. For example, accessing data in an
asia-east1
bucket with anasia-east1
Compute Engine instance.
From later in that same page, there is also information about the pre-request pricing:
Class A Operations: storage.*.insert[1]
[1] Simple, multipart, and resumable uploads with the JSON API are each considered one Class A operation.
The cost for Class A operations is per 10,000 operations, and is either $0.05 or $0.10 depending on the storage type. I believe you would only be doing 1 Class A operation (or at most, 1 Class A operation per file that you upload), so this probably wouldn't add up to much usage overall.
Now to answer your question:
For your use case, it sounds like you want to have the files in the tarball be individual files in GCS (as opposed to just having a big tarball stored in one file in GCS). The first step is to untar it somewhere, and the second step is to use gsutil cp
to copy it to GCS.
Unless you have to (i.e. not enough space on the machine that holds the tarball now), I wouldn't recommend copying the tarball to an intermediate VM in GCE before uploading to GCE, for two reasons:
gsutil cp
already handles a bunch of annoying edge cases for you: parallel uploads, resuming an upload in case there's a network failure, retries, checksum comparisons, etc.- Using any GCE VMs will add cost to this whole copy operation -- costs for the disks plus costs for the VMs themselves.
If you want to try the procedure out with something lower-risk first, make a small directory with a few megabytes of data and a few files and use gsutil cp
to copy it, then check how much you were charged for that. From the GCS pricing page:
Charges accrue daily, but Cloud Storage bills you only at the end of the billing period. You can view unbilled usage in your project's billing page in the Google Cloud Platform Console.
So you'd just have to wait a day to see how much you were billed.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53315218%2fminimizing-the-cost-of-uploading-a-very-large-tar-file-to-google-cloud-storage%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
First some background information on pricing:
Google has pretty good documentation about how to ingest data into GCS. From that guide:
Today, when you move data to Cloud Storage, there are no ingress traffic charges. The gsutil tool and the Storage Transfer Service are both offered at no charge. See the GCP network pricing page for the most up-to-date pricing details.
The "network pricing page" just says:
[Traffic type: Ingress] Price: No charge, unless there is a resource such as a load balancer that is processing ingress traffic. Responses to requests count as egress and are charged.
There is additional information on the GCS pricing page about your idea to use a GCE VM to write to GCS:
There are no network charges for accessing data in your Cloud Storage buckets when you do so with other GCP services in the following scenarios:
- Your bucket and GCP service are located in the same multi-regional or regional location. For example, accessing data in an
asia-east1
bucket with anasia-east1
Compute Engine instance.
From later in that same page, there is also information about the pre-request pricing:
Class A Operations: storage.*.insert[1]
[1] Simple, multipart, and resumable uploads with the JSON API are each considered one Class A operation.
The cost for Class A operations is per 10,000 operations, and is either $0.05 or $0.10 depending on the storage type. I believe you would only be doing 1 Class A operation (or at most, 1 Class A operation per file that you upload), so this probably wouldn't add up to much usage overall.
Now to answer your question:
For your use case, it sounds like you want to have the files in the tarball be individual files in GCS (as opposed to just having a big tarball stored in one file in GCS). The first step is to untar it somewhere, and the second step is to use gsutil cp
to copy it to GCS.
Unless you have to (i.e. not enough space on the machine that holds the tarball now), I wouldn't recommend copying the tarball to an intermediate VM in GCE before uploading to GCE, for two reasons:
gsutil cp
already handles a bunch of annoying edge cases for you: parallel uploads, resuming an upload in case there's a network failure, retries, checksum comparisons, etc.- Using any GCE VMs will add cost to this whole copy operation -- costs for the disks plus costs for the VMs themselves.
If you want to try the procedure out with something lower-risk first, make a small directory with a few megabytes of data and a few files and use gsutil cp
to copy it, then check how much you were charged for that. From the GCS pricing page:
Charges accrue daily, but Cloud Storage bills you only at the end of the billing period. You can view unbilled usage in your project's billing page in the Google Cloud Platform Console.
So you'd just have to wait a day to see how much you were billed.
add a comment |
First some background information on pricing:
Google has pretty good documentation about how to ingest data into GCS. From that guide:
Today, when you move data to Cloud Storage, there are no ingress traffic charges. The gsutil tool and the Storage Transfer Service are both offered at no charge. See the GCP network pricing page for the most up-to-date pricing details.
The "network pricing page" just says:
[Traffic type: Ingress] Price: No charge, unless there is a resource such as a load balancer that is processing ingress traffic. Responses to requests count as egress and are charged.
There is additional information on the GCS pricing page about your idea to use a GCE VM to write to GCS:
There are no network charges for accessing data in your Cloud Storage buckets when you do so with other GCP services in the following scenarios:
- Your bucket and GCP service are located in the same multi-regional or regional location. For example, accessing data in an
asia-east1
bucket with anasia-east1
Compute Engine instance.
From later in that same page, there is also information about the pre-request pricing:
Class A Operations: storage.*.insert[1]
[1] Simple, multipart, and resumable uploads with the JSON API are each considered one Class A operation.
The cost for Class A operations is per 10,000 operations, and is either $0.05 or $0.10 depending on the storage type. I believe you would only be doing 1 Class A operation (or at most, 1 Class A operation per file that you upload), so this probably wouldn't add up to much usage overall.
Now to answer your question:
For your use case, it sounds like you want to have the files in the tarball be individual files in GCS (as opposed to just having a big tarball stored in one file in GCS). The first step is to untar it somewhere, and the second step is to use gsutil cp
to copy it to GCS.
Unless you have to (i.e. not enough space on the machine that holds the tarball now), I wouldn't recommend copying the tarball to an intermediate VM in GCE before uploading to GCE, for two reasons:
gsutil cp
already handles a bunch of annoying edge cases for you: parallel uploads, resuming an upload in case there's a network failure, retries, checksum comparisons, etc.- Using any GCE VMs will add cost to this whole copy operation -- costs for the disks plus costs for the VMs themselves.
If you want to try the procedure out with something lower-risk first, make a small directory with a few megabytes of data and a few files and use gsutil cp
to copy it, then check how much you were charged for that. From the GCS pricing page:
Charges accrue daily, but Cloud Storage bills you only at the end of the billing period. You can view unbilled usage in your project's billing page in the Google Cloud Platform Console.
So you'd just have to wait a day to see how much you were billed.
add a comment |
First some background information on pricing:
Google has pretty good documentation about how to ingest data into GCS. From that guide:
Today, when you move data to Cloud Storage, there are no ingress traffic charges. The gsutil tool and the Storage Transfer Service are both offered at no charge. See the GCP network pricing page for the most up-to-date pricing details.
The "network pricing page" just says:
[Traffic type: Ingress] Price: No charge, unless there is a resource such as a load balancer that is processing ingress traffic. Responses to requests count as egress and are charged.
There is additional information on the GCS pricing page about your idea to use a GCE VM to write to GCS:
There are no network charges for accessing data in your Cloud Storage buckets when you do so with other GCP services in the following scenarios:
- Your bucket and GCP service are located in the same multi-regional or regional location. For example, accessing data in an
asia-east1
bucket with anasia-east1
Compute Engine instance.
From later in that same page, there is also information about the pre-request pricing:
Class A Operations: storage.*.insert[1]
[1] Simple, multipart, and resumable uploads with the JSON API are each considered one Class A operation.
The cost for Class A operations is per 10,000 operations, and is either $0.05 or $0.10 depending on the storage type. I believe you would only be doing 1 Class A operation (or at most, 1 Class A operation per file that you upload), so this probably wouldn't add up to much usage overall.
Now to answer your question:
For your use case, it sounds like you want to have the files in the tarball be individual files in GCS (as opposed to just having a big tarball stored in one file in GCS). The first step is to untar it somewhere, and the second step is to use gsutil cp
to copy it to GCS.
Unless you have to (i.e. not enough space on the machine that holds the tarball now), I wouldn't recommend copying the tarball to an intermediate VM in GCE before uploading to GCE, for two reasons:
gsutil cp
already handles a bunch of annoying edge cases for you: parallel uploads, resuming an upload in case there's a network failure, retries, checksum comparisons, etc.- Using any GCE VMs will add cost to this whole copy operation -- costs for the disks plus costs for the VMs themselves.
If you want to try the procedure out with something lower-risk first, make a small directory with a few megabytes of data and a few files and use gsutil cp
to copy it, then check how much you were charged for that. From the GCS pricing page:
Charges accrue daily, but Cloud Storage bills you only at the end of the billing period. You can view unbilled usage in your project's billing page in the Google Cloud Platform Console.
So you'd just have to wait a day to see how much you were billed.
First some background information on pricing:
Google has pretty good documentation about how to ingest data into GCS. From that guide:
Today, when you move data to Cloud Storage, there are no ingress traffic charges. The gsutil tool and the Storage Transfer Service are both offered at no charge. See the GCP network pricing page for the most up-to-date pricing details.
The "network pricing page" just says:
[Traffic type: Ingress] Price: No charge, unless there is a resource such as a load balancer that is processing ingress traffic. Responses to requests count as egress and are charged.
There is additional information on the GCS pricing page about your idea to use a GCE VM to write to GCS:
There are no network charges for accessing data in your Cloud Storage buckets when you do so with other GCP services in the following scenarios:
- Your bucket and GCP service are located in the same multi-regional or regional location. For example, accessing data in an
asia-east1
bucket with anasia-east1
Compute Engine instance.
From later in that same page, there is also information about the pre-request pricing:
Class A Operations: storage.*.insert[1]
[1] Simple, multipart, and resumable uploads with the JSON API are each considered one Class A operation.
The cost for Class A operations is per 10,000 operations, and is either $0.05 or $0.10 depending on the storage type. I believe you would only be doing 1 Class A operation (or at most, 1 Class A operation per file that you upload), so this probably wouldn't add up to much usage overall.
Now to answer your question:
For your use case, it sounds like you want to have the files in the tarball be individual files in GCS (as opposed to just having a big tarball stored in one file in GCS). The first step is to untar it somewhere, and the second step is to use gsutil cp
to copy it to GCS.
Unless you have to (i.e. not enough space on the machine that holds the tarball now), I wouldn't recommend copying the tarball to an intermediate VM in GCE before uploading to GCE, for two reasons:
gsutil cp
already handles a bunch of annoying edge cases for you: parallel uploads, resuming an upload in case there's a network failure, retries, checksum comparisons, etc.- Using any GCE VMs will add cost to this whole copy operation -- costs for the disks plus costs for the VMs themselves.
If you want to try the procedure out with something lower-risk first, make a small directory with a few megabytes of data and a few files and use gsutil cp
to copy it, then check how much you were charged for that. From the GCS pricing page:
Charges accrue daily, but Cloud Storage bills you only at the end of the billing period. You can view unbilled usage in your project's billing page in the Google Cloud Platform Console.
So you'd just have to wait a day to see how much you were billed.
edited Nov 15 '18 at 20:01
answered Nov 15 '18 at 19:08
DanDan
4,26411938
4,26411938
add a comment |
add a comment |
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53315218%2fminimizing-the-cost-of-uploading-a-very-large-tar-file-to-google-cloud-storage%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown