How to adopt multiple different loss functions in each steps of LSTM in Keras
I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
keras lstm
add a comment |
I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
keras lstm
add a comment |
I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
keras lstm
I have a set of sentences and their scores, I would like to train a marking system that could predict the score for a given sentence, such one example is like this:
(X =Tomorrow is a good day, Y = 0.9)
I would like to use LSTM to build such a marking system, and also consider the sequential relationship between each word in the sentence, so the training example shown above is transformed as following:
(x1=Tomorrow, y1=is) (x2=is, y2=a) (x3=a, y3=good) (x4=day, y4=0.9)
When training this LSTM, I would like the first three time steps using a softmax classifier, and the final step using a MSE. It is obvious that the loss function used in this LSTM is composed of two different loss functions. In this case, it seems the Keras does not provide the way to address my problem directly. In addition, I am not sure whether my method to build the marking system is correct or not.
keras lstm
keras lstm
edited Nov 13 '18 at 9:23
Amir
6,80063972
6,80063972
asked Nov 13 '18 at 8:43
Kevin SunKevin Sun
1209
1209
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy
for the language model and MSE
for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?
– Kevin Sun
Nov 13 '18 at 20:21
@KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).
– Amir
Nov 13 '18 at 21:04
Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?
– Kevin Sun
Nov 13 '18 at 21:28
Your welcome. You could but I am unsure about the convergence of the model.
– Amir
Nov 13 '18 at 21:44
Is your first reply to my question with the same meaning that I asked you in the previous post?
– Kevin Sun
Nov 13 '18 at 21:53
|
show 2 more comments
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276981%2fhow-to-adopt-multiple-different-loss-functions-in-each-steps-of-lstm-in-keras%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy
for the language model and MSE
for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?
– Kevin Sun
Nov 13 '18 at 20:21
@KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).
– Amir
Nov 13 '18 at 21:04
Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?
– Kevin Sun
Nov 13 '18 at 21:28
Your welcome. You could but I am unsure about the convergence of the model.
– Amir
Nov 13 '18 at 21:44
Is your first reply to my question with the same meaning that I asked you in the previous post?
– Kevin Sun
Nov 13 '18 at 21:53
|
show 2 more comments
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy
for the language model and MSE
for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?
– Kevin Sun
Nov 13 '18 at 20:21
@KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).
– Amir
Nov 13 '18 at 21:04
Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?
– Kevin Sun
Nov 13 '18 at 21:28
Your welcome. You could but I am unsure about the convergence of the model.
– Amir
Nov 13 '18 at 21:44
Is your first reply to my question with the same meaning that I asked you in the previous post?
– Kevin Sun
Nov 13 '18 at 21:53
|
show 2 more comments
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy
for the language model and MSE
for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
Keras support multiple loss functions as well:
model = Model(inputs=inputs,
outputs=[lang_model, sent_model])
model.compile(optimizer='sgd',
loss=['categorical_crossentropy', 'mse'],
metrics=['accuracy'], loss_weights=[1., 1.])
Based on your explanation, I think you need a model that first, predict a token based on previous tokens, in NLP domain it usually called Language model, and then compute a score which I assume it is a sentiment (it is applicable to other domain).
To do so, you can train your language model with LSTM and pick the last output of LSTM for your ranking task. To this end, you need to define two loss function: categorical_crossentropy
for the language model and MSE
for the ranking task.
This tutorial would be helpful: https://www.pyimagesearch.com/2018/06/04/keras-multiple-outputs-and-multiple-losses/
edited Nov 13 '18 at 21:04
answered Nov 13 '18 at 9:07
AmirAmir
6,80063972
6,80063972
Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?
– Kevin Sun
Nov 13 '18 at 20:21
@KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).
– Amir
Nov 13 '18 at 21:04
Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?
– Kevin Sun
Nov 13 '18 at 21:28
Your welcome. You could but I am unsure about the convergence of the model.
– Amir
Nov 13 '18 at 21:44
Is your first reply to my question with the same meaning that I asked you in the previous post?
– Kevin Sun
Nov 13 '18 at 21:53
|
show 2 more comments
Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?
– Kevin Sun
Nov 13 '18 at 20:21
@KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).
– Amir
Nov 13 '18 at 21:04
Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?
– Kevin Sun
Nov 13 '18 at 21:28
Your welcome. You could but I am unsure about the convergence of the model.
– Amir
Nov 13 '18 at 21:44
Is your first reply to my question with the same meaning that I asked you in the previous post?
– Kevin Sun
Nov 13 '18 at 21:53
Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?
– Kevin Sun
Nov 13 '18 at 20:21
Hi Amir, thanks very much for your reply. Does the "token" in your response mean the features of the sentence? i.e., the input for the softmax at the last time step?
– Kevin Sun
Nov 13 '18 at 20:21
@KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).
– Amir
Nov 13 '18 at 21:04
@KevinSun I mean the things that you pass to your LSTM. It is usually word-vectors (Glove or w2v).
– Amir
Nov 13 '18 at 21:04
Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?
– Kevin Sun
Nov 13 '18 at 21:28
Thanks very much again. As I understood from the tutorial you referred, the multiple loss is built on different output layers that are without any connections. For example, if we want two losses, and these two losses are built on two layers named layer1 and layer2. From the tutorial, layer1 and layer3 have no connections to each other. In my problem, my losses are built on the outputs of layer3 and layer4, while the input of layer4 is the output of layer3. In this regard, could I use thses multiple losses?
– Kevin Sun
Nov 13 '18 at 21:28
Your welcome. You could but I am unsure about the convergence of the model.
– Amir
Nov 13 '18 at 21:44
Your welcome. You could but I am unsure about the convergence of the model.
– Amir
Nov 13 '18 at 21:44
Is your first reply to my question with the same meaning that I asked you in the previous post?
– Kevin Sun
Nov 13 '18 at 21:53
Is your first reply to my question with the same meaning that I asked you in the previous post?
– Kevin Sun
Nov 13 '18 at 21:53
|
show 2 more comments
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53276981%2fhow-to-adopt-multiple-different-loss-functions-in-each-steps-of-lstm-in-keras%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown