Concatenate layer in Keras makes fitting fail












0














Whenever I concatenate the outputs of two layers (for example, because I want to use softmax on some outputs and another activation function on the rest), the network always fails to learn.



This is some example code to demonstrate the problem:



from tensorflow.keras.layers import Lambda, Input, Dense, Concatenate, Dropout, Reshape, 
Conv2D, Flatten, MaxPooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import mnist
from tensorflow.keras.losses import mse, categorical_crossentropy, binary_crossentropy
from tensorflow.keras.utils import plot_model, to_categorical
from tensorflow.keras import backend as K
from tensorflow.keras import optimizers

import numpy as np
import matplotlib.pyplot as plt
import argparse
import os
import pygameVisualise as pyvis

# MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

no_cls = max(y_train)+1
width = 20

extra_dims = True

image_size = x_train.shape[1]
original_dim = image_size * image_size
x_train = np.reshape(x_train, [-1, original_dim])
x_test = np.reshape(x_test, [-1, original_dim])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
y_train = to_categorical(y_train, num_classes=width if extra_dims else no_cls)
y_test = to_categorical(y_test, num_classes=width if extra_dims else no_cls)

hidden_dim = 512
batch_sz = 256
eps = 10

ins = Input(shape=(original_dim,))
x = Dense(hidden_dim)(ins)
cls_pred = Dense(no_cls, activation="softmax")(x)
other = Dense(width-no_cls)(x)
outs = Concatenate()([cls_pred, other])

encoder = Model(ins, outs if extra_dims else cls_pred, name="encoder")
encoder.summary()

def cust_loss_fn(y_true, y_pred):
return categorical_crossentropy(y_true[:no_cls], y_pred[:no_cls])

optimiser = optimizers.SGD(lr=0.003, clipvalue=0.1)
encoder.compile(optimizer=optimiser, loss=cust_loss_fn,
metrics=["accuracy"])

encoder.fit(x_train, y_train,
batch_size=batch_sz,
epochs=eps,
validation_data=(x_test, y_test))

score = encoder.evaluate(x_test, y_test)
print(score)

print(encoder.predict(x_train[0:10]))


With extra_dims = False, i.e. no concatenate layer, the network will consistently reach 88% accuracy in the 10 epochs. When it is True, the network will stay at around 8% accuracy and the loss will not drop at all during training.



Am I doing something wrong?










share|improve this question


















  • 1




    Are you sure the model runs? Since that way of slicing in the loss function is wrong as it is only selecting the first no_cls samples and therefore the shapes would not be consistent. It must be [:,:no_cls] instead.
    – today
    Nov 12 at 19:36












  • @today Thank you so much, I changed that line and it started learning, although much slower than before - guess this was to be expected though. Thanks for pointing it out! Wonder why this works without the concatenate layer though? Wasn't having any problems with extra_dims = False
    – TheAbelo2
    Nov 12 at 21:36










  • It does not run and gives me errors when I use keras directly. I did not test it using tensorflow.keras and I don't know why it works with that. It's strange indeed.
    – today
    Nov 13 at 4:57
















0














Whenever I concatenate the outputs of two layers (for example, because I want to use softmax on some outputs and another activation function on the rest), the network always fails to learn.



This is some example code to demonstrate the problem:



from tensorflow.keras.layers import Lambda, Input, Dense, Concatenate, Dropout, Reshape, 
Conv2D, Flatten, MaxPooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import mnist
from tensorflow.keras.losses import mse, categorical_crossentropy, binary_crossentropy
from tensorflow.keras.utils import plot_model, to_categorical
from tensorflow.keras import backend as K
from tensorflow.keras import optimizers

import numpy as np
import matplotlib.pyplot as plt
import argparse
import os
import pygameVisualise as pyvis

# MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

no_cls = max(y_train)+1
width = 20

extra_dims = True

image_size = x_train.shape[1]
original_dim = image_size * image_size
x_train = np.reshape(x_train, [-1, original_dim])
x_test = np.reshape(x_test, [-1, original_dim])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
y_train = to_categorical(y_train, num_classes=width if extra_dims else no_cls)
y_test = to_categorical(y_test, num_classes=width if extra_dims else no_cls)

hidden_dim = 512
batch_sz = 256
eps = 10

ins = Input(shape=(original_dim,))
x = Dense(hidden_dim)(ins)
cls_pred = Dense(no_cls, activation="softmax")(x)
other = Dense(width-no_cls)(x)
outs = Concatenate()([cls_pred, other])

encoder = Model(ins, outs if extra_dims else cls_pred, name="encoder")
encoder.summary()

def cust_loss_fn(y_true, y_pred):
return categorical_crossentropy(y_true[:no_cls], y_pred[:no_cls])

optimiser = optimizers.SGD(lr=0.003, clipvalue=0.1)
encoder.compile(optimizer=optimiser, loss=cust_loss_fn,
metrics=["accuracy"])

encoder.fit(x_train, y_train,
batch_size=batch_sz,
epochs=eps,
validation_data=(x_test, y_test))

score = encoder.evaluate(x_test, y_test)
print(score)

print(encoder.predict(x_train[0:10]))


With extra_dims = False, i.e. no concatenate layer, the network will consistently reach 88% accuracy in the 10 epochs. When it is True, the network will stay at around 8% accuracy and the loss will not drop at all during training.



Am I doing something wrong?










share|improve this question


















  • 1




    Are you sure the model runs? Since that way of slicing in the loss function is wrong as it is only selecting the first no_cls samples and therefore the shapes would not be consistent. It must be [:,:no_cls] instead.
    – today
    Nov 12 at 19:36












  • @today Thank you so much, I changed that line and it started learning, although much slower than before - guess this was to be expected though. Thanks for pointing it out! Wonder why this works without the concatenate layer though? Wasn't having any problems with extra_dims = False
    – TheAbelo2
    Nov 12 at 21:36










  • It does not run and gives me errors when I use keras directly. I did not test it using tensorflow.keras and I don't know why it works with that. It's strange indeed.
    – today
    Nov 13 at 4:57














0












0








0


1





Whenever I concatenate the outputs of two layers (for example, because I want to use softmax on some outputs and another activation function on the rest), the network always fails to learn.



This is some example code to demonstrate the problem:



from tensorflow.keras.layers import Lambda, Input, Dense, Concatenate, Dropout, Reshape, 
Conv2D, Flatten, MaxPooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import mnist
from tensorflow.keras.losses import mse, categorical_crossentropy, binary_crossentropy
from tensorflow.keras.utils import plot_model, to_categorical
from tensorflow.keras import backend as K
from tensorflow.keras import optimizers

import numpy as np
import matplotlib.pyplot as plt
import argparse
import os
import pygameVisualise as pyvis

# MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

no_cls = max(y_train)+1
width = 20

extra_dims = True

image_size = x_train.shape[1]
original_dim = image_size * image_size
x_train = np.reshape(x_train, [-1, original_dim])
x_test = np.reshape(x_test, [-1, original_dim])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
y_train = to_categorical(y_train, num_classes=width if extra_dims else no_cls)
y_test = to_categorical(y_test, num_classes=width if extra_dims else no_cls)

hidden_dim = 512
batch_sz = 256
eps = 10

ins = Input(shape=(original_dim,))
x = Dense(hidden_dim)(ins)
cls_pred = Dense(no_cls, activation="softmax")(x)
other = Dense(width-no_cls)(x)
outs = Concatenate()([cls_pred, other])

encoder = Model(ins, outs if extra_dims else cls_pred, name="encoder")
encoder.summary()

def cust_loss_fn(y_true, y_pred):
return categorical_crossentropy(y_true[:no_cls], y_pred[:no_cls])

optimiser = optimizers.SGD(lr=0.003, clipvalue=0.1)
encoder.compile(optimizer=optimiser, loss=cust_loss_fn,
metrics=["accuracy"])

encoder.fit(x_train, y_train,
batch_size=batch_sz,
epochs=eps,
validation_data=(x_test, y_test))

score = encoder.evaluate(x_test, y_test)
print(score)

print(encoder.predict(x_train[0:10]))


With extra_dims = False, i.e. no concatenate layer, the network will consistently reach 88% accuracy in the 10 epochs. When it is True, the network will stay at around 8% accuracy and the loss will not drop at all during training.



Am I doing something wrong?










share|improve this question













Whenever I concatenate the outputs of two layers (for example, because I want to use softmax on some outputs and another activation function on the rest), the network always fails to learn.



This is some example code to demonstrate the problem:



from tensorflow.keras.layers import Lambda, Input, Dense, Concatenate, Dropout, Reshape, 
Conv2D, Flatten, MaxPooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.datasets import mnist
from tensorflow.keras.losses import mse, categorical_crossentropy, binary_crossentropy
from tensorflow.keras.utils import plot_model, to_categorical
from tensorflow.keras import backend as K
from tensorflow.keras import optimizers

import numpy as np
import matplotlib.pyplot as plt
import argparse
import os
import pygameVisualise as pyvis

# MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

no_cls = max(y_train)+1
width = 20

extra_dims = True

image_size = x_train.shape[1]
original_dim = image_size * image_size
x_train = np.reshape(x_train, [-1, original_dim])
x_test = np.reshape(x_test, [-1, original_dim])
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
y_train = to_categorical(y_train, num_classes=width if extra_dims else no_cls)
y_test = to_categorical(y_test, num_classes=width if extra_dims else no_cls)

hidden_dim = 512
batch_sz = 256
eps = 10

ins = Input(shape=(original_dim,))
x = Dense(hidden_dim)(ins)
cls_pred = Dense(no_cls, activation="softmax")(x)
other = Dense(width-no_cls)(x)
outs = Concatenate()([cls_pred, other])

encoder = Model(ins, outs if extra_dims else cls_pred, name="encoder")
encoder.summary()

def cust_loss_fn(y_true, y_pred):
return categorical_crossentropy(y_true[:no_cls], y_pred[:no_cls])

optimiser = optimizers.SGD(lr=0.003, clipvalue=0.1)
encoder.compile(optimizer=optimiser, loss=cust_loss_fn,
metrics=["accuracy"])

encoder.fit(x_train, y_train,
batch_size=batch_sz,
epochs=eps,
validation_data=(x_test, y_test))

score = encoder.evaluate(x_test, y_test)
print(score)

print(encoder.predict(x_train[0:10]))


With extra_dims = False, i.e. no concatenate layer, the network will consistently reach 88% accuracy in the 10 epochs. When it is True, the network will stay at around 8% accuracy and the loss will not drop at all during training.



Am I doing something wrong?







python tensorflow keras






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Nov 12 at 18:53









TheAbelo2

17413




17413








  • 1




    Are you sure the model runs? Since that way of slicing in the loss function is wrong as it is only selecting the first no_cls samples and therefore the shapes would not be consistent. It must be [:,:no_cls] instead.
    – today
    Nov 12 at 19:36












  • @today Thank you so much, I changed that line and it started learning, although much slower than before - guess this was to be expected though. Thanks for pointing it out! Wonder why this works without the concatenate layer though? Wasn't having any problems with extra_dims = False
    – TheAbelo2
    Nov 12 at 21:36










  • It does not run and gives me errors when I use keras directly. I did not test it using tensorflow.keras and I don't know why it works with that. It's strange indeed.
    – today
    Nov 13 at 4:57














  • 1




    Are you sure the model runs? Since that way of slicing in the loss function is wrong as it is only selecting the first no_cls samples and therefore the shapes would not be consistent. It must be [:,:no_cls] instead.
    – today
    Nov 12 at 19:36












  • @today Thank you so much, I changed that line and it started learning, although much slower than before - guess this was to be expected though. Thanks for pointing it out! Wonder why this works without the concatenate layer though? Wasn't having any problems with extra_dims = False
    – TheAbelo2
    Nov 12 at 21:36










  • It does not run and gives me errors when I use keras directly. I did not test it using tensorflow.keras and I don't know why it works with that. It's strange indeed.
    – today
    Nov 13 at 4:57








1




1




Are you sure the model runs? Since that way of slicing in the loss function is wrong as it is only selecting the first no_cls samples and therefore the shapes would not be consistent. It must be [:,:no_cls] instead.
– today
Nov 12 at 19:36






Are you sure the model runs? Since that way of slicing in the loss function is wrong as it is only selecting the first no_cls samples and therefore the shapes would not be consistent. It must be [:,:no_cls] instead.
– today
Nov 12 at 19:36














@today Thank you so much, I changed that line and it started learning, although much slower than before - guess this was to be expected though. Thanks for pointing it out! Wonder why this works without the concatenate layer though? Wasn't having any problems with extra_dims = False
– TheAbelo2
Nov 12 at 21:36




@today Thank you so much, I changed that line and it started learning, although much slower than before - guess this was to be expected though. Thanks for pointing it out! Wonder why this works without the concatenate layer though? Wasn't having any problems with extra_dims = False
– TheAbelo2
Nov 12 at 21:36












It does not run and gives me errors when I use keras directly. I did not test it using tensorflow.keras and I don't know why it works with that. It's strange indeed.
– today
Nov 13 at 4:57




It does not run and gives me errors when I use keras directly. I did not test it using tensorflow.keras and I don't know why it works with that. It's strange indeed.
– today
Nov 13 at 4:57

















active

oldest

votes











Your Answer






StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268374%2fconcatenate-layer-in-keras-makes-fitting-fail%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown






























active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes
















draft saved

draft discarded




















































Thanks for contributing an answer to Stack Overflow!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f53268374%2fconcatenate-layer-in-keras-makes-fitting-fail%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Xamarin.iOS Cant Deploy on Iphone

Glorious Revolution

Dulmage-Mendelsohn matrix decomposition in Python