0

Here is the dataset. I tried converting this implementation into its analog with Keras. Why are my predictions SO bad? They are almost always close to a single number. Doesn't matter if I use more layers, more neurons, or even if I use convolutional NN. Am I doing something quite wrong? I used the same dataset to train the network in the second link and had good results.

data1_file_path = 'GSE106648_data1.csv'
data2_file_path = 'GSE106648_data2.csv'

#read in training data train = np.loadtxt(data1_file_path, skiprows=1, delimiter=',') print("Finish read training set") #read in test data test = np.loadtxt(data2_file_path, skiprows=1, delimiter=',') print("Finish read test set")

#separate training/testing input features and labels x_train = train[:,1:] y_train = train[:,0].reshape(-1,1) x_test = test[:,1:] y_test = test[:,0].reshape(-1,1)

define base model

def baseline_model(): # create model model = Sequential() model.add(Dense(200, input_shape=(x_train.shape[1],),activation='LeakyReLU')) model.add(Dense(1)) # Compile model model.compile(loss='mse', optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001)) return model

baseline_model().fit(x_train,y_train,batch_size=20,epochs=500) y_pred = baseline_model().predict(x_test)

These are the predictions:

enter image description here

And the loss function vs epochs.

enter image description here

Caterina
  • 229
  • Yes but all the predictions converge to zero. – Caterina Dec 09 '22 at 23:02
  • smaller learning rate? Oh, you should center your response/X variables too. – John Madden Dec 09 '22 at 23:07
  • You have 67 training samples with 10,000 features, and 70 testing samples. This is an overfitting problem. Use [tag:regularization], or [tag:feature-selection] or get hundreds of thousands more samples. – Sycorax Dec 09 '22 at 23:36
  • Ok, thanks! Do you know what is the equivalent in Keras for the l1 penality they use? Is it kernel_regularizer, bias_regularizer or activity_regularizer? – Caterina Dec 09 '22 at 23:42
  • I already used feature selection, initially they were 400k. And getting more samples is not an option unfortunately. – Caterina Dec 10 '22 at 01:10

0 Answers0