Problem when training my own dataset on Seq2seq #128

ghost · 2020-04-08T18:58:57Z

Hi Breta,

First of all thank you for your amazing work, i'm learning a lot from it !

Here is my problem. I am trying to train my own dataset (made of words) on the Seq2Seq model. However my dataset is composed of french words with accentuated characters such as 'é' or 'è'.

How do i extend the alphabet and train the model with this new characters ?

Here is what i tried. I added the new characters to the pre existing alphabet in the ocr.datahelpers. Then in the Seq2seq notebook i uploaded my images with the labels.

When i tuned the parameters, i changed char_size to 98 which is the amount of characters i use. I didn't touch any other parameter.

And then i have this error when i run the last cell :

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
----> 1 train_iterator.next_feed(BATCH_SIZE)

in next_feed(self, size)
104 decoder_targets_,
105 encoder_inputs_length_,
--> 106 decoder_targets_length_) = self.next_batch(size)
107 return {
108 encoder_inputs: encoder_inputs_,

in next_batch(self, batch_size)
88 print('objet.shape = ' + str((input_seq[i][:res['in_length'].values[i]]).shape))
89 print('len(img)=' + str(len(img)))
---> 90 input_seq[i][:res['in_length'].values[i]] = img
91 input_seq = input_seq.swapaxes(0, 1)
92

ValueError: could not broadcast input array from shape (148) into shape (120)`

I noticed the number (148) changes from time to time ( (106), (108), (132), (268), (90), (70),...)

Do you have an idea about where the problem lies and how i could deal with it please ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem when training my own dataset on Seq2seq #128

Problem when training my own dataset on Seq2seq #128

ghost commented Apr 8, 2020

Problem when training my own dataset on Seq2seq #128

Problem when training my own dataset on Seq2seq #128

Comments

ghost commented Apr 8, 2020