-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem of reproduce. #4
Comments
Thank you for your interest in our work.
In fact, while the code was running, we didn't see any warnings about loading the pretrained model. To help you reproduce the results above, we provide our preprocessed data and you can try to use it (download links: [Google Drive]; [Baidu Pan] (code:nq9y)] |
Thanks for your response! About question 2: Yes, there is no error when loading the model, because pytorch has two ways to save the entire network and only the network parameters, and the author uses the way to save the entire network, which will save the network structure, so it can be run even if the weight structure is different from the model structure in the code. The printed structure of the pre-trained model is attached to the file. In line 537 of the file, you can see that there are three convolution layers that differ from the results of the code model. I didn't find the code associated with it in the code model file, so I guess it was because of this problem that the author's results could not be reproduced by me. Thanks a lot again! |
I remembered the reason: Once, when exploring the model structure, I defined some 3D convolutions in the initialization function, but we did not use them in the forward process. Therefore, these 3D convolutions exist in our saved model structure. In my opinion, you can ignore their weights when loading the model to get correct results. |
Thanks for your reply. I have reproduce the result in your paper. I guess it may be that there is no fixed random number seed, which leads to a bad result for the first reproduce.
Thanks again for your great work and patient answers. |
I apologize for disturbing you again. Thanks a lot again. |
The problem you mentioned does exist, so I repeated the training of the model five times to get the average scores. From my point of view, the problem currently exists in the image captioning task and the change captioning task in the remote sensing field. I think this may be related to two factors: 1. There is a gap between the cross-entropy loss and the evaluation metrics. 2. Compared with the image-text datasets of natural images, the remote sensing image-text datasets are relatively small. |
Thanks a lot for your reply and advice. |
Hi Zhou @Zhou2019 , how did you reproduce the results? I directly run the code without any modification several times, but I only obtained the much lower results, such as CIDEr score with 90+. I run the code on 3080 and 3090 GPU, but I got the similar results. |
We can set batch_size over 35 to achieve the stable results |
Firstly, thanks for your greate work!
I am trying to reproduce the results of your paper using your code, but I encountered some problems. I hope you can help me solve them. Thank you for your time and attention.
In line 47 of the file train.py, there is a code snippet:
if i == 20:
break
This code needs to be commented out, otherwise the model training is insufficient and the performance is very poor. Please explain why you added this code and how it affects the training process. I guess this code snippet is for debug?
The current open source model structure is inconsistent with the paper or the public pre-trained model results. Specifically, the encoder_feat part of the model weights released by the author contains three convolutional layers:
(conv3d): ModuleList(
(0): Conv3d(512, 1024, kernel_size=(2, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1))
(1): Conv3d(512, 1024, kernel_size=(2, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1))
(2): Conv3d(512, 1024, kernel_size=(2, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1))
)
But there is no such part in the code model. Therefore, even if I comment out the code in the first problem, the model performance still does not reach the effect of the author's paper. Could you provide the correct model structure code for the paper? or help me how to reproduce the results in the paper.
Thanks a lot!!
The text was updated successfully, but these errors were encountered: