Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem of reproduce. #4

Open
Zhou2019 opened this issue Mar 25, 2023 · 9 comments
Open

Problem of reproduce. #4

Zhou2019 opened this issue Mar 25, 2023 · 9 comments

Comments

@Zhou2019
Copy link

Firstly, thanks for your greate work!

I am trying to reproduce the results of your paper using your code, but I encountered some problems. I hope you can help me solve them. Thank you for your time and attention.

  1. In line 47 of the file train.py, there is a code snippet:
    if i == 20:
    break
    This code needs to be commented out, otherwise the model training is insufficient and the performance is very poor. Please explain why you added this code and how it affects the training process. I guess this code snippet is for debug?

  2. The current open source model structure is inconsistent with the paper or the public pre-trained model results. Specifically, the encoder_feat part of the model weights released by the author contains three convolutional layers:
    (conv3d): ModuleList(
    (0): Conv3d(512, 1024, kernel_size=(2, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1))
    (1): Conv3d(512, 1024, kernel_size=(2, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1))
    (2): Conv3d(512, 1024, kernel_size=(2, 3, 3), stride=(1, 1, 1), padding=(0, 1, 1))
    )
    But there is no such part in the code model. Therefore, even if I comment out the code in the first problem, the model performance still does not reach the effect of the author's paper. Could you provide the correct model structure code for the paper? or help me how to reproduce the results in the paper.
    Thanks a lot!!

@Chen-Yang-Liu
Copy link
Owner

Thank you for your interest in our work.

  1. Response to Question 1: You are right. Those two lines of code are used to confirm the training process is correct before we release the code, now you can delete it.

  2. Response to Question 2: Based on our public code and model, I tried to run the eval.py file and got the following scores:
    image
    As you can see, the available model is higher than the score in our paper, because the result in our paper is the average of the scores we trained five times.

In fact, while the code was running, we didn't see any warnings about loading the pretrained model. To help you reproduce the results above, we provide our preprocessed data and you can try to use it (download links: [Google Drive]; [Baidu Pan] (code:nq9y)]

@Zhou2019
Copy link
Author

Thanks for your response!

About question 2: Yes, there is no error when loading the model, because pytorch has two ways to save the entire network and only the network parameters, and the author uses the way to save the entire network, which will save the network structure, so it can be run even if the weight structure is different from the model structure in the code.
Print out the model structure with training weights and you can find that the encoder_feat components in it have three more convolution layers than the model structure in the code.

The printed structure of the pre-trained model is attached to the file.
pretrain_model_acrch.txt

In line 537 of the file, you can see that there are three convolution layers that differ from the results of the code model. I didn't find the code associated with it in the code model file, so I guess it was because of this problem that the author's results could not be reproduced by me.

Thanks a lot again!

@Chen-Yang-Liu
Copy link
Owner

Thanks for your response!

About question 2: Yes, there is no error when loading the model, because pytorch has two ways to save the entire network and only the network parameters, and the author uses the way to save the entire network, which will save the network structure, so it can be run even if the weight structure is different from the model structure in the code. Print out the model structure with training weights and you can find that the encoder_feat components in it have three more convolution layers than the model structure in the code.

The printed structure of the pre-trained model is attached to the file. pretrain_model_acrch.txt

In line 537 of the file, you can see that there are three convolution layers that differ from the results of the code model. I didn't find the code associated with it in the code model file, so I guess it was because of this problem that the author's results could not be reproduced by me.

Thanks a lot again!

I remembered the reason: Once, when exploring the model structure, I defined some 3D convolutions in the initialization function, but we did not use them in the forward process. Therefore, these 3D convolutions exist in our saved model structure. In my opinion, you can ignore their weights when loading the model to get correct results.

@Zhou2019
Copy link
Author

Zhou2019 commented Mar 26, 2023

Thanks for your reply.

I have reproduce the result in your paper. I guess it may be that there is no fixed random number seed, which leads to a bad result for the first reproduce.
The following is the result of my reproduce:

['Bleu_1', 'Bleu_2', 'Bleu_3', 'Bleu_4'] [0.9441283734498986, 0.9350939664945913, 0.9295561396083202, 0.9255503767000808]
METEOR 0.7088351869561866
ROUGE_L 0.951196575871603
CIDEr 0.0
nochange_acc: 0.9367875647668393
change_metric:
['Bleu_1', 'Bleu_2', 'Bleu_3', 'Bleu_4'] [0.7669174160401347, 0.6236078299399817, 0.4933946274791389, 0.38481022734657844]
METEOR 0.25945607910671487
ROUGE_L 0.5328756478768945
CIDEr 0.6234487965908536
change_acc: 0.9128630705394191
.......................................................
['Bleu_1', 'Bleu_2', 'Bleu_3', 'Bleu_4'] [0.8504440043441608, 0.7665358703553551, 0.6927971568203759, 0.631244447164277]
METEOR 0.3970548039576966
ROUGE_L 0.7421445413527336
CIDEr 1.342158995158655
trans - beam size 1: BLEU-1 0.8504 BLEU-2 0.7665 BLEU-3 0.6928 BLEU-4 0.6312 METEOR 0.3971 ROUGE_L 0.7421 CIDEr 1.3422

Thanks again for your great work and patient answers.

@Zhou2019
Copy link
Author

I apologize for disturbing you again.
Did the author encounter large variations in model performance during the experiment? I trained a model that reached the paper’s indicators last time, but I couldn’t reproduce it again.
Would you be able to supply any beneficial log files or advice for reproduction?

Thanks a lot again.

@Zhou2019 Zhou2019 reopened this Mar 29, 2023
@Chen-Yang-Liu
Copy link
Owner

I apologize for disturbing you again. Did the author encounter large variations in model performance during the experiment? I trained a model that reached the paper’s indicators last time, but I couldn’t reproduce it again. Would you be able to supply any beneficial log files or advice for reproduction?

Thanks a lot again.

The problem you mentioned does exist, so I repeated the training of the model five times to get the average scores. From my point of view, the problem currently exists in the image captioning task and the change captioning task in the remote sensing field. I think this may be related to two factors: 1. There is a gap between the cross-entropy loss and the evaluation metrics. 2. Compared with the image-text datasets of natural images, the remote sensing image-text datasets are relatively small.

@Zhou2019
Copy link
Author

Thanks a lot for your reply and advice.

@tuyunbin
Copy link

tuyunbin commented May 8, 2023

Hi Zhou @Zhou2019 , how did you reproduce the results? I directly run the code without any modification several times, but I only obtained the much lower results, such as CIDEr score with 90+. I run the code on 3080 and 3090 GPU, but I got the similar results.

@TangZwei
Copy link

We can set batch_size over 35 to achieve the stable results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants