Gradiant overflow in finetuning #5

ghaddarAbs · 2021-06-30T00:02:40Z

Hi,

Thank you very much for the great work, and for sharing the fine-tuning data last week.
I got an issue when I tried to fine-tune and evaluate the model on the flickr30k, using:

# I just run the second command (GPU:1 lr: 2e-5 )
./bash/train_flickr.sh

The epoch start normally at the beginning, but suddenly the loss strat increasing at epoch 6:


Epoch: 6: Step: 555/1511, loss=0.527620, loss_nce=0.527620, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 559/1511, loss=0.727350, loss_nce=0.727350, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 563/1511, loss=0.570808, loss_nce=0.570808, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 567/1511, loss=0.393095, loss_nce=0.393095, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 571/1511, loss=0.674848, loss_nce=0.674848, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 575/1511, loss=0.499143, loss_nce=0.499143, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 579/1511, loss=0.594417, loss_nce=0.594417, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 583/1511, loss=0.637567, loss_nce=0.637567, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 587/1511, loss=0.848309, loss_nce=0.848309, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 591/1511, loss=0.859852, loss_nce=0.859852, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 595/1511, loss=0.551946, loss_nce=0.551946, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 599/1511, loss=0.569656, loss_nce=0.569656, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 603/1511, loss=0.811136, loss_nce=0.811136, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 607/1511, loss=0.926843, loss_nce=0.926843, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 611/1511, loss=0.878590, loss_nce=0.878590, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 615/1511, loss=0.930382, loss_nce=0.930382, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 619/1511, loss=1.138345, loss_nce=1.138345, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 623/1511, loss=1.101084, loss_nce=1.101084, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 627/1511, loss=0.899013, loss_nce=0.899013, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 631/1511, loss=1.180095, loss_nce=1.180095, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 635/1511, loss=1.371186, loss_nce=1.371186, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 639/1511, loss=1.614157, loss_nce=1.614157, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 643/1511, loss=1.712646, loss_nce=1.712646, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 647/1511, loss=2.504568, loss_nce=2.504568, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 651/1511, loss=2.761936, loss_nce=2.761936, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 655/1511, loss=4.210203, loss_nce=4.210203, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 659/1511, loss=6.195764, loss_nce=6.195764, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 663/1511, loss=8.189028, loss_nce=8.189028, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 667/1511, loss=12.597887, loss_nce=12.597887, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 671/1511, loss=11.704583, loss_nce=11.704583, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 675/1511, loss=13.765331, loss_nce=13.765331, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 679/1511, loss=18.207155, loss_nce=18.207155, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 683/1511, loss=16.359169, loss_nce=16.359169, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 687/1511, loss=20.523600, loss_nce=20.523600, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 691/1511, loss=27.668240, loss_nce=27.668240, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 695/1511, loss=30.855385, loss_nce=30.855385, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 699/1511, loss=35.086441, loss_nce=35.086441, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 703/1511, loss=30.574892, loss_nce=30.574892, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 707/1511, loss=52.953876, loss_nce=52.953876, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 711/1511, loss=40.207417, loss_nce=40.207417, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 715/1511, loss=53.108303, loss_nce=53.108303, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 719/1511, loss=47.695160, loss_nce=47.695160, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 723/1511, loss=45.211182, loss_nce=45.211182, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 727/1511, loss=49.979271, loss_nce=49.979271, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 731/1511, loss=45.502415, loss_nce=45.502415, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 735/1511, loss=42.128304, loss_nce=42.128304, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 739/1511, loss=57.433262, loss_nce=57.433262, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 743/1511, loss=70.618607, loss_nce=70.618607, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 747/1511, loss=52.835541, loss_nce=52.835541, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 751/1511, loss=57.775532, loss_nce=57.775532, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 755/1511, loss=75.909271, loss_nce=75.909271, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 759/1511, loss=47.627548, loss_nce=47.627548, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 763/1511, loss=55.984451, loss_nce=55.984451, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 767/1511, loss=39.634636, loss_nce=39.634636, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 771/1511, loss=43.213181, loss_nce=43.213181, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 775/1511, loss=37.875175, loss_nce=37.875175, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 779/1511, loss=45.833000, loss_nce=45.833000, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 783/1511, loss=42.249699, loss_nce=42.249699, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 787/1511, loss=49.242207, loss_nce=49.242207, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 791/1511, loss=59.082058, loss_nce=59.082058, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 795/1511, loss=44.366467, loss_nce=44.366467, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 799/1511, loss=61.286034, loss_nce=61.286034, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 803/1511, loss=65.236374, loss_nce=65.236374, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 807/1511, loss=55.568848, loss_nce=55.568848, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 811/1511, loss=81.588463, loss_nce=81.588463, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 815/1511, loss=138.267487, loss_nce=138.267487, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 819/1511, loss=205.398163, loss_nce=205.398163, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 823/1511, loss=106.781647, loss_nce=106.781647, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 827/1511, loss=114.370003, loss_nce=114.370003, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 831/1511, loss=85.564255, loss_nce=85.564255, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 835/1511, loss=58.856918, loss_nce=58.856918, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 839/1511, loss=48.463295, loss_nce=48.463295, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 843/1511, loss=49.180916, loss_nce=49.180916, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 847/1511, loss=42.912064, loss_nce=42.912064, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 851/1511, loss=33.153042, loss_nce=33.153042, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 855/1511, loss=49.714306, loss_nce=49.714306, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 859/1511, loss=30.225197, loss_nce=30.225197, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 863/1511, loss=40.542446, loss_nce=40.542446, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 867/1511, loss=42.657013, loss_nce=42.657013, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 871/1511, loss=29.824253, loss_nce=29.824253, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 875/1511, loss=38.451778, loss_nce=38.451778, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 879/1511, loss=30.017517, loss_nce=30.017517, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 883/1511, loss=30.451855, loss_nce=30.451855, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 887/1511, loss=24.856079, loss_nce=24.856079, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 891/1511, loss=26.671665, loss_nce=26.671665, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 895/1511, loss=24.949318, loss_nce=24.949318, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 899/1511, loss=24.966484, loss_nce=24.966484, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 903/1511, loss=31.370058, loss_nce=31.370058, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 907/1511, loss=54.106686, loss_nce=54.106686, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 911/1511, loss=27.364002, loss_nce=27.364002, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 915/1511, loss=31.717720, loss_nce=31.717720, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 919/1511, loss=32.850029, loss_nce=32.850029, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 923/1511, loss=36.481514, loss_nce=36.481514, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 927/1511, loss=36.080856, loss_nce=36.080856, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 931/1511, loss=43.164818, loss_nce=43.164818, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 935/1511, loss=82.020950, loss_nce=82.020950, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 939/1511, loss=36.782185, loss_nce=36.782185, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 943/1511, loss=32.322525, loss_nce=32.322525, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 947/1511, loss=37.928696, loss_nce=37.928696, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 951/1511, loss=37.906788, loss_nce=37.906788, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 955/1511, loss=40.255390, loss_nce=40.255390, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 959/1511, loss=36.430790, loss_nce=36.430790, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 963/1511, loss=34.600498, loss_nce=34.600498, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 967/1511, loss=39.713654, loss_nce=39.713654, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 971/1511, loss=46.052864, loss_nce=46.052864, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 975/1511, loss=37.347187, loss_nce=37.347187, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 979/1511, loss=41.355392, loss_nce=41.355392, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 983/1511, loss=45.157066, loss_nce=45.157066, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 987/1511, loss=32.828815, loss_nce=32.828815, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 991/1511, loss=55.191578, loss_nce=55.191578, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 995/1511, loss=49.200516, loss_nce=49.200516, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 999/1511, loss=34.357136, loss_nce=34.357136, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1003/1511, loss=37.069489, loss_nce=37.069489, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1007/1511, loss=45.910133, loss_nce=45.910133, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1011/1511, loss=41.456188, loss_nce=41.456188, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1015/1511, loss=60.424339, loss_nce=60.424339, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1019/1511, loss=35.902451, loss_nce=35.902451, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1023/1511, loss=43.260071, loss_nce=43.260071, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1027/1511, loss=39.661362, loss_nce=39.661362, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1031/1511, loss=64.590012, loss_nce=64.590012, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1035/1511, loss=34.630993, loss_nce=34.630993, loss_kd=0.0, lr=0.000012

and continue like this for the end of the training, then the code crash at the evaluation

Epoch: 14: Step: 1459/1511, loss=1448.427734, loss_nce=1448.427734, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1463/1511, loss=1645.300171, loss_nce=1645.300171, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1467/1511, loss=1398.610107, loss_nce=1398.610107, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1471/1511, loss=1394.673096, loss_nce=1394.673096, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1475/1511, loss=2031.539795, loss_nce=2031.539795, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1479/1511, loss=1238.061768, loss_nce=1238.061768, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1483/1511, loss=1475.774780, loss_nce=1475.774780, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1487/1511, loss=1240.767578, loss_nce=1240.767578, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1491/1511, loss=1186.123657, loss_nce=1186.123657, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1495/1511, loss=1728.326904, loss_nce=1728.326904, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1499/1511, loss=1731.635498, loss_nce=1731.635498, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1503/1511, loss=1679.102173, loss_nce=1679.102173, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1507/1511, loss=1465.885498, loss_nce=1465.885498, loss_kd=0.0, lr=0.000000
Total data indexed 1014
Total data indexed 5070
Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt
Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.last.pt
test dataset len = 5000, dataloader len = 63
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 512.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.25
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.03125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.015625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0078125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.00390625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.001953125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0009765625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.00048828125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.000244140625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0001220703125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 6.103515625e-05
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.0517578125e-05
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.52587890625e-05
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.62939453125e-06
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.814697265625e-06
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.9073486328125e-06
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.5367431640625e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.76837158203125e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.384185791015625e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1920928955078125e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.960464477539063e-08
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.9802322387695312e-08
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4901161193847656e-08
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.450580596923828e-09
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.725290298461914e-09
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.862645149230957e-09
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.313225746154785e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.656612873077393e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.3283064365386963e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1641532182693481e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.820766091346741e-11
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.9103830456733704e-11
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4551915228366852e-11
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.275957614183426e-12
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.637978807091713e-12
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.8189894035458565e-12
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.094947017729282e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.547473508864641e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.2737367544323206e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1368683772161603e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.684341886080802e-14
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.842170943040401e-14
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4210854715202004e-14
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.105427357601002e-15
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.552713678800501e-15
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.7763568394002505e-15
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.881784197001252e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.440892098500626e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.220446049250313e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1102230246251565e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.551115123125783e-17
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.7755575615628914e-17
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.3877787807814457e-17
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 6.938893903907228e-18
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.469446951953614e-18
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.734723475976807e-18
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.673617379884035e-19
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.336808689942018e-19
Traceback (most recent call last):
  File "train_itm.py", line 369, in <module>
    args.txt_retrieval, img2txt)
AttributeError: 'Namespace' object has no attribute 'txt_retrieval'

However, I tried to evaluate the best model biencoder.best.pt using the following command:

python eval_itm.py ./config/flickr30k_eval_config.json /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt

and get the following results:

Total data indexed 1000
Total data indexed 5000
time cost = 10.698805809020996s
average loss = nan, accuracy = 0.0126
indexed  1000 data
image retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}
txt retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}

The text was updated successfully, but these errors were encountered:

intersun · 2021-06-30T21:45:19Z

From the loss curve it looks like it ran successfully in first 6 epochs, and suddenly the loss blows up, which seems very similar to your previous issue. Can you just try to reproduce the error by training on a smaller dataset (say the dev set for flickr, or a subset of training, if you prefer), and solve it using the suggestions from the other thread?

As for the evaluation issue, I will investigate more in this weekends. To me it looks like the checkpoint is NOT loading successfully (can you double check this part?) so the model just randomly picked some images as retrieved results.

Zjamie813 · 2022-06-06T07:40:19Z

Hello，
I am also trying to run the code to reproduce the fine-tuning results on Flickr30k. However, I cannot find the shared data link for Flickr30k fine-tuning. Would you like to share it? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gradiant overflow in finetuning #5

Gradiant overflow in finetuning #5

ghaddarAbs commented Jun 30, 2021

intersun commented Jun 30, 2021

Zjamie813 commented Jun 6, 2022

Gradiant overflow in finetuning #5

Gradiant overflow in finetuning #5

Comments

ghaddarAbs commented Jun 30, 2021

intersun commented Jun 30, 2021

Zjamie813 commented Jun 6, 2022