Weird "Missing ranks" error in parallel training using horovod #3937

DingChangjie · 2024-07-01T16:22:40Z

DingChangjie
Jul 1, 2024

Dear all,
I'm facing a weird error in parallel training using horovod. The version of deepmd-kit that I'm now using is v2.2.10. Parallel training is performed on four 3080Ti cards.
Everything worked fine and I was able to perform parallel training until several days ago. I was trying to update CUDA from 11.6 to 11.8, but the uninstallation of old CUDA was not very successful, and I had to manually purge some redundant files and directories (including NVIDIA driver). After that, I reinstalled NVIDIA driver, CUDA, and deepmd-kit. I'm pretty sure that the new CUDA11.8 was correctly installed since nvcc and nvidia-smi commands worked fine. However, the weirdest thing came: I was no longer able to perform parallel training even on the same task that worked fine before the CUDA update. The following "Missing ranks" errors kept occur until a bad termination of the training process:

[0] DEEPMD INFO    batch    1000 training time 15.32 s, testing time 0.04 s, total wall time 15.39 s
[0] DEEPMD INFO    batch    1100 training time 15.15 s, testing time 0.04 s, total wall time 15.23 s
[0] [2024-07-01 15:56:08.676962: W /home/conda/feedstock_root/build_artifacts/horovod_1700515829114/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
[0] Missing ranks:
[0] 2: [DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_103/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_3_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_104/HorovodAllreduce_gradients_2_AddN_437_0, DistributedAdamOptimizer_Allreduce/cond_105/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_4_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_106/HorovodAllreduce_gradients_2_AddN_378_0 ...]
[0] [2024-07-01 15:57:08.678199: W /home/conda/feedstock_root/build_artifacts/horovod_1700515829114/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
[0] Missing ranks:
[0] 2: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[0] [2024-07-01 15:58:08.679310: W /home/conda/feedstock_root/build_artifacts/horovod_1700515829114/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
...... 
(Many many similar errors)
......
===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 1983889 RUNNING AT localhost
=   EXIT CODE: 15
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Terminated (signal 15)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

These errors typically occured after thousands of training steps, while they might also occur immediately after a training begins. During training, the GPU memory usage seemed okay, but GPU-Util was not balanced on four cards. For example, see nvidia-smi output below (the CUDA Runtime api version is actually 11.8 according to nvcc):

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.78                 Driver Version: 550.78         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3080 Ti     Off |   00000000:19:00.0 Off |                  N/A |
| 30%   29C    P8             29W /  350W |    4506MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA GeForce RTX 3080 Ti     Off |   00000000:1A:00.0 Off |                  N/A |
| 30%   30C    P8             24W /  350W |    4506MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA GeForce RTX 3080 Ti     Off |   00000000:67:00.0 Off |                  N/A |
| 30%   31C    P8             25W /  350W |    4506MiB /  12288MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA GeForce RTX 3080 Ti     Off |   00000000:68:00.0 Off |                  N/A |
| 33%   52C    P2            146W /  350W |    4519MiB /  12288MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

I have no idea what's going wrong, and I don't think it's a deepmd-kit bug since everything worked fine before the unsuccessful CUDA update. I've tried some methods, including (1) totally purge and reinstall driver, CUDA, and deepmd-kit, and then reboot my machine; (2) try different versions of deepmd-kit, from 2.2.7 to 2.2.10; (3) try different CUDA versions, from 11.8 to 12.0. Unluckily, none of these methods work. Does anyone have suggestions? Thanks.

P.S. The Lammps that shipped with deepmd-kit works fine.

Answered by DingChangjie

Jul 3, 2024

Finally, I've figured out that this problem is caused by the environment variable KMP_AFFINITY (I manually changed it to scatter). This variable should be automatically set by deepmd-kit ......

View full answer

njzjz · 2024-07-01T19:35:16Z

njzjz
Jul 1, 2024
Maintainer

Could you use --mpi-log workers to print the log on all ranks?

3 replies

DingChangjie Jul 2, 2024
Author

Could you use --mpi-log workers to print the log on all ranks?

Yes. Here is the output:

DEEPMD rank:3  INFO    ---Summary of the training---------------------------------------
DEEPMD rank:3  INFO    distributed
DEEPMD rank:3  INFO    world size:           4
DEEPMD rank:3  INFO    my rank:              3
DEEPMD rank:3  INFO    node list:            ['GPU']
DEEPMD rank:3  INFO    running on:           GPU
DEEPMD rank:3  INFO    computing device:     gpu:3
DEEPMD rank:3  INFO    CUDA_VISIBLE_DEVICES: unset
DEEPMD rank:3  INFO    Count of visible GPU: 4
DEEPMD rank:3  INFO    num_intra_threads:    0
DEEPMD rank:3  INFO    num_inter_threads:    0
DEEPMD rank:3  INFO    -----------------------------------------------------------------
DEEPMD rank:2  INFO    training without frame parameter
DEEPMD rank:2  INFO    data stating... (this step may take long time)
DEEPMD rank:1  INFO    training without frame parameter
DEEPMD rank:1  INFO    data stating... (this step may take long time)
DEEPMD rank:0  INFO    training without frame parameter
DEEPMD rank:0  INFO    data stating... (this step may take long time)
DEEPMD rank:3  INFO    training without frame parameter
DEEPMD rank:3  INFO    data stating... (this step may take long time)
/home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/descriptor/se_t.py:611: RuntimeWarning: invalid value encountered in scalar divide
  val = np.sqrt(sumv2 / sumn - np.multiply(sumv / sumn, sumv / sumn))
DEEPMD rank:3  INFO    built lr
/home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/descriptor/se_t.py:611: RuntimeWarning: invalid value encountered in scalar divide
  val = np.sqrt(sumv2 / sumn - np.multiply(sumv / sumn, sumv / sumn))
DEEPMD rank:0  INFO    built lr
/home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/descriptor/se_t.py:611: RuntimeWarning: invalid value encountered in scalar divide
  val = np.sqrt(sumv2 / sumn - np.multiply(sumv / sumn, sumv / sumn))
DEEPMD rank:2  INFO    built lr
/home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/descriptor/se_t.py:611: RuntimeWarning: invalid value encountered in scalar divide
  val = np.sqrt(sumv2 / sumn - np.multiply(sumv / sumn, sumv / sumn))
DEEPMD rank:1  INFO    built lr
DEEPMD rank:3  INFO    built network
DEEPMD rank:3  INFO    Scale learning rate by coef: 4.000000
DEEPMD rank:0  INFO    built network
DEEPMD rank:0  INFO    Scale learning rate by coef: 4.000000
DEEPMD rank:2  INFO    built network
DEEPMD rank:2  INFO    Scale learning rate by coef: 4.000000
DEEPMD rank:1  INFO    built network
DEEPMD rank:1  INFO    Scale learning rate by coef: 4.000000
DEEPMD rank:2  INFO    built training
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
DEEPMD rank:0  INFO    built training
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-07-02 11:14:10.160886: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10238 MB memory:  -> device: 2, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:67:00.0, compute capability: 8.6
DEEPMD rank:1  INFO    built training
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
DEEPMD rank:3  INFO    built training
WARNING:root:To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, TF_INTRA_OP_PARALLELISM_THREADS, and TF_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
2024-07-02 11:14:10.206760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10238 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:19:00.0, compute capability: 8.6
2024-07-02 11:14:10.284670: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10238 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:1a:00.0, compute capability: 8.6
2024-07-02 11:14:10.307274: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10223 MB memory:  -> device: 3, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:68:00.0, compute capability: 8.6
DEEPMD rank:0  INFO    initialize model from scratch
DEEPMD rank:2  INFO    receive global variables from task#0
DEEPMD rank:3  INFO    receive global variables from task#0
DEEPMD rank:1  INFO    receive global variables from task#0
DEEPMD rank:0  INFO    broadcast global variables to other tasks
DEEPMD rank:2  INFO    start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.978593, final lr will be 2.00e-08
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
DEEPMD rank:1  INFO    start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.978593, final lr will be 2.00e-08
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
DEEPMD rank:0  INFO    start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.978593, final lr will be 2.00e-08
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
DEEPMD rank:3  INFO    start training at lr 1.00e-03 (== 1.00e-03), decay_step 5000, decay_rate 0.978593, final lr will be 2.00e-08
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
WARNING:tensorflow:From /home/gpu/DeepMD/deepmd-kit_2_2_10/lib/python3.11/site-packages/deepmd/train/trainer.py:1194: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
    options available in V2.
    - tf.py_function takes a python function which manipulates tf eager
    tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
    an ndarray (just call tensor.numpy()) but having access to eager tensors
    means `tf.py_function`s can use accelerators such as GPUs as well as
    being differentiable using a gradient tape.
    - tf.numpy_function maintains the semantics of the deprecated tf.py_func
    (it is not differentiable, and manipulates numpy arrays). It drops the
    stateful argument making all functions stateful.
    
2024-07-02 11:14:22.709714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10238 MB memory:  -> device: 2, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:67:00.0, compute capability: 8.6
2024-07-02 11:14:22.742441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10223 MB memory:  -> device: 3, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:68:00.0, compute capability: 8.6
2024-07-02 11:14:22.745999: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10238 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:1a:00.0, compute capability: 8.6
2024-07-02 11:14:24.257847: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1886] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10238 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3080 Ti, pci bus id: 0000:19:00.0, compute capability: 8.6
DEEPMD rank:1  INFO    batch     100 training time 22.83 s, testing time 0.00 s, total wall time 22.87 s
DEEPMD rank:3  INFO    batch     100 training time 22.74 s, testing time 0.00 s, total wall time 22.78 s
DEEPMD rank:2  INFO    batch     100 training time 22.84 s, testing time 0.00 s, total wall time 22.88 s
DEEPMD rank:0  INFO    batch     100 training time 20.35 s, testing time 0.05 s, total wall time 22.87 s
DEEPMD rank:1  INFO    batch     200 training time 13.20 s, testing time 0.00 s, total wall time 13.24 s
DEEPMD rank:2  INFO    batch     200 training time 13.20 s, testing time 0.00 s, total wall time 13.24 s
DEEPMD rank:3  INFO    batch     200 training time 13.21 s, testing time 0.00 s, total wall time 13.25 s
DEEPMD rank:0  INFO    batch     200 training time 13.15 s, testing time 0.04 s, total wall time 13.23 s
DEEPMD rank:1  INFO    batch     300 training time 12.85 s, testing time 0.00 s, total wall time 12.89 s
DEEPMD rank:3  INFO    batch     300 training time 12.85 s, testing time 0.00 s, total wall time 12.88 s
DEEPMD rank:2  INFO    batch     300 training time 12.85 s, testing time 0.00 s, total wall time 12.89 s
DEEPMD rank:0  INFO    batch     300 training time 12.81 s, testing time 0.05 s, total wall time 12.90 s
DEEPMD rank:3  INFO    batch     400 training time 12.88 s, testing time 0.00 s, total wall time 12.92 s
DEEPMD rank:1  INFO    batch     400 training time 12.88 s, testing time 0.00 s, total wall time 12.92 s
DEEPMD rank:2  INFO    batch     400 training time 12.88 s, testing time 0.00 s, total wall time 12.92 s
DEEPMD rank:0  INFO    batch     400 training time 12.82 s, testing time 0.04 s, total wall time 12.90 s
DEEPMD rank:1  INFO    batch     500 training time 12.80 s, testing time 0.00 s, total wall time 12.84 s
DEEPMD rank:3  INFO    batch     500 training time 12.80 s, testing time 0.00 s, total wall time 12.84 s
DEEPMD rank:2  INFO    batch     500 training time 12.80 s, testing time 0.00 s, total wall time 12.84 s
DEEPMD rank:0  INFO    batch     500 training time 12.76 s, testing time 0.04 s, total wall time 12.85 s
DEEPMD rank:2  INFO    batch     600 training time 12.61 s, testing time 0.00 s, total wall time 12.65 s
DEEPMD rank:1  INFO    batch     600 training time 12.61 s, testing time 0.00 s, total wall time 12.65 s
DEEPMD rank:3  INFO    batch     600 training time 12.62 s, testing time 0.00 s, total wall time 12.65 s
DEEPMD rank:0  INFO    batch     600 training time 12.57 s, testing time 0.05 s, total wall time 12.66 s
DEEPMD rank:3  INFO    batch     700 training time 13.02 s, testing time 0.00 s, total wall time 13.06 s
DEEPMD rank:1  INFO    batch     700 training time 13.02 s, testing time 0.00 s, total wall time 13.06 s
DEEPMD rank:2  INFO    batch     700 training time 13.02 s, testing time 0.00 s, total wall time 13.06 s
DEEPMD rank:0  INFO    batch     700 training time 12.97 s, testing time 0.05 s, total wall time 13.06 s
DEEPMD rank:1  INFO    batch     800 training time 12.86 s, testing time 0.00 s, total wall time 12.90 s
DEEPMD rank:3  INFO    batch     800 training time 12.86 s, testing time 0.00 s, total wall time 12.90 s
DEEPMD rank:2  INFO    batch     800 training time 12.87 s, testing time 0.00 s, total wall time 12.90 s
DEEPMD rank:0  INFO    batch     800 training time 12.81 s, testing time 0.04 s, total wall time 12.88 s
DEEPMD rank:1  INFO    batch     900 training time 13.07 s, testing time 0.00 s, total wall time 13.11 s
DEEPMD rank:3  INFO    batch     900 training time 13.08 s, testing time 0.00 s, total wall time 13.11 s
DEEPMD rank:2  INFO    batch     900 training time 13.08 s, testing time 0.00 s, total wall time 13.11 s
DEEPMD rank:0  INFO    batch     900 training time 13.04 s, testing time 0.04 s, total wall time 13.11 s
DEEPMD rank:3  INFO    batch    1000 training time 12.92 s, testing time 0.00 s, total wall time 12.96 s
DEEPMD rank:1  INFO    batch    1000 training time 12.92 s, testing time 0.00 s, total wall time 12.96 s
DEEPMD rank:2  INFO    batch    1000 training time 12.92 s, testing time 0.00 s, total wall time 12.96 s
DEEPMD rank:0  INFO    batch    1000 training time 12.88 s, testing time 0.04 s, total wall time 12.96 s
DEEPMD rank:1  INFO    batch    1100 training time 12.85 s, testing time 0.00 s, total wall time 12.89 s
DEEPMD rank:3  INFO    batch    1100 training time 12.85 s, testing time 0.00 s, total wall time 12.89 s
DEEPMD rank:2  INFO    batch    1100 training time 12.85 s, testing time 0.00 s, total wall time 12.89 s
DEEPMD rank:0  INFO    batch    1100 training time 12.81 s, testing time 0.05 s, total wall time 12.90 s
DEEPMD rank:1  INFO    batch    1200 training time 13.07 s, testing time 0.00 s, total wall time 13.11 s
DEEPMD rank:3  INFO    batch    1200 training time 13.07 s, testing time 0.00 s, total wall time 13.10 s
DEEPMD rank:2  INFO    batch    1200 training time 13.07 s, testing time 0.00 s, total wall time 13.10 s
DEEPMD rank:0  INFO    batch    1200 training time 13.02 s, testing time 0.05 s, total wall time 13.10 s
DEEPMD rank:1  INFO    batch    1300 training time 12.76 s, testing time 0.00 s, total wall time 12.80 s
DEEPMD rank:2  INFO    batch    1300 training time 12.76 s, testing time 0.00 s, total wall time 12.80 s
DEEPMD rank:3  INFO    batch    1300 training time 12.76 s, testing time 0.00 s, total wall time 12.80 s
DEEPMD rank:0  INFO    batch    1300 training time 12.72 s, testing time 0.04 s, total wall time 12.79 s
DEEPMD rank:2  INFO    batch    1400 training time 12.88 s, testing time 0.00 s, total wall time 12.92 s
DEEPMD rank:1  INFO    batch    1400 training time 12.88 s, testing time 0.00 s, total wall time 12.92 s
DEEPMD rank:3  INFO    batch    1400 training time 12.88 s, testing time 0.00 s, total wall time 12.92 s
DEEPMD rank:0  INFO    batch    1400 training time 12.84 s, testing time 0.04 s, total wall time 12.92 s
DEEPMD rank:2  INFO    batch    1500 training time 12.69 s, testing time 0.00 s, total wall time 12.72 s
DEEPMD rank:1  INFO    batch    1500 training time 12.68 s, testing time 0.00 s, total wall time 12.72 s
DEEPMD rank:3  INFO    batch    1500 training time 12.68 s, testing time 0.00 s, total wall time 12.72 s
DEEPMD rank:0  INFO    batch    1500 training time 12.64 s, testing time 0.03 s, total wall time 12.72 s
[2024-07-02 11:21:03.352521: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:22:03.353628: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:23:03.354108: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:24:03.354707: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:25:03.355225: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:26:03.356275: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:27:03.357132: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:28:03.358333: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:29:03.359294: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:30:03.359584: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:31:03.360315: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:32:03.360946: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:33:03.361792: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:34:03.362214: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:35:03.362840: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:36:03.363896: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:37:03.364480: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:38:03.365566: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:39:03.366085: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:40:03.367061: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:41:03.368216: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:42:03.369251: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:43:03.369866: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:44:03.371051: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:45:03.372060: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:46:03.373112: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:47:03.373565: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:48:03.373858: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:49:03.375059: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:50:03.375365: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:51:03.375685: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:52:03.376745: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:53:03.377403: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:54:03.378462: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:55:03.379680: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:56:03.380193: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:57:03.381384: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:58:03.382393: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 11:59:03.383470: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:00:03.383894: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:01:03.384952: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:02:03.386164: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:03:03.386398: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:04:03.387232: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:05:03.388048: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:06:03.388648: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:07:03.389725: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:08:03.390231: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:09:03.390638: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:10:03.391263: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:11:03.392320: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:12:03.393365: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:13:03.394430: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:14:03.394632: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:15:03.395475: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:16:03.396339: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:17:03.396792: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:18:03.397244: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:19:03.398002: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:20:03.398915: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:21:03.399457: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:22:03.399831: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:23:03.400661: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:24:03.401258: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:25:03.402223: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:26:03.402766: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:27:03.403720: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:28:03.404107: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:29:03.405110: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:30:03.405569: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:31:03.406315: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]
[2024-07-02 12:32:03.406619: W /home/conda/feedstock_root/build_artifacts/horovod_1717750507724/work/horovod/common/stall_inspector.cc:107] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock. 
Missing ranks:
0: [DistributedAdamOptimizer_Allreduce/cond/HorovodAllreduce_gradients_2_AddN_414_0, DistributedAdamOptimizer_Allreduce/cond_1/HorovodAllreduce_gradients_2_filter_type_0_0_BiasAdd_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_10/HorovodAllreduce_gradients_2_AddN_271_0, DistributedAdamOptimizer_Allreduce/cond_100/HorovodAllreduce_gradients_2_AddN_376_0, DistributedAdamOptimizer_Allreduce/cond_101/HorovodAllreduce_gradients_2_filter_type_all_1_BiasAdd_2_grad_tuple_control_dependency_1_0, DistributedAdamOptimizer_Allreduce/cond_102/HorovodAllreduce_gradients_2_AddN_446_0 ...]

njzjz Jul 2, 2024
Maintainer

I don't see anything wrong before step 1500.

DingChangjie Jul 3, 2024
Author

Finally, I've figured out that this problem is caused by the environment variable KMP_AFFINITY (I manually changed it to scatter). This variable should be automatically set by deepmd-kit ......

Answer selected by DingChangjie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weird "Missing ranks" error in parallel training using horovod #3937

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Weird "Missing ranks" error in parallel training using horovod #3937

DingChangjie Jul 1, 2024

Replies: 1 comment · 3 replies

njzjz Jul 1, 2024 Maintainer

DingChangjie Jul 2, 2024 Author

njzjz Jul 2, 2024 Maintainer

DingChangjie Jul 3, 2024 Author

DingChangjie
Jul 1, 2024

Replies: 1 comment 3 replies

njzjz
Jul 1, 2024
Maintainer

DingChangjie Jul 2, 2024
Author

njzjz Jul 2, 2024
Maintainer

DingChangjie Jul 3, 2024
Author