Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pdb files too big generated on GEMC_NpT for opening/transferring #350

Open
mleao882 opened this issue May 17, 2021 · 5 comments
Open

Pdb files too big generated on GEMC_NpT for opening/transferring #350

mleao882 opened this issue May 17, 2021 · 5 comments

Comments

@mleao882
Copy link

While running GEMC_NpT simulations, the pdb files generated for both boxes (L and V) are bigger than 100 MB and this is causing me trouble while transferring from the supercomputer to my machine and also trying to open it inside the supercomputer, and because of it, also a problem for analysis on VMD. After checking the input file used, I believe this might be due to the CoordinatesFreq used. I had set up the value to 10,000N (N being the number of molecules in the simulation), while the simulation was allowed to run for 500,000N. Is there any suggestion or anything I could do to sort this problem out ? Maybe just extracting the final configuration inside the pdb file would be ok for me. Thanks in advance !

@LSchwiebert
Copy link
Collaborator

By setting the CoordinatesFreq to this value for a simulation run with this number steps, you are producing output 50 times. Since you are generating about 2MB of data for each output, you can adjust the CoordinatesFreq value to output less often. You could also do as you suggest and set the CoordinatesFreq to match the number of steps in your simulation to output only the final configuration.

@mleao882
Copy link
Author

@LSchwiebert , thanks for your reply. But would there be an easy way for me to just extract this final configuration from this huge pdb file ?

@GregorySchwing
Copy link
Collaborator

@mleao882 set "RestartFreq true LastStepNumber" in the config file. This will only output the PDB at that step.

Alternatively, you could checkout the development branch, which supports DCD coordinates.

Process for cloning/building the dev branch:
git clone https://github.com/GOMC-WSU/GOMC.git
git checkout development
./metamake.sh

Add this to your conf file
"DCDFreq true ANumber"

This will be 100x smaller file than the PDB trajectory.
We also support
"RestartFreq true LastStepNumber"
in the development branch.

@LSchwiebert
Copy link
Collaborator

I think someone else in the collaboration would be better able to answer your question, but I believe it is a human-readable file, so you should be able to open it with an editor and extract the last part. I would search for the header in the file. If you have to open it with VMD or some similar tool, then you are stuck back at your original problem.

@msoroush
Copy link
Collaborator

@mleao882 in order to extract the final configuration, you can use vmd.
Please follow these steps to store the last trajectory into PDB file.

  1. load the system into vmd vmd NAME_merged.psf NAME_BOX_0.pdb
  2. Execute the following command into your terminal
set lastFrame [atomselect top all frame last]
$lastFrame set beta 0.0
$lastFrame set occupancy 0.0
$lastFrame writepdb last_frame.pdb

The only issue with this solution is that occupancy and beta column data will be set to the value of frame 0. This is because VMD only reads the occupancy and beta value only from frame 0 and does not update them for each frame. In above script, I set their value to zero to avoid any problem/mistake. In GOMC, we use occupancy column to define if molecule is in the box or not. However, we set the coordinates of the molecule that does not exist in the box, to zero. So, you can exclude zero coordinates (X,Y,X) in your analysis.

As @GregorySchwing mentioned, we now support binary coordinates to store coordinates with higher precision and less size. Similar to NAMD, GOMC prints the trajectory coordinates (.dcd) in single precision and restart coordinate (.coor) in double precision. In addition, similar to restart PDB file, GOMC now output restart PSF file for each simulation box.
I would recommend to try the development branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants