Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embeddings of atoms from different representations #35

Open
n0w0f opened this issue Apr 4, 2024 · 6 comments
Open

Embeddings of atoms from different representations #35

n0w0f opened this issue Apr 4, 2024 · 6 comments

Comments

@n0w0f
Copy link
Contributor

n0w0f commented Apr 4, 2024

@kjappelbaum , In order to check the similarity between atoms , or do those King - Queen = Man - Women analysis I would like to embed individual atoms with models trained on different representation. This is as a follow up to see if composition or atoms means anything for smaller models

For slice and composition maybe i can keep atom as the first token and pad all other token,
but for crystal-llm or cif_rep atoms usually comes in the later part of the representation , would keeping atom at the beginning work for these representations ?

@kjappelbaum
Copy link
Contributor

so you would like to have one vector per atom in a structure?

@n0w0f
Copy link
Contributor Author

n0w0f commented Apr 4, 2024

so you would like to have one vector per atom in a structure?

I would like to get a vector for an atoms, not in the context of the atom being in any particular structure, but standalone.
for eg ( Na -> model -> vector).

so that i can see if all the alkali elements are similar for models trained with different representations

@n0w0f
Copy link
Contributor Author

n0w0f commented Apr 4, 2024

can i use the learned token embedding ? or do i even need to pass it through the model if it is is not in the context of structure ?

@kjappelbaum
Copy link
Contributor

can i use the learned token embedding ? or do i even need to pass it through the model if it is is not in the context of structure ?

ah, for this, people have used the learned embeddings of different tokens. Some existing techniques are here https://github.com/kjappelbaum/element-coder

@kjappelbaum
Copy link
Contributor

@n0w0f did you ever give this a look, do you plan to still look into it?

@n0w0f
Copy link
Contributor Author

n0w0f commented May 29, 2024

I did not yet, but I think there can be lot of hidden insights there, and would love to followup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants