Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MASA: Matching Anything By Segmenting Anything (CVPR24) #1474

Open
1 of 2 tasks
johnnynunez opened this issue Jun 11, 2024 · 14 comments
Open
1 of 2 tasks

MASA: Matching Anything By Segmenting Anything (CVPR24) #1474

johnnynunez opened this issue Jun 11, 2024 · 14 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@johnnynunez
Copy link
Contributor

Search before asking

  • I have searched the Yolov8 Tracking issues and found no similar enhancement requests.

Description

https://github.com/siyuanliii/masa

Use case

No response

Are you willing to submit a PR?

  • Yes I'd like to help by submitting a PR!
@johnnynunez johnnynunez added the enhancement New feature or request label Jun 11, 2024
@mikel-brostrom
Copy link
Owner

Wow, this looks very promising

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Jun 11, 2024

From MASA:

"Additionally, our learned detection head speeds up the original SAM dense uniform point proposals for segmenting everything by over tenfold, crucial for tracking applications."

"We treat the SAM outputs as dense object region proposals and learn to match those regions from a vast image collection. We further design a universal MAS encoder: A heavy ViT-based backbone for feature extraction."

From SAM:

"Given a precomputed image embedding, the prompt encoder and mask decoder run in a web browser, on CPU, in ∼50ms."

SAM uses ViT as image embedder, from ViT:

"ViT-B (Base Model): Inference times on high-end GPUs (such as NVIDIA V100 or A100) are typically around 20-40 milliseconds per image."

ViT-L (Large Model): Inference times are generally longer, around 50-80 milliseconds per image, depending on the exact setup and image resolution.

ViT-H (Huge Model): Inference times can exceed 100 milliseconds per image due to the increased model complexity and size.

This would require to get ViT working here as embedder to start with

@mikel-brostrom mikel-brostrom changed the title New tracker CVPR2024 MASA: Matching Anything By Segmenting Anything (CVPR24) Jun 12, 2024
@mikel-brostrom
Copy link
Owner

This:

This would require to get ViT working here as embedder to start with

may not be a problem anymore 😄, given that Ultralytics has a ViT encoder

Copy link

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

@github-actions github-actions bot added the Stale label Jun 24, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2024
@mikel-brostrom mikel-brostrom reopened this Sep 9, 2024
@mikel-brostrom mikel-brostrom added help wanted Extra attention is needed and removed Stale labels Sep 9, 2024
@mikel-brostrom
Copy link
Owner

Add model under boxmot/appearance/backbones. And model weights url (https://huggingface.co/dereksiyuanli/masa/resolve/main/gdino_masa.pth). Adapt:
https://github.com/siyuanliii/masa/blob/main/masa/apis/masa_inference.py

@johnnynunez
Copy link
Contributor Author

so is it compatible now your repo?

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 9, 2024

I have been investigating this thoroughly today. The whole architecture build using mmdet needs to be ported to pytorch. This is no easy feat due to many custom implementations and optimizations. I don't have the time for such a research project on my free time at the moment

@rolson24
Copy link
Contributor

rolson24 commented Sep 9, 2024 via email

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 9, 2024

I was working on this a few weeks ago to integrate with huggingface, but never got around to finishing it. Would be happy to share my code from it.

Hey @rolson24! Hope everything is fine. That would be awesome. So just the architecture part?

@microchila
Copy link

MASA is very powerful, will it be supported in the future?

@mikel-brostrom
Copy link
Owner

mikel-brostrom commented Sep 11, 2024

MASA is very powerful, will it be supported in the future?

It will take some time due to the need of an architecture port from mmdet to pure pytorch

Copy link

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

@github-actions github-actions bot added the Stale label Sep 29, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 3, 2024
@mikel-brostrom mikel-brostrom reopened this Oct 3, 2024
@github-actions github-actions bot removed the Stale label Oct 4, 2024
Copy link

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.
Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

@github-actions github-actions bot added the Stale label Oct 14, 2024
@mikel-brostrom
Copy link
Owner

Hi @rolson24! Any updates on integrating MASA here?

@github-actions github-actions bot removed the Stale label Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants