Papers
Topics
Authors
Recent
Search
2000 character limit reached

Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation

Published 20 Oct 2024 in cs.CV, cond-mat.mtrl-sci, and eess.IV | (2410.19836v1)

Abstract: The features of self-supervised vision transformers (ViTs) contain strong semantic and positional information relevant to downstream tasks like object localization and segmentation. Recent works combine these features with traditional methods like clustering, graph partitioning or region correlations to achieve impressive baselines without finetuning or training additional networks. We leverage upsampled features from ViT networks (e.g DINOv2) in two workflows: in a clustering based approach for object localization and segmentation, and paired with standard classifiers in weakly supervised materials segmentation. Both show strong performance on benchmarks, especially in weakly supervised segmentation where the ViT features capture complex relationships inaccessible to classical approaches. We expect the flexibility and generalizability of these features will both speed up and strengthen materials characterization, from segmentation to property-prediction.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.