We also investigate the effectiveness of our proposed method when it is applied to training the model from scratch. For fair comparisons, we evaluate the classification performance of Hi-Mapper trained with the full-training scheme (350 epochs) and fine-tuning scheme (baseline + 50 epochs) of the same learning objectives on ImageNet-1K [36]. As shown in Tab. 6, the experimental results demonstrate that the finetuning scheme is better-suitable than full-training in terms of understanding the structural organization of visual scenes.

D. Additional visualization

For a more comprehensive understanding, we will provide additional visualization results that are included in the main paper and also examine the visual hierarchy in CNNs [59], as shown in Figure 7, 8. This will offer insights into the feature representation aspects in transformer structures and CNNs, as well as the benefits of applying our method.

\ \ Figure 6. Illustration for overall procedure of Hi-Mapper for dense prediction tasks.

\ \ \ Figure 7. Visualization of visual hierarchy trees decomposed by Hi-Mapper(DeiT-S) trained on ImageNet-1K with classification objective. The same color family represents the same subtree.

\ \ \ Figure 8. Visualization of visual hierarchy trees decomposed by Hi-Mapper(ENB4) trained on ImageNet-1K [36] with classification objective. The same color family represents the same subtree.

\ \

:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

Feed: Hacker Noon - Medium

View: Original article

Tags: technology

Technology