چکانش دانش چندمرحله‌ای بر پایه بازنمایی‌های مبتنی بر زیر‌فضا

سپهوند, مجید

چکانش دانش چندمرحله‌ای بر پایه بازنمایی‌های مبتنی بر زیر‌فضا

نوع مقاله : مقاله پژوهشی

نویسنده

مجید سپهوند

دانشگاه اراک، دانشکده فنی مهندسی، گروه مهندسی کامپیوتر

چکیده

چکانش دانش با هدف ساخت مدل‌های دانش‌آموز کم‌حجم تحت هدایت مدل‌های معلم بزرگ‌مقیاس به کار می‌رود و از این طریق امکان استفاده از شبکه‌های کارآمدتر را فراهم می‌سازد. با وجود این، فاصله عملکردی میان معلم و دانش‌آموز همچنان چشمگیر است، زیرا بخش مهمی از دانش موجود در معلم به‌طور کامل به دانش‌آموز منتقل نمی‌شود. برای حل این مشکل، در این مقاله مدل چکانش دانش پیشنهادی چندمرحله‌ای پیشنهاد شده که به‌صورت هم‌زمان دانش را از مسیر هم‌ترازی ویژگی‌ها و تطبیق لاجیت‌ها منتقل کرده و وابستگی‌های میان‌لایه‌ای شبکه را نیز مدل‌سازی می‌کند. این رویکرد سیگنال‌های نظارتی دقیق‌تری تولید کرده و دانش‌آموز را قادر می‌سازد بازنمایی‌های معلم را کامل‌تر فرا بگیرد. مدل چکانش دانش پیشنهادی از سه مؤلفه مکمل تشکیل شده است: ماژول توجه سه‌بعدی که نواحی مهم فضایی و کانالی را برجسته می‌کند؛ ماژول ماسک خصمانه که زیر‌فضاهای مفید و غیرمفید را به‌صورت تطبیقی جدا می‌سازد؛ و ماژول تنظیم فضای کروی که توزیع ویژگی‌های معلم و دانش‌آموز را روی ابرکره هم‌راستا می‌کند. ترکیب این سه ماژول باعث می‌شود دانش‌آموز بتواند فضای ویژگی و فضای خروجی معلم را دقیق‌تر کاوش کند و به بازنمایی‌های عام‌تر و پایدارتر دست یابد. آزمایش‌های گسترده روی CIFAR-100، STL-10 و TinyImageNet نشان می‌دهند که مدل چکانش دانش پیشنهادی در بیشتر پیکربندی‌ها عملکردی بهتر از روش‌های پیشرفته موجود ارائه می‌کند.

کلیدواژه‌ها

چکانش دانش

بازنمایی آگاه از زیرفضا

چنددانه‌گی

توجه سه‌بعدی

ماسک خصمانه

تنظیم فضای کروی

[1] G. Hinton, O. Vinyals, and J. Dean, "Distilling the knowledge in a neural network," arXiv preprint arXiv:1503.02531, 2015.

[2] M. Sepahvand, F. Abdali-Mohammadi, and A. Taherkordi, "Teacher–student knowledge distillation based on decomposed deep feature representation for intelligent mobile applications," Expert Systems with Applications, vol. 202, p. 117474, 2022.

[3] M. Sepahvand, F. Abdali-Mohammadi, and A. Taherkordi, "An adaptive teacher–student learning algorithm with decomposed knowledge distillation for on-edge intelligence," Engineering Applications of Artificial Intelligence, vol. 117, p. 105560, 2023.

[4] M. Sepahvand and F. Abdali-Mohammadi, "Joint learning method with teacher–student knowledge distillation for on-device breast cancer image classification," Computers in Biology and Medicine, vol. 155, p. 106476, 2023.

[5] M. Sepahvand and F. Abdali-Mohammadi, "A novel method for reducing arrhythmia classification from 12-lead ECG signals to single-lead ECG with minimal loss of accuracy through teacher-student knowledge distillation," Information Sciences, vol. 593, pp. 64-77, 2022.

[6] M. Mardanpour, M. Sepahvand, F. Abdali-Mohammadi, M. Nikouei, and H. Sarabi, "Human activity recognition based on multiple inertial sensors through feature-based knowledge distillation paradigm," Information Sciences, vol. 640, p. 119073, 2023.

[7] Y. Li, Y. Wang, and D. Li, "Privacy-preserving lightweight face recognition," Neurocomputing, vol. 363, pp. 212-222, 2019.

[8] S. W. Lim, C. S. Chan, E. R. M. Faizal, and K. H. Ewe, "Progressive expansion: Cost-efficient medical image analysis model with reversed once-for-all network training paradigm," Neurocomputing, vol. 581, p. 127512, 2024.

[9] S. I. Mirzadeh, M. Farajtabar, A. Li, N. Levine, A. Matsukawa, and H. Ghasemzadeh, "Improved knowledge distillation via teacher assistant," In Proceedings of the AAAI conference on artificial intelligence, 2020, vol. 34, 04 ed., pp. 5191-5198.

[10] A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio, "Fitnets: Hints for thin deep nets," presented at the International Conference Learning Representation (ICLR), 2014.

[11] J. Yim, D. Joo, J. Bae, and J. Kim, "A gift from knowledge distillation: Fast optimization, network minimization and transfer learning," In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 4133-4141.

[12] S. Zagoruyko and N. Komodakis, "Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer," arXiv preprint arXiv:1612.03928, 2016.

[13] M. Yuan, B. Lang, and F. Quan, "Student-friendly knowledge distillation," Knowledge-Based Systems, vol. 296, p. 111915, 2024.

[14] T. Huang et al., "Masked distillation with receptive tokens," arXiv preprint arXiv:2205.14589, 2022.

[15] W. Zhang, D. Liu, W. Cai, and C. Ma, "Cross-view consistency regularisation for knowledge distillation," Association for Computing Machinery, pp. 2011-2020, 2024.

[16] Y. Tian, D. Krishnan, and P. Isola, "Contrastive representation distillation," arXiv preprint arXiv:1910.10699, 2019.

[17] J. Guo, M. Chen, Y. Hu, C. Zhu, X. He, and D. Cai, "Reducing the teacher-student gap via spherical knowledge disitllation," arXiv preprint arXiv:2010.07485, 2020.

[18] Z. Li et al., "Curriculum temperature for knowledge distillation," 2023, vol. 37, 2 ed., pp. 1504-1512.

[19] Z. Chi et al., "Normkd: Normalized logits for knowledge distillation," arXiv preprint arXiv:2308.00520, 2023.

[20] S. Sun, W. Ren, J. Li, R. Wang, and X. Cao, "Logit standardization in knowledge distillation," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 15731-15740. 2024.

[21] K. Zheng and E.-H. Yang, "Knowledge distillation based on transformed teacher matching," arXiv preprint arXiv:2402.11148, 2024.

[22] B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang, "Decoupled knowledge distillation," In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp. 11953-11962, 2022.

[23] J. Cui, Z. Tian, Z. Zhong, X. Qi, B. Yu, and H. Zhang, "Decoupled kullback-leibler divergence loss," Advances in Neural Information Processing Systems, vol. 37, pp. 74461-74486, 2024.

[24] S. Wei, C. Luo, and Y. Luo, "Scaled decoupled distillation," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 15975-15983.

[25] W. Son, J. Na, J. Choi, and W. Hwang, "Densely guided knowledge distillation using multiple teacher assistants," In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 9395-9404.

[26] T. Furlanello, Z. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, "Born again neural networks," PMLR, 2018, pp. 1607-1616.

[27] M. I. Hossain, S. Akhter, C. S. Hong, and E.-N. Huh, "Single teacher, multiple perspectives: Teacher knowledge augmentation for enhanced knowledge distillation," In The Thirteenth International Conference on Learning Representations, 2025.

[28] G. Xu, Z. Liu, X. Li, and C. C. Loy, "Knowledge distillation meets self-supervision," 2020: Springer, pp. 588-604.

[29] C. Shu, Y. Liu, J. Gao, Z. Yan, and C. Shen, "Channel-wise knowledge distillation for dense prediction," In Proceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 5311-5320.

[30] Z. Liu, Y. Wang, X. Chu, N. Dong, S. Qi, and H. Ling, "A simple and generic framework for feature distillation via channel-wise transformation," In Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 1129-1138.

[31] T. Liu, C. Chen, X. Yang, and W. Tan, "Rethinking knowledge distillation with raw features for semantic segmentation," In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 1155-1164.

[32] J. Yuan, M. H. Phan, L. Liu, and Y. Liu, "Fakd: Feature augmented knowledge distillation for semantic segmentation," In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , 2024, pp. 595-605.

[33] Y. Zhang, T. Huang, J. Liu, T. Jiang, K. Cheng, and S. Zhang, "Freekd: Knowledge distillation via semantic frequency prompt," In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 15931-15940.

[34] X. Liu, L. Li, C. Li, and A. Yao, "Norm: Knowledge distillation via n-to-one representation matching," arXiv preprint arXiv:2305.13803, 2023.

[35] P. Passban, Y. Wu, M. Rezagholizadeh, and Q. Liu, "Alp-kd: Attention-based layer projection for knowledge distillation," In Proceedings of the AAAI Conference on artificial intelligence , 2021, vol. 35, 15 ed., pp. 13657-13665.

[36] J. Gou, L. Sun, B. Yu, S. Wan, and D. Tao, "Hierarchical multi-attention transfer for knowledge distillation," ACM Transactions on Multimedia Computing, Communications and Applications, vol. 20, no. 2, pp. 1-20, 2023.

[37] J. Gou, L. Sun, B. Yu, S. Wan, W. Ou, and Z. Yi, "Multilevel attention-based sample correlations for knowledge distillation," IEEE Transactions on Industrial Informatics, vol. 19, no. 5, pp. 7099-7109, 2022.

[38] Z. Tao, H. Li, J. Zhang, and S. Zhang, "Multi-level knowledge distillation via dynamic decision boundaries exploration and exploitation," Information Fusion, vol. 112, p. 102586, 2024/12/01/ 2024, doi: https://doi.org/10.1016/j.inffus.2024.102586.

[39] S. You, C. Xu, C. Xu, and D. Tao, "Learning from multiple teacher networks," Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, 2017, pp. 1285-1294.

[40] W. Park, D. Kim, Y. Lu, and M. Cho, "Relational knowledge distillation," Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 3967-3976.

[41] Y. Liu et al., "Knowledge distillation via instance relationship graph," In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 7096-7104.

[42] C. Yang, H. Zhou, Z. An, X. Jiang, Y. Xu, and Q. Zhang, "Cross-image relational knowledge distillation for semantic segmentation," Proceedings of the IEEE/CVF international conference on computer vision, 2022, pp. 12319-12328.

[43] H. Hu, H. Zeng, Y. Xie, Y. Shi, J. Zhu, and J. Chen, "Global Instance Relation Distillation for convolutional neural network compression," Neural Computing and Applications, vol. 36, no. 18, pp. 10941-10953, 2024.

[44] T. Huang, S. You, F. Wang, C. Qian, and C. Xu, "Knowledge distillation from a stronger teacher," Advances in Neural Information Processing Systems, vol. 35, pp. 33716-33727, 2022.

[45] Z. Zhang, C. Zhou, and Z. Tu, "Distilling inter-class distance for semantic segmentation," arXiv preprint arXiv:2205.03650, 2022.

[46] A. M. Mansourian, R. Ahamdi, and S. Kasaei, "Aicsd: adaptive inter-class similarity distillation for semantic segmentation," Multimedia Tools and Applications, pp. 1-20, 2025.

[47] C. Wang et al., "Prrd: Pixel-region relation distillation for efficient semantic segmentation," 2023: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 1-5.

[48] Q. Wang, L. Liu, W. Yu, S. Chen, J. Gong, and P. Chen, "BCKD: block-correlation knowledge distillation," In 2023 IEEE International Conference on Image Processing (ICIP), pp. 3225-3229.

[49] C. Wang, J. Zhong, Q. Dai, R. Li, Q. Yu, and B. Fang, "Local structure consistency and pixel-correlation distillation for compact semantic segmentation," Applied Intelligence, vol. 53, no. 6, pp. 6307-6323, 2023.

[50] A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images.(2009)," ed, 2009.

[51] A. Coates, A. Ng, and H. Lee, "An analysis of single-layer networks in unsupervised feature learning," 2011: JMLR Workshop and Conference Proceedings, pp. 215-223.

[52] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255.

[53] J. Yang, X. Zhu, A. Bulat, B. Martinez, and G. Tzimiropoulos, "Knowledge distillation meets open-set semi-supervised learning," International Journal of Computer Vision, vol. 133, no. 1, pp. 315-334, 2025.

[54] G. E. Hinton and S. Roweis, "Stochastic neighbor embedding," Advances in neural information processing systems, vol. 15, 2002.

[55] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921-2929

دوره 12، شماره 4
بهار 1405
صفحه 1-17

XML

اصل مقاله 1.65 M

تعداد مشاهده مقاله 83
تعداد دریافت فایل اصل مقاله 83

مجله ماشین بینایی و پردازش تصویر

چکانش دانش چندمرحله‌ای بر پایه بازنمایی‌های مبتنی بر زیر‌فضا

دوره 12، شماره 4بهار 1405صفحه 1-17

فایل ها

هم رسانی

ارجاع به این مقاله

آمار

دوره 12، شماره 4
بهار 1405
صفحه 1-17