Citation: Jiameng Yuan, Lang Chen, Weiya Chen, Hanbin Luo. Research on Multimodal Retrieval Methods for Historical Buildings. Journal of Information Technologyin Civil Engineering and Architecture, 2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02
2024, 16(4): 7-13. doi: 10.16670/j.cnki.cn11-5823/tu.2024.04.02
Research on Multimodal Retrieval Methods for Historical Buildings
1. | School of Civil and Hydraulic Engineering, Huazhong University of Science and Technology, Wuhan 430074, China |
2. | National Digital Construction Technology Innovation Center, Wuhan 430074, China |
The retrieval of historical buildings in HBIM database faces three main issues: 1) the absence of universal rules for determining the similarity between buildings; 2) the neglect of historical and cultural information inherent to the buildings themselves; 3) most queries rely on keywords, which imposes limitations of available information. Addressing these challenges, this paper introduces a multimodal retrieval approach for historical buildings. Users can retrieve a list of buildings matching their input features, whether through images or natural language text data. For image-based building retrieval, the "dino_vit16" model is employed for feature extraction, achieving a retrieval accuracy of 90.08% with the proposed image-building retrieval method. For text-based building retrieval, a connection between images and text is established through the CLIP model. The study explores the values of image-text similarity and text similarity weights, selecting m=0.6 and n=0.4 as the optimal configuration for these weights. Experimental results have shown that the proposed text-based architectural retrieval algorithm performs best when the query statement contains a specific visual feature, and it performs worst when the query statement describes a particular function and architectural style. However, when the query statement includes four or more mixed features that accurately describe the fundamental appearance of a building, it can accurately retrieve buildings that meet the criteria.
[1] |
Murphy M., Mcgovern E., Pavia S., et al. Historic building information modelling (HBIM), 2009: 311-327. |
[2] |
Dore C., Murphy M. . Integration of historic building information modeling (HBIM) and 3D GIS for recording and managing cultural heritage sites[C]// International Conference on Virtual Systems and Multimedia, 2012. |
[3] |
Murphy M., Corns A., Cahill J., et al. Developing Historic Building Information Modelling Guidelines and Procedures for Architectural Heritage in Ireland[J]. Semantic Scholar, 2017, 8: 539-546. |
[4] |
López F J, Lerones P M, Llamas J, et al. A review of heritage building information modeling (HBIM)[J]. Multimodal Technologies and Interaction, 2018, 2(2): 21.doi: 10.3390/mti2020021 |
[5] |
Devesh R, Jha J, Jayaswal R, et al. Retrieval of monuments images through ACO optimization approach[J]. Int. Res. J. Eng. Technol, 2017, 4(7): 279-285. |
[6] |
Devesh R, Jha J. An Efficient Approach for Monuments Image Retrieval Using Multi-visual Descriptors[C]//Proceeding of the Second International Conference on Microelectronics, Computing & Communication Systems (MCCS 2017). Springer Singapore, 2019: 281-293. |
[7] |
Jha J, Bhaduaria S S. A novel approach for retrieval of historical monuments images using visual contents and unsupervised machine learning[J]. Int J, 2020, 9(3). |
[8] |
文政颖, 卫欣. 多分辨批量古典建筑图像深度学习检索算法[J]. 河南工程学院学报(自然科学版), 2019, 31(02): 66-71. |
[9] |
杨蕾. 基于深度学习的地标建筑图像检索研究与实现[D]. 西安: 西安建筑科技大学, 2022. |
[10] |
Agarwal A, Saxena V. Content based multimodal retrieval for databases of Indian monuments[C]//Contemporary Computing: Third International Conference, IC3 2010, Noida, India. Proceedings, Part Ⅰ 3. Springer Berlin Heidelberg, 2010: 446-455. |
[11] |
Wu H, Mao J, Zhang Y, et al. Unified visual-semantic embeddings: Bridging vision and language with structured meaning representations[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 6609-6618. |
[12] |
Matsubara T. Target-oriented deformation of visual-semantic embedding space[J]. Leice Transactions on Information and Systems, 2021, 104(1): 24-33. |
[13] |
Wang Z, Liu X, Li H, et al. Camp: Cross-modal adaptive message passing for text-image retrieval[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019: 5764-5773. |
[14] |
Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.doi: 10.1145/3065386 |
[15] |
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition[J]. arXiv preprint arXiv, 2014: 1409.1556. |
[16] |
Chen W, Liu Y, Wang W, et al. Deep learning for instance retrieval: A survey[J]. arXiv preprint arXiv, 2021: 2101.11282. |
[17] |
Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017, 30. |
[18] |
Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv, 2010: 11929, 2020. |
[19] |
Caron M, Touvron H, Misra I, et al. Emerging properties in self-supervised vision transformers[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021: 9650-9660. |
[20] |
Radford A, Kim J W, Hallacy C, et al. Learning transferable visual models from natural language supervision[C]//International Conference on Machine Learning. PMLR, 2021: 8748-8763. |
[21] |
青岛安娜别墅_百度百科(baidu. com) [EB/OL] (2023-08-19) [2023-10-28] |
[22] |
Chen W, Yuan J, Luo H. Design and development of heritage building information model (HBIM) database to support maintenance[J]. EG-ICE International Workshop on Intelligent Computing in Engineering, 2022: 359-367. |
Metrics
- PDF Downloads(14)
- Abstract views(525)
- HTML views(245)