Abstract:
With the widespread application of deep generative models in the field of computer vision, the Stable Diffusion model has become a popular choice for architectural image generation due to its excellent performance in text-to-image generation tasks. However, the complexity of architectural image generation tasks requires the model to not only generate visuals that meet architectural design requirements, but also to consider multi-dimensional features such as structure, spatial layout, and materials. Therefore, optimizing the model's training process to improve its generation performance has become an important research topic.