- (arXiv 2020.12) SceneFormer: Indoor Scene Generation with Transformers, [Paper]
- (arXiv 2021.05) SCTN: Sparse Convolution-Transformer Network for Scene Flow Estimation, [Paper]
- (arXiv 2021.06) P2T: Pyramid Pooling Transformer for Scene Understanding, [Paper], [Code]
- (arXiv 2021.07) Scenes and Surroundings: Scene Graph Generation using Relation Transformer, [Paper]
- (arXiv 2021.07) Spatial-Temporal Transformer for Dynamic Scene Graph Generation, [Paper]
- (arXiv 2021.09) BGT-Net: Bidirectional GRU Transformer Network for Scene Graph Generation, [Paper]
- (arXiv 2021.11) Compositional Transformers for Scene Generation, [Paper]
- (arXiv 2021.11) Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations, [Paper], [Project]
- (arXiv 2021.12) SGTR: End-to-end Scene Graph Generation with Transformer, [Paper]
- (arXiv 2022.01) RelTR: Relation Transformer for Scene Graph Generation, [Paper], [Code]
- (arXiv 2022.03) Relationformer: A Unified Framework for Image-to-Graph Generation, [Paper]
- (arXiv 2022.05) ASSET: Autoregressive Semantic Scene Editing with Transformers at High Resolutions, [Paper], [Code]
- (arXiv 2022.06) Object Scene Representation Transformer, [Paper], [Project]
- (arXiv 2022.11) SG-Shuffle: Multi-aspect Shuffle Transformer for Scene Graph Generation, [Paper]
- (arXiv 2022.11) Iterative Scene Graph Generation with Generative Transformers, [Paper]
- (arXiv 2022.12) SrTR: Self-reasoning Transformer with Visual-linguistic Knowledge for Scene Graph Generation, [Paper]
- (arXiv 2023.03) Transformer-based Image Generation from Scene Graphs, [Paper], [Code]
- (arXiv 2023.03) Revisiting Transformer for Point Cloud-based 3D Scene Graph Generation, [Paper]
- (arXiv 2023.03) Learning Similarity between Scene Graphs and Images with Transformers, [Paper]
- (arXiv 2023.04) RePAST: Relative Pose Attention Scene Representation Transformer, [Paper]
- (arXiv 2023.05) HSCNet++: Hierarchical Scene Coordinate Classification and Regression for Visual Localization with Transformer, [Paper]
- (arXiv 2023.05) PanoContext-Former: Panoramic Total Scene Understanding with a Transformer, [Paper]
- (arXiv 2023.06) InvPT++: Inverted Pyramid Multi-Task Transformer for Visual Scene Understanding, [Paper], [Code]
- (arXiv 2023.06) ViTEraser: Harnessing the Power of Vision Transformers for Scene Text Removal with SegMIM Pretraining, [Paper], [Code]
- (arXiv 2023.08) Generalized Unbiased Scene Graph Generation, [Paper]
- (arXiv 2023.08) Vision Relation Transformer for Unbiased Scene Graph Generation, [Paper],[Code]
- (arXiv 2023.09) RoadFormer: Duplex Transformer for RGB-Normal Semantic Road Scene Parsing, [Paper],[Code]
- (arXiv 2023.09) Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation, [Paper]
- (arXiv 2023.10) Towards Grouping in Large Scenes with Occlusion-aware Spatio-temporal Transformers, [Paper],[Code]
- (arXiv 2023.11) Towards a Unified Transformer-based Framework for Scene Graph Generation and Human-object Interaction Detection, [Paper]
- (arXiv 2023.11) TSP-Transformer: Task-Specific Prompts Boosted Transformer for Holistic Scene Understanding, [Paper],[Code]
- (arXiv 2023.11) VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation, [Paper]
- (arXiv 2023.12) Gaussian Grouping: Segment and Edit Anything in 3D Scenes, [Paper],[Code]
- (arXiv 2024.01) Dream360: Diverse and Immersive Outdoor Virtual Scene Creation via Transformer-Based 360 Image Outpainting, [Paper]
- (arXiv 2024.01) SGTR+: End-to-end Scene Graph Generation with Transformer, [Paper],[Code]
- (arXiv 2024.02) S^2Former-OR: Single-Stage Bimodal Transformer for Scene Graph Generation in OR, [Paper]
- (arXiv 2024.02) Vision Transformers with Natural Language Semantics, [Paper]
- (arXiv 2024.03) Can Transformers Capture Spatial Relations between Objects, [Paper],[Code]
- (arXiv 2024.03) DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation, [Paper],[Code]
- (arXiv 2024.03) SceneTracker: Long-term Scene Flow Estimation Network, [Paper],[Code]
- (arXiv 2024.04) From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models, [Paper],[Code]
- (arXiv 2024.04) EGTR: Extracting Graph from Transformer for Scene Graph Generation, [Paper],[Code]
- (arXiv 2024.05) Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies, [Paper],[Code]
- (arXiv 2024.07) BCTR: Bidirectional Conditioning Transformer for Scene Graph Generation, [Paper]
- (arXiv 2024.12) GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding, [Paper]