I'm a Computer Vision Engineer.
Main Research Areas:
Multimodal Large Language Models/Visual Language Model Pre-training/3D Vision
Base: BeiJing
Email: [email protected]
- 2024 π Video Pretrain Speedup: V-SWIFT: Training a Small VideoMAE Model on a Single Machine in a Day.
- 2024 π Visual Auto-Regressive: VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modelin.
- 2024 π EPIC-Kitchens Challenges(CVPR Workshop 2024): Rank the 2nd place Action Recognition/Action Detection.
- 2024 π MLLM: Croc : Pretraining Large Multimodal Models with Cross-Modal Comprehension.
- 2024 π Stereo Matching: A Transformer-Based Architecture for High-Resolution Stereo Matching.
- 2022 π PointCloud Classification Challenges(ECCV Workshop 2022): Rank the 2nd place Point-Voxel Adaptive Feature Abstraction for Robust Point Cloud Classification.
- 2021 π PointCloud Registration Challenges(ICCV Workshop 2021): Rank the 2nd place Point Cloud Registration using Representative Overlapping Points.
- 2018 π ImageMatch/FeatureMatch(Chinese Journals): A Review of Image Matching Methods.
- 2017 π ImageMatch/TemplateMatch(Chinese Journals): Template selection and matching algorithm for image matching.