Skip to content

Song et al., End-to-End Image Stitching Network via Multi-Homography Estimation, IEEE SPL, 2021

License

Notifications You must be signed in to change notification settings

EadCat/EEISN_SPL21

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

End-to-End Image Stitching Network via Multi-Homography Estimation

Official project page "End-to-End Image Stitching Network via Multi-Homography Estimation"

Accepted at IEEE Signal Processing Letters 2021

https://ieeexplore.ieee.org/document/9393563

Title

Dae-Young Song[1] , Gi-Mun Um[2], Hee Kyung Lee[2] and Donghyeon Cho[1]

This paper is one of the outcomes of an industry-academic cooperation project with ETRI (Electronics and Telecommunications Research Institute).

The dataset and source code are ETRI's assets.

[1] The Department of Electronics Engineering, Chungnam National University

[2] Communication Media Research Laboratory, ETRI

I. Abstract

In this paper, we propose an end-to-end stitching network, which takes two images with a narrow field of view (FOV) as inputs, and produces a single image with a wide FOV. Our method estimates multiple homographies to cover the depth differences in the scene and is therefore robust against parallax distortion. In particular, global warping maps are generated using estimated multiple homographies and adjusted by local displacement maps. The final result is made by warping input images multiple times using the warping maps and then merging warped images with the weight maps. Multiple homographies, local displacement maps, and weight maps are generated simultaneously by our stitching network. To train the stitching network, we construct a dataset using the CARLA simulator. Then, using this dataset, our network is trained by end-to-end supervised learning based on appearance matching loss and depth layer loss. In experiments, we show that our method is superior to existing methods both qualitatively and quantitatively. Also, we provide various empirical studies for in-depth analysis as well as the result of the expansion to 360 degree panoramas.

II. End-to-End Stitching

Title

III. Training Dataset in the Paper

all datasets are produced using CARLA[3] simulator and autopilot.

  • total 42,777 sets

  • Input FoV : 90 degree (rotated ±30 degree from ground-truth)

  • Ground-truth FoV : 120 degree

  • Direction : front (17,599 sets) + right (25,178 sets)

  • input image configuration

    Title

  • Examples

    left (front dir.) right (front dir.)
    Title Title
    ground-truth (front dir.) depth (front dir.)
    Title Title
    left (right dir.) right (right dir.)
    Title Title
    ground-truth (right dir.) depth (right dir.)
    Title Title

[3] : https://github.com/carla-simulator/carla

IV. Network Architecture

*(asterisk) indicates that BatchNorm and ELU layer follow immediately after the marked layer.

@ indicates that layer receives skip-connection input.

N indicates # of depth layers. (2N=K, where K is the number of all homographies)

1. Encoder

layer kernel stride channel pad scale_in scale_out
Conv1* 7 1 3/32 3 1 1
Conv1b* 7 2 32/32 3 1 2
Conv2* 5 1 32/64 2 2 2
Conv2b* 5 2 64/64 2 2 4
Conv3* 3 1 64/128 1 4 4
Conv3b* 3 2 128/128 1 4 8
Conv4* 3 1 128/256 1 8 8
Conv4b* 3 2 256/256 1 8 16
Conv5* 3 1 256/512 1 16 16
Conv5b* 3 2 512/512 1 16 32
Conv6* 3 1 512/512 1 32 32
Conv6b* 3 2 512/512 1 32 64
Conv7* 3 1 512/512 1 64 64
Conv7b* 3 2 512/512 1 64 128

2. Decoder

layer kernel stride channel pad scale_in scale_out
Upconv7* 3 1 512/512 1 128 64
Iconv7@* 3 1 1024/512 1 64 64
Upconv6* 3 1 512/512 1 64 32
Iconv6@* 3 1 1024/512 1 32 32
Upconv5* 3 1 512/256 1 32 16
Iconv5@* 3 1 512/256 1 16 16
Upconv4* 3 1 256/128 1 16 8
Iconv4@* 3 1 256/128 1 8 8
Upconv3* 3 1 128/64 1 8 4
Iconv3@* 3 1 128/64 1 4 4
Upconv2* 3 1 64/32 1 4 2
Iconv2@* 3 1 64/32 1 2 2
Upconv1* 3 1 32/16 1 2 1
Iconv1* 3 1 16/16 1 1 1

3. Regressor

layer kernel stride channel pad scale_in scale_out
Conv45* 3 2 1024/512 1 128 128
Conv5a* 3 1 512/512 1 128 128
Conv5b* 3 1 512/512 1 128 128
Conv56* 1 2 512/512 1 128 128
Conv6a* 1 1 512/512 1 128 128
Conv6b* 1 1 512/512 1 128 128
avg_pool
FC 512/[2N x 2 x 3]
reshape [Batch x 2N x 2 x 3]

4. Displacement Map Generation

 - Conv (16 / [2 x 2N], kernel=3, stride=1, pad=1)
 - Tanh

5. Weight Map Generation

 - Conv (16 / [2 x 2N], kernel=3, stride=1, pad=1)
 - Softmax

V. Performance

Ablation studies according to the number of homographies(K)

Table1


Method Evaluation

Table2

VI. Citation

@ARTICLE{9393563,
  author={Song, Dae-Young and Um, Gi-Mun and Lee, Hee Kyung and Cho, Donghyeon},
  journal={IEEE Signal Processing Letters}, 
  title={End-to-End Image Stitching Network via Multi-Homography Estimation}, 
  year={2021},
  volume={28},
  number={},
  pages={763-767},
  doi={10.1109/LSP.2021.3070525}}

About

Song et al., End-to-End Image Stitching Network via Multi-Homography Estimation, IEEE SPL, 2021

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published