Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is TCN suitable for spatio-temporal data? #73

Open
taheramii opened this issue Dec 21, 2022 · 7 comments
Open

Is TCN suitable for spatio-temporal data? #73

taheramii opened this issue Dec 21, 2022 · 7 comments

Comments

@taheramii
Copy link

I have dimensional spatio-temporal data which the spatial part is represented by 2D matrices(like an RGB image). How can I feed the data to the TCN?

@alexmehta
Copy link

I have a similar question. I want to use a TCN for video data. Anyone have any ideas?

@alexmehta
Copy link

I have found a solution using an encoder.

@taheramii
Copy link
Author

I have found a solution using an encoder.

I was wondering if you could share the solution?

Thanks,
Taher

@zeroocean
Copy link

Did you find a solution?

@alexmehta
Copy link

Just use any encoder and set channels to the output dim for one time step of the encoder. For example if you have some CNN model that inputs image (n_imgs,112,112) and outputs (n_imgs, channels), you simply input that into a CNN making sure that n_channels = channels and n_imgs is the length not the channels (possibly requiring reshaping).

Lmk if that makes sense.

@chc-tw
Copy link

chc-tw commented Jul 20, 2023

Just use any encoder and set channels to the output dim for one time step of the encoder. For example if you have some CNN model that inputs image (n_imgs,112,112) and outputs (n_imgs, channels), you simply input that into a CNN making sure that n_channels = channels and n_imgs is the length not the channels (possibly requiring reshaping).

Lmk if that makes sense.

You are correct in saying that we can use any CNN backbone initially to transform the input images (n_imgs, W, H, C) into (n_imgs, W', H', C'), where W', H', and C' are derived from the last feature map. To reduce the dimensions of W and H, we can employ either flattening or global average pooling (which is recommended) so that the dimension becomes (n_imgs, C'). Afterward, we can feed the transformed data into TCN.

Please let me know if you need any further clarification.

@Wei4lei
Copy link

Wei4lei commented Sep 4, 2023

How does it perform?

只需使用任何编码器,并将通道设置为编码器的一个时间步长的输出调光。例如,如果您有一些 CNN 模型输入图像 (n_imgs,112,112) 和输出(n_imgs,通道),您只需将其输入到 CNN 中,确保 n_channels = 通道和 n_imgs 是长度而不是通道(可能需要重塑)。
如果这有意义的话,LMK。

您说得对,我们最初可以使用任何CNN主干将输入图像(n_imgs,W,H,C)转换为(n_imgs,W',H',C'),其中W',H'和C'来自最后一个特征图。为了减少 W 和 H 的维度,我们可以采用扁平化或全局平均池化(推荐),使维度变为 (n_imgs, C')。之后,我们可以将转换后的数据输入 TCN。

如果您需要任何进一步的澄清,请告诉我。

How does it perform?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants