Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to choose the indexes of the MultiDimensionalArray to map into the Eigen Matrix in theto_eigen method #88

Closed
xela-95 opened this issue Dec 18, 2024 · 8 comments · Fixed by #89
Assignees

Comments

@xela-95
Copy link
Member

xela-95 commented Dec 18, 2024

Hi,
I am working in loading some complex MAT file in C++ using matioCpp and then obtaining their corresponding Eigen Matrix. One of the data structure I need loaded is of type MultiDimensionalArray with dimension (3,1,15000).

Up to now is not possible to directly use the to_eigen method

template <typename type>
inline Eigen::Map<Eigen::Matrix<type, Eigen::Dynamic, Eigen::Dynamic>> matioCpp::to_eigen(matioCpp::MultiDimensionalArray<type>& input)
{
assert(input.isValid());
assert(input.dimensions().size() == 2);
return Eigen::Map<Eigen::Matrix<type, Eigen::Dynamic, Eigen::Dynamic>>(input.data(), input.dimensions()(0), input.dimensions()(1));
}

to convert it to an Eigen Matrix since the array's dimensions are > 2.

It would be nice if I could select the indices of the dimensions I need when calling this method (in this case they are the 1st and 3rd dimension).

@traversaro
Copy link
Collaborator

Just a shot in the dark, but it could be possible to just expose the MultiDimensionalArray as TensorMap and then do any further manipulation of the array just with Eigen?

@S-Dafarra
Copy link
Member

Just a shot in the dark, but it could be possible to just expose the MultiDimensionalArray as TensorMap and then do any further manipulation of the array just with Eigen?

At the moment TensorMap is not supported, right? I would avoid using that API.

Another direction would be to consider those cases in which only two dimensions are effectively different from 1. I could also check if it is possible to have a squeeze method for MultiDimensionalArray.

@S-Dafarra
Copy link
Member

I could also check if it is possible to have a squeeze method for MultiDimensionalArray.

It is not possible directly, but maybe I can edit the internal matio variable. A bit risky, but I am doing the same to change name. IN the end, the data remains the same, I just need to remove the useless dimensions.

@S-Dafarra
Copy link
Member

Discussing with @xela-95, another option can even be to exploit the Map stride. This would have the added value of giving the user with the possibility of slicing an actual 3D array into a 2D eigen map

@traversaro
Copy link
Collaborator

slicing an actual 3D array into a 2D eigen map

Just a comment: perhaps we can refer to order 3 and order 2 (or something like that?). I am afraid that 3D array may be confused with a first order 3 dimensional vector. Other terms to refer to that are degree or rank, but I think those are more ambiguous.

@S-Dafarra
Copy link
Member

slicing an actual 3D array into a 2D eigen map

Just a comment: perhaps we can refer to order 3 and order 2 (or something like that?). I am afraid that 3D array may be confused with a first order 3 dimensional vector. Other terms to refer to that are degree or rank, but I think those are more ambiguous.

Yeah, I would say to slice a generic MultiDimensionalArray in an Eigen Matrix

@traversaro
Copy link
Collaborator

Just a shot in the dark, but it could be possible to just expose the MultiDimensionalArray as TensorMap and then do any further manipulation of the array just with Eigen?

At the moment TensorMap is not supported, right? I would avoid using that API.

As mentioned in person, indeed the Eigen/Tensor needs to be included with unsupported/Eigen/CXX11/Tensor, but other then that (that is indeed is ugly) it seems to be that the Tensor functionality has been around for a long time and is used in TensorFlow and pybind11 (see https://github.com/tensorflow/tensorflow/blob/6f7232ea51b788ecde72aceb76ea859072969a12/tensorflow/compiler/aot/benchmark_main.template#L22 and pybind/pybind11#4201) so I do think it would be a big problem to start using it. Even if the API changes, the Eigen releases are so rare that it would not a big problem to handle that.

@S-Dafarra
Copy link
Member

S-Dafarra commented Dec 19, 2024

Thanks @traversaro. Nonetheless, I got intrigued by how to handle the slicing. Here is what I came up with.

Let's start considering that we store the elements in column-major order. This means that the element $(i, J, k, l)$ of a tensor of dimension $(n, m, p, k)$ is stored at the position

$$index = i \cdot (1)+ j \cdot (n) + k \cdot (n \cdot m) + l \cdot (n \cdot m \cdot p)$$

Here it can be noticed that each index is multiplied by the product of the previous sizes (using 1 as the size of the dimension before the first). Let's call these components $\gamma_x$, such that

$$index = i \cdot \gamma_I+ j \cdot \gamma_J + k \cdot \gamma_K + l \cdot \gamma_L$$

Now, if we want to slice the tensor into a matrix, we need to define 3 elements:

  • the offset, i.e. the index of the first element of the slice
  • the inner stride, i.e. the index variation to pass from one element to the next in the same column of the slice
  • the outer stride, i.e. the index variation to pass from one element to the next in the same row of the slice

In order to slice we need to define which dimensions are fixed. Then,

  • the offset is the index of the element with the non-fixed dimensions set to zero.
  • the inner stride and the outer sizes are given by the $\gamma_x$ of the non-fixed dimensions. This fact can be easily obtained from the index formula, taking the difference between the index of $(i, J, k, l)$ and $(i, J + 1, k, l)$ for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants