Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

np.load, np.save missing #48

Open
Fokatu opened this issue Jan 28, 2023 · 9 comments
Open

np.load, np.save missing #48

Fokatu opened this issue Jan 28, 2023 · 9 comments

Comments

@Fokatu
Copy link

Fokatu commented Jan 28, 2023

Is it possible to add these two functions to load and save npy file?
Thanks!

@KevinBaselinesw
Copy link
Collaborator

I think I looked at np.load and np.save before. It is a very big task to implement these and I didn't have time to do it. Nobody has asked for them in 5 years so they are not the most commonly used functions.

In python/numpy there is a very significant performance improvement by saving/loading the arrays with those functions because they are written in C and not python. My view is that an application user of numpydotnet should probably write their own specific load/save functions. Since it would be in .NET either way, there is no performance to be gained by replicating these functions in the library.

Can you implement your own load and save functions?
Does the format need to be 100% compatible with python/numpy?
Are you exchanging files with python/numpy?

FYI, we do support np.tofile and np.fromfile

If you really need to be 100% compatible with the python file format, let me know.
I can also offer tips on accessing any arrays you want to save to file.

Also, we recently added support for serializing ndarrays. See ndarray.ToSerializable(). It will return a class object that can be serialized to a string and written to a disk file.

@rainyl
Copy link
Contributor

rainyl commented Mar 28, 2023

I think it is not necessary to add .npy support for numpy.net, even as a python numpy user, I hardly use np.save or np.load, I prefer text file like .csv, if higher performance needed, I think feather is a better choice.

@rainyl
Copy link
Contributor

rainyl commented Mar 28, 2023

@KevinBaselinesw I noticed that you are working on np.load for binary files, an advice, text files and binary files can use different api, for example, numpy use np.loadtxt for text file and np.load for binary file, which will be more clear and direct.

PS: the current np.fromfile in this project is also not complete, e.g., skip_rows, comments are useful when the input file has extra info, but they are not included. np.loadtxt()

Finally, I REALLY appreciate your works, I explored many multi-dimensional array libraries for C#, such as NumSharp, Numpy.NET, Tensor.NET from SciSharp, thanks for their efforts, but they are not convenient and complete as this work, TorchSharp is great but I don't want to install such a huge library just for multi-dimension array calculation.

Thanks for your work again!!! 😄

@KevinBaselinesw
Copy link
Collaborator

@rainyl I have tried to port np.load() but it is very complicated and too much work.

I recommend that people use the newly added .ToSerializable() method. This is probably a more modern way to save, share and restore data structures in a .NET application. The serializable data structures can then be converted to json or XML for writing to a file, database or network API. But of course it is not binary compatible with the python output of np.save() and np.load()

      [TestMethod]
       public void test_ndarray_serialization_newtonsoft()
       {
           var a = np.array(new int[] { 0, 1, 2, 3, 4, 5, 6, 7, 8 }).reshape(3,3);
           AssertArray(a, new int[,] { { 0, 1, 2 }, { 3, 4, 5 }, { 6, 7, 8 } });

           var A_ArraySerializedFormat = a.ToSerializable();   <-these are equivalent 
           A_ArraySerializedFormat = np.ToSerializable(a);  <-these are equivalent


           var A_Serialized = SerializationHelper.SerializeNewtonsoftJSON(A_ArraySerializedFormat);
           var A_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(A_Serialized);

           Console.WriteLine("AA");
           print(A_Serialized);


           var b = new ndarray(A_Deserialized);  <- restores the serialized ndarray

           var B_ArraySerializedFormat = b.ToSerializable();
           var B_Serialized = SerializationHelper.SerializeNewtonsoftJSON(B_ArraySerializedFormat);
           var B_Deserialized = SerializationHelper.DeSerializeNewtonsoftJSON<ndarray_serializable>(B_Serialized);
           Console.WriteLine("\n\nBB");
           print(B_Serialized);

           Assert.AreEqual(0, string.Compare(A_Serialized, B_Serialized));
           Assert.AreEqual(a.Dtype.TypeNum, b.Dtype.TypeNum);
           Assert.AreEqual(a.Dtype.str, b.Dtype.str);
           Assert.AreEqual(a.Dtype.alignment, b.Dtype.alignment);
           Assert.AreEqual(a.Dtype.ElementSize, b.Dtype.ElementSize);
           Assert.AreEqual(a.Dtype.Kind, b.Dtype.Kind);

       }

   public static class SerializationHelper
   {
       public static T DeserializeXml<T>(this string toDeserialize)
       {
           System.Xml.Serialization.XmlSerializer xmlSerializer = new System.Xml.Serialization.XmlSerializer(typeof(T));
           using (System.IO.StringReader textReader = new System.IO.StringReader(toDeserialize))
           {
               return (T)xmlSerializer.Deserialize(textReader);
           }
       }

       public static string SerializeXml<T>(this T toSerialize)
       {
           System.Xml.Serialization.XmlSerializer xmlSerializer = new System.Xml.Serialization.XmlSerializer(typeof(T));
           using (System.IO.StringWriter textWriter = new System.IO.StringWriter())
           {
               xmlSerializer.Serialize(textWriter, toSerialize);
               return textWriter.ToString();
           }
       }

       public static string SerializeNewtonsoftJSON<T>(this T toSerialize)
       {
           return Newtonsoft.Json.JsonConvert.SerializeObject(toSerialize);
       }

       public static T DeSerializeNewtonsoftJSON<T>(this string toDeserialize)
       {
           return Newtonsoft.Json.JsonConvert.DeserializeObject<T>(toDeserialize);
       }

   }

@rainyl
Copy link
Contributor

rainyl commented Mar 28, 2023

Yes, you are right, for saving files, .ToSerializable() is a better choice. Considering the complexity, I think it's not necessary to support np.load, at least with lower priority. Anyway, it's only a immature suggestion from a new user, thanks for your reply :)

@GeorgeS2019
Copy link

@KevinBaselinesw

Could you evaluate if these codes are useful "Starting Ideas" to address the missing np.load and np.save functions

https://github.com/SciSharp/NumSharp/blob/master/src/NumSharp.Core/APIs/np.load.cs

Less relevant, but FYI
https://github.com/SciSharp/Numpy.NET/blob/main/src/Numpy/np.io.gen.cs

@thalesfm
Copy link

I've been working on implementing np.load for a project of mine since I needed access to a dataset that uses the .npz format. It's not a 1-to-1 port since I don't handle structured data types, but I've gotten most of the primitive types working so far. Would you be open to include this feature if I submit a pull request?

@GeorgeS2019
Copy link

submit the PR, it will be useful for the community, especially those from the Scisharp

@KevinBaselinesw
Copy link
Collaborator

sorry I was out of town for a few days.

Feel free to create a PR (with some unit tests please). I will review it and potentially accept it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants