-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert all code to Cython #108
Comments
Converting COAL to Cython might reduce the cost of some of the pixel iteration, but I don't expect it would make a significant difference since I believe most time during mineral classification is spent inside of Spectral Python's spectral angle mapper. Be sure to weigh the cost of rewriting and the cost of using nonstandard Python against the possible time benefits. See the performance issue for a general discussion on improving time efficiency. Another thing to look into is how much we can shrink the data during processing (from 224 bands down to to 128 or 64 or less) and still maintain acceptable accuracy. There is probably significant room for improvement by throwing away extraneous data and by finding inefficiencies in the algorithm. Using the subset/threshold approach saves on a lot of the comparisons, but I found that classifying with the full spectral library was probably more accurate. Something that probably would benefit from being written in a more efficient language is the spectral angle mapper algorithm itself. That is, it might be worth looking into either contributing improvements to Spectral Python or reimplementing some of the algorithms from scratch. Spectral's implementation is not all that complex and could easily be reimplemented by hand. The linear algebra library methods (dot products, etc.) are probably the biggest time sinks in this code. Interpreted languages like MATLAB are efficient not because they have tight for loops but because they have fast built-in matrix operators. |
Also just FYI I don't mean to discourage work on this, I just tend to play devil's advocate when collaborating on things. |
Assuming that mineral classification is the biggest time sink, the callback we are considering implementing would provide one way to address this issue. The callback could point to an efficient compiled function which would hopefully speed things up even if the rest of the toolkit is in plain old Python. The NumPy SAM implementation is probably not half bad however. At this point the performance analysis is largely speculative so it would probably be best to profile things before attempting to optimize. |
In an effort to keep the issue tracker up to date, I will close this issue and recommend that the action items above be pursued incrementally in future issues: In particular, logging, profiling, and optimizing. However, if there are any arguments in favor of committing to the Cython language then this issue can be reopened and pursued by anyone motivated enough convert all the code, as I am not deeply attached to Python. The fundamental goal here is to improve efficiency which brings up a lot of subtleties in the current code and architectural issues in future versions of the library that I think have less to do with the language and more to do with the algorithm. The key question is whether the classifier or the code calling the classifier dominates the running time. This is not totally obvious given things like file I/O, but my intuition is that the classification algorithm would dominate especially when more intensive machine learning algorithms are utilized. What I am most likely to pursue for my own purposes down the road is factoring out the reusable parts of the library from the parts dedicated to coal mining, or otherwise implement them equivalently. A core library built on top of existing high-performance remote sensing and machine learning libraries (Orfeo ToolBox being one very good candidate) and possibly implemented in a language like C++ would expose a general interface that could then be encapsulated in COAL and other projects while maintaining the user-friendly Python API and compatibility with all of the existing deployment processes. |
Cython is a superset of the Python programming language, designed to give C-like performance with code which is mostly written in Python. It is well known to have significant speed improvements over Python code... something we would greatly benefit from when using XSEDE compute resources.
I will begin implementing the Cython enhancements after I've implemented application logging and timeit. We can then easily undertake comparisons in runtime code execution between Cython and Python versions of the pycoal toolkit.
The text was updated successfully, but these errors were encountered: