Interaction with other libraries
Keras
It's a very romantic notion to think that we can come up with the best features to model our world. That notion has now been dispelled.
Most object detection/labeling/segmentation/classification tasks now have neural network equivalent algorithms that perform on-par with or better than hand-crafted methods.
One library that gives Python users particularly easy access to deep learning is Keras: https://github.com/fchollet/keras/tree/master/examples (it works with both Theano and TensorFlow).
At SciPy2017: "Fully Convolutional Networks for Image Segmentation", Daniil Pakhomov, SciPy2017 (Friday 2:30pm)
Particularly interesting, because such networks can be applied to images of any size
... and because Daniil is a scikit-image contributor 😉
Configurations
From http://www.asimovinstitute.org/neural-network-zoo/:
E.g., see how to fine tune a model on top of InceptionV3:
https://keras.io/applications/#fine-tune-inceptionv3-on-a-new-set-of-classes
In the Keras docs, you may read about
image_data_format
. By default, this ischannels-last
, which is compatible with scikit-image's storage of(row, cols, ch)
.
You can fine-tune Inception to classify your own classes, as described at
https://keras.io/applications/#fine-tune-inceptionv3-on-a-new-set-of-classes
SciPy: LowLevelCallable
https://ilovesymposia.com/2017/03/12/scipys-new-lowlevelcallable-is-a-game-changer/
Parallel and batch processing
Joblib (developed by scikit-learn) is used for:
transparent disk-caching of the output values and lazy re-evaluation (memoize pattern)
easy simple parallel computing
logging and tracing of the execution
Dask is a parallel computing library. It has two components:
Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
“Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.
See Matt Rocklin's blogpost for a more detailed example