3 projects lighting a fire under machine learning

PyTorch, MXNet, and upstart Java-centric Smile bring greater speed and ease to model training and deployment

Comments

Mention machine learning, and many common frameworks pop into mind, from “old” stalwarts like Scikit-learn to juggernauts like Google’s TensorFlow. But the field is large and diverse, and useful innovations are bubbling up across the landscape.

Recent releases from three open source projects continue the march toward making machine learning faster, more scalable, and easier to use. PyTorch and Apache MXNet bring GPU support to machine learning and deep learning in Python.

Smile promises speed, convenience, and a comprehensive library of machine learning algorithms to Java developers. Read on for the finer details.

PyTorch 0.2.0

Pythonistas have easily the biggest roster of machine learning libraries to choose from. PyTorch, a deep learning tensor library with GPU acceleration, provides yet another compelling option.

PyTorch was built first and foremost to be used in the Python ecosystem. Many of its functions can either replace or complement Python math-and-stats packages like NumPy, and it can extend Python’s multiprocessing functions to share memory for Torch jobs. But PyTorch is meant to be useful to most anyone doing deep learning or machine learning, since it includes features like modifying an existing, trained neural network without having to start from scratch.

The 0.2.0 release of PyTorch unveils a major architectural change, dubbed Distributed PyTorch. Tensors in PyTorch can now be scaled and distributed across multiple machines for faster processing. Multiple network back ends let you choose a model for scaling that best suits your network topology or infrastructure. The Gloo library, for instance, can use Nvidia’s GPUDirect interconnect for fast transfers between GPUs on different machines.

Other additions to PyTorch 0.2.0 include:

Tensor broadcasting. Python’s NumPy library allows arrays with different shapes to be treated as if they were the same size, as a way to speed up processing across multiple arrays. This technique, called “broadcasting,” can now be performed in PyTorch on tensors as well, without copying operations that kill performance and eat up memory.
Higher order gradients. This often-requested addition to PyTorch allows you to compute a whole new class of functions (e.g., unrolled generative adversarial networks) directly in PyTorch. However, to take full advantage of it you need to create certain kinds of PyTorch functions using a new methodology. (Old-style functions will continue to work.)
Advanced tensor and variable indexing. Another concept borrowed from NumPy, advanced indexing lets you select arbitrary slices of a tensor without having to jump through additional programming hoops.

Apache MXNet 0.11.0

If TensorFlow is Google’s answer to deep learning, Apache MXNet is Amazon’s. The latest version of the MXNet framework, designed to work at massive scale, adds only a couple of new features. But both are intended to make MXNet useful to developers working to bring machine learning intelligence to user-facing products:

Support for MXNet models on Apple devices via the Core ML model format. Core ML is Apple’s framework for creating machine learning models compact enough to run on smartphone-grade devices. MXNet models can now be converted to the Core ML format, so models trained in the cloud (specifically, Amazon’s) can be converted and deployed on iOS or MacOS devices. The toolset used to perform the conversion can also convert models from the Caffe framework, so it isn’t as if MXNet has a monopoly on targeting Core ML.
Keras 1.2 support. Python library Keras makes it easier to program neural networks by way of frameworks like TensorFlow, Theano, and MXNet. MXNet now works with Keras 1.2, and allows Keras to scale across multiple GPUs when used as a back end.

Smile 1.4.0

The name is an acronym: Statistical Machine Intelligence and Learning Engine. Smile gives you a broad range of algorithms out of the box, ranging from simple functions like classification and regression to sophisticated offerings like natural language processing. And all you need is Java, or any JVM language.

According to Smile’s quick-start document, the point of using Java APIs is to allow model training and implementation to be done in the same environments. Models created in Smile can be piped into other Java apps, including Apache Spark, using Java’s native serialization methods. For data visualization, Smile provides a library called SmilePlot, built with Swing.

Other languages that run on the JVM—in particular, Scala—can plug into Smile and build atop it. That said, a downside of Smile’s Java-centric approach is the complete absence of native support for non-JVM languages. Machine learning mavens relying on Python will either have to roll their own wrappers or switch to Java.