Skip to content

MLJ Setup for User Testing

Anthony Blaom, PhD edited this page Jun 16, 2019 · 9 revisions

These instructions are aimed at users with little or no prior experience of Julia.

Important note. Query any method's doc string in Julia with ?methodname at the REPL prompt.

Essentials

  • Download and install Julia.

  • Run the Julia REPL from the command line with $ julia. You should get a julia> prompt.

As in python, a Julia environment specifies a collection of locally installed versioned packages that can be safely loaded simultaneously. Julia can only load code from a package contained in the currently active environment.

Create an enviroment sprint for MLJ user testing as follows:

  • From the REPL enter the package manager by typing ]. You should now get a (v1.1) pkg> prompt or similar.

  • Enter these instructions:

activate --shared sprint
add MLJ#master
add MLJModels#master

Hit delete to exit the package manager.

You can now load MLJ itself with using MLJ. Entering the following in the REPL should now work:

models() # return a dictionary of MLJ registered models, keyed on package name
task = load_boston()
model = KNNRegressor(K=3)
mach = machine(model, task)
evaluate!(mach)

To load the code defining a model listed by models() into MLJ you must first add the package defining the model to your active environment. For example, having run add DecisionTree in the package manager, you can then run these commands at the julia> prompt:

@load DecisionTreeRegressor                 # load package code defining model
model = DecisionTreeRegressor(max_depth=4)  # instantiate model

Other auxilliary packages you may want to add to sprint now are: DataFrames, RDatasets, Plots (for plotting), and either PyPlot or Plotly (plotting backends). To use these packages use the using command from the REPL prompt.

For more on using MLJ, see the MLJ documentation.

You can search all Julia packages and their documentation from here.

Exit the REPL with ctrl-D. Activate your sprint environment in each new REPL session with ]activate --shared sprint.

Running Jupyter notebooks containing Julia script

To run or create Julia notebooks using Jupyter requires you to install a Julia kernel for Jupyter. Do this by following these instructions. To test this functionality and tour some MLJ's features:

  • Clone the MLJ repo

  • Run juptyer notebook tour.ipynb from the subdirectory examples/tour/

The notebooks in examples activate and instantiate locally defined environments (given by the Project.toml and ManifestFile.toml files) so no need to worry about that. If you do want to activate your sprint environment from a notebook then you need to run a cell with this code in it:

using Pkg
Pkg.activate("sprint", shared=true)

Other ways to interact with Julia

There is an Emacs julia-repl mode, a VS Code extension, and a Julia IDE called Juno.

Working with tabular data

MLJ supports all the Julia in and out-of-memory formats for tabular data which additionally implement a common API specified in Tables.jl. The most familiar in-memory format for python and R users is the DataFrame format from DataFrames.jl. DataFrame objects are read from CSV-like files using a separate package, CSV.jl, as in this example (add CSV to your environment if necessary):

using CSV
df = CSV.read("iris.csv")
df.sepal_length == df[1]  # true
df[1:10, :]               # first 10 rows

A very basic but up-do-date tutorial on using DataFrames is here.