-
Notifications
You must be signed in to change notification settings - Fork 156
MLJ Setup for User Testing
These instructions are aimed at users with little or no prior experience of Julia.
Important note. Query any method's doc string in Julia with ?methodname
at the REPL prompt.
-
Download and install Julia.
-
Run the Julia REPL from the command line with
$ julia
. You should get ajulia>
prompt.
As in python, a Julia environment specifies a collection of locally installed versioned packages that can be safely loaded simultaneously. Julia can only load code from a package contained in the currently active environment.
Create an enviroment sprint
for MLJ user testing as follows:
-
From the REPL enter the package manager by typing
]
. You should now get a(v1.1) pkg>
prompt or similar. -
Enter these instructions:
activate --shared sprint
add MLJ#master
add MLJModels#master
Hit delete
to exit the package manager.
You can now load MLJ itself with using MLJ
. Entering the following in the REPL should now work:
models() # return a dictionary of MLJ registered models, keyed on package name
task = load_boston()
model = KNNRegressor(K=3)
mach = machine(model, task)
evaluate!(mach)
To load the code defining a model listed by models()
into MLJ you must
first add the package defining the model to your active
environment. For example, having run add DecisionTree
in the package
manager, you can then run these commands at the julia>
prompt:
@load DecisionTreeRegressor # load package code defining model
model = DecisionTreeRegressor(max_depth=4) # instantiate model
Other auxilliary packages you may want to add to sprint
now are:
DataFrames, RDatasets, Plots (for plotting), and either PyPlot or
Plotly (plotting backends). To use these packages use the using
command from the REPL prompt.
For more on using MLJ, see the MLJ documentation.
You can search all Julia packages and their documentation from here.
Exit the REPL with ctrl-D
. Activate your sprint
environment in
each new REPL session with ]activate --shared sprint
.
To run or create Julia notebooks using Jupyter requires you to install a Julia kernel for Jupyter. Do this by following these instructions. To test this functionality and tour some MLJ's features:
-
Clone the MLJ repo
-
Run
juptyer notebook tour.ipynb
from the subdirectoryexamples/tour/
The notebooks in examples
activate and instantiate locally defined environments (given by the Project.toml and ManifestFile.toml files) so no need to worry about that. If you do want to activate your sprint
environment from a notebook then you need to run a cell with
this code in it:
using Pkg
Pkg.activate("sprint", shared=true)
There is an Emacs julia-repl
mode, a VS Code extension,
and a Julia IDE called Juno.
MLJ supports all the Julia in and out-of-memory formats for tabular data which additionally implement a common API specified in Tables.jl. The most familiar in-memory format for python and R users is the DataFrame
format from DataFrames.jl. DataFrame objects are read from CSV-like files using a separate package, CSV.jl, as in this example (add CSV to your environment if necessary):
using CSV
df = CSV.read("iris.csv")
df.sepal_length == df[1] # true
df[1:10, :] # first 10 rows
A very basic but up-do-date tutorial on using DataFrames is here.