From 7bbd7e84c136e67fb854ff3b0fe951222a718ff7 Mon Sep 17 00:00:00 2001 From: "Documenter.jl" Date: Wed, 3 Jul 2024 16:16:43 +0000 Subject: [PATCH] build based on 5ca73e4 --- dev/index.html | 4 +- dev/man/all_manual/index.html | 2 +- dev/man/cholesky/index.html | 8 ++-- dev/man/deep_learning/index.html | 10 ++--- dev/man/display/index.html | 4 +- dev/man/eval/index.html | 26 ++++++------ dev/man/find_path/index.html | 12 +++--- dev/man/input/index.html | 6 +-- dev/man/make_adjacency_matrix/index.html | 6 +-- dev/man/make_cue_matrix/index.html | 14 +++---- dev/man/make_semantic_matrix/index.html | 50 ++++++++++++------------ dev/man/make_yt_matrix/index.html | 4 +- dev/man/measures_func/index.html | 2 +- dev/man/output/index.html | 14 +++---- dev/man/pickle/index.html | 2 +- dev/man/preprocess/index.html | 2 +- dev/man/pyndl/index.html | 10 ++--- dev/man/test_combo/index.html | 2 +- dev/man/utils/index.html | 6 +-- dev/man/wh/index.html | 2 +- dev/search/index.html | 2 +- dev/search_index.js | 2 +- 22 files changed, 95 insertions(+), 95 deletions(-) diff --git a/dev/index.html b/dev/index.html index 864a826..e2d1402 100644 --- a/dev/index.html +++ b/dev/index.html @@ -1,5 +1,5 @@ -Home · JudiLing.jl

JudiLing

JudiLing: An implementation for Linear Discriminative Learning in Julia

Maintainer: Maria Heitmeier @MariaHei
Original codebase: Xuefeng Luo @MegamindHenry

Installation

You can install JudiLing by the follow commands:

using Pkg
+Home · JudiLing.jl

JudiLing

JudiLing: An implementation for Linear Discriminative Learning in Julia

Maintainer: Maria Heitmeier @MariaHei
Original codebase: Xuefeng Luo @MegamindHenry

Installation

You can install JudiLing by the follow commands:

using Pkg
 Pkg.add("JudiLing")

For brave adventurers, install test version of JudiLing by:

julia> Pkg.add(url="https://github.com/quantling/JudiLing.jl.git")

Or from the Julia REPL, type ] to enter the Pkg REPL mode and run

pkg> add https://github.com/quantling/JudiLing.jl.git

Running Julia with multiple threads

JudiLing supports the use of multiple threads. Simply start up Julia in your terminal as follows:

$ julia -t your_num_of_threads

For detailed information on using Julia with threads, see this link.

Include packages

Before we start, we first need to load the JudiLing package:

using JudiLing

Note: As of JudiLing 0.8.0, PyCall and Flux have become optional dependencies. This means that all code in JudiLing which requires calls to python is only available if PyCall is loaded first, like this:

using PyCall
 using JudiLing

Likewise, the code involving deep learning is only available if Julia's deep learning library Flux is loaded first, like this:

using Flux
 using JudiLing

Note that Flux and PyCall have to be installed separately, and the newest version of Flux requires at least Julia 1.9. If you want to run deep learning in a GPU, make sure to also install and import CUDA.

Running Julia with multiple threads

JudiLing supports the use of multiple threads. Simply start up Julia in your terminal as follows:

$ julia -t your_num_of_threads

For detailed information on using Julia with threads, see this link.

Quick start example

The Latin dataset latin.csv contains lexemes and inflectional features for 672 inflected Latin verb forms for 8 lexemes from 4 conjugation classes. Word forms are inflected for person, number, tense, voice and mood.

"","Word","Lexeme","Person","Number","Tense","Voice","Mood"
@@ -324,4 +324,4 @@
 @show acc_build_val

Output:

acc_learn_train = 0.9983
 acc_learn_val = 0.6866
 acc_build_train = 1.0
-acc_build_val = 0.3284

Alternatively, we have a wrapper function incorporating all above functionalities. With this function, you can quickly explore datasets with different parameter settings. Please find more in the Test Combo Introduction.

Supports

There are two types of supports in outputs. An utterance level and a set of supports for each cue. The former support is also called "synthesis-by-analysis" support. This support is calculated by predicted S vector and original S vector and it is used to select the best paths. Cue level supports are slices of Yt matrices from each timestep. Those supports are used to determine whether a cue is eligible for constructing paths.

Acknowledgments

This project was supported by the ERC advanced grant WIDE-742545 and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645.

Acknowledgments

This project was supported by the ERC advanced grant WIDE-742545 and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645.

Citation

If you find this package helpful, please cite it as follows:

Luo, X., Heitmeier, M., Chuang, Y. Y., Baayen, R. H. JudiLing: an implementation of the Discriminative Lexicon Model in Julia. Eberhard Karls Universität Tübingen, Seminar für Sprachwissenschaft.

The following studies have made use of several algorithms now implemented in JudiLing instead of WpmWithLdl:

  • Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.

  • Baayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270.

  • Chuang, Y.-Y., Lõo, K., Blevins, J. P., and Baayen, R. H. (2020). Estonian case inflection made simple. A case study in Word and Paradigm morphology with Linear Discriminative Learning. In Körtvélyessy, L., and Štekauer, P. (Eds.) Complex Words: Advances in Morphology, 1-19.

  • Chuang, Y-Y., Bell, M. J., Banke, I., and Baayen, R. H. (2020). Bilingual and multilingual mental lexicon: a modeling study with Linear Discriminative Learning. Language Learning, 1-55.

  • Heitmeier, M., Chuang, Y-Y., Baayen, R. H. (2021). Modeling morphology with Linear Discriminative Learning: considerations and design choices. Frontiers in Psychology, 12, 4929.

  • Denistia, K., and Baayen, R. H. (2022). The morphology of Indonesian: Data and quantitative modeling. In Shei, C., and Li, S. (Eds.) The Routledge Handbook of Asian Linguistics, (pp. 605-634). Routledge, London.

  • Heitmeier, M., Chuang, Y.-Y., and Baayen, R. H. (2023). How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. Cognitive Psychology, 1-30.

  • Chuang, Y. Y., Kang, M., Luo, X. F. and Baayen, R. H. (2023). Vector Space Morphology with Linear Discriminative Learning. In Crepaldi, D. (Ed.) Linguistic morphology in the mind and brain.

  • Heitmeier, M., Chuang, Y. Y., Axen, S. D., & Baayen, R. H. (2024). Frequency effects in linear discriminative learning. Frontiers in Human Neuroscience, 17, 1242720.

  • Plag, I., Heitmeier, M. & Domahs, F. (to appear). German nominal number interpretation in an impaired mental lexicon: A naive discriminative learning perspective. The Mental Lexicon.

+acc_build_val = 0.3284

Alternatively, we have a wrapper function incorporating all above functionalities. With this function, you can quickly explore datasets with different parameter settings. Please find more in the Test Combo Introduction.

Supports

There are two types of supports in outputs. An utterance level and a set of supports for each cue. The former support is also called "synthesis-by-analysis" support. This support is calculated by predicted S vector and original S vector and it is used to select the best paths. Cue level supports are slices of Yt matrices from each timestep. Those supports are used to determine whether a cue is eligible for constructing paths.

Acknowledgments

This project was supported by the ERC advanced grant WIDE-742545 and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645.

Citation

If you find this package helpful, please cite it as follows:

Luo, X., Heitmeier, M., Chuang, Y. Y., Baayen, R. H. JudiLing: an implementation of the Discriminative Lexicon Model in Julia. Eberhard Karls Universität Tübingen, Seminar für Sprachwissenschaft.

The following studies have made use of several algorithms now implemented in JudiLing instead of WpmWithLdl:

  • Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.

  • Baayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270.

  • Chuang, Y.-Y., Lõo, K., Blevins, J. P., and Baayen, R. H. (2020). Estonian case inflection made simple. A case study in Word and Paradigm morphology with Linear Discriminative Learning. In Körtvélyessy, L., and Štekauer, P. (Eds.) Complex Words: Advances in Morphology, 1-19.

  • Chuang, Y-Y., Bell, M. J., Banke, I., and Baayen, R. H. (2020). Bilingual and multilingual mental lexicon: a modeling study with Linear Discriminative Learning. Language Learning, 1-55.

  • Heitmeier, M., Chuang, Y-Y., Baayen, R. H. (2021). Modeling morphology with Linear Discriminative Learning: considerations and design choices. Frontiers in Psychology, 12, 4929.

  • Denistia, K., and Baayen, R. H. (2022). The morphology of Indonesian: Data and quantitative modeling. In Shei, C., and Li, S. (Eds.) The Routledge Handbook of Asian Linguistics, (pp. 605-634). Routledge, London.

  • Heitmeier, M., Chuang, Y.-Y., and Baayen, R. H. (2023). How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. Cognitive Psychology, 1-30.

  • Chuang, Y. Y., Kang, M., Luo, X. F. and Baayen, R. H. (2023). Vector Space Morphology with Linear Discriminative Learning. In Crepaldi, D. (Ed.) Linguistic morphology in the mind and brain.

  • Heitmeier, M., Chuang, Y. Y., Axen, S. D., & Baayen, R. H. (2024). Frequency effects in linear discriminative learning. Frontiers in Human Neuroscience, 17, 1242720.

  • Plag, I., Heitmeier, M. & Domahs, F. (to appear). German nominal number interpretation in an impaired mental lexicon: A naive discriminative learning perspective. The Mental Lexicon.

diff --git a/dev/man/all_manual/index.html b/dev/man/all_manual/index.html index 240c2da..1d2634c 100644 --- a/dev/man/all_manual/index.html +++ b/dev/man/all_manual/index.html @@ -1,2 +1,2 @@ -All Manual index · JudiLing.jl
+All Manual index · JudiLing.jl
diff --git a/dev/man/cholesky/index.html b/dev/man/cholesky/index.html index 0929e1f..e9dedc3 100644 --- a/dev/man/cholesky/index.html +++ b/dev/man/cholesky/index.html @@ -1,5 +1,5 @@ -Cholesky · JudiLing.jl

Cholesky

JudiLing.make_transform_facFunction

The first part of make transform matrix, usually used by the learn_paths function to save time and computing resources.

source
JudiLing.make_transform_matrixMethod
make_transform_matrix(fac::Union{LinearAlgebra.Cholesky, SuiteSparse.CHOLMOD.Factor}, X::Union{SparseMatrixCSC, Matrix}, Y::Union{SparseMatrixCSC, Matrix})

Second step in calculating the Cholesky decomposition for the transformation matrix.

source
JudiLing.make_transform_matrixMethod
make_transform_matrix(X::SparseMatrixCSC, Y::Matrix)

Use Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a dense matrix.

Obligatory Arguments

  • X::SparseMatrixCSC: the X matrix, where X is a sparse matrix
  • Y::Matrix: the Y matrix, where Y is a dense matrix

Optional Arguments

  • method::Symbol = :additive: whether :additive or :multiplicative decomposition is required
  • shift::Float64 = 0.02: shift value for :additive decomposition
  • multiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition
  • output_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse
  • verbose::Bool = false: if true, more information will be printed out

Examples

# additive mode
+Cholesky · JudiLing.jl

Cholesky

JudiLing.make_transform_facFunction

The first part of make transform matrix, usually used by the learn_paths function to save time and computing resources.

source
JudiLing.make_transform_matrixMethod
make_transform_matrix(fac::Union{LinearAlgebra.Cholesky, SuiteSparse.CHOLMOD.Factor}, X::Union{SparseMatrixCSC, Matrix}, Y::Union{SparseMatrixCSC, Matrix})

Second step in calculating the Cholesky decomposition for the transformation matrix.

source
JudiLing.make_transform_matrixMethod
make_transform_matrix(X::SparseMatrixCSC, Y::Matrix)

Use Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a dense matrix.

Obligatory Arguments

  • X::SparseMatrixCSC: the X matrix, where X is a sparse matrix
  • Y::Matrix: the Y matrix, where Y is a dense matrix

Optional Arguments

  • method::Symbol = :additive: whether :additive or :multiplicative decomposition is required
  • shift::Float64 = 0.02: shift value for :additive decomposition
  • multiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition
  • output_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse
  • verbose::Bool = false: if true, more information will be printed out

Examples

# additive mode
 JudiLing.make_transform_matrix(
     C,
     S,
@@ -20,7 +20,7 @@
   ...
     output_format = :auto,
     sparse_ratio = 0.05,
-  ...)
source
JudiLing.make_transform_matrixMethod
make_transform_matrix(X::Matrix, Y::Union{SparseMatrixCSC, Matrix})

Use the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a dense matrix and Y is either a dense matrix or a sparse matrix.

Obligatory Arguments

  • X::Matrix: the X matrix, where X is a dense matrix
  • Y::Union{SparseMatrixCSC, Matrix}: the Y matrix, where Y is either a sparse or a dense matrix

Optional Arguments

  • method::Symbol = :additive: whether :additive or :multiplicative decomposition is required
  • shift::Float64 = 0.02: shift value for :additive decomposition
  • multiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition
  • output_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse
  • verbose::Bool = false: if true, more information will be printed out

Examples

# additive mode
+  ...)
source
JudiLing.make_transform_matrixMethod
make_transform_matrix(X::Matrix, Y::Union{SparseMatrixCSC, Matrix})

Use the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a dense matrix and Y is either a dense matrix or a sparse matrix.

Obligatory Arguments

  • X::Matrix: the X matrix, where X is a dense matrix
  • Y::Union{SparseMatrixCSC, Matrix}: the Y matrix, where Y is either a sparse or a dense matrix

Optional Arguments

  • method::Symbol = :additive: whether :additive or :multiplicative decomposition is required
  • shift::Float64 = 0.02: shift value for :additive decomposition
  • multiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition
  • output_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse
  • verbose::Bool = false: if true, more information will be printed out

Examples

# additive mode
 JudiLing.make_transform_matrix(
     C,
     S,
@@ -41,7 +41,7 @@
     ...
     output_format = :auto,
     sparse_ratio = 0.05,
-    ...)
source
JudiLing.make_transform_matrixMethod
make_transform_matrix(X::SparseMatrixCSC, Y::SparseMatrixCSC)

Use the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a sparse matrix.

Obligatory Arguments

  • X::SparseMatrixCSC: the X matrix, where X is a sparse matrix
  • Y::SparseMatrixCSC: the Y matrix, where Y is a sparse matrix

Optional Arguments

  • method::Symbol = :additive: whether :additive or :multiplicative decomposition is required
  • shift::Float64 = 0.02: shift value for :additive decomposition
  • multiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition
  • output_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse
  • verbose::Bool = false: if true, more information will be printed out

Examples

# additive mode
+    ...)
source
JudiLing.make_transform_matrixMethod
make_transform_matrix(X::SparseMatrixCSC, Y::SparseMatrixCSC)

Use the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a sparse matrix.

Obligatory Arguments

  • X::SparseMatrixCSC: the X matrix, where X is a sparse matrix
  • Y::SparseMatrixCSC: the Y matrix, where Y is a sparse matrix

Optional Arguments

  • method::Symbol = :additive: whether :additive or :multiplicative decomposition is required
  • shift::Float64 = 0.02: shift value for :additive decomposition
  • multiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition
  • output_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse
  • verbose::Bool = false: if true, more information will be printed out

Examples

# additive mode
 JudiLing.make_transform_matrix(
     C,
     S,
@@ -62,4 +62,4 @@
     ...
     output_format = :auto,
     sparse_ratio = 0.05,
-    ...)
source
JudiLing.format_matrixFunction
format_matrix(M::Union{SparseMatrixCSC, Matrix}, output_format=:auto)

Convert output matrix format to either a dense matrix or a sparse matrix.

source
+ ...)
source
JudiLing.format_matrixFunction
format_matrix(M::Union{SparseMatrixCSC, Matrix}, output_format=:auto)

Convert output matrix format to either a dense matrix or a sparse matrix.

source
diff --git a/dev/man/deep_learning/index.html b/dev/man/deep_learning/index.html index ff22999..4e7c2f3 100644 --- a/dev/man/deep_learning/index.html +++ b/dev/man/deep_learning/index.html @@ -1,7 +1,7 @@ Deep learning · JudiLing.jl

Deep learning in JudiLing

JudiLing.predict_from_deep_modelMethod
predict_from_deep_model(model::Chain,
-                        X::Union{SparseMatrixCSC,Matrix})

Generates output of a model given input X.

Obligatory arguments

  • model::Chain: Model of type Flux.Chain, as generated by get_and_train_model
  • X::Union{SparseMatrixCSC,Matrix}: Input matrix of size (numberofsamples, inpdim) where inpdim is the input dimension of model
source
JudiLing.predict_shatMethod
predict_shat(model::Chain,
-             ci::Vector{Int})

Predicts semantic vector shat given a deep learning comprehension model model and a list of indices of ngrams ci.

Obligatory arguments

  • model::Chain: Deep learning comprehension model as generated by get_and_train_model
  • ci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.
source
JudiLing.get_and_train_modelMethod
get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},
+                        X::Union{SparseMatrixCSC,Matrix})

Generates output of a model given input X.

Obligatory arguments

  • model::Chain: Model of type Flux.Chain, as generated by get_and_train_model
  • X::Union{SparseMatrixCSC,Matrix}: Input matrix of size (numberofsamples, inpdim) where inpdim is the input dimension of model
source
JudiLing.predict_shatMethod
predict_shat(model::Chain,
+             ci::Vector{Int})

Predicts semantic vector shat given a deep learning comprehension model model and a list of indices of ngrams ci.

Obligatory arguments

  • model::Chain: Deep learning comprehension model as generated by get_and_train_model
  • ci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.
source
JudiLing.get_and_train_modelMethod
get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},
                     Y_train::Union{SparseMatrixCSC,Matrix},
                     X_val::Union{SparseMatrixCSC,Matrix,Missing},
                     Y_val::Union{SparseMatrixCSC,Matrix,Missing},
@@ -24,7 +24,7 @@
                     ...kargs
                     )

Trains a deep learning model from X_train to Y_train, saving the model with either the highest validation accuracy or lowest validation loss (depending on optimise_for_acc) to outpath.

The default model looks like this:

inp_dim = size(X_train, 2)
 out_dim = size(Y_train, 2)
-Chain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))

Any other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.

By default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.

Returns a named tuple with the following values:

  • model: the trained model
  • data_train: the training data, including any measures if computed by measures_func
  • data_val: the validation data, including any measures if computed by measures_func
  • losses_train: The losses of the training data for each epoch.
  • losses_val: The losses of the validation data after each epoch.
  • accs_train: The accuracies of the training data after each epoch, if return_train_acc=true.
  • accs_val: The accuracies of the validation data after each epoch.

Obligatory arguments

  • X_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n
  • Y_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k
  • X_train::Union{SparseMatrixCSC,Matrix}: validation input matrix of dimension l x n
  • Y_train::Union{SparseMatrixCSC,Matrix}: validation output/target matrix of dimension l x k
  • data_train::DataFrame: training data
  • data_val::DataFrame: validation data
  • target_col::Union{Symbol, String}: column with target wordforms in datatrain and dataval
  • model_outpath::String: filepath to where final model should be stored (in .bson format)

Optional arguments

  • hidden_dim::Int=1000: hidden dimension of the model
  • n_epochs::Int=100: number of epochs for which the model should be trained
  • batchsize::Int=64: batchsize during training
  • loss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!
  • optimizer=Flux.Adam(0.001): optimizer to use for training
  • model::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data
  • early_stopping::Union{Missing, Int}=missing: If missing, no early stopping is used. Otherwise early_stopping indicates how many epochs have to pass without improvement in validation accuracy before the training is stopped.
  • optimise_for_acc::Bool=false: if true, keep model with highest validation accuracy. If false, keep model with lowest validation loss.
  • return_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned
  • verbose::Bool=true: Turn on verbose mode
  • measures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument. If a measure is tagged for each epoch, the one tagged with "final" will be the one for the finally returned model.
  • return_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.
  • ...kargs: any additional keyword arguments are passed to the measures_func
source
JudiLing.get_and_train_modelMethod
get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},
+Chain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))

Any other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.

By default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.

Returns a named tuple with the following values:

  • model: the trained model
  • data_train: the training data, including any measures if computed by measures_func
  • data_val: the validation data, including any measures if computed by measures_func
  • losses_train: The losses of the training data for each epoch.
  • losses_val: The losses of the validation data after each epoch.
  • accs_train: The accuracies of the training data after each epoch, if return_train_acc=true.
  • accs_val: The accuracies of the validation data after each epoch.

Obligatory arguments

  • X_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n
  • Y_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k
  • X_train::Union{SparseMatrixCSC,Matrix}: validation input matrix of dimension l x n
  • Y_train::Union{SparseMatrixCSC,Matrix}: validation output/target matrix of dimension l x k
  • data_train::DataFrame: training data
  • data_val::DataFrame: validation data
  • target_col::Union{Symbol, String}: column with target wordforms in datatrain and dataval
  • model_outpath::String: filepath to where final model should be stored (in .bson format)

Optional arguments

  • hidden_dim::Int=1000: hidden dimension of the model
  • n_epochs::Int=100: number of epochs for which the model should be trained
  • batchsize::Int=64: batchsize during training
  • loss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!
  • optimizer=Flux.Adam(0.001): optimizer to use for training
  • model::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data
  • early_stopping::Union{Missing, Int}=missing: If missing, no early stopping is used. Otherwise early_stopping indicates how many epochs have to pass without improvement in validation accuracy before the training is stopped.
  • optimise_for_acc::Bool=false: if true, keep model with highest validation accuracy. If false, keep model with lowest validation loss.
  • return_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned
  • verbose::Bool=true: Turn on verbose mode
  • measures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument. If a measure is tagged for each epoch, the one tagged with "final" will be the one for the finally returned model.
  • return_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.
  • ...kargs: any additional keyword arguments are passed to the measures_func
source
JudiLing.get_and_train_modelMethod
get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},
                     Y_train::Union{SparseMatrixCSC,Matrix},
                     model_outpath::String;
                     data_train::Union{Missing, DataFrame}=missing,
@@ -41,7 +41,7 @@
                     return_train_acc::Bool=false,
                     ...kargs)

Trains a deep learning model from X_train to Y_train, saving the model after n_epochs epochs. The default model looks like this:

inp_dim = size(X_train, 2)
 out_dim = size(Y_train, 2)
-Chain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))

Any other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.

By default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.

Returns a named tuple with the following values:

  • model: the trained model
  • data_train: the data, including any measures if computed by measures_func
  • data_val: missing for this function
  • losses_train: The losses of the training data for each epoch.
  • losses_val: missing for this function
  • accs_train: The accuracies of the training data after each epoch, if return_train_acc=true.
  • accs_val: missing for this function

Obligatory arguments

  • X_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n
  • Y_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k
  • model_outpath::String: filepath to where final model should be stored (in .bson format)

Optional arguments

  • data_train::Union{Missing, DataFrame}=missing: The training data. Only necessary if a measuresfunc is included or returntrain_acc=true.
  • target_col::Union{Missing, Symbol, String}=missing: The column with target word forms in the training data. Only necessary if a measuresfunc is included or returntrain_acc=true.
  • hidden_dim::Int=1000: hidden dimension of the model
  • n_epochs::Int=100: number of epochs for which the model should be trained
  • batchsize::Int=64: batchsize during training
  • loss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!
  • optimizer=Flux.Adam(0.001): optimizer to use for training
  • model::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data
  • return_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned
  • verbose::Bool=true: Turn on verbose mode
  • measures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument.
  • return_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.
  • ...kargs: any additional keyword arguments are passed to the measures_func
source
JudiLing.fiddlMethod
fiddl(X_train::Union{SparseMatrixCSC,Matrix},
+Chain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))

Any other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.

By default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.

Returns a named tuple with the following values:

  • model: the trained model
  • data_train: the data, including any measures if computed by measures_func
  • data_val: missing for this function
  • losses_train: The losses of the training data for each epoch.
  • losses_val: missing for this function
  • accs_train: The accuracies of the training data after each epoch, if return_train_acc=true.
  • accs_val: missing for this function

Obligatory arguments

  • X_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n
  • Y_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k
  • model_outpath::String: filepath to where final model should be stored (in .bson format)

Optional arguments

  • data_train::Union{Missing, DataFrame}=missing: The training data. Only necessary if a measuresfunc is included or returntrain_acc=true.
  • target_col::Union{Missing, Symbol, String}=missing: The column with target word forms in the training data. Only necessary if a measuresfunc is included or returntrain_acc=true.
  • hidden_dim::Int=1000: hidden dimension of the model
  • n_epochs::Int=100: number of epochs for which the model should be trained
  • batchsize::Int=64: batchsize during training
  • loss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!
  • optimizer=Flux.Adam(0.001): optimizer to use for training
  • model::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data
  • return_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned
  • verbose::Bool=true: Turn on verbose mode
  • measures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument.
  • return_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.
  • ...kargs: any additional keyword arguments are passed to the measures_func
source
JudiLing.fiddlMethod
fiddl(X_train::Union{SparseMatrixCSC,Matrix},
         Y_train::Union{SparseMatrixCSC,Matrix},
         learn_seq::Vector,
         data::DataFrame,
@@ -57,4 +57,4 @@
         n_batch_eval::Int=100,
         compute_accuracy::Bool=true,
         measures_func::Union{Function, Missing}=missing,
-        kargs...)

Trains a deep learning model using the FIDDL method (frequency-informed deep discriminative learning). Optionally, after each n_batch_eval batches measures_func can be run to compute any measures which are then added to the data.

Note

If you get an OutOfMemory error, chances are that this is due to the eval_SC function being evaluated after each n_batch_eval batches. Setting compute_accuracy=false disables computing the mapping accuracy.

Returns a named tuple with the following values:

  • model: the trained model
  • data: the data, including any measures if computed by measures_func
  • losses_train: The losses of the data the model is trained on within each n_batch_eval batches.
  • losses: The losses of the full dataset after each n_batch_eval batches.
  • accs: The accuracies of the full dataset after each n_batch_eval batches.

Obligatory arguments

  • X_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n
  • Y_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k
  • learn_seq::Vector: List of indices in the order that the vectors in Xtrain and Ytrain should be presented to the model for training.
  • data::DataFrame: The full data.
  • target_col::Union{Symbol, String}: The column with target word forms in the data.
  • model_outpath::String: filepath to where final model should be stored (in .bson format)

Optional arguments

  • hidden_dim::Int=1000: hidden dimension of the model
  • n_epochs::Int=100: number of epochs for which the model should be trained
  • batchsize::Int=64: batchsize during training
  • loss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!
  • optimizer=Flux.Adam(0.001): optimizer to use for training
  • model::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data
  • return_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned
  • verbose::Bool=true: Turn on verbose mode
  • n_batch_eval::Int=100: Loss, accuracy and measures_func are evaluated every n_batch_eval batches.
  • compute_accuracy::Bool=true: Whether accuracy should be computed every n_batch_eval batches.
  • measures_func::Union{Missing, Function}=missing: A measures function which is run each n_batch_eval batches. For more information see The measures_func argument.
source
+ kargs...)

Trains a deep learning model using the FIDDL method (frequency-informed deep discriminative learning). Optionally, after each n_batch_eval batches measures_func can be run to compute any measures which are then added to the data.

Note

If you get an OutOfMemory error, chances are that this is due to the eval_SC function being evaluated after each n_batch_eval batches. Setting compute_accuracy=false disables computing the mapping accuracy.

Returns a named tuple with the following values:

Obligatory arguments

Optional arguments

source diff --git a/dev/man/display/index.html b/dev/man/display/index.html index ad2af79..ef48501 100644 --- a/dev/man/display/index.html +++ b/dev/man/display/index.html @@ -1,5 +1,5 @@ -Display · JudiLing.jl

Cholesky

JudiLing.display_matrixMethod
display_matrix(data, target_col, cue_pS_obj, M, M_type)

Display matrix with rownames and colnames.

Obligatory Arguments

  • data::DataFrame: the dataset
  • target_col::Union{String, Symbol}: the target column name
  • cue_pS_obj::Union{Cue_Matrix_Struct,PS_Matrix_Struct}: the cue matrix or pS matrix structure
  • M::Union{SparseMatrixCSC, Matrix}: the matrix
  • M_type::Union{String, Symbol}: the type of the matrix, currently support :C, :S, :F, :G, :Chat, :Shat, :A, :R and :pS

Optional Arguments

  • nrow::Int64 = 6: the number of rows to display
  • ncol::Int64 = 6: the number of columns to display
  • return_matrix::Bool = false: whether the created dataframe should be returned (and not only displayed)

Examples

JudiLing.display_matrix(latin, :Word, cue_obj, cue_obj.C, :C)
+Display · JudiLing.jl

Cholesky

JudiLing.display_matrixMethod
display_matrix(data, target_col, cue_pS_obj, M, M_type)

Display matrix with rownames and colnames.

Obligatory Arguments

  • data::DataFrame: the dataset
  • target_col::Union{String, Symbol}: the target column name
  • cue_pS_obj::Union{Cue_Matrix_Struct,PS_Matrix_Struct}: the cue matrix or pS matrix structure
  • M::Union{SparseMatrixCSC, Matrix}: the matrix
  • M_type::Union{String, Symbol}: the type of the matrix, currently support :C, :S, :F, :G, :Chat, :Shat, :A, :R and :pS

Optional Arguments

  • nrow::Int64 = 6: the number of rows to display
  • ncol::Int64 = 6: the number of columns to display
  • return_matrix::Bool = false: whether the created dataframe should be returned (and not only displayed)

Examples

JudiLing.display_matrix(latin, :Word, cue_obj, cue_obj.C, :C)
 JudiLing.display_matrix(latin, :Word, cue_obj, S, :S)
 JudiLing.display_matrix(latin, :Word, cue_obj, G, :G)
 JudiLing.display_matrix(latin, :Word, cue_obj, Chat, :Chat)
@@ -7,4 +7,4 @@
 JudiLing.display_matrix(latin, :Word, cue_obj, Shat, :Shat)
 JudiLing.display_matrix(latin, :Word, cue_obj, A, :A)
 JudiLing.display_matrix(latin, :Word, cue_obj, R, :R)
-JudiLing.display_matrix(latin, :Word, pS_obj, pS_obj.pS, :pS)
source
+JudiLing.display_matrix(latin, :Word, pS_obj, pS_obj.pS, :pS)
source
diff --git a/dev/man/eval/index.html b/dev/man/eval/index.html index 7aa8daf..05ce3af 100644 --- a/dev/man/eval/index.html +++ b/dev/man/eval/index.html @@ -1,12 +1,12 @@ -Evaluation · JudiLing.jl

Evaluation

JudiLing.eval_SCFunction

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. Homophones support option is implemented.

source
JudiLing.eval_SC_looseFunction

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Count it as correct if one of the top k candidates is correct. Homophones support option is implemented.

source
JudiLing.accuracy_comprehensionMethod
accuracy_comprehension(S, Shat, data)

Evaluate comprehension accuracy for training data.

Note

In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.

Obligatory Arguments

  • S::Matrix: the (gold standard) S matrix
  • Shat::Matrix: the (predicted) Shat matrix
  • data::DataFrame: the dataset

Optional Arguments

  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • base::Vector=nothing: base features (typically a lexeme)
  • inflections::Union{Nothing, Vector}=nothing: other features (typically in inflectional features)

Examples

accuracy_comprehension(
+Evaluation · JudiLing.jl

Evaluation

JudiLing.eval_SCFunction

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. Homophones support option is implemented.

source
JudiLing.eval_SC_looseFunction

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Count it as correct if one of the top k candidates is correct. Homophones support option is implemented.

source
JudiLing.accuracy_comprehensionMethod
accuracy_comprehension(S, Shat, data)

Evaluate comprehension accuracy for training data.

Note

In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.

Obligatory Arguments

  • S::Matrix: the (gold standard) S matrix
  • Shat::Matrix: the (predicted) Shat matrix
  • data::DataFrame: the dataset

Optional Arguments

  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • base::Vector=nothing: base features (typically a lexeme)
  • inflections::Union{Nothing, Vector}=nothing: other features (typically in inflectional features)

Examples

accuracy_comprehension(
     S_train,
     Shat_train,
     latin_val,
     target_col=:Words,
     base=[:Lexeme],
     inflections=[:Person, :Number, :Tense, :Voice, :Mood]
-    )

Note

In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform "Äpfel" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which "Äpfel" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform "Äpfel" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form "Äpfel" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that "case" was comprehended incorrectly.

source
JudiLing.accuracy_comprehensionMethod
accuracy_comprehension(
+    )

Note

In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform "Äpfel" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which "Äpfel" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform "Äpfel" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form "Äpfel" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that "case" was comprehended incorrectly.

source
JudiLing.accuracy_comprehensionMethod
accuracy_comprehension(
     S_val,
     S_train,
     Shat_val,
@@ -24,27 +24,27 @@
     target_col=:Words,
     base=[:Lexeme],
     inflections=[:Person, :Number, :Tense, :Voice, :Mood]
-    )

Note

In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform "Äpfel" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which "Äpfel" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform "Äpfel" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form "Äpfel" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that "case" was comprehended incorrectly.

source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C)
+    )

Note

In case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform "Äpfel" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which "Äpfel" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform "Äpfel" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form "Äpfel" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that "case" was comprehended incorrectly.

source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C)
 eval_SC(Chat_val, cue_obj_val.C)
 eval_SC(Shat_train, S_train)
-eval_SC(Shat_val, S_val)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Note

The order is important. The fist gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val) or eval_SC(Shat_val, S_val, S_train)

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix
  • SC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C)
+eval_SC(Shat_val, S_val)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Note

The order is important. The fist gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val) or eval_SC(Shat_val, S_val, S_train)

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix
  • SC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C)
 eval_SC(Chat_val, cue_obj_val.C, cue_obj_train.C)
 eval_SC(Shat_train, S_train, S_val)
-eval_SC(Shat_val, S_val, S_train)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol})

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Support for homophones.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix
  • data::DataFrame: datasets
  • target_col::Union{String, Symbol}: target column name

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C, latin, :Word)
+eval_SC(Shat_val, S_val, S_train)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol})

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Support for homophones.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix
  • data::DataFrame: datasets
  • target_col::Union{String, Symbol}: target column name

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C, latin, :Word)
 eval_SC(Chat_val, cue_obj_val.C, latin, :Word)
 eval_SC(Shat_train, S_train, latin, :Word)
-eval_SC(Shat_val, S_val, latin, :Word)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray, data::DataFrame, data_rest::DataFrame, target_col::Union{String, Symbol})

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Note

The order is important. The first gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val, latin, :Word) or eval_SC(Shat_val, S_val, S_train, latin, :Word)

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix
  • SC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix
  • data::DataFrame: the training/validation datasets
  • data_rest::DataFrame: the validation/training datasets
  • target_col::Union{String, Symbol}: target column name

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C, latin, :Word)
+eval_SC(Shat_val, S_val, latin, :Word)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray, data::DataFrame, data_rest::DataFrame, target_col::Union{String, Symbol})

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.

If freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.

Note

The order is important. The first gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val, latin, :Word) or eval_SC(Shat_val, S_val, S_train, latin, :Word)

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix
  • SC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix
  • data::DataFrame: the training/validation datasets
  • data_rest::DataFrame: the validation/training datasets
  • target_col::Union{String, Symbol}: target column name

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • R::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return
  • freq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C, latin, :Word)
 eval_SC(Chat_val, cue_obj_val.C, cue_obj_train.C, latin, :Word)
 eval_SC(Shat_train, S_train, S_val, latin, :Word)
-eval_SC(Shat_val, S_val, S_train, latin, :Word)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, batch_size::Int64)

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks.

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Note

Currently only available for correlation.

Obligatory Arguments

  • SChat: the Chat or Shat matrix
  • SC: the C or S matrix
  • data: datasets
  • target_col: target column name
  • batch_size: batch size

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed
eval_SC(Chat_train, cue_obj_train.C, latin, :Word)
+eval_SC(Shat_val, S_val, S_train, latin, :Word)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, batch_size::Int64)

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks.

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Note

Currently only available for correlation.

Obligatory Arguments

  • SChat: the Chat or Shat matrix
  • SC: the C or S matrix
  • data: datasets
  • target_col: target column name
  • batch_size: batch size

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed
eval_SC(Chat_train, cue_obj_train.C, latin, :Word)
 eval_SC(Chat_val, cue_obj_val.C, latin, :Word)
 eval_SC(Shat_train, S_train, latin, :Word)
-eval_SC(Shat_val, S_val, latin, :Word)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol}, batch_size::Int64)

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks. Support homophones.

Note

Currently only available for correlation.

Obligatory Arguments

  • SChat::AbstractArray: the Chat or Shat matrix
  • SC::AbstractArray: the C or S matrix
  • data::DataFrame: datasets
  • target_col::Union{String, Symbol}: target column name
  • batch_size::Int64: batch size

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed
eval_SC(Chat_train, cue_obj_train.C, latin, :Word, 5000)
+eval_SC(Shat_val, S_val, latin, :Word)
source
JudiLing.eval_SCMethod
eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol}, batch_size::Int64)

Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks. Support homophones.

Note

Currently only available for correlation.

Obligatory Arguments

  • SChat::AbstractArray: the Chat or Shat matrix
  • SC::AbstractArray: the C or S matrix
  • data::DataFrame: datasets
  • target_col::Union{String, Symbol}: target column name
  • batch_size::Int64: batch size

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed
eval_SC(Chat_train, cue_obj_train.C, latin, :Word, 5000)
 eval_SC(Chat_val, cue_obj_val.C, latin, :Word, 5000)
 eval_SC(Shat_train, S_train, latin, :Word, 5000)
-eval_SC(Shat_val, S_val, latin, :Word, 5000)
source
JudiLing.eval_SC_looseMethod
eval_SC_loose(SChat, SC, k)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct.

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and it is not guaranteed that the target on the diagonal will be among the k neighbours. In particular, eval_SC and eval_SC_loose with k=1 are not guaranteed to give the same result. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix
  • k: top k candidates

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC_loose(Chat, cue_obj.C, k)
-eval_SC_loose(Shat, S, k)
source
JudiLing.eval_SC_looseMethod
eval_SC_loose(SChat, SC, k, data, target_col)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct. Support for homophones.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix
  • k: top k candidates
  • data: datasets
  • target_col: target column name

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC_loose(Chat, cue_obj.C, k, latin, :Word)
-eval_SC_loose(Shat, S, k, latin, :Word)
source
JudiLing.eval_manualMethod
eval_manual(res, data, i2f)

Create extensive reports for the outputs from build_paths and learn_paths.

source
JudiLing.eval_accMethod
eval_acc(res, gold_inds::Array)

Evaluate the accuracy of the results from learn_paths or build_paths.

Obligatory Arguments

  • res::Array: the results from learn_paths or build_paths
  • gold_inds::Array: the gold paths' indices

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed

Examples

# evaluation on training data
+eval_SC(Shat_val, S_val, latin, :Word, 5000)
source
JudiLing.eval_SC_looseMethod
eval_SC_loose(SChat, SC, k)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct.

Note

If there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and it is not guaranteed that the target on the diagonal will be among the k neighbours. In particular, eval_SC and eval_SC_loose with k=1 are not guaranteed to give the same result. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix
  • k: top k candidates

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC_loose(Chat, cue_obj.C, k)
+eval_SC_loose(Shat, S, k)
source
JudiLing.eval_SC_looseMethod
eval_SC_loose(SChat, SC, k, data, target_col)

Assess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct. Support for homophones.

Obligatory Arguments

  • SChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix
  • SC::Union{SparseMatrixCSC, Matrix}: the C or S matrix
  • k: top k candidates
  • data: datasets
  • target_col: target column name

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • method::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.
eval_SC_loose(Chat, cue_obj.C, k, latin, :Word)
+eval_SC_loose(Shat, S, k, latin, :Word)
source
JudiLing.eval_manualMethod
eval_manual(res, data, i2f)

Create extensive reports for the outputs from build_paths and learn_paths.

source
JudiLing.eval_accMethod
eval_acc(res, gold_inds::Array)

Evaluate the accuracy of the results from learn_paths or build_paths.

Obligatory Arguments

  • res::Array: the results from learn_paths or build_paths
  • gold_inds::Array: the gold paths' indices

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed

Examples

# evaluation on training data
 acc_train = JudiLing.eval_acc(
     res_train,
     cue_obj_train.gold_ind,
@@ -56,7 +56,7 @@
     res_val,
     cue_obj_val.gold_ind,
     verbose=false
-)
source
JudiLing.eval_accMethod
eval_acc(res, cue_obj::Cue_Matrix_Struct)

Evaluate the accuracy of the results from learn_paths or build_paths.

Obligatory Arguments

  • res::Array: the results from learn_paths or build_paths
  • cue_obj::Cue_Matrix_Struct: the C matrix object

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed

Examples

acc = JudiLing.eval_acc(res, cue_obj)
source
JudiLing.eval_acc_looseMethod
eval_acc_loose(res, gold_inds)

Lenient evaluation of the accuracy of the results from learn_paths or build_paths, counting a prediction as correct when the correlation of the predicted and gold standard semantic vectors is among the n top correlations, where n is equal to max_can in the 'learnpaths' or `buildpaths` function.

Obligatory Arguments

  • res::Array: the results from learn_paths or build_paths
  • gold_inds::Array: the gold paths' indices

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed

Examples

# evaluation on training data
+)
source
JudiLing.eval_accMethod
eval_acc(res, cue_obj::Cue_Matrix_Struct)

Evaluate the accuracy of the results from learn_paths or build_paths.

Obligatory Arguments

  • res::Array: the results from learn_paths or build_paths
  • cue_obj::Cue_Matrix_Struct: the C matrix object

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed

Examples

acc = JudiLing.eval_acc(res, cue_obj)
source
JudiLing.eval_acc_looseMethod
eval_acc_loose(res, gold_inds)

Lenient evaluation of the accuracy of the results from learn_paths or build_paths, counting a prediction as correct when the correlation of the predicted and gold standard semantic vectors is among the n top correlations, where n is equal to max_can in the 'learnpaths' or `buildpaths` function.

Obligatory Arguments

  • res::Array: the results from learn_paths or build_paths
  • gold_inds::Array: the gold paths' indices

Optional Arguments

  • digits: the specified number of digits after the decimal place (or before if negative)
  • verbose::Bool=false: if true, more information is printed

Examples

# evaluation on training data
 acc_train_loose = JudiLing.eval_acc_loose(
     res_train,
     cue_obj_train.gold_ind,
@@ -68,4 +68,4 @@
     res_val,
     cue_obj_val.gold_ind,
     verbose=false
-)
source
JudiLing.extract_gpiFunction

extract_gpi(gpi, threshold=0.1, tolerance=(-1000.0))

Extract, using gold paths' information, how many n-grams for a gold path are below the threshold but above the tolerance.

source
+)
source
JudiLing.extract_gpiFunction

extract_gpi(gpi, threshold=0.1, tolerance=(-1000.0))

Extract, using gold paths' information, how many n-grams for a gold path are below the threshold but above the tolerance.

source
diff --git a/dev/man/find_path/index.html b/dev/man/find_path/index.html index 709e836..ffc0ea4 100644 --- a/dev/man/find_path/index.html +++ b/dev/man/find_path/index.html @@ -1,5 +1,5 @@ -Find Paths · JudiLing.jl

Find Paths

Structures

JudiLing.Gold_Path_Info_StructType

Store gold paths' information including indices and indices' support and total support. It can be used to evaluate how low the threshold needs to be set in order to find most of the correct paths or if set very low, all of the correct paths.

source

Build paths

JudiLing.build_pathsFunction

The build_paths function constructs paths by only considering those n-grams that are close to the target. It first takes the predicted c-hat vector and finds the closest n neighbors in the C matrix. Then it selects all n-grams of these neighbors, and constructs all valid paths with those n-grams. The path producing the best correlation with the target semantic vector (through synthesis by analysis) is selected.

source
JudiLing.build_pathsMethod
build_paths(
+Find Paths · JudiLing.jl

Find Paths

Structures

JudiLing.Gold_Path_Info_StructType

Store gold paths' information including indices and indices' support and total support. It can be used to evaluate how low the threshold needs to be set in order to find most of the correct paths or if set very low, all of the correct paths.

source

Build paths

JudiLing.build_pathsFunction

The build_paths function constructs paths by only considering those n-grams that are close to the target. It first takes the predicted c-hat vector and finds the closest n neighbors in the C matrix. Then it selects all n-grams of these neighbors, and constructs all valid paths with those n-grams. The path producing the best correlation with the target semantic vector (through synthesis by analysis) is selected.

source
JudiLing.build_pathsMethod
build_paths(
     data_val,
     C_train,
     S_val,
@@ -66,7 +66,7 @@
     pca_eval_M=Fo,
     n_neighbors=3,
     verbose=true
-    )
source

Learn paths

JudiLing.learn_pathsFunction

A sequence finding algorithm using discrimination learning to predict, for a given word, which n-grams are best supported for a given position in the sequence of n-grams.

source

Learn paths

JudiLing.learn_pathsFunction

A sequence finding algorithm using discrimination learning to predict, for a given word, which n-grams are best supported for a given position in the sequence of n-grams.

source
JudiLing.learn_pathsMethod
learn_paths(
     data::DataFrame,
     cue_obj::Cue_Matrix_Struct,
     S_val::Union{SparseMatrixCSC, Matrix},
@@ -80,7 +80,7 @@
     max_tolerance::Int = 3,
     activation::Union{Nothing, Function} = nothing,
     ignore_nan::Bool = true,
-    verbose::Bool = true)

A high-level wrapper function for learn_paths with much less control. It aims for users who is very new to JudiLing and learn_paths function.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • cue_obj::Cue_Matrix_Struct: the C matrix object containing all information with C
  • S_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset
  • F_train::Union{SparseMatrixCSC, Matrix, Chain}: either the F matrix for training dataset, or a deep learning comprehension model trained on the training set
  • Chat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset

Optional Arguments

  • Shat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset
  • check_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value
  • threshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration
  • is_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path
  • tolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path
  • max_tolerance::Int64=4: maximum number of n-grams allowed in a path
  • activation::Function=nothing: the activation function you want to pass
  • ignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value
  • verbose::Bool=false: if true, more information is printed

Examples

res = learn_paths(latin, cue_obj, S, F, Chat)
source
JudiLing.learn_pathsMethod
learn_paths(
+    verbose::Bool = true)

A high-level wrapper function for learn_paths with much less control. It aims for users who is very new to JudiLing and learn_paths function.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • cue_obj::Cue_Matrix_Struct: the C matrix object containing all information with C
  • S_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset
  • F_train::Union{SparseMatrixCSC, Matrix, Chain}: either the F matrix for training dataset, or a deep learning comprehension model trained on the training set
  • Chat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset

Optional Arguments

  • Shat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset
  • check_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value
  • threshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration
  • is_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path
  • tolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path
  • max_tolerance::Int64=4: maximum number of n-grams allowed in a path
  • activation::Function=nothing: the activation function you want to pass
  • ignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value
  • verbose::Bool=false: if true, more information is printed

Examples

res = learn_paths(latin, cue_obj, S, F, Chat)
source
JudiLing.learn_pathsMethod
learn_paths(
     data_train::DataFrame,
     data_val::DataFrame,
     C_train::Union{Matrix, SparseMatrixCSC},
@@ -227,7 +227,7 @@
 if_pca=true,
 pca_eval_M=Fo,
 verbose=true);
-
source
JudiLing.learn_paths_rpiMethod
learn_paths_rpi(
     data_train::DataFrame,
     data_val::DataFrame,
     C_train::Union{Matrix, SparseMatrixCSC},
@@ -260,5 +260,5 @@
     ignore_nan::Bool = true,
     check_threshold_stat::Bool = false,
     verbose::Bool = false
-)

Calculate learn_paths with results indices supports as well.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • C_train::Union{SparseMatrixCSC, Matrix}: the C matrix for training dataset
  • S_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset
  • F_train::Union{SparseMatrixCSC, Matrix, Chain}: the F matrix for training dataset, or a deep learning comprehension model trained on the training data
  • Chat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset
  • A::SparseMatrixCSC: the adjacency matrix
  • i2f::Dict: the dictionary returning features given indices
  • f2i::Dict: the dictionary returning indices given features

Optional Arguments

  • gold_ind::Union{Nothing, Vector}=nothing: gold paths' indices
  • Shat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset
  • check_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value
  • max_t::Int64=15: maximum timestep
  • max_can::Int64=10: maximum number of candidates to consider
  • threshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration
  • is_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path
  • tolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path
  • max_tolerance::Int64=4: maximum number of n-grams allowed in a path
  • grams::Int64=3: the number n of grams that make up an n-gram
  • tokenized::Bool=false: if true, the dataset target is tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • keep_sep::Bool=false:if true, keep separators in cues
  • target_col::Union{String, :Symbol}=:Words: the column name for target strings
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • issparse::Union{Symbol, Bool}=:auto: control of whether output of Mt matrix is a dense matrix or a sparse matrix
  • sparse_ratio::Float64=0.05: the ratio to decide whether a matrix is sparse
  • if_pca::Bool=false: turn on to enable pca mode
  • pca_eval_M::Matrix=nothing: pass original F for pca mode
  • activation::Function=nothing: the activation function you want to pass
  • ignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value
  • check_threshold_stat::Bool=false: if true, return a threshold and torlerance proportion for each timestep
  • verbose::Bool=false: if true, more information is printed
source

Utility functions

JudiLing.eval_canMethod
eval_can(candidates, S, F::Union{Matrix,SparseMatrixCSC, Chain}, i2f, max_can, if_pca, pca_eval_M)

Calculate for each candidate path the correlation between predicted semantic vector and the gold standard semantic vector, and select as target for production the path with the highest correlation.

source
JudiLing.predict_shatMethod
predict_shat(F::Union{Matrix, SparseMatrixCSC},
-             ci::Vector{Int})

Predicts semantic vector shat given a comprehension matrix F and a list of indices of ngrams ci.

Obligatory arguments

  • F::Union{Matrix, SparseMatrixCSC}: Comprehension matrix F.
  • ci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.
source
+)

Calculate learn_paths with results indices supports as well.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • C_train::Union{SparseMatrixCSC, Matrix}: the C matrix for training dataset
  • S_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset
  • F_train::Union{SparseMatrixCSC, Matrix, Chain}: the F matrix for training dataset, or a deep learning comprehension model trained on the training data
  • Chat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset
  • A::SparseMatrixCSC: the adjacency matrix
  • i2f::Dict: the dictionary returning features given indices
  • f2i::Dict: the dictionary returning indices given features

Optional Arguments

  • gold_ind::Union{Nothing, Vector}=nothing: gold paths' indices
  • Shat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset
  • check_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value
  • max_t::Int64=15: maximum timestep
  • max_can::Int64=10: maximum number of candidates to consider
  • threshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration
  • is_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path
  • tolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path
  • max_tolerance::Int64=4: maximum number of n-grams allowed in a path
  • grams::Int64=3: the number n of grams that make up an n-gram
  • tokenized::Bool=false: if true, the dataset target is tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • keep_sep::Bool=false:if true, keep separators in cues
  • target_col::Union{String, :Symbol}=:Words: the column name for target strings
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • issparse::Union{Symbol, Bool}=:auto: control of whether output of Mt matrix is a dense matrix or a sparse matrix
  • sparse_ratio::Float64=0.05: the ratio to decide whether a matrix is sparse
  • if_pca::Bool=false: turn on to enable pca mode
  • pca_eval_M::Matrix=nothing: pass original F for pca mode
  • activation::Function=nothing: the activation function you want to pass
  • ignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value
  • check_threshold_stat::Bool=false: if true, return a threshold and torlerance proportion for each timestep
  • verbose::Bool=false: if true, more information is printed
source

Utility functions

JudiLing.eval_canMethod
eval_can(candidates, S, F::Union{Matrix,SparseMatrixCSC, Chain}, i2f, max_can, if_pca, pca_eval_M)

Calculate for each candidate path the correlation between predicted semantic vector and the gold standard semantic vector, and select as target for production the path with the highest correlation.

source
JudiLing.predict_shatMethod
predict_shat(F::Union{Matrix, SparseMatrixCSC},
+             ci::Vector{Int})

Predicts semantic vector shat given a comprehension matrix F and a list of indices of ngrams ci.

Obligatory arguments

  • F::Union{Matrix, SparseMatrixCSC}: Comprehension matrix F.
  • ci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.
source
diff --git a/dev/man/input/index.html b/dev/man/input/index.html index 7aff442..a9141d6 100644 --- a/dev/man/input/index.html +++ b/dev/man/input/index.html @@ -2,7 +2,7 @@ Loading data · JudiLing.jl

Loading data

JudiLing.load_datasetMethod
load_dataset(filepath::String;
             delim::String=",",
             kargs...)

Load a dataset from file, usually comma- or tab-separated. Returns a DataFrame.

Obligatory arguments

  • filepath::String: Path to file to be loaded.

Optional arguments

  • delim::String=",": Delimiter in the file (usually either "," or "\t").
  • kargs...: Further keyword arguments are passed to CSV.File().

Example

latin = JudiLing.load_dataset("latin.csv")
-first(latin, 10)
source
JudiLing.loading_data_randomly_splitMethod
loading_data_randomly_split(
     data_path::String,
     output_dir_path::String,
     data_prefix::String;
@@ -13,7 +13,7 @@
     "careful",
     "latin",
     ["Lexeme","Person","Number","Tense","Voice","Mood"]
-)
source
JudiLing.loading_data_careful_splitMethod
loading_data_careful_split(
     data_path::String,
     data_prefix::String,
     output_dir_path::String,
@@ -33,4 +33,4 @@
     "latin",
     "careful",
     ["Lexeme","Person","Number","Tense","Voice","Mood"]
-)
source
+)source diff --git a/dev/man/make_adjacency_matrix/index.html b/dev/man/make_adjacency_matrix/index.html index f016f8d..99ac529 100644 --- a/dev/man/make_adjacency_matrix/index.html +++ b/dev/man/make_adjacency_matrix/index.html @@ -8,7 +8,7 @@ JudiLing.make_adjacency_matrix( i2f, tokenized=true, - sep_token="-")source
JudiLing.make_full_adjacency_matrixMethod
make_adjacency_matrix(i2f)

Make full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.

Obligatory Arguments

  • i2f::Dict: the dictionary returning features given indices

Optional Arguments

  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • verbose::Bool=false: if true, more information will be printed

Examples

# without tokenization
+    sep_token="-")
source
JudiLing.make_full_adjacency_matrixMethod
make_adjacency_matrix(i2f)

Make full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.

Obligatory Arguments

  • i2f::Dict: the dictionary returning features given indices

Optional Arguments

  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • verbose::Bool=false: if true, more information will be printed

Examples

# without tokenization
 i2f = Dict([(1, "#ab"), (2, "abc"), (3, "bc#"), (4, "#bc"), (5, "ab#")])
 JudiLing.make_adjacency_matrix(i2f)
 
@@ -17,11 +17,11 @@
 JudiLing.make_adjacency_matrix(
     i2f,
     tokenized=true,
-    sep_token="-")
source
JudiLing.make_combined_adjacency_matrixMethod
make_combined_adjacency_matrix(data_train, data_val)

Make combined adjacency matrix.

Obligatory Arguments

  • data_train::DataFrame: training dataset
  • data_val::DataFrame: validation dataset

Optional Arguments

  • grams=3: the number of grams for cues
  • target_col=:Words: the column name for target strings
  • tokenized=false:if true, the dataset target is assumed to be tokenized
  • sep_token=nothing: separator
  • keep_sep=false: if true, keep separators in cues
  • start_end_token="#": start and end token in boundary cues
  • verbose=false: if true, more information is printed

Examples

JudiLing.make_combined_adjacency_matrix(
+    sep_token="-")
source
JudiLing.make_combined_adjacency_matrixMethod
make_combined_adjacency_matrix(data_train, data_val)

Make combined adjacency matrix.

Obligatory Arguments

  • data_train::DataFrame: training dataset
  • data_val::DataFrame: validation dataset

Optional Arguments

  • grams=3: the number of grams for cues
  • target_col=:Words: the column name for target strings
  • tokenized=false:if true, the dataset target is assumed to be tokenized
  • sep_token=nothing: separator
  • keep_sep=false: if true, keep separators in cues
  • start_end_token="#": start and end token in boundary cues
  • verbose=false: if true, more information is printed

Examples

JudiLing.make_combined_adjacency_matrix(
     latin_train,
     latin_val,
     grams=3,
     target_col=:Word,
     tokenized=false,
     keep_sep=false
-    )
source
+ )source diff --git a/dev/man/make_cue_matrix/index.html b/dev/man/make_cue_matrix/index.html index 38af40f..3e8bc69 100644 --- a/dev/man/make_cue_matrix/index.html +++ b/dev/man/make_cue_matrix/index.html @@ -1,5 +1,5 @@ -Make Cue Matrix · JudiLing.jl

Make Cue Matrix

JudiLing.Cue_Matrix_StructType

A structure that stores information created by makecuematrix: C is the cue matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices; goldind is a list of indices of gold paths; A is the adjacency matrix; grams is the number of grams for cues; targetcol is the column name for target strings; tokenized is whether the dataset target is tokenized; septoken is the separator; keepsep is whether to keep separators in cues; startendtoken is the start and end token in boundary cues.

source
JudiLing.make_cue_matrixMethod
make_cue_matrix(data::DataFrame)

Make the cue matrix for training datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.

Obligatory Arguments

  • data::DataFrame: the dataset

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
+Make Cue Matrix · JudiLing.jl

Make Cue Matrix

JudiLing.Cue_Matrix_StructType

A structure that stores information created by makecuematrix: C is the cue matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices; goldind is a list of indices of gold paths; A is the adjacency matrix; grams is the number of grams for cues; targetcol is the column name for target strings; tokenized is whether the dataset target is tokenized; septoken is the separator; keepsep is whether to keep separators in cues; startendtoken is the start and end token in boundary cues.

source
JudiLing.make_cue_matrixMethod
make_cue_matrix(data::DataFrame)

Make the cue matrix for training datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.

Obligatory Arguments

  • data::DataFrame: the dataset

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
 cue_obj_train = JudiLing.make_cue_matrix(
      latin_train,
     grams=3,
@@ -21,7 +21,7 @@
     start_end_token="#",
     keep_sep=true,
     verbose=false
-    )
source
JudiLing.make_cue_matrixMethod
make_cue_matrix(data::DataFrame, cue_obj::Cue_Matrix_Struct)

Make the cue matrix for validation datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.

Obligatory Arguments

  • data::DataFrame: the dataset
  • cue_obj::Cue_Matrix_Struct: training cue object

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
+    )
source
JudiLing.make_cue_matrixMethod
make_cue_matrix(data::DataFrame, cue_obj::Cue_Matrix_Struct)

Make the cue matrix for validation datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.

Obligatory Arguments

  • data::DataFrame: the dataset
  • cue_obj::Cue_Matrix_Struct: training cue object

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
 cue_obj_val = JudiLing.make_cue_matrix(
   latin_val,
   cue_obj_train,
@@ -45,7 +45,7 @@
     keep_sep=true,
     start_end_token="#",
     verbose=false
-    )
source
JudiLing.make_cue_matrixMethod
make_cue_matrix(data_train::DataFrame, data_val::DataFrame)

Make the cue matrix for traiing and validation datasets at the same time.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
+    )
source
JudiLing.make_cue_matrixMethod
make_cue_matrix(data_train::DataFrame, data_val::DataFrame)

Make the cue matrix for traiing and validation datasets at the same time.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
 cue_obj_train, cue_obj_val = JudiLing.make_cue_matrix(
     latin_train,
     latin_val,
@@ -66,7 +66,7 @@
     keep_sep=true,
     start_end_token="#",
     verbose=false
-    )
source
JudiLing.make_combined_cue_matrixMethod
make_combined_cue_matrix(data_train, data_val)

Make the cue matrix for training and validation datasets at the same time, where the features and adjacencies are combined.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
+    )
source
JudiLing.make_combined_cue_matrixMethod
make_combined_cue_matrix(data_train, data_val)

Make the cue matrix for training and validation datasets at the same time, where the features and adjacencies are combined.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset

Optional Arguments

  • grams::Int64=3: the number of grams for cues
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • tokenized::Bool=false:if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • keep_sep::Bool=false: if true, keep separators in cues
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • verbose::Bool=false: if true, more information is printed

Examples

# make cue matrix without tokenization
 cue_obj_train, cue_obj_val = JudiLing.make_combined_cue_matrix(
     latin_train,
     latin_val,
@@ -87,9 +87,9 @@
     keep_sep=true,
     start_end_token="#",
     verbose=false
-    )
source
JudiLing.make_cue_matrix_from_CFBSMethod
make_cue_matrix_from_CFBS(features::Vector{Vector{T}};
                           pad_val::T = 0.,
-                          ncol::Union{Missing,Int}=missing) where {T}

Create a cue matrix from a vector of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val.

Obligatory arguments

  • features::Vector{Vector{T}}: vector of vectors containing C-FBS features

Optional arguments

  • pad_val::T = 0.: Value with which the feature vectors will be padded
  • ncol::Union{Missing,Int}=missing: Number of columns of the C matrix. If not set, will be set to the maximum number of features

Examples

C = JudiLing.make_cue_matrix_from_CFBS(features)
source
JudiLing.make_combined_cue_matrix_from_CFBSMethod
make_combined_cue_matrix_from_CFBS(features_train::Vector{Vector{T}},
+                          ncol::Union{Missing,Int}=missing) where {T}

Create a cue matrix from a vector of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val.

Obligatory arguments

  • features::Vector{Vector{T}}: vector of vectors containing C-FBS features

Optional arguments

  • pad_val::T = 0.: Value with which the feature vectors will be padded
  • ncol::Union{Missing,Int}=missing: Number of columns of the C matrix. If not set, will be set to the maximum number of features

Examples

C = JudiLing.make_cue_matrix_from_CFBS(features)
source
JudiLing.make_combined_cue_matrix_from_CFBSMethod
make_combined_cue_matrix_from_CFBS(features_train::Vector{Vector{T}},
                                    features_test::Vector{Vector{T}};
                                    pad_val::T = 0.,
-                                   ncol::Union{Missing,Int}=missing) where {T}

Create cue matrices from two vectors of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val. The cue matrices are set to have to the size of the maximum number of feature values in features_train and features_test.

Obligatory arguments

  • features_train::Vector{Vector{T}}: vector of vectors containing C-FBS features
  • features_test::Vector{Vector{T}}: vector of vectors containing C-FBS features

Optional arguments

  • pad_val::T = 0.: Value with which the feature vectors will be padded
  • ncol::Union{Missing,Int}=missing: Number of columns of the C matrices. If not set, will be set to the maximum number of features in features_train and features_test

Examples

C_train, C_test = JudiLing.make_combined_cue_matrix_from_CFBS(features_train, features_test)
source
JudiLing.make_ngramsMethod
make_ngrams(tokens, grams, keep_sep, sep_token, start_end_token)

Given a list of string tokens return a list of all n-grams for these tokens.

source
+ ncol::Union{Missing,Int}=missing) where {T}

Create cue matrices from two vectors of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val. The cue matrices are set to have to the size of the maximum number of feature values in features_train and features_test.

Obligatory arguments

  • features_train::Vector{Vector{T}}: vector of vectors containing C-FBS features
  • features_test::Vector{Vector{T}}: vector of vectors containing C-FBS features

Optional arguments

  • pad_val::T = 0.: Value with which the feature vectors will be padded
  • ncol::Union{Missing,Int}=missing: Number of columns of the C matrices. If not set, will be set to the maximum number of features in features_train and features_test

Examples

C_train, C_test = JudiLing.make_combined_cue_matrix_from_CFBS(features_train, features_test)
source
JudiLing.make_ngramsMethod
make_ngrams(tokens, grams, keep_sep, sep_token, start_end_token)

Given a list of string tokens return a list of all n-grams for these tokens.

source
diff --git a/dev/man/make_semantic_matrix/index.html b/dev/man/make_semantic_matrix/index.html index 78e322e..b26d69d 100644 --- a/dev/man/make_semantic_matrix/index.html +++ b/dev/man/make_semantic_matrix/index.html @@ -1,12 +1,12 @@ -Make Semantic Matrix · JudiLing.jl

Make Semantic Matrix

Make binary semantic vectors

JudiLing.PS_Matrix_StructType

A structure that stores the discrete semantic vectors: pS is the discrete semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.

source
JudiLing.make_pS_matrixMethod
make_pS_matrix(data)

Create a discrete semantic matrix given a dataframe.

Obligatory Arguments

  • data::DataFrame: the dataset

Optional Arguments

  • features_col::Symbol=:CommunicativeIntention: the column name for target
  • sep_token::String="_": separator

Examples

s_obj_train = JudiLing.make_pS_matrix(
+Make Semantic Matrix · JudiLing.jl

Make Semantic Matrix

Make binary semantic vectors

JudiLing.PS_Matrix_StructType

A structure that stores the discrete semantic vectors: pS is the discrete semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.

source
JudiLing.make_pS_matrixMethod
make_pS_matrix(data)

Create a discrete semantic matrix given a dataframe.

Obligatory Arguments

  • data::DataFrame: the dataset

Optional Arguments

  • features_col::Symbol=:CommunicativeIntention: the column name for target
  • sep_token::String="_": separator

Examples

s_obj_train = JudiLing.make_pS_matrix(
     utterance,
     features_col=:CommunicativeIntention,
-    sep_token="_")
source
JudiLing.make_pS_matrixMethod
make_pS_matrix(data_val, pS_obj)

Construct discrete semantic matrix for the validation datasets given by the exemplar in the dataframe, and given the S matrix for the training datasets.

Obligatory Arguments

  • data_val::DataFrame: the dataset
  • pS_obj::PS_Matrix_Struct: training PS object

Optional Arguments

  • features_col::Symbol=:CommunicativeIntention: the column name for target
  • sep_token::String="_": separator

Examples

s_obj_val = JudiLing.make_pS_matrix(
+    sep_token="_")
source
JudiLing.make_pS_matrixMethod
make_pS_matrix(data_val, pS_obj)

Construct discrete semantic matrix for the validation datasets given by the exemplar in the dataframe, and given the S matrix for the training datasets.

Obligatory Arguments

  • data_val::DataFrame: the dataset
  • pS_obj::PS_Matrix_Struct: training PS object

Optional Arguments

  • features_col::Symbol=:CommunicativeIntention: the column name for target
  • sep_token::String="_": separator

Examples

s_obj_val = JudiLing.make_pS_matrix(
     data_val,
     s_obj_train,
     features_col=:CommunicativeIntention,
-    sep_token="_")
source
JudiLing.make_combined_pS_matrixMethod
make_combined_pS_matrix(
     data_train,
     data_val;
     features_col = :CommunicativeIntention,
@@ -15,7 +15,7 @@
     data_train,
     data_val,
     features_col=:CommunicativeIntention,
-    sep_token="_")
source

Simulate semantic vectors

JudiLing.L_Matrix_StructType

A structure that stores Lexome semantic vectors: L is Lexome semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.

source
JudiLing.make_S_matrixMethod
make_S_matrix(data::DataFrame, base::Vector, inflections::Vector)

Create simulated semantic matrix for the training datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    sep_token="_")
source

Simulate semantic vectors

JudiLing.L_Matrix_StructType

A structure that stores Lexome semantic vectors: L is Lexome semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.

source
JudiLing.make_S_matrixMethod
make_S_matrix(data::DataFrame, base::Vector, inflections::Vector)

Create simulated semantic matrix for the training datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train = JudiLing.make_S_matrix(
     french,
     ["Lexeme"],
@@ -51,7 +51,7 @@
     sd_base=4,
     sd_inflection=4,
     sd_noise=1,
-    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)

Create simulated semantic matrix for the validation datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)

Create simulated semantic matrix for the validation datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train, S_val = JudiLing.make_S_matrix(
     french,
     french_val,
@@ -88,7 +88,7 @@
     sd_base=4,
     sd_inflection=4,
     sd_noise=1,
-    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data::DataFrame, base::Vector)

Create simulated semantic matrix for the training datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_base::Int64=4: the sd of base features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data::DataFrame, base::Vector)

Create simulated semantic matrix for the training datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_base::Int64=4: the sd of base features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train = JudiLing.make_S_matrix(
     french,
     ["Lexeme"],
@@ -123,7 +123,7 @@
     sd_base=4,
     sd_inflection=4,
     sd_noise=1,
-    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)

Create simulated semantic matrix for the validation datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_base::Int64=4: the sd of base features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)

Create simulated semantic matrix for the validation datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_base::Int64=4: the sd of base features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train, S_val = JudiLing.make_S_matrix(
     french,
     french_val,
@@ -159,7 +159,7 @@
     sd_base=4,
     sd_inflection=4,
     sd_noise=1,
-    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    ...)
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S1 = JudiLing.make_S_matrix(
     latin,
     ["Lexeme"],
@@ -168,7 +168,7 @@
      add_noise=true,
     sd_noise=1,
     normalized=false
-    )
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    )
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S1, S2 = JudiLing.make_S_matrix(
      latin,
     latin_val,
@@ -177,7 +177,7 @@
     add_noise=true,
     sd_noise=1,
     normalized=false
-    )
source
JudiLing.make_S_matrixMethod
make_S_matrix(data::DataFrame, base::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    )
source
JudiLing.make_S_matrixMethod
make_S_matrix(data::DataFrame, base::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S1 = JudiLing.make_S_matrix(
     latin,
     ["Lexeme"],
@@ -185,7 +185,7 @@
     add_noise=true,
     sd_noise=1,
     normalized=false
-    )
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    )
source
JudiLing.make_S_matrixMethod
make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix where lexome matrix is available.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes
  • L::L_Matrix_Struct: the lexome matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S1, S2 = JudiLing.make_S_matrix(
     latin,
     latin_val,
@@ -195,51 +195,51 @@
     add_noise=true,
     sd_noise=1,
     normalized=false
-    )
source
JudiLing.make_L_matrixMethod
make_L_matrix(data::DataFrame, base::Vector)

Create Lexome Matrix with simulated semantic vectors where there are only base features.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_base::Int64=4: the sd of base features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized

Examples

# basic usage
+    )
source
JudiLing.make_L_matrixMethod
make_L_matrix(data::DataFrame, base::Vector)

Create Lexome Matrix with simulated semantic vectors where there are only base features.

Obligatory Arguments

  • data::DataFrame: the dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_base::Int64=4: the sd of base features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized

Examples

# basic usage
 L = JudiLing.make_L_matrix(
     latin,
     ["Lexeme"],
-    ncol=200)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes
  • L::L_Matrix_Struct: the Lexome Matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    ncol=200)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes
  • L::L_Matrix_Struct: the Lexome Matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train, S_val = JudiLing.make_combined_S_matrix(
     latin_train,
     latin_val,
     ["Lexeme"],
     ["Person","Number","Tense","Voice","Mood"],
-    L)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • L::L_Matrix_Struct: the Lexome Matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    L)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)

Create simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • L::L_Matrix_Struct: the Lexome Matrix

Optional Arguments

  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train, S_val = JudiLing.make_combined_S_matrix(
     latin_train,
     latin_val,
     ["Lexeme"],
     ["Person","Number","Tense","Voice","Mood"],
-    L)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(  data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)

Create simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    L)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(  data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)

Create simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train, S_val = JudiLing.make_combined_S_matrix(
     latin_train,
     latin_val,
     ["Lexeme"],
     ["Person","Number","Tense","Voice","Mood"],
-    ncol=n_features)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)

Create simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
+    ncol=n_features)
source
JudiLing.make_combined_S_matrixMethod
make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)

Create simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized
  • add_noise::Bool=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd

Examples

# basic usage
 S_train, S_val = JudiLing.make_combined_S_matrix(
     latin_train,
     latin_val,
     ["Lexeme"],
     ["Person","Number","Tense","Voice","Mood"],
-    ncol=n_features)
source
JudiLing.make_combined_L_matrixMethod
make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)

Create Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized

Examples

# basic usage
+    ncol=n_features)
source
JudiLing.make_combined_L_matrixMethod
make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)

Create Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes
  • inflections::Vector: grammatic lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized

Examples

# basic usage
 L = JudiLing.make_combined_L_matrix(
     latin_train,
     latin_val,
     ["Lexeme"],
     ["Person","Number","Tense","Voice","Mood"],
-    ncol=n_features)
source
JudiLing.make_combined_L_matrixMethod
make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)

Create Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized

Examples

# basic usage
+    ncol=n_features)
source
JudiLing.make_combined_L_matrixMethod
make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)

Create Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • base::Vector: context lexemes

Optional Arguments

  • ncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • seed::Int64=314: the random seed
  • isdeep::Bool=true: if true, mean of each feature is also randomized

Examples

# basic usage
 L = JudiLing.make_combined_L_matrix(
     latin_train,
     latin_val,
     ["Lexeme"],
-    ncol=n_features)
source
JudiLing.L_Matrix_StructMethod
L_Matrix_Struct(L, sd_base, sd_base_mean, sd_inflection, sd_inflection_mean, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)

Construct LMatrixStruct with deep mode.

source
JudiLing.L_Matrix_StructMethod
L_Matrix_Struct(L, sd_base, sd_inflection, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)

Construct LMatrixStruct without deep mode.

source

Load from word2vec, fasttext or similar

JudiLing.L_Matrix_StructMethod
L_Matrix_Struct(L, sd_base, sd_base_mean, sd_inflection, sd_inflection_mean, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)

Construct LMatrixStruct with deep mode.

source
JudiLing.L_Matrix_StructMethod
L_Matrix_Struct(L, sd_base, sd_inflection, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)

Construct LMatrixStruct without deep mode.

source

Load from word2vec, fasttext or similar

JudiLing.load_S_matrix_from_fasttextMethod
load_S_matrix_from_fasttext(data::DataFrame,
                                  language::Symbol;
                                  target_col=:Word,
                                  default_file::Int=1)

Load semantic matrix from fasttext, loaded using the Embeddings.jl package. Subset fasttext vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available.

The last parameter, default_file, specifies which vectors are loaded. To learn about all available vectors, use the following commands:

using Embeddings
 language_files(FastText_Text{:nl})

replacing the language code (here :nl) with the language you are interested in. In general, for all languages other than English, these files are available:

  • default_file=1 loads from https://fasttext.cc/docs/en/crawl-vectors.html, paper: E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/
  • default_file=2 loads from https://fasttext.cc/docs/en/pretrained-vectors.html paper: P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/

Obligatory Arguments

  • data::DataFrame: the dataset
  • language::Symbol: the language of the words in the dataset, offically ISO 639-2 (see https://github.com/JuliaText/Embeddings.jl/issues/34#issuecomment-782604523) but practically it seems more like ISO 639-1 to me with ISO 639-2 only being used if ISO 639-1 isn't available (see https://en.wikipedia.org/wiki/ListofISO639-2codes)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
  • default_file::Int=1: source of vectors, for more information see above and here: https://github.com/JuliaText/Embeddings.jl#loading-different-embeddings

Examples

# basic usage
-latin_small, S = JudiLing.load_S_matrix_from_fasttext(latin, :la, target_col=:Word)
source
JudiLing.load_S_matrix_from_fasttextMethod
load_S_matrix_from_fasttext(data_train::DataFrame,
                                  data_val::DataFrame,
                                  language::Symbol;
                                  target_col=:Word,
@@ -248,14 +248,14 @@
 latin_small_train, latin_small_val, S_train, S_val = JudiLing.load_S_matrix_from_fasttext(latin_train,
                                                       latin_val,
                                                       :la,
-                                                      target_col=:Word)
source
JudiLing.load_S_matrix_from_word2vec_fileMethod
load_S_matrix_from_word2vec_file(data::DataFrame,
                             filepath::String;
-                            target_col=:Word)

Load semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • filepath::String: path to file with word2vec vectors in .txt (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source
JudiLing.load_S_matrix_from_word2vec_fileMethod
load_S_matrix_from_word2vec_file(data_train::DataFrame,
+                            target_col=:Word)

Load semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • filepath::String: path to file with word2vec vectors in .txt (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source
JudiLing.load_S_matrix_from_word2vec_fileMethod
load_S_matrix_from_word2vec_file(data_train::DataFrame,
                             data_val::DataFrame,
                             filepath::String;
-                            target_col=:Word)

Load semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • filepath::String: path to file with word2vec vectors in .txt (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source
JudiLing.load_S_matrix_from_fasttext_fileMethod
load_S_matrix_from_fasttext_file(data::DataFrame,
+                            target_col=:Word)

Load semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • filepath::String: path to file with word2vec vectors in .txt (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source
JudiLing.load_S_matrix_from_fasttext_fileMethod
load_S_matrix_from_fasttext_file(data::DataFrame,
                             filepath::String;
-                            target_col=:Word)

Load semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • filepath::String: path to file with fasttext vectors in .txt or .vec (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source
JudiLing.load_S_matrix_from_fasttext_fileMethod
load_S_matrix_from_fasttext_file(data_train::DataFrame,
+                            target_col=:Word)

Load semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.

Obligatory Arguments

  • data::DataFrame: the training dataset
  • filepath::String: path to file with fasttext vectors in .txt or .vec (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source
JudiLing.load_S_matrix_from_fasttext_fileMethod
load_S_matrix_from_fasttext_file(data_train::DataFrame,
                             data_val::DataFrame,
                             filepath::String;
-                            target_col=:Word)

Load semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • filepath::String: path to file with fasttext vectors in .txt (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source

Utility functions

JudiLing.merge_f2iMethod
merge_f2i(base_f2i, infl_f2i, n_base_f, n_infl_f)

Merge base f2i dictionary and inflectional f2i dictionary.

source
JudiLing.make_StMethod
make_St(L, n, data, base, inflections)

Make S transpose matrix with inflections.

source
+ target_col=:Word)

Load semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • filepath::String: path to file with fasttext vectors in .txt (not compressed in any way)

Optional Arguments

  • target_col=:Word: column with orthographic representation of words in data
source

Utility functions

JudiLing.merge_f2iMethod
merge_f2i(base_f2i, infl_f2i, n_base_f, n_infl_f)

Merge base f2i dictionary and inflectional f2i dictionary.

source
JudiLing.make_StMethod
make_St(L, n, data, base, inflections)

Make S transpose matrix with inflections.

source
diff --git a/dev/man/make_yt_matrix/index.html b/dev/man/make_yt_matrix/index.html index aa8b7f1..e6c5844 100644 --- a/dev/man/make_yt_matrix/index.html +++ b/dev/man/make_yt_matrix/index.html @@ -1,3 +1,3 @@ -Make Yt Matrix · JudiLing.jl

Make Yt Matrix

JudiLing.make_Yt_matrixMethod
make_Yt_matrix(t, data, f2i)

Make Yt matrix for timestep t. A given column of the Yt matrix specifies the support for the corresponding n-gram predicted for timestep t for each of the observations (rows of Yt).

Obligatory Arguments

  • t::Int64: the timestep t
  • data::DataFrame: the dataset
  • f2i::Dict: the dictionary returning indices given features

Optional Arguments

  • tokenized::Bool=false: if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • verbose::Bool=false: if verbose, more information will be printed

Examples

latin = DataFrame(CSV.File(joinpath("data", "latin_mini.csv")))
-JudiLing.make_Yt_matrix(2, latin)
source
+Make Yt Matrix · JudiLing.jl

Make Yt Matrix

JudiLing.make_Yt_matrixMethod
make_Yt_matrix(t, data, f2i)

Make Yt matrix for timestep t. A given column of the Yt matrix specifies the support for the corresponding n-gram predicted for timestep t for each of the observations (rows of Yt).

Obligatory Arguments

  • t::Int64: the timestep t
  • data::DataFrame: the dataset
  • f2i::Dict: the dictionary returning indices given features

Optional Arguments

  • tokenized::Bool=false: if true, the dataset target is assumed to be tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator token
  • verbose::Bool=false: if verbose, more information will be printed

Examples

latin = DataFrame(CSV.File(joinpath("data", "latin_mini.csv")))
+JudiLing.make_Yt_matrix(2, latin)
source
diff --git a/dev/man/measures_func/index.html b/dev/man/measures_func/index.html index 164f644..7916f40 100644 --- a/dev/man/measures_func/index.html +++ b/dev/man/measures_func/index.html @@ -71,4 +71,4 @@ Note: the `kargs` are just keyword arguments that are passed on from the parameters of `get_and_train_model` to the `measures_func`. For example, this could be a suffix that should be added to each added column in `measures_func`. ## Output -The function has to return the dataset. +The function has to return the dataset. diff --git a/dev/man/output/index.html b/dev/man/output/index.html index 81feccc..047ddde 100644 --- a/dev/man/output/index.html +++ b/dev/man/output/index.html @@ -1,5 +1,5 @@ -Output · JudiLing.jl

Output

JudiLing.write2csvFunction

Write results into a csv file. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.

source
JudiLing.write2dfFunction

Reformat results into a dataframe. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.

source
JudiLing.write2csvMethod
write2csv(res, data, cue_obj_train, cue_obj_val, filename)

Write results into csv file for the results from learn_paths and build_paths.

Obligatory Arguments

  • res::Array{Array{Result_Path_Info_Struct,1},1}: the results from learn_paths or build_paths
  • data::DataFrame: the dataset
  • cue_obj_train::Cue_Matrix_Struct: the cue object for training dataset
  • cue_obj_val::Cue_Matrix_Struct: the cue object for validation dataset
  • filename::String: the filename

Optional Arguments

  • grams::Int64=3: the number n in n-gram cues
  • tokenized::Bool=false: if true, the dataset target is tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • output_sep_token::Union{String, Char}="": output separator
  • path_sep_token::Union{String, Char}=":": path separator
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

# writing results for training data
+Output · JudiLing.jl

Output

JudiLing.write2csvFunction

Write results into a csv file. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.

source
JudiLing.write2dfFunction

Reformat results into a dataframe. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.

source
JudiLing.write2csvMethod
write2csv(res, data, cue_obj_train, cue_obj_val, filename)

Write results into csv file for the results from learn_paths and build_paths.

Obligatory Arguments

  • res::Array{Array{Result_Path_Info_Struct,1},1}: the results from learn_paths or build_paths
  • data::DataFrame: the dataset
  • cue_obj_train::Cue_Matrix_Struct: the cue object for training dataset
  • cue_obj_val::Cue_Matrix_Struct: the cue object for validation dataset
  • filename::String: the filename

Optional Arguments

  • grams::Int64=3: the number n in n-gram cues
  • tokenized::Bool=false: if true, the dataset target is tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • output_sep_token::Union{String, Char}="": output separator
  • path_sep_token::Union{String, Char}=":": path separator
  • target_col::Union{String, Symbol}=:Words: the column name for target strings
  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

# writing results for training data
 JudiLing.write2csv(
     res_train,
     latin_train,
@@ -31,7 +31,7 @@
     path_sep_token=":",
     target_col=:Word,
     root_dir=".",
-    output_dir="test_out")
source
JudiLing.write2csvMethod
write2csv(gpi::Vector{Gold_Path_Info_Struct}, filename)

Write results into csv file for the gold paths' information optionally returned by learn_paths and build_paths.

Obligatory Arguments

  • gpi::Vector{Gold_Path_Info_Struct}: the gold paths' information
  • filename::String: the filename

Optional Arguments

  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

# write gold standard paths to csv for training data
+    output_dir="test_out")
source
JudiLing.write2csvMethod
write2csv(gpi::Vector{Gold_Path_Info_Struct}, filename)

Write results into csv file for the gold paths' information optionally returned by learn_paths and build_paths.

Obligatory Arguments

  • gpi::Vector{Gold_Path_Info_Struct}: the gold paths' information
  • filename::String: the filename

Optional Arguments

  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

# write gold standard paths to csv for training data
 JudiLing.write2csv(
     gpi_train,
     "gpi_latin_train.csv",
@@ -45,7 +45,7 @@
     "gpi_latin_val.csv",
     root_dir=".",
     output_dir="test_out"
-    )
source
JudiLing.write2csvMethod
write2csv(ts::Threshold_Stat_Struct, filename)

Write results into csv file for threshold and tolerance proportion for each timestep.

Obligatory Arguments

  • gpi::Vector{Gold_Path_Info_Struct}: the gold paths' information
  • filename::String: the filename

Optional Arguments

  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

JudiLing.write2csv(ts, "ts.csv", root_dir = @__DIR__, output_dir="out")
source
JudiLing.write2dfMethod
write2df(res, data, cue_obj_train, cue_obj_val)

Reformat results into a dataframe for the results form learn_paths and build_paths functions.

Obligatory Arguments

  • res: output of learn_paths or build_paths
  • data::DataFrame: the dataset
  • cue_obj_train: cue object of the training data set
  • cue_obj_val: cue object of the validation data set

Optional Arguments

  • grams::Int64=3: the number n in n-gram cues
  • tokenized::Bool=false: if true, the dataset target is tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • output_sep_token::Union{String, Char}="": output separator
  • path_sep_token::Union{String, Char}=":": path separator
  • target_col::Union{String, Symbol}=:Words: the column name for target strings

Examples

# writing results for training data
+    )
source
JudiLing.write2csvMethod
write2csv(ts::Threshold_Stat_Struct, filename)

Write results into csv file for threshold and tolerance proportion for each timestep.

Obligatory Arguments

  • gpi::Vector{Gold_Path_Info_Struct}: the gold paths' information
  • filename::String: the filename

Optional Arguments

  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

JudiLing.write2csv(ts, "ts.csv", root_dir = @__DIR__, output_dir="out")
source
JudiLing.write2dfMethod
write2df(res, data, cue_obj_train, cue_obj_val)

Reformat results into a dataframe for the results form learn_paths and build_paths functions.

Obligatory Arguments

  • res: output of learn_paths or build_paths
  • data::DataFrame: the dataset
  • cue_obj_train: cue object of the training data set
  • cue_obj_val: cue object of the validation data set

Optional Arguments

  • grams::Int64=3: the number n in n-gram cues
  • tokenized::Bool=false: if true, the dataset target is tokenized
  • sep_token::Union{Nothing, String, Char}=nothing: separator
  • start_end_token::Union{String, Char}="#": start and end token in boundary cues
  • output_sep_token::Union{String, Char}="": output separator
  • path_sep_token::Union{String, Char}=":": path separator
  • target_col::Union{String, Symbol}=:Words: the column name for target strings

Examples

# writing results for training data
 JudiLing.write2df(
     res_train,
     latin_train,
@@ -71,10 +71,10 @@
     start_end_token="#",
     output_sep_token="",
     path_sep_token=":",
-    target_col=:Word)
source
JudiLing.write2dfMethod
write2df(gpi::Vector{Gold_Path_Info_Struct})

Write results into a dataframe for the gold paths' information optionally returned by learn_paths and build_paths.

Obligatory Arguments

  • gpi::Vector{Gold_Path_Info_Struct}: the gold paths' information

Examples

# write gold standard paths to df for training data
+    target_col=:Word)
source
JudiLing.write2dfMethod
write2df(gpi::Vector{Gold_Path_Info_Struct})

Write results into a dataframe for the gold paths' information optionally returned by learn_paths and build_paths.

Obligatory Arguments

  • gpi::Vector{Gold_Path_Info_Struct}: the gold paths' information

Examples

# write gold standard paths to df for training data
 JudiLing.write2csv(gpi_train)
 
 # write gold standard paths to df for validation data
-JudiLing.write2csv(gpi_val)
source
JudiLing.write2dfMethod
write2df(ts::Threshold_Stat_Struct)

Write results into a dataframe for threshold and tolerance proportion for each timestep.

Obligatory Arguments

  • ts::Threshold_Stat_Struct: the threshold and tolerance proportion

Examples

JudiLing.write2df(ts)
source
JudiLing.write_comprehension_evalMethod
write_comprehension_eval(SChat, SC, data, target_col, filename)

Write comprehension evaluation into a CSV file, include target and predicted ids and indentifiers and their correlations.

Obligatory Arguments

  • SChat::Matrix: the Shat/Chat matrix
  • SC::Matrix: the S/C matrix
  • data::DataFrame: the data
  • target_col::Symbol: the name of target column
  • filename::String: the filename/filepath

Optional Arguments

  • k: top k candidates
  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

JudiLing.write_comprehension_eval(Chat, cue_obj.C, latin, :Word, "output.csv",
-    k=10, root_dir=@__DIR__, output_dir="out")
source
JudiLing.write_comprehension_evalMethod
write_comprehension_eval(SChat, SC, SC_rest, data, data_rest, target_col, filename)

Write comprehension evaluation into a CSV file for both training and validation datasets, include target and predicted ids and indentifiers and their correlations.

Obligatory Arguments

  • SChat::Matrix: the Shat/Chat matrix
  • SC::Matrix: the S/C matrix
  • SC_rest::Matrix: the rest S/C matrix
  • data::DataFrame: the data
  • data_rest::DataFrame: the rest data
  • target_col::Symbol: the name of target column
  • filename::String: the filename/filepath

Optional Arguments

  • k: top k candidates
  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

JudiLing.write_comprehension_eval(Shat_val, S_val, S_train, latin_val, latin_train,
-    :Word, "all_output.csv", k=10, root_dir=@__DIR__, output_dir="out")
source
JudiLing.save_L_matrixMethod
save_L_matrix(L, filename)

Save lexome matrix into csv file.

Obligatory Arguments

  • L::L_Matrix_Struct: the lexome matrix struct
  • filename::String: the filename/filepath

Examples

JudiLing.save_L_matrix(L, joinpath(@__DIR__, "L.csv"))
source
JudiLing.load_L_matrixMethod
load_L_matrix(filename)

Load lexome matrix from csv file.

Obligatory Arguments

  • filename::String: the filename/filepath

Optional Arguments

  • header::Bool=false: header in csv

Examples

L_load = JudiLing.load_L_matrix(joinpath(@__DIR__, "L.csv"))
source
JudiLing.save_S_matrixMethod
save_S_matrix(S, filename, data, target_col)

Save S matrix into a csv file.

Obligatory Arguments

  • S::Matrix: the S matrix
  • filename::String: the filename/filepath
  • data::DataFrame: the data
  • target_col::Symbol: the name of target column

Optional Arguments

  • sep::Bool=" ": separator in CSV file

Examples

JudiLing.save_S_matrix(S, joinpath(@__DIR__, "S.csv"), latin, :Word)
source
JudiLing.load_S_matrixMethod
load_S_matrix(filename)

Load S matrix from a csv file.

Obligatory Arguments

  • filename::String: the filename/filepath

Optional Arguments

  • header::Bool=false: header in csv
  • sep::Bool=" ": separator in CSV file

Examples

JudiLing.load_S_matrix(joinpath(@__DIR__, "S.csv"))
source
+JudiLing.write2csv(gpi_val)
source
JudiLing.write2dfMethod
write2df(ts::Threshold_Stat_Struct)

Write results into a dataframe for threshold and tolerance proportion for each timestep.

Obligatory Arguments

  • ts::Threshold_Stat_Struct: the threshold and tolerance proportion

Examples

JudiLing.write2df(ts)
source
JudiLing.write_comprehension_evalMethod
write_comprehension_eval(SChat, SC, data, target_col, filename)

Write comprehension evaluation into a CSV file, include target and predicted ids and indentifiers and their correlations.

Obligatory Arguments

  • SChat::Matrix: the Shat/Chat matrix
  • SC::Matrix: the S/C matrix
  • data::DataFrame: the data
  • target_col::Symbol: the name of target column
  • filename::String: the filename/filepath

Optional Arguments

  • k: top k candidates
  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

JudiLing.write_comprehension_eval(Chat, cue_obj.C, latin, :Word, "output.csv",
+    k=10, root_dir=@__DIR__, output_dir="out")
source
JudiLing.write_comprehension_evalMethod
write_comprehension_eval(SChat, SC, SC_rest, data, data_rest, target_col, filename)

Write comprehension evaluation into a CSV file for both training and validation datasets, include target and predicted ids and indentifiers and their correlations.

Obligatory Arguments

  • SChat::Matrix: the Shat/Chat matrix
  • SC::Matrix: the S/C matrix
  • SC_rest::Matrix: the rest S/C matrix
  • data::DataFrame: the data
  • data_rest::DataFrame: the rest data
  • target_col::Symbol: the name of target column
  • filename::String: the filename/filepath

Optional Arguments

  • k: top k candidates
  • root_dir::String=".": dir path for project root dir
  • output_dir::String=".": output dir inside root dir

Examples

JudiLing.write_comprehension_eval(Shat_val, S_val, S_train, latin_val, latin_train,
+    :Word, "all_output.csv", k=10, root_dir=@__DIR__, output_dir="out")
source
JudiLing.save_L_matrixMethod
save_L_matrix(L, filename)

Save lexome matrix into csv file.

Obligatory Arguments

  • L::L_Matrix_Struct: the lexome matrix struct
  • filename::String: the filename/filepath

Examples

JudiLing.save_L_matrix(L, joinpath(@__DIR__, "L.csv"))
source
JudiLing.load_L_matrixMethod
load_L_matrix(filename)

Load lexome matrix from csv file.

Obligatory Arguments

  • filename::String: the filename/filepath

Optional Arguments

  • header::Bool=false: header in csv

Examples

L_load = JudiLing.load_L_matrix(joinpath(@__DIR__, "L.csv"))
source
JudiLing.save_S_matrixMethod
save_S_matrix(S, filename, data, target_col)

Save S matrix into a csv file.

Obligatory Arguments

  • S::Matrix: the S matrix
  • filename::String: the filename/filepath
  • data::DataFrame: the data
  • target_col::Symbol: the name of target column

Optional Arguments

  • sep::Bool=" ": separator in CSV file

Examples

JudiLing.save_S_matrix(S, joinpath(@__DIR__, "S.csv"), latin, :Word)
source
JudiLing.load_S_matrixMethod
load_S_matrix(filename)

Load S matrix from a csv file.

Obligatory Arguments

  • filename::String: the filename/filepath

Optional Arguments

  • header::Bool=false: header in csv
  • sep::Bool=" ": separator in CSV file

Examples

JudiLing.load_S_matrix(joinpath(@__DIR__, "S.csv"))
source
diff --git a/dev/man/pickle/index.html b/dev/man/pickle/index.html index 42c4b55..8b6183b 100644 --- a/dev/man/pickle/index.html +++ b/dev/man/pickle/index.html @@ -1,2 +1,2 @@ -Pickle · JudiLing.jl
+Pickle · JudiLing.jl
diff --git a/dev/man/preprocess/index.html b/dev/man/preprocess/index.html index e0879bd..848503e 100644 --- a/dev/man/preprocess/index.html +++ b/dev/man/preprocess/index.html @@ -1,2 +1,2 @@ -Preprocess · JudiLing.jl

Preprocess

+Preprocess · JudiLing.jl

Preprocess

diff --git a/dev/man/pyndl/index.html b/dev/man/pyndl/index.html index 804fd2d..689b2fe 100644 --- a/dev/man/pyndl/index.html +++ b/dev/man/pyndl/index.html @@ -3,12 +3,12 @@ using JudiLing

Calling pyndl from JudiLing

JudiLing.Pyndl_Weight_StructType
Pyndl_Weight_Struct
     cues::Vector{String}
     outcomes::Vector{String}
-    weight::Matrix{Float64}
  • cues::Vector{String}: Vector of cues, in the order that they appear in the weight matrix.
  • outcomes::Vector{String}: Vector of outcomes, in the order that they appear in the weight matrix.
  • weight::Matrix{Float64}: Weight matrix.
source
JudiLing.pyndlMethod
pyndl(
+    weight::Matrix{Float64}
  • cues::Vector{String}: Vector of cues, in the order that they appear in the weight matrix.
  • outcomes::Vector{String}: Vector of outcomes, in the order that they appear in the weight matrix.
  • weight::Matrix{Float64}: Weight matrix.
source
JudiLing.pyndlMethod
pyndl(
     data_path::String;
     alpha::Float64 = 0.1,
     betas::Tuple{Float64,Float64} = (0.1, 0.1),
     method::String = "openmp"
-)

Compute weights using pyndl. See the documentation of pyndl for more information: https://pyndl.readthedocs.io/en/latest/

Obligatory arguments

  • data_path::String: Path to an events file as generated by pyndl's preprocess.createeventfile

Optional arguments

  • alpha::Float64 = 0.1: α learning rate.
  • betas::Tuple{Float64,Float64} = (0.1, 0.1): β1 and β2 learning rates
  • method::String = "openmp": One of {"openmp", "threading"}. "openmp" only works on Linux.

Example

weights = JudiLing.pyndl("data/latin_train_events.tab.gz")
source

Translating output of pyndl to cue and semantic matrices in JudiLing

With the weights in hand, the cue and semantic matrices can be computed:

JudiLing.make_cue_matrixMethod
make_cue_matrix(
+)

Compute weights using pyndl. See the documentation of pyndl for more information: https://pyndl.readthedocs.io/en/latest/

Obligatory arguments

  • data_path::String: Path to an events file as generated by pyndl's preprocess.createeventfile

Optional arguments

  • alpha::Float64 = 0.1: α learning rate.
  • betas::Tuple{Float64,Float64} = (0.1, 0.1): β1 and β2 learning rates
  • method::String = "openmp": One of {"openmp", "threading"}. "openmp" only works on Linux.

Example

weights = JudiLing.pyndl("data/latin_train_events.tab.gz")
source

Translating output of pyndl to cue and semantic matrices in JudiLing

With the weights in hand, the cue and semantic matrices can be computed:

JudiLing.make_cue_matrixMethod
make_cue_matrix(
     data::DataFrame,
     pyndl_weights::Pyndl_Weight_Struct;
     grams = 3,
@@ -21,7 +21,7 @@
 )

Make the cue matrix based on a dataframe and weights computed with pyndl. Practically this means that the cues are extracted from the weights object and translated to the JudiLing format.

Obligatory arguments

  • data::DataFrame: Dataset with all the word types on which the weights were trained.
  • pyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl

Optional argyments

  • grams = 3: N-gram size (has to match the n-gram granularity of the cues on which the weights were trained).
  • target_col = "Words": Column with target words.
  • tokenized = false: Whether the target words are already tokenized
  • sep_token = nothing: The string separating the tokens (only used if tokenized=true).
  • keep_sep = false: Whether the sep_token should be retained in the cues.
  • start_end_token = "#": The string with which to mark word boundaries.
  • verbose = false: Verbose mode.

Example

weights = JudiLing.pyndl("data/latin_train_events.tab.gz")
 cue_obj = JudiLing.make_cue_matrix("latin_train.csv", weights,
                                     grams = 3,
-                                    target_col = "Word")
source
JudiLing.make_S_matrixMethod
make_S_matrix(
+                                    target_col = "Word")
source
JudiLing.make_S_matrixMethod
make_S_matrix(
     data::DataFrame,
     pyndl_weights::Pyndl_Weight_Struct,
     n_features_columns::Vector;
@@ -31,7 +31,7 @@
 S = JudiLing.make_S_matrix(data,
                             weights_latin,
                             ["Lexeme", "Person", "Number", "Tense", "Voice", "Mood"],
-                            tokenized=false)
source
JudiLing.make_S_matrixMethod
make_S_matrix(
+                            tokenized=false)
source
JudiLing.make_S_matrixMethod
make_S_matrix(
     data_train::DataFrame,
     data_val::DataFrame,
     pyndl_weights::Pyndl_Weight_Struct,
@@ -43,4 +43,4 @@
                             val,
                             weights_latin,
                             ["Lexeme", "Person", "Number", "Tense", "Voice", "Mood"],
-                            tokenized=false)
source
+ tokenized=false)source diff --git a/dev/man/test_combo/index.html b/dev/man/test_combo/index.html index be42f16..2ed9f1f 100644 --- a/dev/man/test_combo/index.html +++ b/dev/man/test_combo/index.html @@ -1,2 +1,2 @@ -Test Combo · JudiLing.jl

Test Combo

JudiLing.test_comboMethod
test_combo(test_mode;kwargs...)

A wrapper function for a full model for a specific combination of parameters. A detailed introduction is in Test Combo Introduction

Obligatory Arguments

  • test_mode::Symbol: which test mode, currently supports :trainonly, :presplit, :carefulsplit and :randomsplit.

Optional Arguments

  • train_sample_size::Int64=0: the desired number of training data
  • val_sample_size::Int64=0: the desired number of validation data
  • val_ratio::Float64=0.0: the desired portion of validation data, if works only if :valsamplesize is 0.0.
  • extension::String=".csv": the extension for data nfeaturesinflections
  • n_grams_target_col::Union{String, Symbol}=:Word: the column name for target strings
  • n_grams_tokenized::Boolean=false: if true, the dataset target is assumed to be tokenized
  • n_grams_sep_token::String=nothing: separator
  • grams::Int64=3: the number of grams for cues
  • n_grams_keep_sep::Boolean=false: if true, keep separators in cues
  • start_end_token::String=":": start and end token in boundary cues
  • path_sep_token::String=":": path separator in the assembled path
  • random_seed::Int64=314: the random seed
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • isdeep::Boolean=true: if true, mean of each feature is also randomized
  • add_noise::Boolean=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Boolean=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd
  • if_combined::Boolean=false: if true, then features are combined with both training and validation data
  • learn_mode::Int64=:cholesky: which learning mode, currently supports :cholesky and :wh
  • method::Int64=:additive: whether :additive or :multiplicative decomposition is required
  • shift::Int64=0.02: shift value for :additive decomposition
  • multiplier::Int64=1.01: multiplier value for :multiplicative decomposition
  • output_format::Int64=:auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Int64=0.05: the ratio to decide whether a matrix is sparse
  • wh_freq::Vector=nothing: the learning sequence
  • init_weights::Matrix=nothing: the initial weights
  • eta::Float64=0.1: the learning rate
  • n_epochs::Int64=1: the number of epochs to be trained
  • max_t::Int64=0: the number of epochs to be trained
  • A::Matrix=nothing: the number of epochs to be trained
  • A_mode::Symbol=:combined: the adjacency matrix mode, currently supports :combined or :train_only
  • max_can::Int64=10: the max number of candidate path to keep in the output
  • threshold_train::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for training data
  • is_tolerant_train::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for training data
  • tolerance_train::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for training data
  • max_tolerance_train::Int64=2: maximum number of n-grams allowed in a path for training data
  • threshold_val::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for validation data
  • is_tolerant_val::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for validation data
  • tolerance_val::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for validation data
  • max_tolerance_val::Int64=2: maximum number of n-grams allowed in a path for validation data
  • n_neighbors_train::Int64=10: the top n form neighbors to be considered for training data
  • n_neighbors_val::Int64=20: the top n form neighbors to be considered for validation data
  • issparse::Bool=false: if true, keep sparse matrix format when learning paths
  • output_dir::String="out": the output directory
  • verbose::Bool=false: if true, more information will be printed
source
+Test Combo · JudiLing.jl

Test Combo

JudiLing.test_comboMethod
test_combo(test_mode;kwargs...)

A wrapper function for a full model for a specific combination of parameters. A detailed introduction is in Test Combo Introduction

Note

testcombo: testcombo is deprecated. While it will remain in the package it is no longer actively maintained.

Obligatory Arguments

  • test_mode::Symbol: which test mode, currently supports :trainonly, :presplit, :carefulsplit and :randomsplit.

Optional Arguments

  • train_sample_size::Int64=0: the desired number of training data
  • val_sample_size::Int64=0: the desired number of validation data
  • val_ratio::Float64=0.0: the desired portion of validation data, if works only if :valsamplesize is 0.0.
  • extension::String=".csv": the extension for data nfeaturesinflections
  • n_grams_target_col::Union{String, Symbol}=:Word: the column name for target strings
  • n_grams_tokenized::Boolean=false: if true, the dataset target is assumed to be tokenized
  • n_grams_sep_token::String=nothing: separator
  • grams::Int64=3: the number of grams for cues
  • n_grams_keep_sep::Boolean=false: if true, keep separators in cues
  • start_end_token::String=":": start and end token in boundary cues
  • path_sep_token::String=":": path separator in the assembled path
  • random_seed::Int64=314: the random seed
  • sd_base_mean::Int64=1: the sd mean of base features
  • sd_inflection_mean::Int64=1: the sd mean of inflectional features
  • sd_base::Int64=4: the sd of base features
  • sd_inflection::Int64=4: the sd of inflectional features
  • isdeep::Boolean=true: if true, mean of each feature is also randomized
  • add_noise::Boolean=true: if true, add additional Gaussian noise
  • sd_noise::Int64=1: the sd of the Gaussian noise
  • normalized::Boolean=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd
  • if_combined::Boolean=false: if true, then features are combined with both training and validation data
  • learn_mode::Int64=:cholesky: which learning mode, currently supports :cholesky and :wh
  • method::Int64=:additive: whether :additive or :multiplicative decomposition is required
  • shift::Int64=0.02: shift value for :additive decomposition
  • multiplier::Int64=1.01: multiplier value for :multiplicative decomposition
  • output_format::Int64=:auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program
  • sparse_ratio::Int64=0.05: the ratio to decide whether a matrix is sparse
  • wh_freq::Vector=nothing: the learning sequence
  • init_weights::Matrix=nothing: the initial weights
  • eta::Float64=0.1: the learning rate
  • n_epochs::Int64=1: the number of epochs to be trained
  • max_t::Int64=0: the number of epochs to be trained
  • A::Matrix=nothing: the number of epochs to be trained
  • A_mode::Symbol=:combined: the adjacency matrix mode, currently supports :combined or :train_only
  • max_can::Int64=10: the max number of candidate path to keep in the output
  • threshold_train::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for training data
  • is_tolerant_train::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for training data
  • tolerance_train::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for training data
  • max_tolerance_train::Int64=2: maximum number of n-grams allowed in a path for training data
  • threshold_val::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for validation data
  • is_tolerant_val::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for validation data
  • tolerance_val::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for validation data
  • max_tolerance_val::Int64=2: maximum number of n-grams allowed in a path for validation data
  • n_neighbors_train::Int64=10: the top n form neighbors to be considered for training data
  • n_neighbors_val::Int64=20: the top n form neighbors to be considered for validation data
  • issparse::Bool=false: if true, keep sparse matrix format when learning paths
  • output_dir::String="out": the output directory
  • verbose::Bool=false: if true, more information will be printed
source
diff --git a/dev/man/utils/index.html b/dev/man/utils/index.html index fda958f..54d4806 100644 --- a/dev/man/utils/index.html +++ b/dev/man/utils/index.html @@ -1,13 +1,13 @@ -Utils · JudiLing.jl

Utils

JudiLing.is_truly_sparseFunction

Check whether a matrix is truly sparse regardless its format, where M is originally a sparse matrix format.

source

Check whether a matrix is truly sparse regardless its format, where M is originally a dense matrix format.

source
JudiLing.cal_max_timestepFunction
function cal_max_timestep(
+Utils · JudiLing.jl

Utils

JudiLing.is_truly_sparseFunction

Check whether a matrix is truly sparse regardless its format, where M is originally a sparse matrix format.

source

Check whether a matrix is truly sparse regardless its format, where M is originally a dense matrix format.

source
JudiLing.cal_max_timestepFunction
function cal_max_timestep(
     data_train::DataFrame,
     data_val::DataFrame,
     target_col::Union{String, Symbol};
     tokenized::Bool = false,
     sep_token::Union{Nothing, String, Char} = "",
-)

Calculate the max timestep given training and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • target_col::Union{String, Symbol}: the column with the target word forms

Optional Arguments

  • tokenized::Bool = false: Whether the word forms in the target_col are already tokenized
  • sep_token::Union{Nothing, String, Char} = "": The token with which the word forms are tokenized

Examples

JudiLing.cal_max_timestep(latin_train, latin_val, target_col=:Word)
source
function cal_max_timestep(
+)

Calculate the max timestep given training and validation datasets.

Obligatory Arguments

  • data_train::DataFrame: the training dataset
  • data_val::DataFrame: the validation dataset
  • target_col::Union{String, Symbol}: the column with the target word forms

Optional Arguments

  • tokenized::Bool = false: Whether the word forms in the target_col are already tokenized
  • sep_token::Union{Nothing, String, Char} = "": The token with which the word forms are tokenized

Examples

JudiLing.cal_max_timestep(latin_train, latin_val, target_col=:Word)
source
function cal_max_timestep(
     data::DataFrame,
     target_col::Union{String, Symbol};
     tokenized::Bool = false,
     sep_token::Union{Nothing, String, Char} = "",
-)

Calculate the max timestep given training dataset.

Obligatory Arguments

  • data::DataFrame: the dataset
  • target_col::Union{String, Symbol}: the column with the target word forms

Optional Arguments

  • tokenized::Bool = false: Whether the word forms in the target_col are already tokenized
  • sep_token::Union{Nothing, String, Char} = "": The token with which the word forms are tokenized

Examples

JudiLing.cal_max_timestep(latin, target_col=:Word)
source
+)

Calculate the max timestep given training dataset.

Obligatory Arguments

  • data::DataFrame: the dataset
  • target_col::Union{String, Symbol}: the column with the target word forms

Optional Arguments

  • tokenized::Bool = false: Whether the word forms in the target_col are already tokenized
  • sep_token::Union{Nothing, String, Char} = "": The token with which the word forms are tokenized

Examples

JudiLing.cal_max_timestep(latin, target_col=:Word)
source
diff --git a/dev/man/wh/index.html b/dev/man/wh/index.html index b01e979..bf39baa 100644 --- a/dev/man/wh/index.html +++ b/dev/man/wh/index.html @@ -10,4 +10,4 @@ history_cols = nothing, history_rows = nothing, verbose = false, - )

Widrow-Hoff Learning.

Obligatory Arguments

Optional Arguments

source
JudiLing.make_learn_seqMethod
make_learn_seq(freq; random_seed = 314)

Make Widrow-Hoff learning sequence from frequencies. Creates a randomly ordered sequences of indices where each index appears according to its frequncy.

Obligatory arguments

  • freq: Vector with frequencies.

Optional arguments

  • random_seed = 314: Random seed to control randomness.

Example

learn_seq = JudiLing.make_learn_seq(data.frequency)
source
+ )

Widrow-Hoff Learning.

Obligatory Arguments

Optional Arguments

source
JudiLing.make_learn_seqMethod
make_learn_seq(freq; random_seed = 314)

Make Widrow-Hoff learning sequence from frequencies. Creates a randomly ordered sequences of indices where each index appears according to its frequncy.

Obligatory arguments

  • freq: Vector with frequencies.

Optional arguments

  • random_seed = 314: Random seed to control randomness.

Example

learn_seq = JudiLing.make_learn_seq(data.frequency)
source
diff --git a/dev/search/index.html b/dev/search/index.html index 35594a9..f6a4643 100644 --- a/dev/search/index.html +++ b/dev/search/index.html @@ -1,2 +1,2 @@ -Search · JudiLing.jl

Loading search...

    +Search · JudiLing.jl

    Loading search...

      diff --git a/dev/search_index.js b/dev/search_index.js index ff02ab3..7fdce6e 100644 --- a/dev/search_index.js +++ b/dev/search_index.js @@ -1,3 +1,3 @@ var documenterSearchIndex = {"docs": -[{"location":"man/deep_learning/","page":"Deep learning","title":"Deep learning","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/deep_learning/#Deep-learning-in-JudiLing","page":"Deep learning","title":"Deep learning in JudiLing","text":"","category":"section"},{"location":"man/deep_learning/","page":"Deep learning","title":"Deep learning","text":"predict_from_deep_model(model::Flux.Chain,\n X::Union{SparseMatrixCSC,Matrix})\npredict_shat(model::Flux.Chain,\n ci::Vector{Int})\nget_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n X_val::Union{SparseMatrixCSC,Matrix,Missing},\n Y_val::Union{SparseMatrixCSC,Matrix,Missing},\n data_train::Union{DataFrame,Missing},\n data_val::Union{DataFrame,Missing},\n target_col::Union{Symbol, String,Missing},\n model_outpath::String;\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Flux.Chain} = missing,\n early_stopping::Union{Missing, Int}=missing,\n optimise_for_acc::Bool=false,\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n kargs...)\nget_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n model_outpath::String;\n data_train::Union{Missing, DataFrame}=missing,\n target_col::Union{Missing, Symbol, String}=missing,\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Flux.Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n kargs...)\nfiddl(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n learn_seq::Vector,\n data::DataFrame,\n target_col::Union{Symbol, String},\n model_outpath::String;\n hidden_dim::Int=1000,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n n_batch_eval::Int=100,\n measures_func::Union{Function, Missing}=missing,\n kargs...)\n","category":"page"},{"location":"man/deep_learning/#JudiLing.predict_from_deep_model-Tuple{Chain, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Deep learning","title":"JudiLing.predict_from_deep_model","text":"predict_from_deep_model(model::Chain,\n X::Union{SparseMatrixCSC,Matrix})\n\nGenerates output of a model given input X.\n\nObligatory arguments\n\nmodel::Chain: Model of type Flux.Chain, as generated by get_and_train_model\nX::Union{SparseMatrixCSC,Matrix}: Input matrix of size (numberofsamples, inpdim) where inpdim is the input dimension of model\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.predict_shat-Tuple{Chain, Vector{Int64}}","page":"Deep learning","title":"JudiLing.predict_shat","text":"predict_shat(model::Chain,\n ci::Vector{Int})\n\nPredicts semantic vector shat given a deep learning comprehension model model and a list of indices of ngrams ci.\n\nObligatory arguments\n\nmodel::Chain: Deep learning comprehension model as generated by get_and_train_model\nci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.get_and_train_model-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{Missing, SparseArrays.SparseMatrixCSC, Matrix}, Union{Missing, SparseArrays.SparseMatrixCSC, Matrix}, Union{Missing, DataFrames.DataFrame}, Union{Missing, DataFrames.DataFrame}, Union{Missing, String, Symbol}, String}","page":"Deep learning","title":"JudiLing.get_and_train_model","text":"get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n X_val::Union{SparseMatrixCSC,Matrix,Missing},\n Y_val::Union{SparseMatrixCSC,Matrix,Missing},\n data_train::Union{DataFrame,Missing},\n data_val::Union{DataFrame,Missing},\n target_col::Union{Symbol,String,Missing},\n model_outpath::String;\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001)\n model::Union{Missing, Chain}=missing,\n early_stopping::Union{Missing, Int}=missing,\n optimise_for_acc::Bool=false\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n ...kargs\n )\n\nTrains a deep learning model from X_train to Y_train, saving the model with either the highest validation accuracy or lowest validation loss (depending on optimise_for_acc) to outpath.\n\nThe default model looks like this:\n\ninp_dim = size(X_train, 2)\nout_dim = size(Y_train, 2)\nChain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))\n\nAny other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.\n\nBy default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.\n\nReturns a named tuple with the following values:\n\nmodel: the trained model\ndata_train: the training data, including any measures if computed by measures_func\ndata_val: the validation data, including any measures if computed by measures_func\nlosses_train: The losses of the training data for each epoch.\nlosses_val: The losses of the validation data after each epoch.\naccs_train: The accuracies of the training data after each epoch, if return_train_acc=true.\naccs_val: The accuracies of the validation data after each epoch.\n\nObligatory arguments\n\nX_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n\nY_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k\nX_train::Union{SparseMatrixCSC,Matrix}: validation input matrix of dimension l x n\nY_train::Union{SparseMatrixCSC,Matrix}: validation output/target matrix of dimension l x k\ndata_train::DataFrame: training data\ndata_val::DataFrame: validation data\ntarget_col::Union{Symbol, String}: column with target wordforms in datatrain and dataval\nmodel_outpath::String: filepath to where final model should be stored (in .bson format)\n\nOptional arguments\n\nhidden_dim::Int=1000: hidden dimension of the model\nn_epochs::Int=100: number of epochs for which the model should be trained\nbatchsize::Int=64: batchsize during training\nloss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!\noptimizer=Flux.Adam(0.001): optimizer to use for training\nmodel::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data\nearly_stopping::Union{Missing, Int}=missing: If missing, no early stopping is used. Otherwise early_stopping indicates how many epochs have to pass without improvement in validation accuracy before the training is stopped.\noptimise_for_acc::Bool=false: if true, keep model with highest validation accuracy. If false, keep model with lowest validation loss.\nreturn_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned\nverbose::Bool=true: Turn on verbose mode\nmeasures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument. If a measure is tagged for each epoch, the one tagged with \"final\" will be the one for the finally returned model.\nreturn_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.\n...kargs: any additional keyword arguments are passed to the measures_func\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.get_and_train_model-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, String}","page":"Deep learning","title":"JudiLing.get_and_train_model","text":"get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n model_outpath::String;\n data_train::Union{Missing, DataFrame}=missing,\n target_col::Union{Missing, Symbol, String}=missing,\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n ...kargs)\n\nTrains a deep learning model from X_train to Y_train, saving the model after n_epochs epochs. The default model looks like this:\n\ninp_dim = size(X_train, 2)\nout_dim = size(Y_train, 2)\nChain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))\n\nAny other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.\n\nBy default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.\n\nReturns a named tuple with the following values:\n\nmodel: the trained model\ndata_train: the data, including any measures if computed by measures_func\ndata_val: missing for this function\nlosses_train: The losses of the training data for each epoch.\nlosses_val: missing for this function\naccs_train: The accuracies of the training data after each epoch, if return_train_acc=true.\naccs_val: missing for this function\n\nObligatory arguments\n\nX_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n\nY_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k\nmodel_outpath::String: filepath to where final model should be stored (in .bson format)\n\nOptional arguments\n\ndata_train::Union{Missing, DataFrame}=missing: The training data. Only necessary if a measuresfunc is included or returntrain_acc=true.\ntarget_col::Union{Missing, Symbol, String}=missing: The column with target word forms in the training data. Only necessary if a measuresfunc is included or returntrain_acc=true.\nhidden_dim::Int=1000: hidden dimension of the model\nn_epochs::Int=100: number of epochs for which the model should be trained\nbatchsize::Int=64: batchsize during training\nloss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!\noptimizer=Flux.Adam(0.001): optimizer to use for training\nmodel::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data\nreturn_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned\nverbose::Bool=true: Turn on verbose mode\nmeasures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument.\nreturn_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.\n...kargs: any additional keyword arguments are passed to the measures_func\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.fiddl-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Vector, DataFrames.DataFrame, Union{String, Symbol}, String}","page":"Deep learning","title":"JudiLing.fiddl","text":"fiddl(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n learn_seq::Vector,\n data::DataFrame,\n target_col::Union{Symbol, String},\n model_outpath::String;\n hidden_dim::Int=1000,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n n_batch_eval::Int=100,\n compute_accuracy::Bool=true,\n measures_func::Union{Function, Missing}=missing,\n kargs...)\n\nTrains a deep learning model using the FIDDL method (frequency-informed deep discriminative learning). Optionally, after each n_batch_eval batches measures_func can be run to compute any measures which are then added to the data.\n\nnote: Note\nIf you get an OutOfMemory error, chances are that this is due to the eval_SC function being evaluated after each n_batch_eval batches. Setting compute_accuracy=false disables computing the mapping accuracy.\n\nReturns a named tuple with the following values:\n\nmodel: the trained model\ndata: the data, including any measures if computed by measures_func\nlosses_train: The losses of the data the model is trained on within each n_batch_eval batches.\nlosses: The losses of the full dataset after each n_batch_eval batches.\naccs: The accuracies of the full dataset after each n_batch_eval batches.\n\nObligatory arguments\n\nX_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n\nY_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k\nlearn_seq::Vector: List of indices in the order that the vectors in Xtrain and Ytrain should be presented to the model for training.\ndata::DataFrame: The full data.\ntarget_col::Union{Symbol, String}: The column with target word forms in the data.\nmodel_outpath::String: filepath to where final model should be stored (in .bson format)\n\nOptional arguments\n\nhidden_dim::Int=1000: hidden dimension of the model\nn_epochs::Int=100: number of epochs for which the model should be trained\nbatchsize::Int=64: batchsize during training\nloss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!\noptimizer=Flux.Adam(0.001): optimizer to use for training\nmodel::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data\nreturn_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned\nverbose::Bool=true: Turn on verbose mode\nn_batch_eval::Int=100: Loss, accuracy and measures_func are evaluated every n_batch_eval batches.\ncompute_accuracy::Bool=true: Whether accuracy should be computed every n_batch_eval batches.\nmeasures_func::Union{Missing, Function}=missing: A measures function which is run each n_batch_eval batches. For more information see The measures_func argument.\n\n\n\n\n\n","category":"method"},{"location":"man/pickle/","page":"Pickle","title":"Pickle","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/pickle/#Utils","page":"Pickle","title":"Utils","text":"","category":"section"},{"location":"man/pickle/","page":"Pickle","title":"Pickle","text":" save_pickle\n load_pickle","category":"page"},{"location":"man/pickle/#JudiLing.save_pickle","page":"Pickle","title":"JudiLing.save_pickle","text":"Save pickle from python pickle file.\n\n\n\n\n\n","category":"function"},{"location":"man/pickle/#JudiLing.load_pickle","page":"Pickle","title":"JudiLing.load_pickle","text":"Load pickle from python pickle file.\n\n\n\n\n\n","category":"function"},{"location":"man/measures_func/#The-measures_func-argument","page":"Measures function","title":"The measures_func argument","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"The deep learning functions get_and_train_model and fiddl take a measures_func as one of their arguments. This helps computing measures during the training. For this to work, the measures_func has to conform to the following format.","category":"page"},{"location":"man/measures_func/#For-get_and_train_model","page":"Measures function","title":"For get_and_train_model","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"data_train, data_val = measures_func(X_train,\n Y_train,\n X_val,\n Y_val,\n Yhat_train,\n Yhat_val,\n data_train,\n data_val,\n target_col,\n model,\n epoch;\n kargs...)\n\n## Input\n\n- `X_train`: The input training matrix.\n- `Y_train`: The target training matrix\n- `X_val`: The input validation matrix.\n- `Y_val`: The target validation matrix.\n- `Yhat_train`: The predicted training matrix.\n- `Yhat_val`: The predicted validation matrix.\n- `data_train`: The training dataset.\n- `data_val`: The validation dataset.\n- `target_col`: The name of the column with the target wordforms in the datasets.\n- `model`: The trained model.\n- `epoch`: The epoch the training is currently in.\n- `kargs...`: Any other keyword arguments that should be passed to the function.\n\nNote: the `kargs` are just keyword arguments that are passed on from the parameters of `get_and_train_model` to the `measures_func`. For example, this could be a suffix that should be added to each added column in `measures_func`.\n\n## Output\nThe function has to return the training and validation dataframes.","category":"page"},{"location":"man/measures_func/#Example","page":"Measures function","title":"Example","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"Define a measures_func. This one computes target correlations for both training and validation datasets.","category":"page"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"function compute_target_corr(X_train, Y_train, X_val, Y_val,\n Yhat_train, Yhat_val, data_train,\n data_val, target_col, model, epoch)\n _, corr = JudiLing.eval_SC(Yhat_train, Y_train, R=true)\n data_train[!, string(\"target_corr_\", epoch)] = diag(corr)\n _, corr = JudiLing.eval_SC(Yhat_val, Y_val, R=true)\n data_val[!, string(\"target_corr_\", epoch)] = diag(corr)\n return(data_train, data_val)\nend","category":"page"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"Train a model for 100 epochs, call compute_target_corr after each epoch.","category":"page"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"res = JudiLing.get_and_train_model(cue_obj_train.C,\n S_train,\n cue_obj_val.C,\n S_val,\n train, val,\n :Word,\n \"test.bson\",\n return_losses=true,\n batchsize=3,\n measures_func=compute_target_corr)\n","category":"page"},{"location":"man/measures_func/#For-fiddl","page":"Measures function","title":"For fiddl","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"data = measures_func(X_train,\n Y_train,\n Yhat_train,\n data,\n target_col,\n model,\n step;\n kargs...)\n\n## Input\n\n- `X_train`: The input matrix of the full dataset.\n- `Y_train`: The target matrix of the full dataset.\n- `Yhat_train`: The predicted matrix of the full dataset at current step.\n- `data_train`: The full dataset.\n- `target_col`: The name of the column with the target wordforms in the dataset.\n- `model`: The trained model.\n- `step`: The step the training is currently in.\n- `kargs...`: Any other keyword arguments that should be passed to the function.\n\nNote: the `kargs` are just keyword arguments that are passed on from the parameters of `get_and_train_model` to the `measures_func`. For example, this could be a suffix that should be added to each added column in `measures_func`.\n\n## Output\nThe function has to return the dataset.","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"JudiLing is able to call the python package pyndl internally to compute NDL models. pyndl uses event files to compute the mapping matrices, which have to be generated manually or by using pyndl in Python, see documentation here. The advantage of calling pyndl from JudiLing is that the resulting weights, cue and semantic matrices can be directly translated into JudiLing format and further processing can be done in JudiLing.","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"note: Note\nFor pyndl to be available in JudiLing, PyCall has to be imported before JudiLing:using PyCall\nusing JudiLing","category":"page"},{"location":"man/pyndl/#Calling-pyndl-from-JudiLing","page":"Pyndl","title":"Calling pyndl from JudiLing","text":"","category":"section"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":" Pyndl_Weight_Struct\n pyndl(\n data_path::String;\n alpha::Float64 = 0.1,\n betas::Tuple{Float64,Float64} = (0.1, 0.1),\n method::String = \"openmp\"\n )","category":"page"},{"location":"man/pyndl/#JudiLing.Pyndl_Weight_Struct","page":"Pyndl","title":"JudiLing.Pyndl_Weight_Struct","text":"Pyndl_Weight_Struct\n cues::Vector{String}\n outcomes::Vector{String}\n weight::Matrix{Float64}\n\ncues::Vector{String}: Vector of cues, in the order that they appear in the weight matrix.\noutcomes::Vector{String}: Vector of outcomes, in the order that they appear in the weight matrix.\nweight::Matrix{Float64}: Weight matrix.\n\n\n\n\n\n","category":"type"},{"location":"man/pyndl/#JudiLing.pyndl-Tuple{String}","page":"Pyndl","title":"JudiLing.pyndl","text":"pyndl(\n data_path::String;\n alpha::Float64 = 0.1,\n betas::Tuple{Float64,Float64} = (0.1, 0.1),\n method::String = \"openmp\"\n)\n\nCompute weights using pyndl. See the documentation of pyndl for more information: https://pyndl.readthedocs.io/en/latest/\n\nObligatory arguments\n\ndata_path::String: Path to an events file as generated by pyndl's preprocess.createeventfile\n\nOptional arguments\n\nalpha::Float64 = 0.1: α learning rate.\nbetas::Tuple{Float64,Float64} = (0.1, 0.1): β1 and β2 learning rates\nmethod::String = \"openmp\": One of {\"openmp\", \"threading\"}. \"openmp\" only works on Linux.\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\n\n\n\n\n\n","category":"method"},{"location":"man/pyndl/#Translating-output-of-pyndl-to-cue-and-semantic-matrices-in-JudiLing","page":"Pyndl","title":"Translating output of pyndl to cue and semantic matrices in JudiLing","text":"","category":"section"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"With the weights in hand, the cue and semantic matrices can be computed:","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":" make_cue_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct;\n grams = 3,\n target_col = \"Words\",\n tokenized = false,\n sep_token = nothing,\n keep_sep = false,\n start_end_token = \"#\",\n verbose = false,\n )\n make_S_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n )\n make_S_matrix(\n data_train::DataFrame,\n data_val::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n )","category":"page"},{"location":"man/pyndl/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame, JudiLing.Pyndl_Weight_Struct}","page":"Pyndl","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct;\n grams = 3,\n target_col = \"Words\",\n tokenized = false,\n sep_token = nothing,\n keep_sep = false,\n start_end_token = \"#\",\n verbose = false,\n)\n\nMake the cue matrix based on a dataframe and weights computed with pyndl. Practically this means that the cues are extracted from the weights object and translated to the JudiLing format.\n\nObligatory arguments\n\ndata::DataFrame: Dataset with all the word types on which the weights were trained.\npyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl\n\nOptional argyments\n\ngrams = 3: N-gram size (has to match the n-gram granularity of the cues on which the weights were trained).\ntarget_col = \"Words\": Column with target words.\ntokenized = false: Whether the target words are already tokenized\nsep_token = nothing: The string separating the tokens (only used if tokenized=true).\nkeep_sep = false: Whether the sep_token should be retained in the cues.\nstart_end_token = \"#\": The string with which to mark word boundaries.\nverbose = false: Verbose mode.\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\ncue_obj = JudiLing.make_cue_matrix(\"latin_train.csv\", weights,\n grams = 3,\n target_col = \"Word\")\n\n\n\n\n\n","category":"method"},{"location":"man/pyndl/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, JudiLing.Pyndl_Weight_Struct, Vector}","page":"Pyndl","title":"JudiLing.make_S_matrix","text":"make_S_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n)\n\nCreate semantic matrix based on a dataframe and weights computed with pyndl. Practically this means that the semantic features are extracted from the weights object and translated to the JudiLing format.\n\nObligatory arguments\n\ndata::DataFrame: The dataset with word types.\npyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl.\nn_features_columns::Vector: Vector of columns with the features in the dataset.\n\nOptional arguments\n\ntokenized=false: Whether the features in n_features_columns columns are already tokenized (e.g. \"feature1_feature2_feature3\")\nsep_token=\"_\": The string with which the features are separated (only used if tokenized=false).\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\nS = JudiLing.make_S_matrix(data,\n weights_latin,\n [\"Lexeme\", \"Person\", \"Number\", \"Tense\", \"Voice\", \"Mood\"],\n tokenized=false)\n\n\n\n\n\n","category":"method"},{"location":"man/pyndl/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, JudiLing.Pyndl_Weight_Struct, Vector}","page":"Pyndl","title":"JudiLing.make_S_matrix","text":"make_S_matrix(\n data_train::DataFrame,\n data_val::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n)\n\nCreate semantic matrix based on a training and validation dataframe and weights computed with pyndl. Practically this means that the semantic features are extracted from the weights object and translated to the JudiLing format.\n\nObligatory arguments\n\ndata_train::DataFrame: The training dataset.\ndata_val::DataFrame: The validation dataset.\npyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl.\nn_features_columns::Vector: Vector of columns with the features in the training and validation datasets.\n\nOptional arguments\n\ntokenized=false: Whether the features in n_features_columns columns are already tokenized (e.g. \"feature1_feature2_feature3\")\nsep_token=\"_\": The string with which the features are separated (only used if tokenized=false).\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\nS_train, S_val = JudiLing.make_S_matrix(train,\n val,\n weights_latin,\n [\"Lexeme\", \"Person\", \"Number\", \"Tense\", \"Voice\", \"Mood\"],\n tokenized=false)\n\n\n\n\n\n","category":"method"},{"location":"man/wh/","page":"Widrow-Hoff Learning","title":"Widrow-Hoff Learning","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/wh/#Utils","page":"Widrow-Hoff Learning","title":"Utils","text":"","category":"section"},{"location":"man/wh/","page":"Widrow-Hoff Learning","title":"Widrow-Hoff Learning","text":" wh_learn(\n X,\n Y;\n eta = 0.01,\n n_epochs = 1,\n weights = nothing,\n learn_seq = nothing,\n save_history = false,\n history_cols = nothing,\n history_rows = nothing,\n verbose = false,\n )\n make_learn_seq(freq; random_seed = 314)","category":"page"},{"location":"man/wh/#JudiLing.wh_learn-Tuple{Any, Any}","page":"Widrow-Hoff Learning","title":"JudiLing.wh_learn","text":"wh_learn(\n X,\n Y;\n eta = 0.01,\n n_epochs = 1,\n weights = nothing,\n learn_seq = nothing,\n save_history = false,\n history_cols = nothing,\n history_rows = nothing,\n verbose = false,\n )\n\nWidrow-Hoff Learning.\n\nObligatory Arguments\n\ntest_mode::Symbol: which test mode, currently supports :trainonly, :presplit, :carefulsplit and :randomsplit.\n\nOptional Arguments\n\neta::Float64=0.1: the learning rate\nn_epochs::Int64=1: the number of epochs to be trained\nweights::Matrix=nothing: the initial weights\nlearn_seq::Vector=nothing: the learning sequence\nsave_history::Bool=false: if true, a partical training history will be saved\nhistory_cols::Vector=nothing: the list of column indices you want to saved in history, e.g. [1,32,42] or [2]\nhistory_rows::Vector=nothing: the list of row indices you want to saved in history, e.g. [1,32,42] or [2]\nverbose::Bool = false: if true, more information will be printed out\n\n\n\n\n\n","category":"method"},{"location":"man/wh/#JudiLing.make_learn_seq-Tuple{Any}","page":"Widrow-Hoff Learning","title":"JudiLing.make_learn_seq","text":"make_learn_seq(freq; random_seed = 314)\n\nMake Widrow-Hoff learning sequence from frequencies. Creates a randomly ordered sequences of indices where each index appears according to its frequncy.\n\nObligatory arguments\n\nfreq: Vector with frequencies.\n\nOptional arguments\n\nrandom_seed = 314: Random seed to control randomness.\n\nExample\n\nlearn_seq = JudiLing.make_learn_seq(data.frequency)\n\n\n\n\n\n","category":"method"},{"location":"man/utils/","page":"Utils","title":"Utils","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/utils/#Utils","page":"Utils","title":"Utils","text":"","category":"section"},{"location":"man/utils/","page":"Utils","title":"Utils","text":" iscorrect\n display_pred\n translate\n translate_path\n is_truly_sparse\n isattachable\n iscomplete\n isstart\n isnovel\n check_used_token\n cal_max_timestep","category":"page"},{"location":"man/utils/#JudiLing.iscorrect","page":"Utils","title":"JudiLing.iscorrect","text":"Check whether the predictions are correct.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.display_pred","page":"Utils","title":"JudiLing.display_pred","text":"Display prediction nicely.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.translate","page":"Utils","title":"JudiLing.translate","text":"Translate indices into words or utterances\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.translate_path","page":"Utils","title":"JudiLing.translate_path","text":"Append indices together to form a path\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.is_truly_sparse","page":"Utils","title":"JudiLing.is_truly_sparse","text":"Check whether a matrix is truly sparse regardless its format, where M is originally a sparse matrix format.\n\n\n\n\n\nCheck whether a matrix is truly sparse regardless its format, where M is originally a dense matrix format.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.isattachable","page":"Utils","title":"JudiLing.isattachable","text":"Check whether a gram can attach to another gram.\n\n\n\n\n\nCheck whether a gram can attach to another gram.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.iscomplete","page":"Utils","title":"JudiLing.iscomplete","text":"Check whether a path is complete.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.isstart","page":"Utils","title":"JudiLing.isstart","text":"Check whether a gram can start a path.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.isnovel","page":"Utils","title":"JudiLing.isnovel","text":"Check whether a predicted path is in training data.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.check_used_token","page":"Utils","title":"JudiLing.check_used_token","text":"Check whether there are tokens already used in dataset as n-gram components.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.cal_max_timestep","page":"Utils","title":"JudiLing.cal_max_timestep","text":"function cal_max_timestep(\n data_train::DataFrame,\n data_val::DataFrame,\n target_col::Union{String, Symbol};\n tokenized::Bool = false,\n sep_token::Union{Nothing, String, Char} = \"\",\n)\n\nCalculate the max timestep given training and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\ntarget_col::Union{String, Symbol}: the column with the target word forms\n\nOptional Arguments\n\ntokenized::Bool = false: Whether the word forms in the target_col are already tokenized\nsep_token::Union{Nothing, String, Char} = \"\": The token with which the word forms are tokenized\n\nExamples\n\nJudiLing.cal_max_timestep(latin_train, latin_val, target_col=:Word)\n\n\n\n\n\nfunction cal_max_timestep(\n data::DataFrame,\n target_col::Union{String, Symbol};\n tokenized::Bool = false,\n sep_token::Union{Nothing, String, Char} = \"\",\n)\n\nCalculate the max timestep given training dataset.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\ntarget_col::Union{String, Symbol}: the column with the target word forms\n\nOptional Arguments\n\ntokenized::Bool = false: Whether the word forms in the target_col are already tokenized\nsep_token::Union{Nothing, String, Char} = \"\": The token with which the word forms are tokenized\n\nExamples\n\nJudiLing.cal_max_timestep(latin, target_col=:Word)\n\n\n\n\n\n","category":"function"},{"location":"man/make_adjacency_matrix/","page":"Make Adjacency Matrix","title":"Make Adjacency Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_adjacency_matrix/#Make-Adjacency-Matrix","page":"Make Adjacency Matrix","title":"Make Adjacency Matrix","text":"","category":"section"},{"location":"man/make_adjacency_matrix/","page":"Make Adjacency Matrix","title":"Make Adjacency Matrix","text":" make_full_adjacency_matrix\n make_full_adjacency_matrix(i2f)\n make_combined_adjacency_matrix(data_train, data_val)","category":"page"},{"location":"man/make_adjacency_matrix/#JudiLing.make_full_adjacency_matrix","page":"Make Adjacency Matrix","title":"JudiLing.make_full_adjacency_matrix","text":"make_adjacency_matrix(i2f)\n\nMake full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.\n\nObligatory Arguments\n\ni2f::Dict: the dictionary returning features given indices\n\nOptional Arguments\n\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nverbose::Bool=false: if true, more information will be printed\n\nExamples\n\n# without tokenization\ni2f = Dict([(1, \"#ab\"), (2, \"abc\"), (3, \"bc#\"), (4, \"#bc\"), (5, \"ab#\")])\nJudiLing.make_adjacency_matrix(i2f)\n\n# with tokenization\ni2f = Dict([(1, \"#-a-b\"), (2, \"a-b-c\"), (3, \"b-c-#\"), (4, \"#-b-c\"), (5, \"a-b-#\")])\nJudiLing.make_adjacency_matrix(\n i2f,\n tokenized=true,\n sep_token=\"-\")\n\n\n\n\n\n","category":"function"},{"location":"man/make_adjacency_matrix/#JudiLing.make_full_adjacency_matrix-Tuple{Any}","page":"Make Adjacency Matrix","title":"JudiLing.make_full_adjacency_matrix","text":"make_adjacency_matrix(i2f)\n\nMake full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.\n\nObligatory Arguments\n\ni2f::Dict: the dictionary returning features given indices\n\nOptional Arguments\n\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nverbose::Bool=false: if true, more information will be printed\n\nExamples\n\n# without tokenization\ni2f = Dict([(1, \"#ab\"), (2, \"abc\"), (3, \"bc#\"), (4, \"#bc\"), (5, \"ab#\")])\nJudiLing.make_adjacency_matrix(i2f)\n\n# with tokenization\ni2f = Dict([(1, \"#-a-b\"), (2, \"a-b-c\"), (3, \"b-c-#\"), (4, \"#-b-c\"), (5, \"a-b-#\")])\nJudiLing.make_adjacency_matrix(\n i2f,\n tokenized=true,\n sep_token=\"-\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_adjacency_matrix/#JudiLing.make_combined_adjacency_matrix-Tuple{Any, Any}","page":"Make Adjacency Matrix","title":"JudiLing.make_combined_adjacency_matrix","text":"make_combined_adjacency_matrix(data_train, data_val)\n\nMake combined adjacency matrix.\n\nObligatory Arguments\n\ndata_train::DataFrame: training dataset\ndata_val::DataFrame: validation dataset\n\nOptional Arguments\n\ngrams=3: the number of grams for cues\ntarget_col=:Words: the column name for target strings\ntokenized=false:if true, the dataset target is assumed to be tokenized\nsep_token=nothing: separator\nkeep_sep=false: if true, keep separators in cues\nstart_end_token=\"#\": start and end token in boundary cues\nverbose=false: if true, more information is printed\n\nExamples\n\nJudiLing.make_combined_adjacency_matrix(\n latin_train,\n latin_val,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/","page":"Cholesky","title":"Cholesky","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/cholesky/#Cholesky","page":"Cholesky","title":"Cholesky","text":"","category":"section"},{"location":"man/cholesky/","page":"Cholesky","title":"Cholesky","text":" make_transform_fac\n make_transform_matrix\n make_transform_fac(X::SparseMatrixCSC)\n make_transform_fac(X::Matrix)\n make_transform_matrix(fac::Union{LinearAlgebra.Cholesky, SuiteSparse.CHOLMOD.Factor}, X::Union{SparseMatrixCSC, Matrix}, Y::Union{SparseMatrixCSC, Matrix})\n make_transform_matrix(X::SparseMatrixCSC, Y::Matrix)\n make_transform_matrix(X::Matrix, Y::Union{SparseMatrixCSC, Matrix})\n make_transform_matrix(X::SparseMatrixCSC, Y::SparseMatrixCSC)\n format_matrix(M::Union{SparseMatrixCSC, Matrix}, output_format=:auto)","category":"page"},{"location":"man/cholesky/#JudiLing.make_transform_fac","page":"Cholesky","title":"JudiLing.make_transform_fac","text":"The first part of make transform matrix, usually used by the learn_paths function to save time and computing resources.\n\n\n\n\n\n","category":"function"},{"location":"man/cholesky/#JudiLing.make_transform_matrix","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"Using Cholesky decomposition to calculate the transformation matrix from S to C or from C to S.\n\n\n\n\n\n","category":"function"},{"location":"man/cholesky/#JudiLing.make_transform_fac-Tuple{SparseArrays.SparseMatrixCSC}","page":"Cholesky","title":"JudiLing.make_transform_fac","text":"make_transform_fac(X::SparseMatrixCSC)\n\nCalculate the first step of Cholesky decomposition for sparse matrices.\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_fac-Tuple{Matrix}","page":"Cholesky","title":"JudiLing.make_transform_fac","text":"make_transform_fac(X::Matrix)\n\nCalculate the first step of Cholesky decomposition for dense matrices.\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{Union{SparseArrays.CHOLMOD.Factor, LinearAlgebra.Cholesky}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(fac::Union{LinearAlgebra.Cholesky, SuiteSparse.CHOLMOD.Factor}, X::Union{SparseMatrixCSC, Matrix}, Y::Union{SparseMatrixCSC, Matrix})\n\nSecond step in calculating the Cholesky decomposition for the transformation matrix.\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{SparseArrays.SparseMatrixCSC, Matrix}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(X::SparseMatrixCSC, Y::Matrix)\n\nUse Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a dense matrix.\n\nObligatory Arguments\n\nX::SparseMatrixCSC: the X matrix, where X is a sparse matrix\nY::Matrix: the Y matrix, where Y is a dense matrix\n\nOptional Arguments\n\nmethod::Symbol = :additive: whether :additive or :multiplicative decomposition is required\nshift::Float64 = 0.02: shift value for :additive decomposition\nmultiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition\noutput_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse\nverbose::Bool = false: if true, more information will be printed out\n\nExamples\n\n# additive mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :additive,\n shift = 0.02,\n verbose = false)\n\n# multiplicative mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :multiplicative,\n multiplier = 1.01,\n verbose = false)\n\n# further control of sparsity ratio\nJudiLing.make_transform_matrix(\n ...\n output_format = :auto,\n sparse_ratio = 0.05,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{Matrix, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(X::Matrix, Y::Union{SparseMatrixCSC, Matrix})\n\nUse the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a dense matrix and Y is either a dense matrix or a sparse matrix.\n\nObligatory Arguments\n\nX::Matrix: the X matrix, where X is a dense matrix\nY::Union{SparseMatrixCSC, Matrix}: the Y matrix, where Y is either a sparse or a dense matrix\n\nOptional Arguments\n\nmethod::Symbol = :additive: whether :additive or :multiplicative decomposition is required\nshift::Float64 = 0.02: shift value for :additive decomposition\nmultiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition\noutput_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse\nverbose::Bool = false: if true, more information will be printed out\n\nExamples\n\n# additive mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :additive,\n shift = 0.02,\n verbose = false)\n\n# multiplicative mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method=:multiplicative,\n multiplier = 1.01,\n verbose = false)\n\n# further control of sparsity ratio\nJudiLing.make_transform_matrix(\n ...\n output_format = :auto,\n sparse_ratio = 0.05,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{SparseArrays.SparseMatrixCSC, SparseArrays.SparseMatrixCSC}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(X::SparseMatrixCSC, Y::SparseMatrixCSC)\n\nUse the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a sparse matrix.\n\nObligatory Arguments\n\nX::SparseMatrixCSC: the X matrix, where X is a sparse matrix\nY::SparseMatrixCSC: the Y matrix, where Y is a sparse matrix\n\nOptional Arguments\n\nmethod::Symbol = :additive: whether :additive or :multiplicative decomposition is required\nshift::Float64 = 0.02: shift value for :additive decomposition\nmultiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition\noutput_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse\nverbose::Bool = false: if true, more information will be printed out\n\nExamples\n\n# additive mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :additive,\n shift = 0.02,\n verbose = false)\n\n# multiplicative mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :multiplicative,\n multiplier = 1.01,\n verbose = false)\n\n# further control of sparsity ratio\nJudiLing.make_transform_matrix(\n ...\n output_format = :auto,\n sparse_ratio = 0.05,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.format_matrix","page":"Cholesky","title":"JudiLing.format_matrix","text":"format_matrix(M::Union{SparseMatrixCSC, Matrix}, output_format=:auto)\n\nConvert output matrix format to either a dense matrix or a sparse matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_semantic_matrix/#Make-Semantic-Matrix","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":"","category":"section"},{"location":"man/make_semantic_matrix/#Make-binary-semantic-vectors","page":"Make Semantic Matrix","title":"Make binary semantic vectors","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":" PS_Matrix_Struct\n make_pS_matrix\n make_pS_matrix(data)\n make_pS_matrix(data_val, pS_obj)\n make_combined_pS_matrix(\n data_train,\n data_val;\n features_col = :CommunicativeIntention,\n sep_token = \"_\",\n )","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.PS_Matrix_Struct","page":"Make Semantic Matrix","title":"JudiLing.PS_Matrix_Struct","text":"A structure that stores the discrete semantic vectors: pS is the discrete semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.\n\n\n\n\n\n","category":"type"},{"location":"man/make_semantic_matrix/#JudiLing.make_pS_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_pS_matrix","text":"Make discrete semantic matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_pS_matrix-Tuple{Any}","page":"Make Semantic Matrix","title":"JudiLing.make_pS_matrix","text":"make_pS_matrix(data)\n\nCreate a discrete semantic matrix given a dataframe.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\n\nOptional Arguments\n\nfeatures_col::Symbol=:CommunicativeIntention: the column name for target\nsep_token::String=\"_\": separator\n\nExamples\n\ns_obj_train = JudiLing.make_pS_matrix(\n utterance,\n features_col=:CommunicativeIntention,\n sep_token=\"_\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_pS_matrix-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_pS_matrix","text":"make_pS_matrix(data_val, pS_obj)\n\nConstruct discrete semantic matrix for the validation datasets given by the exemplar in the dataframe, and given the S matrix for the training datasets.\n\nObligatory Arguments\n\ndata_val::DataFrame: the dataset\npS_obj::PS_Matrix_Struct: training PS object\n\nOptional Arguments\n\nfeatures_col::Symbol=:CommunicativeIntention: the column name for target\nsep_token::String=\"_\": separator\n\nExamples\n\ns_obj_val = JudiLing.make_pS_matrix(\n data_val,\n s_obj_train,\n features_col=:CommunicativeIntention,\n sep_token=\"_\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_pS_matrix-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_pS_matrix","text":"make_combined_pS_matrix(\n data_train,\n data_val;\n features_col = :CommunicativeIntention,\n sep_token = \"_\",\n)\n\nCreate discrete semantic matrices for a train and validation dataframe.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\n\nOptional Arguments\n\nfeatures_col::Symbol=:CommunicativeIntention: the column name for target\nsep_token::String=\"_\": separator\n\nExamples\n\ns_obj_train, s_obj_val = JudiLing.make_combined_pS_matrix(\n data_train,\n data_val,\n features_col=:CommunicativeIntention,\n sep_token=\"_\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#Simulate-semantic-vectors","page":"Make Semantic Matrix","title":"Simulate semantic vectors","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":" L_Matrix_Struct\n make_S_matrix\n make_L_matrix\n make_combined_S_matrix\n make_combined_L_matrix\n make_S_matrix(data::DataFrame, base::Vector, inflections::Vector)\n make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n make_S_matrix(data::DataFrame, base::Vector)\n make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n make_S_matrix(data_train::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n make_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n make_S_matrix(data::DataFrame, base::Vector, L::L_Matrix_Struct)\n make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n make_L_matrix(data::DataFrame, base::Vector)\n make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n make_combined_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n make_combined_S_matrix( data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n L_Matrix_Struct(L, sd_base, sd_base_mean, sd_inflection, sd_inflection_mean, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)\n L_Matrix_Struct(L, sd_base, sd_inflection, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.L_Matrix_Struct","page":"Make Semantic Matrix","title":"JudiLing.L_Matrix_Struct","text":"A structure that stores Lexome semantic vectors: L is Lexome semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.\n\n\n\n\n\n","category":"type"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"Make simulated semantic matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_L_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_L_matrix","text":"Make simulated lexome matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"Make combined simulated S matrices, where combined features from both training datasets and validation datasets\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_L_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_combined_L_matrix","text":"Make combined simulated Lexome matrix, where combined features from both training datasets and validation datasets\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data::DataFrame, base::Vector, inflections::Vector)\n\nCreate simulated semantic matrix for the training datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train = JudiLing.make_S_matrix(\n french,\n [\"Lexeme\"],\n [\"Tense\",\"Aspect\",\"Person\",\"Number\",\"Gender\",\"Class\",\"Mood\"],\n ncol=200)\n\n# deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n\nCreate simulated semantic matrix for the validation datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_S_matrix(\n french,\n french_val,\n [\"Lexeme\"],\n [\"Tense\",\"Aspect\",\"Person\",\"Number\",\"Gender\",\"Class\",\"Mood\"],\n ncol=200)\n\n# deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data::DataFrame, base::Vector)\n\nCreate simulated semantic matrix for the training datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_base::Int64=4: the sd of base features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train = JudiLing.make_S_matrix(\n french,\n [\"Lexeme\"],\n ncol=200)\n\n# deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n\nCreate simulated semantic matrix for the validation datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_base::Int64=4: the sd of base features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_S_matrix(\n french,\n french_val,\n [\"Lexeme\"],\n ncol=200)\n\n# deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1 = JudiLing.make_S_matrix(\n latin,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Union{Nothing, DataFrames.DataFrame}, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1, S2 = JudiLing.make_S_matrix(\n latin,\n latin_val,\n [\"Lexeme\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data::DataFrame, base::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1 = JudiLing.make_S_matrix(\n latin,\n [\"Lexeme\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1, S2 = JudiLing.make_S_matrix(\n latin,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_L_matrix-Tuple{DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_L_matrix","text":"make_L_matrix(data::DataFrame, base::Vector)\n\nCreate Lexome Matrix with simulated semantic vectors where there are only base features.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_base::Int64=4: the sd of base features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\n\nExamples\n\n# basic usage\nL = JudiLing.make_L_matrix(\n latin,\n [\"Lexeme\"],\n ncol=200)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\nL::L_Matrix_Struct: the Lexome Matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, Union{Nothing, DataFrames.DataFrame}, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\nL::L_Matrix_Struct: the Lexome Matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix( data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n\nCreate simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n\nCreate simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_L_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_L_matrix","text":"make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n\nCreate Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\n\nExamples\n\n# basic usage\nL = JudiLing.make_combined_L_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_L_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_L_matrix","text":"make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n\nCreate Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\n\nExamples\n\n# basic usage\nL = JudiLing.make_combined_L_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.L_Matrix_Struct-NTuple{12, Any}","page":"Make Semantic Matrix","title":"JudiLing.L_Matrix_Struct","text":"L_Matrix_Struct(L, sd_base, sd_base_mean, sd_inflection, sd_inflection_mean, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)\n\nConstruct LMatrixStruct with deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.L_Matrix_Struct-NTuple{10, Any}","page":"Make Semantic Matrix","title":"JudiLing.L_Matrix_Struct","text":"L_Matrix_Struct(L, sd_base, sd_inflection, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)\n\nConstruct LMatrixStruct without deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#Load-from-word2vec,-fasttext-or-similar","page":"Make Semantic Matrix","title":"Load from word2vec, fasttext or similar","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":"load_S_matrix_from_fasttext(data::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n load_S_matrix_from_fasttext(data_train::DataFrame,\n data_val::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n load_S_matrix_from_word2vec_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n load_S_matrix_from_word2vec_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)\n load_S_matrix_from_fasttext_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n load_S_matrix_from_fasttext_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext-Tuple{DataFrames.DataFrame, Symbol}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext","text":"load_S_matrix_from_fasttext(data::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n\nLoad semantic matrix from fasttext, loaded using the Embeddings.jl package. Subset fasttext vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available.\n\nThe last parameter, default_file, specifies which vectors are loaded. To learn about all available vectors, use the following commands:\n\nusing Embeddings\nlanguage_files(FastText_Text{:nl})\n\nreplacing the language code (here :nl) with the language you are interested in. In general, for all languages other than English, these files are available:\n\ndefault_file=1 loads from https://fasttext.cc/docs/en/crawl-vectors.html, paper: E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\ndefault_file=2 loads from https://fasttext.cc/docs/en/pretrained-vectors.html paper: P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nlanguage::Symbol: the language of the words in the dataset, offically ISO 639-2 (see https://github.com/JuliaText/Embeddings.jl/issues/34#issuecomment-782604523) but practically it seems more like ISO 639-1 to me with ISO 639-2 only being used if ISO 639-1 isn't available (see https://en.wikipedia.org/wiki/ListofISO639-2codes)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\ndefault_file::Int=1: source of vectors, for more information see above and here: https://github.com/JuliaText/Embeddings.jl#loading-different-embeddings\n\nExamples\n\n# basic usage\nlatin_small, S = JudiLing.load_S_matrix_from_fasttext(latin, :la, target_col=:Word)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Symbol}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext","text":"load_S_matrix_from_fasttext(data_train::DataFrame,\n data_val::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n\nLoad semantic matrix from fasttext, loaded using the Embeddings.jl package. Subset fasttext vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.\n\nThe last parameter, default_file, specifies which vectors are loaded. To learn about all available vectors, use the following commands:\n\nusing Embeddings\nlanguage_files(FastText_Text{:nl})\n\nreplacing the language code (here :nl) with the language you are interested in. In general, for all languages other than English, these files are available:\n\ndefault_file=1 loads from https://fasttext.cc/docs/en/crawl-vectors.html, paper: E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\ndefault_file=2 loads from https://fasttext.cc/docs/en/pretrained-vectors.html paper: P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nlanguage::Symbol: the language of the words in the dataset, offically ISO 639-2 (see https://github.com/JuliaText/Embeddings.jl/issues/34#issuecomment-782604523) but practically it seems more like ISO 639-1 to me with ISO 639-2 only being used if ISO 639-1 isn't available (see https://en.wikipedia.org/wiki/ListofISO639-2codes)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\ndefault_file::Int=1: source of vectors, for more information see above and here: https://github.com/JuliaText/Embeddings.jl#loading-different-embeddings\n\nExamples\n\n# basic usage\nlatin_small_train, latin_small_val, S_train, S_val = JudiLing.load_S_matrix_from_fasttext(latin_train,\n latin_val,\n :la,\n target_col=:Word)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_word2vec_file-Tuple{DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_word2vec_file","text":"load_S_matrix_from_word2vec_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\nfilepath::String: path to file with word2vec vectors in .txt (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_word2vec_file-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_word2vec_file","text":"load_S_matrix_from_word2vec_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nfilepath::String: path to file with word2vec vectors in .txt (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext_file-Tuple{DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext_file","text":"load_S_matrix_from_fasttext_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\nfilepath::String: path to file with fasttext vectors in .txt or .vec (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext_file-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext_file","text":"load_S_matrix_from_fasttext_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nfilepath::String: path to file with fasttext vectors in .txt (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#Utility-functions","page":"Make Semantic Matrix","title":"Utility functions","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":" process_features(data, feature_cols)\n comp_f_M!(L, sd, sd_mean, n_f, ncol, n_b)\n comp_f_M!(L, sd, n_f, ncol, n_b)\n merge_f2i(base_f2i, infl_f2i, n_base_f, n_infl_f)\n lexome_sum(L, features)\n make_St(L, n, data, base, inflections)\n make_St(L, n, data, base)\n add_St_noise!(St, sd_noise)\n normalize_St!(St, n_base, n_infl)\n normalize_St!(St, n_base)","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.process_features-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.process_features","text":"process_features(data, feature_cols)\n\nCollect all features given datasets and feature column names.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.comp_f_M!-NTuple{6, Any}","page":"Make Semantic Matrix","title":"JudiLing.comp_f_M!","text":"comp_f_M!(L, sd, sd_mean, n_f, ncol, n_b)\n\nCompose feature Matrix with deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.comp_f_M!-NTuple{5, Any}","page":"Make Semantic Matrix","title":"JudiLing.comp_f_M!","text":"comp_f_M!(L, sd, n_f, ncol, n_b)\n\nCompose feature Matrix without deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.merge_f2i-NTuple{4, Any}","page":"Make Semantic Matrix","title":"JudiLing.merge_f2i","text":"merge_f2i(base_f2i, infl_f2i, n_base_f, n_infl_f)\n\nMerge base f2i dictionary and inflectional f2i dictionary.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.lexome_sum-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.lexome_sum","text":"lexome_sum(L, features)\n\nSum up semantic vector, given lexome vector.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_St-NTuple{5, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_St","text":"make_St(L, n, data, base, inflections)\n\nMake S transpose matrix with inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_St-NTuple{4, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_St","text":"make_St(L, n, data, base)\n\nMake S transpose matrix without inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.add_St_noise!-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.add_St_noise!","text":"add_St_noise!(St, sd_noise)\n\nAdd noise.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.normalize_St!-Tuple{Any, Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.normalize_St!","text":"normalize_St!(St, n_base, n_infl)\n\nNormalize S transpose with inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.normalize_St!-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.normalize_St!","text":"normalize_St!(St, n_base)\n\nNormalize S transpose without inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/","page":"Evaluation","title":"Evaluation","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/eval/#Evaluation","page":"Evaluation","title":"Evaluation","text":"","category":"section"},{"location":"man/eval/","page":"Evaluation","title":"Evaluation","text":" Comp_Acc_Struct\n eval_SC\n eval_SC_loose\n accuracy_comprehension(S, Shat, data)\n accuracy_comprehension(\n S_val,\n S_train,\n Shat_val,\n data_val,\n data_train;\n target_col = :Words,\n base = nothing,\n inflections = nothing,\n )\n eval_SC(SChat::AbstractArray, SC::AbstractArray)\n eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray)\n eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol})\n eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray, data::DataFrame, data_rest::DataFrame, target_col::Union{String, Symbol})\n eval_SC(SChat::AbstractArray, SC::AbstractArray, batch_size::Int64)\n eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol}, batch_size::Int64)\n eval_SC_loose(SChat, SC, k)\n eval_SC_loose(SChat, SC, k, data, target_col)\n eval_manual(res, data, i2f)\n eval_acc(res, gold_inds::Array)\n eval_acc(res, cue_obj::Cue_Matrix_Struct)\n eval_acc_loose(res, gold_inds)\n extract_gpi(gpi, threshold=0.1, tolerance=(-1000.0))","category":"page"},{"location":"man/eval/#JudiLing.Comp_Acc_Struct","page":"Evaluation","title":"JudiLing.Comp_Acc_Struct","text":"A structure that stores information about comprehension accuracy.\n\n\n\n\n\n","category":"type"},{"location":"man/eval/#JudiLing.eval_SC","page":"Evaluation","title":"JudiLing.eval_SC","text":"Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. Homophones support option is implemented.\n\n\n\n\n\n","category":"function"},{"location":"man/eval/#JudiLing.eval_SC_loose","page":"Evaluation","title":"JudiLing.eval_SC_loose","text":"Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Count it as correct if one of the top k candidates is correct. Homophones support option is implemented.\n\n\n\n\n\n","category":"function"},{"location":"man/eval/#JudiLing.accuracy_comprehension-Tuple{Any, Any, Any}","page":"Evaluation","title":"JudiLing.accuracy_comprehension","text":"accuracy_comprehension(S, Shat, data)\n\nEvaluate comprehension accuracy for training data.\n\nnote: Note\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.\n\nObligatory Arguments\n\nS::Matrix: the (gold standard) S matrix\nShat::Matrix: the (predicted) Shat matrix\ndata::DataFrame: the dataset\n\nOptional Arguments\n\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\nbase::Vector=nothing: base features (typically a lexeme)\ninflections::Union{Nothing, Vector}=nothing: other features (typically in inflectional features)\n\nExamples\n\naccuracy_comprehension(\n S_train,\n Shat_train,\n latin_val,\n target_col=:Words,\n base=[:Lexeme],\n inflections=[:Person, :Number, :Tense, :Voice, :Mood]\n )\n\nNote\n\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform \"Äpfel\" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which \"Äpfel\" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform \"Äpfel\" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form \"Äpfel\" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that \"case\" was comprehended incorrectly.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.accuracy_comprehension-NTuple{5, Any}","page":"Evaluation","title":"JudiLing.accuracy_comprehension","text":"accuracy_comprehension(\n S_val,\n S_train,\n Shat_val,\n data_val,\n data_train;\n target_col = :Words,\n base = nothing,\n inflections = nothing,\n)\n\nEvaluate comprehension accuracy for validation data.\n\nnote: Note\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.\n\nObligatory Arguments\n\nS_val::Matrix: the (gold standard) S matrix of the validation data\nS_train::Matrix: the (gold standard) S matrix of the training data\nShat_val::Matrix: the (predicted) Shat matrix of the validation data\ndata_val::DataFrame: the validation dataset\ndata_train::DataFrame: the training dataset\n\nOptional Arguments\n\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\nbase::Vector=nothing: base features (typically a lexeme)\ninflections::Union{Nothing, Vector}=nothing: other features (typically in inflectional features)\n\nExamples\n\naccuracy_comprehension(\n S_val,\n S_train,\n Shat_val,\n latin_val,\n latin_train,\n target_col=:Words,\n base=[:Lexeme],\n inflections=[:Person, :Number, :Tense, :Voice, :Mood]\n )\n\nNote\n\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform \"Äpfel\" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which \"Äpfel\" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform \"Äpfel\" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form \"Äpfel\" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that \"case\" was comprehended incorrectly.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C)\neval_SC(Chat_val, cue_obj_val.C)\neval_SC(Shat_train, S_train)\neval_SC(Shat_val, S_val)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, AbstractArray}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nnote: Note\nThe order is important. The fist gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val) or eval_SC(Shat_val, S_val, S_train)\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix\nSC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C)\neval_SC(Chat_val, cue_obj_val.C, cue_obj_train.C)\neval_SC(Shat_train, S_train, S_val)\neval_SC(Shat_val, S_val, S_train)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, DataFrames.DataFrame, Union{String, Symbol}}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol})\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Support for homophones.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\ndata::DataFrame: datasets\ntarget_col::Union{String, Symbol}: target column name\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C, latin, :Word)\neval_SC(Chat_val, cue_obj_val.C, latin, :Word)\neval_SC(Shat_train, S_train, latin, :Word)\neval_SC(Shat_val, S_val, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, AbstractArray, DataFrames.DataFrame, DataFrames.DataFrame, Union{String, Symbol}}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray, data::DataFrame, data_rest::DataFrame, target_col::Union{String, Symbol})\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nnote: Note\nThe order is important. The first gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val, latin, :Word) or eval_SC(Shat_val, S_val, S_train, latin, :Word)\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix\nSC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix\ndata::DataFrame: the training/validation datasets\ndata_rest::DataFrame: the validation/training datasets\ntarget_col::Union{String, Symbol}: target column name\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C, latin, :Word)\neval_SC(Chat_val, cue_obj_val.C, cue_obj_train.C, latin, :Word)\neval_SC(Shat_train, S_train, S_val, latin, :Word)\neval_SC(Shat_val, S_val, S_train, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, Int64}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, batch_size::Int64)\n\nAssess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks.\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nnote: Note\nCurrently only available for correlation.\n\nObligatory Arguments\n\nSChat: the Chat or Shat matrix\nSC: the C or S matrix\ndata: datasets\ntarget_col: target column name\nbatch_size: batch size\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\neval_SC(Chat_train, cue_obj_train.C, latin, :Word)\neval_SC(Chat_val, cue_obj_val.C, latin, :Word)\neval_SC(Shat_train, S_train, latin, :Word)\neval_SC(Shat_val, S_val, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, DataFrames.DataFrame, Union{String, Symbol}, Int64}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol}, batch_size::Int64)\n\nAssess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks. Support homophones.\n\nnote: Note\nCurrently only available for correlation.\n\nObligatory Arguments\n\nSChat::AbstractArray: the Chat or Shat matrix\nSC::AbstractArray: the C or S matrix\ndata::DataFrame: datasets\ntarget_col::Union{String, Symbol}: target column name\nbatch_size::Int64: batch size\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\neval_SC(Chat_train, cue_obj_train.C, latin, :Word, 5000)\neval_SC(Chat_val, cue_obj_val.C, latin, :Word, 5000)\neval_SC(Shat_train, S_train, latin, :Word, 5000)\neval_SC(Shat_val, S_val, latin, :Word, 5000)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC_loose-Tuple{Any, Any, Any}","page":"Evaluation","title":"JudiLing.eval_SC_loose","text":"eval_SC_loose(SChat, SC, k)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct.\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and it is not guaranteed that the target on the diagonal will be among the k neighbours. In particular, eval_SC and eval_SC_loose with k=1 are not guaranteed to give the same result. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\nk: top k candidates\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC_loose(Chat, cue_obj.C, k)\neval_SC_loose(Shat, S, k)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC_loose-NTuple{5, Any}","page":"Evaluation","title":"JudiLing.eval_SC_loose","text":"eval_SC_loose(SChat, SC, k, data, target_col)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct. Support for homophones.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\nk: top k candidates\ndata: datasets\ntarget_col: target column name\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC_loose(Chat, cue_obj.C, k, latin, :Word)\neval_SC_loose(Shat, S, k, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_manual-Tuple{Any, Any, Any}","page":"Evaluation","title":"JudiLing.eval_manual","text":"eval_manual(res, data, i2f)\n\nCreate extensive reports for the outputs from build_paths and learn_paths.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_acc-Tuple{Any, Array}","page":"Evaluation","title":"JudiLing.eval_acc","text":"eval_acc(res, gold_inds::Array)\n\nEvaluate the accuracy of the results from learn_paths or build_paths.\n\nObligatory Arguments\n\nres::Array: the results from learn_paths or build_paths\ngold_inds::Array: the gold paths' indices\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# evaluation on training data\nacc_train = JudiLing.eval_acc(\n res_train,\n cue_obj_train.gold_ind,\n verbose=false\n)\n\n# evaluation on validation data\nacc_val = JudiLing.eval_acc(\n res_val,\n cue_obj_val.gold_ind,\n verbose=false\n)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_acc-Tuple{Any, JudiLing.Cue_Matrix_Struct}","page":"Evaluation","title":"JudiLing.eval_acc","text":"eval_acc(res, cue_obj::Cue_Matrix_Struct)\n\nEvaluate the accuracy of the results from learn_paths or build_paths.\n\nObligatory Arguments\n\nres::Array: the results from learn_paths or build_paths\ncue_obj::Cue_Matrix_Struct: the C matrix object\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\nacc = JudiLing.eval_acc(res, cue_obj)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_acc_loose-Tuple{Any, Any}","page":"Evaluation","title":"JudiLing.eval_acc_loose","text":"eval_acc_loose(res, gold_inds)\n\nLenient evaluation of the accuracy of the results from learn_paths or build_paths, counting a prediction as correct when the correlation of the predicted and gold standard semantic vectors is among the n top correlations, where n is equal to max_can in the 'learnpaths' or `buildpaths` function.\n\nObligatory Arguments\n\nres::Array: the results from learn_paths or build_paths\ngold_inds::Array: the gold paths' indices\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# evaluation on training data\nacc_train_loose = JudiLing.eval_acc_loose(\n res_train,\n cue_obj_train.gold_ind,\n verbose=false\n)\n\n# evaluation on validation data\nacc_val_loose = JudiLing.eval_acc_loose(\n res_val,\n cue_obj_val.gold_ind,\n verbose=false\n)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.extract_gpi","page":"Evaluation","title":"JudiLing.extract_gpi","text":"extract_gpi(gpi, threshold=0.1, tolerance=(-1000.0))\n\nExtract, using gold paths' information, how many n-grams for a gold path are below the threshold but above the tolerance.\n\n\n\n\n\n","category":"function"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/find_path/#Find-Paths","page":"Find Paths","title":"Find Paths","text":"","category":"section"},{"location":"man/find_path/#Structures","page":"Find Paths","title":"Structures","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" Result_Path_Info_Struct\n Gold_Path_Info_Struct\n Threshold_Stat_Struct","category":"page"},{"location":"man/find_path/#JudiLing.Result_Path_Info_Struct","page":"Find Paths","title":"JudiLing.Result_Path_Info_Struct","text":"Store paths' information built by learn_paths or build_paths\n\n\n\n\n\n","category":"type"},{"location":"man/find_path/#JudiLing.Gold_Path_Info_Struct","page":"Find Paths","title":"JudiLing.Gold_Path_Info_Struct","text":"Store gold paths' information including indices and indices' support and total support. It can be used to evaluate how low the threshold needs to be set in order to find most of the correct paths or if set very low, all of the correct paths.\n\n\n\n\n\n","category":"type"},{"location":"man/find_path/#JudiLing.Threshold_Stat_Struct","page":"Find Paths","title":"JudiLing.Threshold_Stat_Struct","text":"Store threshold and tolerance proportional for each timestep.\n\n\n\n\n\n","category":"type"},{"location":"man/find_path/#Build-paths","page":"Find Paths","title":"Build paths","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" build_paths\n build_paths(\n data_val,\n C_train,\n S_val,\n F_train,\n Chat_val,\n A,\n i2f,\n C_train_ind;\n rC = nothing,\n max_t = 15,\n max_can = 10,\n n_neighbors = 10,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n target_col = :Words,\n start_end_token = \"#\",\n if_pca = false,\n pca_eval_M = nothing,\n ignore_nan = true,\n verbose = false,\n )","category":"page"},{"location":"man/find_path/#JudiLing.build_paths","page":"Find Paths","title":"JudiLing.build_paths","text":"The build_paths function constructs paths by only considering those n-grams that are close to the target. It first takes the predicted c-hat vector and finds the closest n neighbors in the C matrix. Then it selects all n-grams of these neighbors, and constructs all valid paths with those n-grams. The path producing the best correlation with the target semantic vector (through synthesis by analysis) is selected.\n\n\n\n\n\n","category":"function"},{"location":"man/find_path/#JudiLing.build_paths-NTuple{8, Any}","page":"Find Paths","title":"JudiLing.build_paths","text":"build_paths(\n data_val,\n C_train,\n S_val,\n F_train,\n Chat_val,\n A,\n i2f,\n C_train_ind;\n rC = nothing,\n max_t = 15,\n max_can = 10,\n n_neighbors = 10,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n target_col = :Words,\n start_end_token = \"#\",\n if_pca = false,\n pca_eval_M = nothing,\n ignore_nan = true,\n verbose = false,\n)\n\nThe build_paths function constructs paths by only considering those n-grams that are close to the target. It first takes the predicted c-hat vector and finds the closest n neighbors in the C matrix. Then it selects all n-grams of these neighbors, and constructs all valid paths with those n-grams. The path producing the best correlation with the target semantic vector (through synthesis by analysis) is selected.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nC_train::SparseMatrixCSC: the C matrix for the training dataset\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for the validation dataset\nF_train::Union{SparseMatrixCSC, Matrix}: the F matrix for the training dataset\nChat_val::Matrix: the Chat matrix for the validation dataset\nA::SparseMatrixCSC: the adjacency matrix\ni2f::Dict: the dictionary returning features given indices\nC_train_ind::Array: the gold paths' indices for the training dataset\n\nOptional Arguments\n\nrC::Union{Nothing, Matrix}=nothing: correlation Matrix of C and Chat, specify to save computing time\nmax_t::Int64=15: maximum number of timesteps\nmax_can::Int64=10: maximum number of candidates to consider\nn_neighbors::Int64=10: the top n form neighbors to be considered\ngrams::Int64=3: the number n of grams that make up n-grams\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\ntarget_col::Union{String, :Symbol}=:Words: the column name for target strings\nif_pca::Bool=false: turn on to enable pca mode\npca_eval_M::Matrix=nothing: pass original F for pca mode\nverbose::Bool=false: if true, more information will be printed\n\nExamples\n\n# training dataset\nJudiLing.build_paths(\n latin_train,\n cue_obj_train.C,\n S_train,\n F_train,\n Chat_train,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t=max_t,\n n_neighbors=10,\n verbose=false\n )\n\n# validation dataset\nJudiLing.build_paths(\n latin_val,\n cue_obj_train.C,\n S_val,\n F_train,\n Chat_val,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t=max_t,\n n_neighbors=10,\n verbose=false\n )\n\n# pca mode\nres_build = JudiLing.build_paths(\n korean,\n Array(Cpcat),\n S,\n F,\n ChatPCA,\n A,\n cue_obj.i2f,\n cue_obj.gold_ind,\n max_t=max_t,\n if_pca=true,\n pca_eval_M=Fo,\n n_neighbors=3,\n verbose=true\n )\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#Learn-paths","page":"Find Paths","title":"Learn paths","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" learn_paths\n learn_paths(\n data::DataFrame,\n cue_obj::Cue_Matrix_Struct,\n S_val::Union{SparseMatrixCSC, Matrix},\n F_train,\n Chat_val::Union{SparseMatrixCSC, Matrix};\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n verbose::Bool = true)\n learn_paths(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n )\n learn_paths_rpi(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n )","category":"page"},{"location":"man/find_path/#JudiLing.learn_paths","page":"Find Paths","title":"JudiLing.learn_paths","text":"A sequence finding algorithm using discrimination learning to predict, for a given word, which n-grams are best supported for a given position in the sequence of n-grams.\n\n\n\n\n\n","category":"function"},{"location":"man/find_path/#JudiLing.learn_paths-Tuple{DataFrames.DataFrame, JudiLing.Cue_Matrix_Struct, Union{SparseArrays.SparseMatrixCSC, Matrix}, Any, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Find Paths","title":"JudiLing.learn_paths","text":"learn_paths(\n data::DataFrame,\n cue_obj::Cue_Matrix_Struct,\n S_val::Union{SparseMatrixCSC, Matrix},\n F_train,\n Chat_val::Union{SparseMatrixCSC, Matrix};\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n verbose::Bool = true)\n\nA high-level wrapper function for learn_paths with much less control. It aims for users who is very new to JudiLing and learn_paths function.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ncue_obj::Cue_Matrix_Struct: the C matrix object containing all information with C\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset\nF_train::Union{SparseMatrixCSC, Matrix, Chain}: either the F matrix for training dataset, or a deep learning comprehension model trained on the training set\nChat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset\n\nOptional Arguments\n\nShat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset\ncheck_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value\nthreshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration\nis_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path\ntolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path\nmax_tolerance::Int64=4: maximum number of n-grams allowed in a path\nactivation::Function=nothing: the activation function you want to pass\nignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\nres = learn_paths(latin, cue_obj, S, F, Chat)\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.learn_paths-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Any, Union{SparseArrays.SparseMatrixCSC, Matrix}, SparseArrays.SparseMatrixCSC, Dict, Dict}","page":"Find Paths","title":"JudiLing.learn_paths","text":"learn_paths(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n)\n\nA sequence finding algorithm using discrimination learning to predict, for a given word, which n-grams are best supported for a given position in the sequence of n-grams.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nC_train::Union{SparseMatrixCSC, Matrix}: the C matrix for training dataset\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset\nF_train::Union{SparseMatrixCSC, Matrix, Chain}: the F matrix for training dataset, or a deep learning comprehension model trained on the training data\nChat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset\nA::SparseMatrixCSC: the adjacency matrix\ni2f::Dict: the dictionary returning features given indices\nf2i::Dict: the dictionary returning indices given features\n\nOptional Arguments\n\ngold_ind::Union{Nothing, Vector}=nothing: gold paths' indices\nShat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset\ncheck_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value\nmax_t::Int64=15: maximum timestep\nmax_can::Int64=10: maximum number of candidates to consider\nthreshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration\nis_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path\ntolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path\nmax_tolerance::Int64=4: maximum number of n-grams allowed in a path\ngrams::Int64=3: the number n of grams that make up an n-gram\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nkeep_sep::Bool=false:if true, keep separators in cues\ntarget_col::Union{String, :Symbol}=:Words: the column name for target strings\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nissparse::Union{Symbol, Bool}=:auto: control of whether output of Mt matrix is a dense matrix or a sparse matrix\nsparse_ratio::Float64=0.05: the ratio to decide whether a matrix is sparse\nif_pca::Bool=false: turn on to enable pca mode\npca_eval_M::Matrix=nothing: pass original F for pca mode\nactivation::Function=nothing: the activation function you want to pass\nignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value\ncheck_threshold_stat::Bool=false: if true, return a threshold and torlerance proportion for each timestep\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# basic usage without tokenization\nres = JudiLing.learn_paths(\nlatin,\nlatin,\ncue_obj.C,\nS,\nF,\nChat,\nA,\ncue_obj.i2f,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=false,\nkeep_sep=false,\ntarget_col=:Word,\nverbose=true)\n\n# basic usage with tokenization\nres = JudiLing.learn_paths(\nfrench,\nfrench,\ncue_obj.C,\nS,\nF,\nChat,\nA,\ncue_obj.i2f,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=true,\nsep_token=\"-\",\nkeep_sep=true,\ntarget_col=:Syllables,\nverbose=true)\n\n# basic usage for validation data\nres_val = JudiLing.learn_paths(\nlatin_train,\nlatin_val,\ncue_obj_train.C,\nS_val,\nF_train,\nChat_val,\nA,\ncue_obj_train.i2f,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=false,\nkeep_sep=false,\ntarget_col=:Word,\nverbose=true)\n\n# turn on tolerance mode\nres_val = JudiLing.learn_paths(\n...\nthreshold=0.1,\nis_tolerant=true,\ntolerance=-0.1,\nmax_tolerance=4,\n...)\n\n# turn on check gold paths mode\nres_train, gpi_train = JudiLing.learn_paths(\n...\ngold_ind=cue_obj_train.gold_ind,\nShat_val=Shat_train,\ncheck_gold_path=true,\n...)\n\nres_val, gpi_val = JudiLing.learn_paths(\n...\ngold_ind=cue_obj_val.gold_ind,\nShat_val=Shat_val,\ncheck_gold_path=true,\n...)\n\n# control over sparsity\nres_val = JudiLing.learn_paths(\n...\nissparse=:auto,\nsparse_ratio=0.05,\n...)\n\n# pca mode\nres_learn = JudiLing.learn_paths(\nkorean,\nkorean,\nArray(Cpcat),\nS,\nF,\nChatPCA,\nA,\ncue_obj.i2f,\ncue_obj.f2i,\ncheck_gold_path=false,\ngold_ind=cue_obj.gold_ind,\nShat_val=Shat,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=true,\nsep_token=\"_\",\nkeep_sep=true,\ntarget_col=:Verb_syll,\nif_pca=true,\npca_eval_M=Fo,\nverbose=true);\n\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.learn_paths_rpi-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Any, Union{SparseArrays.SparseMatrixCSC, Matrix}, SparseArrays.SparseMatrixCSC, Dict, Dict}","page":"Find Paths","title":"JudiLing.learn_paths_rpi","text":"learn_paths_rpi(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n)\n\nCalculate learn_paths with results indices supports as well.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nC_train::Union{SparseMatrixCSC, Matrix}: the C matrix for training dataset\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset\nF_train::Union{SparseMatrixCSC, Matrix, Chain}: the F matrix for training dataset, or a deep learning comprehension model trained on the training data\nChat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset\nA::SparseMatrixCSC: the adjacency matrix\ni2f::Dict: the dictionary returning features given indices\nf2i::Dict: the dictionary returning indices given features\n\nOptional Arguments\n\ngold_ind::Union{Nothing, Vector}=nothing: gold paths' indices\nShat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset\ncheck_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value\nmax_t::Int64=15: maximum timestep\nmax_can::Int64=10: maximum number of candidates to consider\nthreshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration\nis_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path\ntolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path\nmax_tolerance::Int64=4: maximum number of n-grams allowed in a path\ngrams::Int64=3: the number n of grams that make up an n-gram\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nkeep_sep::Bool=false:if true, keep separators in cues\ntarget_col::Union{String, :Symbol}=:Words: the column name for target strings\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nissparse::Union{Symbol, Bool}=:auto: control of whether output of Mt matrix is a dense matrix or a sparse matrix\nsparse_ratio::Float64=0.05: the ratio to decide whether a matrix is sparse\nif_pca::Bool=false: turn on to enable pca mode\npca_eval_M::Matrix=nothing: pass original F for pca mode\nactivation::Function=nothing: the activation function you want to pass\nignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value\ncheck_threshold_stat::Bool=false: if true, return a threshold and torlerance proportion for each timestep\nverbose::Bool=false: if true, more information is printed\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#Utility-functions","page":"Find Paths","title":"Utility functions","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" eval_can(candidates, S, F, i2f, max_can, if_pca, pca_eval_M)\n find_top_feature_indices(rC, C_train_ind)\n make_ngrams_ind(res, n)\n predict_shat(F::Union{Matrix, SparseMatrixCSC},\n ci::Vector{Int})","category":"page"},{"location":"man/find_path/#JudiLing.eval_can-NTuple{7, Any}","page":"Find Paths","title":"JudiLing.eval_can","text":"eval_can(candidates, S, F::Union{Matrix,SparseMatrixCSC, Chain}, i2f, max_can, if_pca, pca_eval_M)\n\nCalculate for each candidate path the correlation between predicted semantic vector and the gold standard semantic vector, and select as target for production the path with the highest correlation.\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.find_top_feature_indices-Tuple{Any, Any}","page":"Find Paths","title":"JudiLing.find_top_feature_indices","text":"find_top_feature_indices(rC, C_train_ind)\n\nFind all indices for the n-grams of the top n closest neighbors of a given target.\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.make_ngrams_ind-Tuple{Any, Any}","page":"Find Paths","title":"JudiLing.make_ngrams_ind","text":"make_ngrams_ind(res, n)\n\nConstruct ngrams indices.\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.predict_shat-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Vector{Int64}}","page":"Find Paths","title":"JudiLing.predict_shat","text":"predict_shat(F::Union{Matrix, SparseMatrixCSC},\n ci::Vector{Int})\n\nPredicts semantic vector shat given a comprehension matrix F and a list of indices of ngrams ci.\n\nObligatory arguments\n\nF::Union{Matrix, SparseMatrixCSC}: Comprehension matrix F.\nci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.\n\n\n\n\n\n","category":"method"},{"location":"man/display/","page":"Display","title":"Display","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/display/#Cholesky","page":"Display","title":"Cholesky","text":"","category":"section"},{"location":"man/display/","page":"Display","title":"Display","text":" display_matrix(M, rownames, colnames)\n display_matrix(data, target_col, cue_obj, M, M_type)","category":"page"},{"location":"man/display/#JudiLing.display_matrix-Tuple{Any, Any, Any}","page":"Display","title":"JudiLing.display_matrix","text":"display_matrix(M, rownames, colnames)\n\nDisplay matrix with rownames and colnames.\n\n\n\n\n\n","category":"method"},{"location":"man/display/#JudiLing.display_matrix-NTuple{5, Any}","page":"Display","title":"JudiLing.display_matrix","text":"display_matrix(data, target_col, cue_pS_obj, M, M_type)\n\nDisplay matrix with rownames and colnames.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\ntarget_col::Union{String, Symbol}: the target column name\ncue_pS_obj::Union{Cue_Matrix_Struct,PS_Matrix_Struct}: the cue matrix or pS matrix structure\nM::Union{SparseMatrixCSC, Matrix}: the matrix\nM_type::Union{String, Symbol}: the type of the matrix, currently support :C, :S, :F, :G, :Chat, :Shat, :A, :R and :pS\n\nOptional Arguments\n\nnrow::Int64 = 6: the number of rows to display\nncol::Int64 = 6: the number of columns to display\nreturn_matrix::Bool = false: whether the created dataframe should be returned (and not only displayed)\n\nExamples\n\nJudiLing.display_matrix(latin, :Word, cue_obj, cue_obj.C, :C)\nJudiLing.display_matrix(latin, :Word, cue_obj, S, :S)\nJudiLing.display_matrix(latin, :Word, cue_obj, G, :G)\nJudiLing.display_matrix(latin, :Word, cue_obj, Chat, :Chat)\nJudiLing.display_matrix(latin, :Word, cue_obj, F, :F)\nJudiLing.display_matrix(latin, :Word, cue_obj, Shat, :Shat)\nJudiLing.display_matrix(latin, :Word, cue_obj, A, :A)\nJudiLing.display_matrix(latin, :Word, cue_obj, R, :R)\nJudiLing.display_matrix(latin, :Word, pS_obj, pS_obj.pS, :pS)\n\n\n\n\n\n","category":"method"},{"location":"man/input/","page":"Loading data","title":"Loading data","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/input/#Loading-data","page":"Loading data","title":"Loading data","text":"","category":"section"},{"location":"man/input/","page":"Loading data","title":"Loading data","text":"load_dataset(filepath::String;\n delim::String=\",\",\n kargs...)\nloading_data_randomly_split(\n data_path::String,\n output_dir_path::String,\n data_prefix::String;\n val_sample_size::Int = 0,\n val_ratio::Float = 0.0,\n random_seed::Int = 314)\nloading_data_careful_split(\n data_path::String,\n data_prefix::String,\n output_dir_path::String,\n n_features_columns::Union{Vector{Symbol},Vector{String}};\n train_sample_size::Int = 0,\n val_sample_size::Int = 0,\n val_ratio::Float64 = 0.0,\n n_grams_target_col::Union{Symbol, String} = :Word,\n n_grams_tokenized::Bool = false,\n n_grams_sep_token::Union{Nothing, String} = nothing,\n grams::Int = 3,\n n_grams_keep_sep::Bool = false,\n start_end_token::String = \"#\",\n random_seed::Int = 314,\n verbose::Bool = false)","category":"page"},{"location":"man/input/#JudiLing.load_dataset-Tuple{String}","page":"Loading data","title":"JudiLing.load_dataset","text":"load_dataset(filepath::String;\n delim::String=\",\",\n kargs...)\n\nLoad a dataset from file, usually comma- or tab-separated. Returns a DataFrame.\n\nObligatory arguments\n\nfilepath::String: Path to file to be loaded.\n\nOptional arguments\n\ndelim::String=\",\": Delimiter in the file (usually either \",\" or \"\\t\").\nkargs...: Further keyword arguments are passed to CSV.File().\n\nExample\n\nlatin = JudiLing.load_dataset(\"latin.csv\")\nfirst(latin, 10)\n\n\n\n\n\n","category":"method"},{"location":"man/input/#JudiLing.loading_data_randomly_split-Tuple{String, String, String}","page":"Loading data","title":"JudiLing.loading_data_randomly_split","text":"loading_data_randomly_split(\n data_path::String,\n output_dir_path::String,\n data_prefix::String;\n val_sample_size::Int = 0,\n val_ratio::Float64 = 0.0,\n random_seed::Int = 314)\n\nRead in a dataframe, splitting the dataframe into a training and validation dataset. The two are also written to output_dir_path at the same time.\n\nnote: Note\nThe order of data_prefix and output_dir_path is exactly reversed compared to loading_data_careful_split.\n\nObligatory arguments\n\ndata_path::String: Path to where the dataset is stored.\noutput_dir_path::String: Path to where the new dataframes should be stored.\ndata_prefix::String: Prefix of the two new files, will be called data_prefix_train.csv and data_prefix_val.csv.\n\nOptional arguments\n\nval_sample_size::Int = 0: Size of the validation dataset (only val_sample_size or val_ratio may be used).\nval_ratio::Float64 = 0.0: Fraction of the data that should be in the validation dataset (only val_sample_size or val_ratio may be used).\nrandom_seed::Int = 314: Random seed for controlling random split.\n\nExample\n\ndata_train, data_val = JudiLing.loading_data_randomly_split(\n \"latin.csv\",\n \"careful\",\n \"latin\",\n [\"Lexeme\",\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"]\n)\n\n\n\n\n\n","category":"method"},{"location":"man/input/#JudiLing.loading_data_careful_split-Tuple{String, String, String, Union{Vector{String}, Vector{Symbol}}}","page":"Loading data","title":"JudiLing.loading_data_careful_split","text":"loading_data_careful_split(\n data_path::String,\n data_prefix::String,\n output_dir_path::String,\n n_features_columns::Union{Vector{Symbol},Vector{String}};\n train_sample_size::Int = 0,\n val_sample_size::Int = 0,\n val_ratio::Float64 = 0.0,\n n_grams_target_col::Union{Symbol, String} = :Word,\n n_grams_tokenized::Bool = false,\n n_grams_sep_token::Union{Nothing, String} = nothing,\n grams::Int = 3,\n n_grams_keep_sep::Bool = false,\n start_end_token::String = \"#\",\n random_seed::Int = 314,\n verbose::Bool = false)\n\nRead in a dataframe, splitting the dataframe into a training and validation dataset. The split is done such that all features in the columns specified in n_features_columns occur both in the training and validation data. It is also ensured that the unique grams resulting from splitting the strings in column n_grams_target_col into grams-grams occur in both datasets. The two are also written to output_dir_path at the same time.\n\nnote: Note\nThe order of data_prefix and output_dir_path is exactly reversed compared to loading_data_randomly_split.\n\nObligatory arguments\n\ndata_path::String: Path to where the dataset is stored.\noutput_dir_path::String: Path to where the new dataframes should be stored.\ndata_prefix::String: Prefix of the two new files, will be called data_prefix_train.csv and data_prefix_val.csv.\nn_features_columns::Vector{Union{Symbol, String}}: Vector with columns whose features have to occur in both the training and validation data.\n\nOptional arguments\n\nval_sample_size::Int = 0: Size of the validation dataset (only val_sample_size or val_ratio may be used).\nval_ratio::Float64 = 0.0: Fraction of the data that should be in the validation dataset (only val_sample_size or val_ratio may be used).\nn_grams_target_col::Union{Symbol, String} = :Word: Column with target words.\nn_grams_tokenized::Bool = false: Whether the words in n_grams_target_col are already tokenized.\nn_grams_sep_token::Union{Nothing, String} = nothing: String with which tokens in n_grams_target_col are separated (only used if n_grams_tokenized=true).\ngrams::Int = 3: Granularity of the n-grams.\nn_grams_keep_sep::Bool = false: Whether the token separators should be kept in the ngrams (this is useful e.g. when working with syllables).\nstart_end_token::String = \"#\": Token with which the start and end of words should be marked.\nrandom_seed::Int = 314: Random seed for controlling random split.\n\nExample\n\ndata_train, data_val = JudiLing.loading_data_careful_split(\n \"latin.csv\",\n \"latin\",\n \"careful\",\n [\"Lexeme\",\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"]\n)\n\n\n\n\n\n","category":"method"},{"location":"man/all_manual/","page":"All Manual index","title":"All Manual index","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/all_manual/","page":"All Manual index","title":"All Manual index","text":"","category":"page"},{"location":"man/output/","page":"Output","title":"Output","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/output/#Output","page":"Output","title":"Output","text":"","category":"section"},{"location":"man/output/","page":"Output","title":"Output","text":" write2csv\n write2df\n write_comprehension_eval\n write2csv(res, data, cue_obj_train, cue_obj_val, filename)\n write2csv(gpi::Vector{Gold_Path_Info_Struct}, filename)\n write2csv(ts::Threshold_Stat_Struct, filename)\n write2df(res, data, cue_obj_train, cue_obj_val)\n write2df(gpi::Vector{Gold_Path_Info_Struct})\n write2df(ts::Threshold_Stat_Struct)\n write_comprehension_eval(SChat, SC, data, target_col, filename)\n write_comprehension_eval(SChat, SC, SC_rest, data, data_rest, target_col, filename)\n save_L_matrix(L, filename)\n load_L_matrix(filename)\n save_S_matrix(S, filename, data, target_col)\n load_S_matrix(filename)","category":"page"},{"location":"man/output/#JudiLing.write2csv","page":"Output","title":"JudiLing.write2csv","text":"Write results into a csv file. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.\n\n\n\n\n\n","category":"function"},{"location":"man/output/#JudiLing.write2df","page":"Output","title":"JudiLing.write2df","text":"Reformat results into a dataframe. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.\n\n\n\n\n\n","category":"function"},{"location":"man/output/#JudiLing.write_comprehension_eval","page":"Output","title":"JudiLing.write_comprehension_eval","text":"Write comprehension evaluation into a CSV file, include target and predicted ids and indentifiers and their correlations.\n\n\n\n\n\n","category":"function"},{"location":"man/output/#JudiLing.write2csv-NTuple{5, Any}","page":"Output","title":"JudiLing.write2csv","text":"write2csv(res, data, cue_obj_train, cue_obj_val, filename)\n\nWrite results into csv file for the results from learn_paths and build_paths.\n\nObligatory Arguments\n\nres::Array{Array{Result_Path_Info_Struct,1},1}: the results from learn_paths or build_paths\ndata::DataFrame: the dataset\ncue_obj_train::Cue_Matrix_Struct: the cue object for training dataset\ncue_obj_val::Cue_Matrix_Struct: the cue object for validation dataset\nfilename::String: the filename\n\nOptional Arguments\n\ngrams::Int64=3: the number n in n-gram cues\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\noutput_sep_token::Union{String, Char}=\"\": output separator\npath_sep_token::Union{String, Char}=\":\": path separator\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\n# writing results for training data\nJudiLing.write2csv(\n res_train,\n latin_train,\n cue_obj_train,\n cue_obj_train,\n \"res_latin_train.csv\",\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word,\n root_dir=\".\",\n output_dir=\"test_out\")\n\n# writing results for validation data\nJudiLing.write2csv(\n res_val,\n latin_val,\n cue_obj_train,\n cue_obj_val,\n \"res_latin_val.csv\",\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word,\n root_dir=\".\",\n output_dir=\"test_out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2csv-Tuple{Vector{JudiLing.Gold_Path_Info_Struct}, Any}","page":"Output","title":"JudiLing.write2csv","text":"write2csv(gpi::Vector{Gold_Path_Info_Struct}, filename)\n\nWrite results into csv file for the gold paths' information optionally returned by learn_paths and build_paths.\n\nObligatory Arguments\n\ngpi::Vector{Gold_Path_Info_Struct}: the gold paths' information\nfilename::String: the filename\n\nOptional Arguments\n\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\n# write gold standard paths to csv for training data\nJudiLing.write2csv(\n gpi_train,\n \"gpi_latin_train.csv\",\n root_dir=\".\",\n output_dir=\"test_out\"\n )\n\n# write gold standard paths to csv for validation data\nJudiLing.write2csv(\n gpi_val,\n \"gpi_latin_val.csv\",\n root_dir=\".\",\n output_dir=\"test_out\"\n )\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2csv-Tuple{JudiLing.Threshold_Stat_Struct, Any}","page":"Output","title":"JudiLing.write2csv","text":"write2csv(ts::Threshold_Stat_Struct, filename)\n\nWrite results into csv file for threshold and tolerance proportion for each timestep.\n\nObligatory Arguments\n\ngpi::Vector{Gold_Path_Info_Struct}: the gold paths' information\nfilename::String: the filename\n\nOptional Arguments\n\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\nJudiLing.write2csv(ts, \"ts.csv\", root_dir = @__DIR__, output_dir=\"out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2df-NTuple{4, Any}","page":"Output","title":"JudiLing.write2df","text":"write2df(res, data, cue_obj_train, cue_obj_val)\n\nReformat results into a dataframe for the results form learn_paths and build_paths functions.\n\nObligatory Arguments\n\nres: output of learn_paths or build_paths\ndata::DataFrame: the dataset\ncue_obj_train: cue object of the training data set\ncue_obj_val: cue object of the validation data set\n\nOptional Arguments\n\ngrams::Int64=3: the number n in n-gram cues\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\noutput_sep_token::Union{String, Char}=\"\": output separator\npath_sep_token::Union{String, Char}=\":\": path separator\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\n\nExamples\n\n# writing results for training data\nJudiLing.write2df(\n res_train,\n latin_train,\n cue_obj_train,\n cue_obj_train,\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word)\n\n# writing results for validation data\nJudiLing.write2df(\n res_val,\n latin_val,\n cue_obj_train,\n cue_obj_val,\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2df-Tuple{Vector{JudiLing.Gold_Path_Info_Struct}}","page":"Output","title":"JudiLing.write2df","text":"write2df(gpi::Vector{Gold_Path_Info_Struct})\n\nWrite results into a dataframe for the gold paths' information optionally returned by learn_paths and build_paths.\n\nObligatory Arguments\n\ngpi::Vector{Gold_Path_Info_Struct}: the gold paths' information\n\nExamples\n\n# write gold standard paths to df for training data\nJudiLing.write2csv(gpi_train)\n\n# write gold standard paths to df for validation data\nJudiLing.write2csv(gpi_val)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2df-Tuple{JudiLing.Threshold_Stat_Struct}","page":"Output","title":"JudiLing.write2df","text":"write2df(ts::Threshold_Stat_Struct)\n\nWrite results into a dataframe for threshold and tolerance proportion for each timestep.\n\nObligatory Arguments\n\nts::Threshold_Stat_Struct: the threshold and tolerance proportion\n\nExamples\n\nJudiLing.write2df(ts)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write_comprehension_eval-NTuple{5, Any}","page":"Output","title":"JudiLing.write_comprehension_eval","text":"write_comprehension_eval(SChat, SC, data, target_col, filename)\n\nWrite comprehension evaluation into a CSV file, include target and predicted ids and indentifiers and their correlations.\n\nObligatory Arguments\n\nSChat::Matrix: the Shat/Chat matrix\nSC::Matrix: the S/C matrix\ndata::DataFrame: the data\ntarget_col::Symbol: the name of target column\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nk: top k candidates\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\nJudiLing.write_comprehension_eval(Chat, cue_obj.C, latin, :Word, \"output.csv\",\n k=10, root_dir=@__DIR__, output_dir=\"out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write_comprehension_eval-NTuple{7, Any}","page":"Output","title":"JudiLing.write_comprehension_eval","text":"write_comprehension_eval(SChat, SC, SC_rest, data, data_rest, target_col, filename)\n\nWrite comprehension evaluation into a CSV file for both training and validation datasets, include target and predicted ids and indentifiers and their correlations.\n\nObligatory Arguments\n\nSChat::Matrix: the Shat/Chat matrix\nSC::Matrix: the S/C matrix\nSC_rest::Matrix: the rest S/C matrix\ndata::DataFrame: the data\ndata_rest::DataFrame: the rest data\ntarget_col::Symbol: the name of target column\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nk: top k candidates\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\nJudiLing.write_comprehension_eval(Shat_val, S_val, S_train, latin_val, latin_train,\n :Word, \"all_output.csv\", k=10, root_dir=@__DIR__, output_dir=\"out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.save_L_matrix-Tuple{Any, Any}","page":"Output","title":"JudiLing.save_L_matrix","text":"save_L_matrix(L, filename)\n\nSave lexome matrix into csv file.\n\nObligatory Arguments\n\nL::L_Matrix_Struct: the lexome matrix struct\nfilename::String: the filename/filepath\n\nExamples\n\nJudiLing.save_L_matrix(L, joinpath(@__DIR__, \"L.csv\"))\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.load_L_matrix-Tuple{Any}","page":"Output","title":"JudiLing.load_L_matrix","text":"load_L_matrix(filename)\n\nLoad lexome matrix from csv file.\n\nObligatory Arguments\n\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nheader::Bool=false: header in csv\n\nExamples\n\nL_load = JudiLing.load_L_matrix(joinpath(@__DIR__, \"L.csv\"))\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.save_S_matrix-NTuple{4, Any}","page":"Output","title":"JudiLing.save_S_matrix","text":"save_S_matrix(S, filename, data, target_col)\n\nSave S matrix into a csv file.\n\nObligatory Arguments\n\nS::Matrix: the S matrix\nfilename::String: the filename/filepath\ndata::DataFrame: the data\ntarget_col::Symbol: the name of target column\n\nOptional Arguments\n\nsep::Bool=\" \": separator in CSV file\n\nExamples\n\nJudiLing.save_S_matrix(S, joinpath(@__DIR__, \"S.csv\"), latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.load_S_matrix-Tuple{Any}","page":"Output","title":"JudiLing.load_S_matrix","text":"load_S_matrix(filename)\n\nLoad S matrix from a csv file.\n\nObligatory Arguments\n\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nheader::Bool=false: header in csv\nsep::Bool=\" \": separator in CSV file\n\nExamples\n\nJudiLing.load_S_matrix(joinpath(@__DIR__, \"S.csv\"))\n\n\n\n\n\n","category":"method"},{"location":"man/make_yt_matrix/","page":"Make Yt Matrix","title":"Make Yt Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_yt_matrix/#Make-Yt-Matrix","page":"Make Yt Matrix","title":"Make Yt Matrix","text":"","category":"section"},{"location":"man/make_yt_matrix/","page":"Make Yt Matrix","title":"Make Yt Matrix","text":" make_Yt_matrix\n make_Yt_matrix(t, data, f2i)","category":"page"},{"location":"man/make_yt_matrix/#JudiLing.make_Yt_matrix","page":"Make Yt Matrix","title":"JudiLing.make_Yt_matrix","text":"Make Yt matrix for timestep t.\n\n\n\n\n\n","category":"function"},{"location":"man/make_yt_matrix/#JudiLing.make_Yt_matrix-Tuple{Any, Any, Any}","page":"Make Yt Matrix","title":"JudiLing.make_Yt_matrix","text":"make_Yt_matrix(t, data, f2i)\n\nMake Yt matrix for timestep t. A given column of the Yt matrix specifies the support for the corresponding n-gram predicted for timestep t for each of the observations (rows of Yt).\n\nObligatory Arguments\n\nt::Int64: the timestep t\ndata::DataFrame: the dataset\nf2i::Dict: the dictionary returning indices given features\n\nOptional Arguments\n\ntokenized::Bool=false: if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nverbose::Bool=false: if verbose, more information will be printed\n\nExamples\n\nlatin = DataFrame(CSV.File(joinpath(\"data\", \"latin_mini.csv\")))\nJudiLing.make_Yt_matrix(2, latin)\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/","page":"Preprocess","title":"Preprocess","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/preprocess/#Preprocess","page":"Preprocess","title":"Preprocess","text":"","category":"section"},{"location":"man/preprocess/","page":"Preprocess","title":"Preprocess","text":" SplitDataException\n lpo_cv_split(p, data_path)\n loo_cv_split(data_path)\n train_val_random_split(data_path, output_dir_path, data_prefix)\n train_val_careful_split(data_path, output_dir_path, data_prefix, n_features_columns)","category":"page"},{"location":"man/preprocess/#JudiLing.SplitDataException","page":"Preprocess","title":"JudiLing.SplitDataException","text":"Split Data Exception\n\n\n\n\n\n","category":"type"},{"location":"man/preprocess/#JudiLing.lpo_cv_split-Tuple{Any, Any}","page":"Preprocess","title":"JudiLing.lpo_cv_split","text":"lpo_cv_split(p, data_path)\n\nLeave p out cross-validation.\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/#JudiLing.loo_cv_split-Tuple{Any}","page":"Preprocess","title":"JudiLing.loo_cv_split","text":"loo_cv_split(data_path)\n\nLeave one out cross-validation.\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/#JudiLing.train_val_random_split-Tuple{Any, Any, Any}","page":"Preprocess","title":"JudiLing.train_val_random_split","text":"train_val_random_split(data_path, output_dir_path, data_prefix)\n\nRandomly split dataset.\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/#JudiLing.train_val_careful_split-NTuple{4, Any}","page":"Preprocess","title":"JudiLing.train_val_careful_split","text":"train_val_careful_split(data_path, output_dir_path, data_prefix, n_features_columns)\n\nCarefully split dataset.\n\n\n\n\n\n","category":"method"},{"location":"man/test_combo/","page":"Test Combo","title":"Test Combo","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/test_combo/#Test-Combo","page":"Test Combo","title":"Test Combo","text":"","category":"section"},{"location":"man/test_combo/","page":"Test Combo","title":"Test Combo","text":" test_combo(test_mode;kwargs...)","category":"page"},{"location":"man/test_combo/#JudiLing.test_combo-Tuple{Any}","page":"Test Combo","title":"JudiLing.test_combo","text":"test_combo(test_mode;kwargs...)\n\nA wrapper function for a full model for a specific combination of parameters. A detailed introduction is in Test Combo Introduction\n\nObligatory Arguments\n\ntest_mode::Symbol: which test mode, currently supports :trainonly, :presplit, :carefulsplit and :randomsplit.\n\nOptional Arguments\n\ntrain_sample_size::Int64=0: the desired number of training data\nval_sample_size::Int64=0: the desired number of validation data\nval_ratio::Float64=0.0: the desired portion of validation data, if works only if :valsamplesize is 0.0.\nextension::String=\".csv\": the extension for data nfeaturesinflections\nn_grams_target_col::Union{String, Symbol}=:Word: the column name for target strings\nn_grams_tokenized::Boolean=false: if true, the dataset target is assumed to be tokenized\nn_grams_sep_token::String=nothing: separator\ngrams::Int64=3: the number of grams for cues\nn_grams_keep_sep::Boolean=false: if true, keep separators in cues\nstart_end_token::String=\":\": start and end token in boundary cues\npath_sep_token::String=\":\": path separator in the assembled path\nrandom_seed::Int64=314: the random seed\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nisdeep::Boolean=true: if true, mean of each feature is also randomized\nadd_noise::Boolean=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Boolean=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\nif_combined::Boolean=false: if true, then features are combined with both training and validation data\nlearn_mode::Int64=:cholesky: which learning mode, currently supports :cholesky and :wh\nmethod::Int64=:additive: whether :additive or :multiplicative decomposition is required\nshift::Int64=0.02: shift value for :additive decomposition\nmultiplier::Int64=1.01: multiplier value for :multiplicative decomposition\noutput_format::Int64=:auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Int64=0.05: the ratio to decide whether a matrix is sparse\nwh_freq::Vector=nothing: the learning sequence\ninit_weights::Matrix=nothing: the initial weights\neta::Float64=0.1: the learning rate\nn_epochs::Int64=1: the number of epochs to be trained\nmax_t::Int64=0: the number of epochs to be trained\nA::Matrix=nothing: the number of epochs to be trained\nA_mode::Symbol=:combined: the adjacency matrix mode, currently supports :combined or :train_only\nmax_can::Int64=10: the max number of candidate path to keep in the output\nthreshold_train::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for training data\nis_tolerant_train::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for training data\ntolerance_train::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for training data\nmax_tolerance_train::Int64=2: maximum number of n-grams allowed in a path for training data\nthreshold_val::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for validation data\nis_tolerant_val::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for validation data\ntolerance_val::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for validation data\nmax_tolerance_val::Int64=2: maximum number of n-grams allowed in a path for validation data\nn_neighbors_train::Int64=10: the top n form neighbors to be considered for training data\nn_neighbors_val::Int64=20: the top n form neighbors to be considered for validation data\nissparse::Bool=false: if true, keep sparse matrix format when learning paths\noutput_dir::String=\"out\": the output directory\nverbose::Bool=false: if true, more information will be printed\n\n\n\n\n\n","category":"method"},{"location":"#JudiLing","page":"Home","title":"JudiLing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"JudiLing: An implementation for Linear Discriminative Learning in Julia","category":"page"},{"location":"","page":"Home","title":"Home","text":"Maintainer: Maria Heitmeier @MariaHei\nOriginal codebase: Xuefeng Luo @MegamindHenry","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"You can install JudiLing by the follow commands:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Pkg\nPkg.add(\"JudiLing\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"For brave adventurers, install test version of JudiLing by:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> Pkg.add(url=\"https://github.com/quantling/JudiLing.jl.git\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"Or from the Julia REPL, type ] to enter the Pkg REPL mode and run","category":"page"},{"location":"","page":"Home","title":"Home","text":"pkg> add https://github.com/quantling/JudiLing.jl.git","category":"page"},{"location":"#Running-Julia-with-multiple-threads","page":"Home","title":"Running Julia with multiple threads","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"JudiLing supports the use of multiple threads. Simply start up Julia in your terminal as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"$ julia -t your_num_of_threads","category":"page"},{"location":"","page":"Home","title":"Home","text":"For detailed information on using Julia with threads, see this link.","category":"page"},{"location":"#Include-packages","page":"Home","title":"Include packages","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Before we start, we first need to load the JudiLing package:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using JudiLing","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note: As of JudiLing 0.8.0, PyCall and Flux have become optional dependencies. This means that all code in JudiLing which requires calls to python is only available if PyCall is loaded first, like this:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using PyCall\nusing JudiLing","category":"page"},{"location":"","page":"Home","title":"Home","text":"Likewise, the code involving deep learning is only available if Julia's deep learning library Flux is loaded first, like this:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Flux\nusing JudiLing","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note that Flux and PyCall have to be installed separately, and the newest version of Flux requires at least Julia 1.9. If you want to run deep learning in a GPU, make sure to also install and import CUDA.","category":"page"},{"location":"#Running-Julia-with-multiple-threads-2","page":"Home","title":"Running Julia with multiple threads","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"JudiLing supports the use of multiple threads. Simply start up Julia in your terminal as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"$ julia -t your_num_of_threads","category":"page"},{"location":"","page":"Home","title":"Home","text":"For detailed information on using Julia with threads, see this link.","category":"page"},{"location":"#Quick-start-example","page":"Home","title":"Quick start example","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The Latin dataset latin.csv contains lexemes and inflectional features for 672 inflected Latin verb forms for 8 lexemes from 4 conjugation classes. Word forms are inflected for person, number, tense, voice and mood.","category":"page"},{"location":"","page":"Home","title":"Home","text":"\"\",\"Word\",\"Lexeme\",\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"\n\"1\",\"vocoo\",\"vocare\",\"p1\",\"sg\",\"present\",\"active\",\"ind\"\n\"2\",\"vocaas\",\"vocare\",\"p2\",\"sg\",\"present\",\"active\",\"ind\"\n\"3\",\"vocat\",\"vocare\",\"p3\",\"sg\",\"present\",\"active\",\"ind\"\n\"4\",\"vocaamus\",\"vocare\",\"p1\",\"pl\",\"present\",\"active\",\"ind\"\n\"5\",\"vocaatis\",\"vocare\",\"p2\",\"pl\",\"present\",\"active\",\"ind\"\n\"6\",\"vocant\",\"vocare\",\"p3\",\"pl\",\"present\",\"active\",\"ind\"","category":"page"},{"location":"","page":"Home","title":"Home","text":"We first download and read the csv file into Julia:","category":"page"},{"location":"","page":"Home","title":"Home","text":"download(\"https://osf.io/2ejfu/download\", \"latin.csv\")\n\nlatin = JudiLing.load_dataset(\"latin.csv\");","category":"page"},{"location":"","page":"Home","title":"Home","text":"and we can inspect the latin dataframe:","category":"page"},{"location":"","page":"Home","title":"Home","text":"display(latin)","category":"page"},{"location":"","page":"Home","title":"Home","text":"672×8 DataFrame. Omitted printing of 2 columns\n│ Row │ Column1 │ Word │ Lexeme │ Person │ Number │ Tense │\n│ │ Int64 │ String │ String │ String │ String │ String │\n├─────┼─────────┼────────────────┼─────────┼────────┼────────┼────────────┤\n│ 1 │ 1 │ vocoo │ vocare │ p1 │ sg │ present │\n│ 2 │ 2 │ vocaas │ vocare │ p2 │ sg │ present │\n│ 3 │ 3 │ vocat │ vocare │ p3 │ sg │ present │\n│ 4 │ 4 │ vocaamus │ vocare │ p1 │ pl │ present │\n│ 5 │ 5 │ vocaatis │ vocare │ p2 │ pl │ present │\n│ 6 │ 6 │ vocant │ vocare │ p3 │ pl │ present │\n│ 7 │ 7 │ clamoo │ clamare │ p1 │ sg │ present │\n│ 8 │ 8 │ clamaas │ clamare │ p2 │ sg │ present │\n⋮\n│ 664 │ 664 │ carpsisseemus │ carpere │ p1 │ pl │ pluperfect │\n│ 665 │ 665 │ carpsisseetis │ carpere │ p2 │ pl │ pluperfect │\n│ 666 │ 666 │ carpsissent │ carpere │ p3 │ pl │ pluperfect │\n│ 667 │ 667 │ cuccurissem │ currere │ p1 │ sg │ pluperfect │\n│ 668 │ 668 │ cuccurissees │ currere │ p2 │ sg │ pluperfect │\n│ 669 │ 669 │ cuccurisset │ currere │ p3 │ sg │ pluperfect │\n│ 670 │ 670 │ cuccurisseemus │ currere │ p1 │ pl │ pluperfect │\n│ 671 │ 671 │ cuccurisseetis │ currere │ p2 │ pl │ pluperfect │\n│ 672 │ 672 │ cuccurissent │ currere │ p3 │ pl │ pluperfect │","category":"page"},{"location":"","page":"Home","title":"Home","text":"For the production model, we want to predict correct forms given their lexemes and inflectional features. For example, giving the lexeme vocare and its inflectional features p1, sg, present, active and ind, the model should produce the form vocoo. On the other hand, the comprehension model takes forms as input and tries to predict their lexemes and inflectional features.","category":"page"},{"location":"","page":"Home","title":"Home","text":"We use letter trigrams to encode our forms. For word vocoo, for example, we use trigrams #vo, voc, oco, coo and oo#. Here, # is used as start/end token to encode the initial trigram and finial trigram of a word. The row vectors of the C matrix specify for each word which of the trigrams are realized in that word.","category":"page"},{"location":"","page":"Home","title":"Home","text":"To make the C matrix, we use the make_cue_matrix function:","category":"page"},{"location":"","page":"Home","title":"Home","text":"cue_obj = JudiLing.make_cue_matrix(\n latin,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )","category":"page"},{"location":"","page":"Home","title":"Home","text":"Next, we simulate the semantic matrix S using the make_S_matrix function:","category":"page"},{"location":"","page":"Home","title":"Home","text":"n_features = size(cue_obj.C, 2)\nS = JudiLing.make_S_matrix(\n latin,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)","category":"page"},{"location":"","page":"Home","title":"Home","text":"For this simulation, first random vectors are assigned to every lexeme and inflectional feature, and next the vectors of those features are summed up to obtain the semantic vector of the inflected form. Similar dimensions for C and S work best. Therefore, we retrieve the number of columns from the C matrix and pass it to make_S_matrix when constructing S.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, the next step is to calculate a mapping from S to C by solving equation C = SG. We use Cholesky decomposition to solve this equation:","category":"page"},{"location":"","page":"Home","title":"Home","text":"G = JudiLing.make_transform_matrix(S, cue_obj.C)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, we can make our predicted C matrix Chat:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Chat = S * G","category":"page"},{"location":"","page":"Home","title":"Home","text":"and evaluate the model's prediction accuracy:","category":"page"},{"location":"","page":"Home","title":"Home","text":"@show JudiLing.eval_SC(Chat, cue_obj.C)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.eval_SC(Chat, cue_obj.C) = 0.9926","category":"page"},{"location":"","page":"Home","title":"Home","text":"NOTE: Accuracy may be different depending on the simulated semantic matrix.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Similar to G and Chat, we can solve S = CF:","category":"page"},{"location":"","page":"Home","title":"Home","text":"F = JudiLing.make_transform_matrix(cue_obj.C, S)","category":"page"},{"location":"","page":"Home","title":"Home","text":"and we then calculate the Shat matrix and evaluate comprehension accuracy:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Shat = cue_obj.C * F\n@show JudiLing.eval_SC(Shat, S)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.eval_SC(Shat, S) = 0.9911","category":"page"},{"location":"","page":"Home","title":"Home","text":"NOTE: Accuracy may be different depending on the simulated semantic matrix.","category":"page"},{"location":"","page":"Home","title":"Home","text":"To model speech production, the proper triphones have to be selected and put into the right order. We have two algorithms that accomplish this. Both algorithms construct paths in a triphone space that start with word-initial triphones and end with word-final triphones.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The first step is to construct an adjacency matrix that specify which triphone can follow each other. In this example, we use the adjacency matrix constructed by make_cue_matrix, but we can also make use of a independently constructed adjacency matrix if required.","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = cue_obj.A","category":"page"},{"location":"","page":"Home","title":"Home","text":"For our sequencing algorithms, we calculate the number of timesteps we need for our algorithms. For the Latin dataset, the max timestep is equal to the length of the longest word. The argument :Word specifies the column in the Latin dataset that lists the words' forms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"max_t = JudiLing.cal_max_timestep(latin, :Word)","category":"page"},{"location":"","page":"Home","title":"Home","text":"One sequence finding algorithm used discrimination learning for the position of triphones. This function returns two lists, one with candidate triphone paths and their positional learning support (res) and one with the semantic supports for the gold paths (gpi).","category":"page"},{"location":"","page":"Home","title":"Home","text":"res_learn, gpi_learn = JudiLing.learn_paths(\n latin,\n latin,\n cue_obj.C,\n S,\n F,\n Chat,\n A,\n cue_obj.i2f,\n cue_obj.f2i, # api changed in 0.3.1\n check_gold_path = true,\n gold_ind = cue_obj.gold_ind,\n Shat_val = Shat,\n max_t = max_t,\n max_can = 10,\n grams = 3,\n threshold = 0.05,\n tokenized = false,\n keep_sep = false,\n target_col = :Word,\n verbose = true\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"We evaluate the accuracy on the training data as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"acc_learn = JudiLing.eval_acc(res_learn, cue_obj.gold_ind, verbose = false)\n\nprintln(\"Acc for learn: $acc_learn\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"Acc for learn: 0.9985","category":"page"},{"location":"","page":"Home","title":"Home","text":"The second sequence finding algorithm is usually faster than the first, but does not provide positional learnability estimates.","category":"page"},{"location":"","page":"Home","title":"Home","text":"res_build = JudiLing.build_paths(\n latin,\n cue_obj.C,\n S,\n F,\n Chat,\n A,\n cue_obj.i2f,\n cue_obj.gold_ind,\n max_t=max_t,\n n_neighbors=3,\n verbose=true\n )\n\nacc_build = JudiLing.eval_acc(\n res_build,\n cue_obj.gold_ind,\n verbose=false\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Acc for build: 0.9955","category":"page"},{"location":"","page":"Home","title":"Home","text":"After having obtained the results from the sequence functions: learn_paths or build_paths, we can save the results either into a csv or into a dataframe, the dataframe can be loaded into R with the rput command of the RCall package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.write2csv(\n res_learn,\n latin,\n cue_obj,\n cue_obj,\n \"latin_learn_res.csv\",\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word,\n root_dir = @__DIR__,\n output_dir = \"latin_out\"\n)\n\ndf_learn = JudiLing.write2df(\n res_learn,\n latin,\n cue_obj,\n cue_obj,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word\n)\n\nJudiLing.write2csv(\n res_build,\n latin,\n cue_obj,\n cue_obj,\n \"latin_build_res.csv\",\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word,\n root_dir = @__DIR__,\n output_dir = \"latin_out\"\n)\n\ndf_build = JudiLing.write2df(\n res_build,\n latin,\n cue_obj,\n cue_obj,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word\n)\n\ndisplay(df_learn)\ndisplay(df_build)","category":"page"},{"location":"","page":"Home","title":"Home","text":"3805×9 DataFrame. Omitted printing of 5 columns\n│ Row │ utterance │ identifier │ path │ pred │\n│ │ Int64? │ String? │ Union{Missing, String} │ String? │\n├──────┼───────────┼────────────────┼─────────────────────────────────────────────────────────┼────────────────┤\n│ 1 │ 1 │ vocoo │ #vo:voc:oco:coo:oo# │ vocoo │\n│ 2 │ 2 │ vocaas │ #vo:voc:oca:caa:aas:as# │ vocaas │\n│ 3 │ 2 │ vocaas │ #vo:voc:oca:caa:aab:aba:baa:aas:as# │ vocaabaas │\n│ 4 │ 2 │ vocaas │ #vo:voc:oca:caa:aat:ati:tis:is# │ vocaatis │\n│ 5 │ 2 │ vocaas │ #vo:voc:oca:caa:aav:avi:vis:ist:sti:tis:is# │ vocaavistis │\n│ 6 │ 2 │ vocaas │ #vo:voc:oca:caa:aam:amu:mus:us# │ vocaamus │\n│ 7 │ 2 │ vocaas │ #vo:voc:oca:caa:aab:abi:bit:it# │ vocaabit │\n│ 8 │ 2 │ vocaas │ #vo:voc:oca:caa:aam:amu:mur:ur# │ vocaamur │\n│ 9 │ 2 │ vocaas │ #vo:voc:oca:caa:aar:are:ret:et# │ vocaaret │\n⋮\n│ 3796 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:ure:ree:eet:eti:tis:is# │ cuccureetis │\n│ 3797 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:uri:ris:ist:sti:tis:is# │ cuccuristis │\n│ 3798 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:set:et# │ cuccurisset │\n│ 3799 │ 671 │ cuccurisseetis │ #cu:cur:urr:rri:rim:imi:min:ini:nii:ii# │ curriminii │\n│ 3800 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:sen:ent:nt# │ cuccurissent │\n│ 3801 │ 672 │ cuccurissent │ #cu:cur:urr:rre:rer:ere:ren:ent:nt# │ currerent │\n│ 3802 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:see:eem:emu:mus:us# │ cuccurisseemus │\n│ 3803 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:see:eet:eti:tis:is# │ cuccurisseetis │\n│ 3804 │ 672 │ cuccurissent │ #cu:cur:urr:rre:rer:ere:ren:ent:ntu:tur:ur# │ currerentur │\n│ 3805 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:see:ees:es# │ cuccurissees │\n2519×9 DataFrame. Omitted printing of 4 columns\n│ Row │ utterance │ identifier │ path │ pred │ num_tolerance │\n│ │ Int64? │ String? │ Union{Missing, String} │ String? │ Int64? │\n├──────┼───────────┼────────────────┼─────────────────────────────────────────────────┼──────────────┼───────────────┤\n│ 1 │ 1 │ vocoo │ #vo:voc:oco:coo:oo# │ vocoo │ 0 │\n│ 2 │ 1 │ vocoo │ #vo:voc:oca:caa:aab:abo:boo:oo# │ vocaaboo │ 0 │\n│ 3 │ 1 │ vocoo │ #vo:voc:oca:caa:aab:aba:bam:am# │ vocaabam │ 0 │\n│ 4 │ 2 │ vocaas │ #vo:voc:oca:caa:aas:as# │ vocaas │ 0 │\n│ 5 │ 2 │ vocaas │ #vo:voc:oca:caa:aab:abi:bis:is# │ vocaabis │ 0 │\n│ 6 │ 2 │ vocaas │ #vo:voc:oca:caa:aat:ati:tis:is# │ vocaatis │ 0 │\n│ 7 │ 3 │ vocat │ #vo:voc:oca:cat:at# │ vocat │ 0 │\n│ 8 │ 3 │ vocat │ #vo:voc:oca:caa:aab:aba:bat:at# │ vocaabat │ 0 │\n│ 9 │ 3 │ vocat │ #vo:voc:oca:caa:aas:as# │ vocaas │ 0 │\n⋮\n│ 2510 │ 671 │ cuccurisseetis │ #cu:cur:uri:ris:iss:sse:see:ees:es# │ curissees │ 0 │\n│ 2511 │ 671 │ cuccurisseetis │ #cu:cur:uri:ris:iss:sse:see:eem:emu:mus:us# │ curisseemus │ 0 │\n│ 2512 │ 671 │ cuccurisseetis │ #cu:cur:uri:ris:is# │ curis │ 0 │\n│ 2513 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:uri:ris:is# │ cuccuris │ 0 │\n│ 2514 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:sen:ent:nt# │ cuccurissent │ 0 │\n│ 2515 │ 672 │ cuccurissent │ #cu:cur:uri:ris:iss:sse:sen:ent:nt# │ curissent │ 0 │\n│ 2516 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:set:et# │ cuccurisset │ 0 │\n│ 2517 │ 672 │ cuccurissent │ #cu:cur:uri:ris:iss:sse:set:et# │ curisset │ 0 │\n│ 2518 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:sem:em# │ cuccurissem │ 0 │\n│ 2519 │ 672 │ cuccurissent │ #cu:cur:uri:ris:iss:sse:sem:em# │ curissem │ 0 │","category":"page"},{"location":"#Cross-validation","page":"Home","title":"Cross-validation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The model also provides functionality for cross-validation. Here, we first split the dataset randomly into 90% training and 10% validation data:","category":"page"},{"location":"","page":"Home","title":"Home","text":"latin_train, latin_val = JudiLing.loading_data_randomly_split(\"latin.csv\",\n \"data\",\n \"latin\",\n val_ratio=0.1,\n random_seed=42)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, we make the C matrix by passing both training and validation datasets to the make_combined_cue_matrix function which ensures that the C matrix contains columns for both training and validation data.","category":"page"},{"location":"","page":"Home","title":"Home","text":"cue_obj_train, cue_obj_val = JudiLing.make_combined_cue_matrix(\n latin_train,\n latin_val,\n grams = 3,\n target_col = :Word,\n tokenized = false,\n keep_sep = false\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Next, we simulate semantic vectors, again for both the training and validation data, using make_combined_S_matrix:","category":"page"},{"location":"","page":"Home","title":"Home","text":"n_features = size(cue_obj_train.C, 2)\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\", \"Number\", \"Tense\", \"Voice\", \"Mood\"],\n ncol = n_features\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"After that, we make the transformation matrices, but this time we only use the training dataset. We use these transformation matrices to predict the validation dataset.","category":"page"},{"location":"","page":"Home","title":"Home","text":"G_train = JudiLing.make_transform_matrix(S_train, cue_obj_train.C)\nF_train = JudiLing.make_transform_matrix(cue_obj_train.C, S_train)\n\nChat_train = S_train * G_train\nChat_val = S_val * G_train\nShat_train = cue_obj_train.C * F_train\nShat_val = cue_obj_val.C * F_train\n\n@show JudiLing.eval_SC(Chat_train, cue_obj_train.C)\n@show JudiLing.eval_SC(Chat_val, cue_obj_val.C)\n@show JudiLing.eval_SC(Shat_train, S_train)\n@show JudiLing.eval_SC(Shat_val, S_val)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.eval_SC(Chat_train, cue_obj_train.C) = 0.995\nJudiLing.eval_SC(Chat_val, cue_obj_val.C) = 0.403\nJudiLing.eval_SC(Shat_train, S_train) = 0.9917\nJudiLing.eval_SC(Shat_val, S_val) = 1.0","category":"page"},{"location":"","page":"Home","title":"Home","text":"Finally, we can find possible paths through build_paths or learn_paths. Since validation datasets are harder to predict, we turn on tolerant mode which allow the algorithms to find more paths but at the cost of investing more time.","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = cue_obj_train.A\nmax_t = JudiLing.cal_max_timestep(latin_train, latin_val, :Word)\n\nres_learn_train, gpi_learn_train = JudiLing.learn_paths(\n latin_train,\n latin_train,\n cue_obj_train.C,\n S_train,\n F_train,\n Chat_train,\n A,\n cue_obj_train.i2f,\n cue_obj_train.f2i, # api changed in 0.3.1\n gold_ind = cue_obj_train.gold_ind,\n Shat_val = Shat_train,\n check_gold_path = true,\n max_t = max_t,\n max_can = 10,\n grams = 3,\n threshold = 0.05,\n tokenized = false,\n sep_token = \"_\",\n keep_sep = false,\n target_col = :Word,\n issparse = :dense,\n verbose = true,\n)\n\nres_learn_val, gpi_learn_val = JudiLing.learn_paths(\n latin_train,\n latin_val,\n cue_obj_train.C,\n S_val,\n F_train,\n Chat_val,\n A,\n cue_obj_train.i2f,\n cue_obj_train.f2i, # api changed in 0.3.1\n gold_ind = cue_obj_val.gold_ind,\n Shat_val = Shat_val,\n check_gold_path = true,\n max_t = max_t,\n max_can = 10,\n grams = 3,\n threshold = 0.05,\n is_tolerant = true,\n tolerance = -0.1,\n max_tolerance = 2,\n tokenized = false,\n sep_token = \"-\",\n keep_sep = false,\n target_col = :Word,\n issparse = :dense,\n verbose = true,\n)\n\nacc_learn_train =\n JudiLing.eval_acc(res_learn_train, cue_obj_train.gold_ind, verbose = false)\nacc_learn_val = JudiLing.eval_acc(res_learn_val, cue_obj_val.gold_ind, verbose = false)\n\nres_build_train = JudiLing.build_paths(\n latin_train,\n cue_obj_train.C,\n S_train,\n F_train,\n Chat_train,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t = max_t,\n n_neighbors = 3,\n verbose = true,\n)\n\nres_build_val = JudiLing.build_paths(\n latin_val,\n cue_obj_train.C,\n S_val,\n F_train,\n Chat_val,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t = max_t,\n n_neighbors = 20,\n verbose = true,\n)\n\nacc_build_train =\n JudiLing.eval_acc(res_build_train, cue_obj_train.gold_ind, verbose = false)\nacc_build_val = JudiLing.eval_acc(res_build_val, cue_obj_val.gold_ind, verbose = false)\n\n@show acc_learn_train\n@show acc_learn_val\n@show acc_build_train\n@show acc_build_val","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"acc_learn_train = 0.9983\nacc_learn_val = 0.6866\nacc_build_train = 1.0\nacc_build_val = 0.3284","category":"page"},{"location":"","page":"Home","title":"Home","text":"Alternatively, we have a wrapper function incorporating all above functionalities. With this function, you can quickly explore datasets with different parameter settings. Please find more in the Test Combo Introduction.","category":"page"},{"location":"#Supports","page":"Home","title":"Supports","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"There are two types of supports in outputs. An utterance level and a set of supports for each cue. The former support is also called \"synthesis-by-analysis\" support. This support is calculated by predicted S vector and original S vector and it is used to select the best paths. Cue level supports are slices of Yt matrices from each timestep. Those supports are used to determine whether a cue is eligible for constructing paths.","category":"page"},{"location":"#Acknowledgments","page":"Home","title":"Acknowledgments","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This project was supported by the ERC advanced grant WIDE-742545 and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645.","category":"page"},{"location":"#Acknowledgments-2","page":"Home","title":"Acknowledgments","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This project was supported by the ERC advanced grant WIDE-742545 and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645.","category":"page"},{"location":"#Citation","page":"Home","title":"Citation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you find this package helpful, please cite it as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Luo, X., Heitmeier, M., Chuang, Y. Y., Baayen, R. H. JudiLing: an implementation of the Discriminative Lexicon Model in Julia. Eberhard Karls Universität Tübingen, Seminar für Sprachwissenschaft.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The following studies have made use of several algorithms now implemented in JudiLing instead of WpmWithLdl:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.\nBaayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270.\nChuang, Y.-Y., Lõo, K., Blevins, J. P., and Baayen, R. H. (2020). Estonian case inflection made simple. A case study in Word and Paradigm morphology with Linear Discriminative Learning. In Körtvélyessy, L., and Štekauer, P. (Eds.) Complex Words: Advances in Morphology, 1-19.\nChuang, Y-Y., Bell, M. J., Banke, I., and Baayen, R. H. (2020). Bilingual and multilingual mental lexicon: a modeling study with Linear Discriminative Learning. Language Learning, 1-55.\nHeitmeier, M., Chuang, Y-Y., Baayen, R. H. (2021). Modeling morphology with Linear Discriminative Learning: considerations and design choices. Frontiers in Psychology, 12, 4929.\nDenistia, K., and Baayen, R. H. (2022). The morphology of Indonesian: Data and quantitative modeling. In Shei, C., and Li, S. (Eds.) The Routledge Handbook of Asian Linguistics, (pp. 605-634). Routledge, London.\nHeitmeier, M., Chuang, Y.-Y., and Baayen, R. H. (2023). How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. Cognitive Psychology, 1-30.\nChuang, Y. Y., Kang, M., Luo, X. F. and Baayen, R. H. (2023). Vector Space Morphology with Linear Discriminative Learning. In Crepaldi, D. (Ed.) Linguistic morphology in the mind and brain.\nHeitmeier, M., Chuang, Y. Y., Axen, S. D., & Baayen, R. H. (2024). Frequency effects in linear discriminative learning. Frontiers in Human Neuroscience, 17, 1242720.\nPlag, I., Heitmeier, M. & Domahs, F. (to appear). German nominal number interpretation in an impaired mental lexicon: A naive discriminative learning perspective. The Mental Lexicon.","category":"page"},{"location":"man/make_cue_matrix/","page":"Make Cue Matrix","title":"Make Cue Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_cue_matrix/#Make-Cue-Matrix","page":"Make Cue Matrix","title":"Make Cue Matrix","text":"","category":"section"},{"location":"man/make_cue_matrix/","page":"Make Cue Matrix","title":"Make Cue Matrix","text":" Cue_Matrix_Struct\r\n make_cue_matrix\r\n make_combined_cue_matrix\r\n make_ngrams\r\n make_cue_matrix(data::DataFrame)\r\n make_cue_matrix(data::DataFrame, cue_obj::Cue_Matrix_Struct)\r\n make_cue_matrix(data_train::DataFrame, data_val::DataFrame)\r\n make_combined_cue_matrix(data_train, data_val)\r\n make_cue_matrix_from_CFBS(features::Vector{Vector{T}};\r\n pad_val::T = 0.,\r\n ncol::Union{Missing,Int}=missing) where {T}\r\n make_combined_cue_matrix_from_CFBS(features_train::Vector{Vector{T}},\r\n features_test::Vector{Vector{T}};\r\n pad_val::T = 0.,\r\n ncol::Union{Missing,Int}=missing) where {T}\r\n make_ngrams(tokens, grams, keep_sep, sep_token, start_end_token)","category":"page"},{"location":"man/make_cue_matrix/#JudiLing.Cue_Matrix_Struct","page":"Make Cue Matrix","title":"JudiLing.Cue_Matrix_Struct","text":"A structure that stores information created by makecuematrix: C is the cue matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices; goldind is a list of indices of gold paths; A is the adjacency matrix; grams is the number of grams for cues; targetcol is the column name for target strings; tokenized is whether the dataset target is tokenized; septoken is the separator; keepsep is whether to keep separators in cues; startendtoken is the start and end token in boundary cues.\n\n\n\n\n\n","category":"type"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"Construct cue matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_cue_matrix/#JudiLing.make_combined_cue_matrix","page":"Make Cue Matrix","title":"JudiLing.make_combined_cue_matrix","text":"Construct cue matrix where combined features and adjacencies for both training datasets and validation datasets.\n\n\n\n\n\n","category":"function"},{"location":"man/make_cue_matrix/#JudiLing.make_ngrams","page":"Make Cue Matrix","title":"JudiLing.make_ngrams","text":"Given a list of string tokens, extract their n-grams.\n\n\n\n\n\n","category":"function"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame}","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(data::DataFrame)\n\nMake the cue matrix for training datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_train = JudiLing.make_cue_matrix(\n latin_train,\n grams=3,\n target_col=:Word,\n tokenized=false,\n sep_token=\"-\",\n start_end_token=\"#\",\n keep_sep=false,\n verbose=false\n )\n\n# make cue matrix with tokenization\ncue_obj_train = JudiLing.make_cue_matrix(\n french_train,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n start_end_token=\"#\",\n keep_sep=true,\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame, JudiLing.Cue_Matrix_Struct}","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(data::DataFrame, cue_obj::Cue_Matrix_Struct)\n\nMake the cue matrix for validation datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\ncue_obj::Cue_Matrix_Struct: training cue object\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_val = JudiLing.make_cue_matrix(\n latin_val,\n cue_obj_train,\n grams=3,\n target_col=:Word,\n tokenized=false,\n sep_token=\"-\",\n keep_sep=false,\n start_end_token=\"#\",\n verbose=false\n )\n\n# make cue matrix with tokenization\ncue_obj_val = JudiLing.make_cue_matrix(\n french_val,\n cue_obj_train,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n keep_sep=true,\n start_end_token=\"#\",\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame}","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(data_train::DataFrame, data_val::DataFrame)\n\nMake the cue matrix for traiing and validation datasets at the same time.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_cue_matrix(\n latin_train,\n latin_val,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )\n\n# make cue matrix with tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_cue_matrix(\n french_train,\n french_val,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n keep_sep=true,\n start_end_token=\"#\",\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_combined_cue_matrix-Tuple{Any, Any}","page":"Make Cue Matrix","title":"JudiLing.make_combined_cue_matrix","text":"make_combined_cue_matrix(data_train, data_val)\n\nMake the cue matrix for training and validation datasets at the same time, where the features and adjacencies are combined.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_combined_cue_matrix(\n latin_train,\n latin_val,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )\n\n# make cue matrix with tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_combined_cue_matrix(\n french_train,\n french_val,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n keep_sep=true,\n start_end_token=\"#\",\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix_from_CFBS-Union{Tuple{Array{Vector{T}, 1}}, Tuple{T}} where T","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix_from_CFBS","text":"make_cue_matrix_from_CFBS(features::Vector{Vector{T}};\n pad_val::T = 0.,\n ncol::Union{Missing,Int}=missing) where {T}\n\nCreate a cue matrix from a vector of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val.\n\nObligatory arguments\n\nfeatures::Vector{Vector{T}}: vector of vectors containing C-FBS features\n\nOptional arguments\n\npad_val::T = 0.: Value with which the feature vectors will be padded\nncol::Union{Missing,Int}=missing: Number of columns of the C matrix. If not set, will be set to the maximum number of features\n\nExamples\n\nC = JudiLing.make_cue_matrix_from_CFBS(features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_combined_cue_matrix_from_CFBS-Union{Tuple{T}, Tuple{Array{Vector{T}, 1}, Array{Vector{T}, 1}}} where T","page":"Make Cue Matrix","title":"JudiLing.make_combined_cue_matrix_from_CFBS","text":"make_combined_cue_matrix_from_CFBS(features_train::Vector{Vector{T}},\n features_test::Vector{Vector{T}};\n pad_val::T = 0.,\n ncol::Union{Missing,Int}=missing) where {T}\n\nCreate cue matrices from two vectors of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val. The cue matrices are set to have to the size of the maximum number of feature values in features_train and features_test.\n\nObligatory arguments\n\nfeatures_train::Vector{Vector{T}}: vector of vectors containing C-FBS features\nfeatures_test::Vector{Vector{T}}: vector of vectors containing C-FBS features\n\nOptional arguments\n\npad_val::T = 0.: Value with which the feature vectors will be padded\nncol::Union{Missing,Int}=missing: Number of columns of the C matrices. If not set, will be set to the maximum number of features in features_train and features_test\n\nExamples\n\nC_train, C_test = JudiLing.make_combined_cue_matrix_from_CFBS(features_train, features_test)\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_ngrams-NTuple{5, Any}","page":"Make Cue Matrix","title":"JudiLing.make_ngrams","text":"make_ngrams(tokens, grams, keep_sep, sep_token, start_end_token)\n\nGiven a list of string tokens return a list of all n-grams for these tokens.\n\n\n\n\n\n","category":"method"}] +[{"location":"man/deep_learning/","page":"Deep learning","title":"Deep learning","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/deep_learning/#Deep-learning-in-JudiLing","page":"Deep learning","title":"Deep learning in JudiLing","text":"","category":"section"},{"location":"man/deep_learning/","page":"Deep learning","title":"Deep learning","text":"predict_from_deep_model(model::Flux.Chain,\n X::Union{SparseMatrixCSC,Matrix})\npredict_shat(model::Flux.Chain,\n ci::Vector{Int})\nget_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n X_val::Union{SparseMatrixCSC,Matrix,Missing},\n Y_val::Union{SparseMatrixCSC,Matrix,Missing},\n data_train::Union{DataFrame,Missing},\n data_val::Union{DataFrame,Missing},\n target_col::Union{Symbol, String,Missing},\n model_outpath::String;\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Flux.Chain} = missing,\n early_stopping::Union{Missing, Int}=missing,\n optimise_for_acc::Bool=false,\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n kargs...)\nget_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n model_outpath::String;\n data_train::Union{Missing, DataFrame}=missing,\n target_col::Union{Missing, Symbol, String}=missing,\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Flux.Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n kargs...)\nfiddl(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n learn_seq::Vector,\n data::DataFrame,\n target_col::Union{Symbol, String},\n model_outpath::String;\n hidden_dim::Int=1000,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n n_batch_eval::Int=100,\n measures_func::Union{Function, Missing}=missing,\n kargs...)\n","category":"page"},{"location":"man/deep_learning/#JudiLing.predict_from_deep_model-Tuple{Chain, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Deep learning","title":"JudiLing.predict_from_deep_model","text":"predict_from_deep_model(model::Chain,\n X::Union{SparseMatrixCSC,Matrix})\n\nGenerates output of a model given input X.\n\nObligatory arguments\n\nmodel::Chain: Model of type Flux.Chain, as generated by get_and_train_model\nX::Union{SparseMatrixCSC,Matrix}: Input matrix of size (numberofsamples, inpdim) where inpdim is the input dimension of model\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.predict_shat-Tuple{Chain, Vector{Int64}}","page":"Deep learning","title":"JudiLing.predict_shat","text":"predict_shat(model::Chain,\n ci::Vector{Int})\n\nPredicts semantic vector shat given a deep learning comprehension model model and a list of indices of ngrams ci.\n\nObligatory arguments\n\nmodel::Chain: Deep learning comprehension model as generated by get_and_train_model\nci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.get_and_train_model-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{Missing, SparseArrays.SparseMatrixCSC, Matrix}, Union{Missing, SparseArrays.SparseMatrixCSC, Matrix}, Union{Missing, DataFrames.DataFrame}, Union{Missing, DataFrames.DataFrame}, Union{Missing, String, Symbol}, String}","page":"Deep learning","title":"JudiLing.get_and_train_model","text":"get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n X_val::Union{SparseMatrixCSC,Matrix,Missing},\n Y_val::Union{SparseMatrixCSC,Matrix,Missing},\n data_train::Union{DataFrame,Missing},\n data_val::Union{DataFrame,Missing},\n target_col::Union{Symbol,String,Missing},\n model_outpath::String;\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001)\n model::Union{Missing, Chain}=missing,\n early_stopping::Union{Missing, Int}=missing,\n optimise_for_acc::Bool=false\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n ...kargs\n )\n\nTrains a deep learning model from X_train to Y_train, saving the model with either the highest validation accuracy or lowest validation loss (depending on optimise_for_acc) to outpath.\n\nThe default model looks like this:\n\ninp_dim = size(X_train, 2)\nout_dim = size(Y_train, 2)\nChain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))\n\nAny other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.\n\nBy default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.\n\nReturns a named tuple with the following values:\n\nmodel: the trained model\ndata_train: the training data, including any measures if computed by measures_func\ndata_val: the validation data, including any measures if computed by measures_func\nlosses_train: The losses of the training data for each epoch.\nlosses_val: The losses of the validation data after each epoch.\naccs_train: The accuracies of the training data after each epoch, if return_train_acc=true.\naccs_val: The accuracies of the validation data after each epoch.\n\nObligatory arguments\n\nX_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n\nY_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k\nX_train::Union{SparseMatrixCSC,Matrix}: validation input matrix of dimension l x n\nY_train::Union{SparseMatrixCSC,Matrix}: validation output/target matrix of dimension l x k\ndata_train::DataFrame: training data\ndata_val::DataFrame: validation data\ntarget_col::Union{Symbol, String}: column with target wordforms in datatrain and dataval\nmodel_outpath::String: filepath to where final model should be stored (in .bson format)\n\nOptional arguments\n\nhidden_dim::Int=1000: hidden dimension of the model\nn_epochs::Int=100: number of epochs for which the model should be trained\nbatchsize::Int=64: batchsize during training\nloss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!\noptimizer=Flux.Adam(0.001): optimizer to use for training\nmodel::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data\nearly_stopping::Union{Missing, Int}=missing: If missing, no early stopping is used. Otherwise early_stopping indicates how many epochs have to pass without improvement in validation accuracy before the training is stopped.\noptimise_for_acc::Bool=false: if true, keep model with highest validation accuracy. If false, keep model with lowest validation loss.\nreturn_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned\nverbose::Bool=true: Turn on verbose mode\nmeasures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument. If a measure is tagged for each epoch, the one tagged with \"final\" will be the one for the finally returned model.\nreturn_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.\n...kargs: any additional keyword arguments are passed to the measures_func\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.get_and_train_model-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, String}","page":"Deep learning","title":"JudiLing.get_and_train_model","text":"get_and_train_model(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n model_outpath::String;\n data_train::Union{Missing, DataFrame}=missing,\n target_col::Union{Missing, Symbol, String}=missing,\n hidden_dim::Int=1000,\n n_epochs::Int=100,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n measures_func::Union{Missing, Function}=missing,\n return_train_acc::Bool=false,\n ...kargs)\n\nTrains a deep learning model from X_train to Y_train, saving the model after n_epochs epochs. The default model looks like this:\n\ninp_dim = size(X_train, 2)\nout_dim = size(Y_train, 2)\nChain(Dense(inp_dim => hidden_dim, relu), Dense(hidden_dim => out_dim))\n\nAny other model with the same input and output dimensions can be provided to the function with the model argument. The default loss function is mean squared error, but any other loss function can be provded, as long as it fits with the model architecture.\n\nBy default the adam optimizer (Kingma and Ba, 2015) with learning rate 0.001 is used. You can provide any other optimizer. If you want to use a different learning rate, e.g. 0.01, provide optimizer=Flux.Adam(0.01). If you do not want to use an optimizer at all, and simply use normal gradient descent, provide optimizer=Descent(0.001), again replacing the learning rate with the learning rate of your preference.\n\nReturns a named tuple with the following values:\n\nmodel: the trained model\ndata_train: the data, including any measures if computed by measures_func\ndata_val: missing for this function\nlosses_train: The losses of the training data for each epoch.\nlosses_val: missing for this function\naccs_train: The accuracies of the training data after each epoch, if return_train_acc=true.\naccs_val: missing for this function\n\nObligatory arguments\n\nX_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n\nY_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k\nmodel_outpath::String: filepath to where final model should be stored (in .bson format)\n\nOptional arguments\n\ndata_train::Union{Missing, DataFrame}=missing: The training data. Only necessary if a measuresfunc is included or returntrain_acc=true.\ntarget_col::Union{Missing, Symbol, String}=missing: The column with target word forms in the training data. Only necessary if a measuresfunc is included or returntrain_acc=true.\nhidden_dim::Int=1000: hidden dimension of the model\nn_epochs::Int=100: number of epochs for which the model should be trained\nbatchsize::Int=64: batchsize during training\nloss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!\noptimizer=Flux.Adam(0.001): optimizer to use for training\nmodel::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data\nreturn_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned\nverbose::Bool=true: Turn on verbose mode\nmeasures_func::Union{Missing, Function}=missing: A measures function which is run at the end of every epoch. For more information see The measures_func argument.\nreturn_train_acc::Bool=false: If true, a vector with training accuracies is returned at the end of the training.\n...kargs: any additional keyword arguments are passed to the measures_func\n\n\n\n\n\n","category":"method"},{"location":"man/deep_learning/#JudiLing.fiddl-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Vector, DataFrames.DataFrame, Union{String, Symbol}, String}","page":"Deep learning","title":"JudiLing.fiddl","text":"fiddl(X_train::Union{SparseMatrixCSC,Matrix},\n Y_train::Union{SparseMatrixCSC,Matrix},\n learn_seq::Vector,\n data::DataFrame,\n target_col::Union{Symbol, String},\n model_outpath::String;\n hidden_dim::Int=1000,\n batchsize::Int=64,\n loss_func::Function=Flux.mse,\n optimizer=Flux.Adam(0.001),\n model::Union{Missing, Chain} = missing,\n return_losses::Bool=false,\n verbose::Bool=true,\n n_batch_eval::Int=100,\n compute_accuracy::Bool=true,\n measures_func::Union{Function, Missing}=missing,\n kargs...)\n\nTrains a deep learning model using the FIDDL method (frequency-informed deep discriminative learning). Optionally, after each n_batch_eval batches measures_func can be run to compute any measures which are then added to the data.\n\nnote: Note\nIf you get an OutOfMemory error, chances are that this is due to the eval_SC function being evaluated after each n_batch_eval batches. Setting compute_accuracy=false disables computing the mapping accuracy.\n\nReturns a named tuple with the following values:\n\nmodel: the trained model\ndata: the data, including any measures if computed by measures_func\nlosses_train: The losses of the data the model is trained on within each n_batch_eval batches.\nlosses: The losses of the full dataset after each n_batch_eval batches.\naccs: The accuracies of the full dataset after each n_batch_eval batches.\n\nObligatory arguments\n\nX_train::Union{SparseMatrixCSC,Matrix}: training input matrix of dimension m x n\nY_train::Union{SparseMatrixCSC,Matrix}: training output/target matrix of dimension m x k\nlearn_seq::Vector: List of indices in the order that the vectors in Xtrain and Ytrain should be presented to the model for training.\ndata::DataFrame: The full data.\ntarget_col::Union{Symbol, String}: The column with target word forms in the data.\nmodel_outpath::String: filepath to where final model should be stored (in .bson format)\n\nOptional arguments\n\nhidden_dim::Int=1000: hidden dimension of the model\nn_epochs::Int=100: number of epochs for which the model should be trained\nbatchsize::Int=64: batchsize during training\nloss_func::Function=Flux.mse: Loss function. Per default this is the mse loss, but other options might be a crossentropy loss (Flux.crossentropy). Make sure the model makes sense with the loss function!\noptimizer=Flux.Adam(0.001): optimizer to use for training\nmodel::Union{Missing, Chain} = missing: A custom model can be provided for training. Its requirements are that it has to correspond to the input and output size of the training and validation data\nreturn_losses::Bool=false: whether additional to the model per-epoch losses for the training and test data as well as per-epoch accuracy on the validation data should be returned\nverbose::Bool=true: Turn on verbose mode\nn_batch_eval::Int=100: Loss, accuracy and measures_func are evaluated every n_batch_eval batches.\ncompute_accuracy::Bool=true: Whether accuracy should be computed every n_batch_eval batches.\nmeasures_func::Union{Missing, Function}=missing: A measures function which is run each n_batch_eval batches. For more information see The measures_func argument.\n\n\n\n\n\n","category":"method"},{"location":"man/pickle/","page":"Pickle","title":"Pickle","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/pickle/#Utils","page":"Pickle","title":"Utils","text":"","category":"section"},{"location":"man/pickle/","page":"Pickle","title":"Pickle","text":" save_pickle\n load_pickle","category":"page"},{"location":"man/pickle/#JudiLing.save_pickle","page":"Pickle","title":"JudiLing.save_pickle","text":"Save pickle from python pickle file.\n\n\n\n\n\n","category":"function"},{"location":"man/pickle/#JudiLing.load_pickle","page":"Pickle","title":"JudiLing.load_pickle","text":"Load pickle from python pickle file.\n\n\n\n\n\n","category":"function"},{"location":"man/measures_func/#The-measures_func-argument","page":"Measures function","title":"The measures_func argument","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"The deep learning functions get_and_train_model and fiddl take a measures_func as one of their arguments. This helps computing measures during the training. For this to work, the measures_func has to conform to the following format.","category":"page"},{"location":"man/measures_func/#For-get_and_train_model","page":"Measures function","title":"For get_and_train_model","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"data_train, data_val = measures_func(X_train,\n Y_train,\n X_val,\n Y_val,\n Yhat_train,\n Yhat_val,\n data_train,\n data_val,\n target_col,\n model,\n epoch;\n kargs...)\n\n## Input\n\n- `X_train`: The input training matrix.\n- `Y_train`: The target training matrix\n- `X_val`: The input validation matrix.\n- `Y_val`: The target validation matrix.\n- `Yhat_train`: The predicted training matrix.\n- `Yhat_val`: The predicted validation matrix.\n- `data_train`: The training dataset.\n- `data_val`: The validation dataset.\n- `target_col`: The name of the column with the target wordforms in the datasets.\n- `model`: The trained model.\n- `epoch`: The epoch the training is currently in.\n- `kargs...`: Any other keyword arguments that should be passed to the function.\n\nNote: the `kargs` are just keyword arguments that are passed on from the parameters of `get_and_train_model` to the `measures_func`. For example, this could be a suffix that should be added to each added column in `measures_func`.\n\n## Output\nThe function has to return the training and validation dataframes.","category":"page"},{"location":"man/measures_func/#Example","page":"Measures function","title":"Example","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"Define a measures_func. This one computes target correlations for both training and validation datasets.","category":"page"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"function compute_target_corr(X_train, Y_train, X_val, Y_val,\n Yhat_train, Yhat_val, data_train,\n data_val, target_col, model, epoch)\n _, corr = JudiLing.eval_SC(Yhat_train, Y_train, R=true)\n data_train[!, string(\"target_corr_\", epoch)] = diag(corr)\n _, corr = JudiLing.eval_SC(Yhat_val, Y_val, R=true)\n data_val[!, string(\"target_corr_\", epoch)] = diag(corr)\n return(data_train, data_val)\nend","category":"page"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"Train a model for 100 epochs, call compute_target_corr after each epoch.","category":"page"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"res = JudiLing.get_and_train_model(cue_obj_train.C,\n S_train,\n cue_obj_val.C,\n S_val,\n train, val,\n :Word,\n \"test.bson\",\n return_losses=true,\n batchsize=3,\n measures_func=compute_target_corr)\n","category":"page"},{"location":"man/measures_func/#For-fiddl","page":"Measures function","title":"For fiddl","text":"","category":"section"},{"location":"man/measures_func/","page":"Measures function","title":"Measures function","text":"data = measures_func(X_train,\n Y_train,\n Yhat_train,\n data,\n target_col,\n model,\n step;\n kargs...)\n\n## Input\n\n- `X_train`: The input matrix of the full dataset.\n- `Y_train`: The target matrix of the full dataset.\n- `Yhat_train`: The predicted matrix of the full dataset at current step.\n- `data_train`: The full dataset.\n- `target_col`: The name of the column with the target wordforms in the dataset.\n- `model`: The trained model.\n- `step`: The step the training is currently in.\n- `kargs...`: Any other keyword arguments that should be passed to the function.\n\nNote: the `kargs` are just keyword arguments that are passed on from the parameters of `get_and_train_model` to the `measures_func`. For example, this could be a suffix that should be added to each added column in `measures_func`.\n\n## Output\nThe function has to return the dataset.","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"JudiLing is able to call the python package pyndl internally to compute NDL models. pyndl uses event files to compute the mapping matrices, which have to be generated manually or by using pyndl in Python, see documentation here. The advantage of calling pyndl from JudiLing is that the resulting weights, cue and semantic matrices can be directly translated into JudiLing format and further processing can be done in JudiLing.","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"note: Note\nFor pyndl to be available in JudiLing, PyCall has to be imported before JudiLing:using PyCall\nusing JudiLing","category":"page"},{"location":"man/pyndl/#Calling-pyndl-from-JudiLing","page":"Pyndl","title":"Calling pyndl from JudiLing","text":"","category":"section"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":" Pyndl_Weight_Struct\n pyndl(\n data_path::String;\n alpha::Float64 = 0.1,\n betas::Tuple{Float64,Float64} = (0.1, 0.1),\n method::String = \"openmp\"\n )","category":"page"},{"location":"man/pyndl/#JudiLing.Pyndl_Weight_Struct","page":"Pyndl","title":"JudiLing.Pyndl_Weight_Struct","text":"Pyndl_Weight_Struct\n cues::Vector{String}\n outcomes::Vector{String}\n weight::Matrix{Float64}\n\ncues::Vector{String}: Vector of cues, in the order that they appear in the weight matrix.\noutcomes::Vector{String}: Vector of outcomes, in the order that they appear in the weight matrix.\nweight::Matrix{Float64}: Weight matrix.\n\n\n\n\n\n","category":"type"},{"location":"man/pyndl/#JudiLing.pyndl-Tuple{String}","page":"Pyndl","title":"JudiLing.pyndl","text":"pyndl(\n data_path::String;\n alpha::Float64 = 0.1,\n betas::Tuple{Float64,Float64} = (0.1, 0.1),\n method::String = \"openmp\"\n)\n\nCompute weights using pyndl. See the documentation of pyndl for more information: https://pyndl.readthedocs.io/en/latest/\n\nObligatory arguments\n\ndata_path::String: Path to an events file as generated by pyndl's preprocess.createeventfile\n\nOptional arguments\n\nalpha::Float64 = 0.1: α learning rate.\nbetas::Tuple{Float64,Float64} = (0.1, 0.1): β1 and β2 learning rates\nmethod::String = \"openmp\": One of {\"openmp\", \"threading\"}. \"openmp\" only works on Linux.\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\n\n\n\n\n\n","category":"method"},{"location":"man/pyndl/#Translating-output-of-pyndl-to-cue-and-semantic-matrices-in-JudiLing","page":"Pyndl","title":"Translating output of pyndl to cue and semantic matrices in JudiLing","text":"","category":"section"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":"With the weights in hand, the cue and semantic matrices can be computed:","category":"page"},{"location":"man/pyndl/","page":"Pyndl","title":"Pyndl","text":" make_cue_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct;\n grams = 3,\n target_col = \"Words\",\n tokenized = false,\n sep_token = nothing,\n keep_sep = false,\n start_end_token = \"#\",\n verbose = false,\n )\n make_S_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n )\n make_S_matrix(\n data_train::DataFrame,\n data_val::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n )","category":"page"},{"location":"man/pyndl/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame, JudiLing.Pyndl_Weight_Struct}","page":"Pyndl","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct;\n grams = 3,\n target_col = \"Words\",\n tokenized = false,\n sep_token = nothing,\n keep_sep = false,\n start_end_token = \"#\",\n verbose = false,\n)\n\nMake the cue matrix based on a dataframe and weights computed with pyndl. Practically this means that the cues are extracted from the weights object and translated to the JudiLing format.\n\nObligatory arguments\n\ndata::DataFrame: Dataset with all the word types on which the weights were trained.\npyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl\n\nOptional argyments\n\ngrams = 3: N-gram size (has to match the n-gram granularity of the cues on which the weights were trained).\ntarget_col = \"Words\": Column with target words.\ntokenized = false: Whether the target words are already tokenized\nsep_token = nothing: The string separating the tokens (only used if tokenized=true).\nkeep_sep = false: Whether the sep_token should be retained in the cues.\nstart_end_token = \"#\": The string with which to mark word boundaries.\nverbose = false: Verbose mode.\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\ncue_obj = JudiLing.make_cue_matrix(\"latin_train.csv\", weights,\n grams = 3,\n target_col = \"Word\")\n\n\n\n\n\n","category":"method"},{"location":"man/pyndl/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, JudiLing.Pyndl_Weight_Struct, Vector}","page":"Pyndl","title":"JudiLing.make_S_matrix","text":"make_S_matrix(\n data::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n)\n\nCreate semantic matrix based on a dataframe and weights computed with pyndl. Practically this means that the semantic features are extracted from the weights object and translated to the JudiLing format.\n\nObligatory arguments\n\ndata::DataFrame: The dataset with word types.\npyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl.\nn_features_columns::Vector: Vector of columns with the features in the dataset.\n\nOptional arguments\n\ntokenized=false: Whether the features in n_features_columns columns are already tokenized (e.g. \"feature1_feature2_feature3\")\nsep_token=\"_\": The string with which the features are separated (only used if tokenized=false).\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\nS = JudiLing.make_S_matrix(data,\n weights_latin,\n [\"Lexeme\", \"Person\", \"Number\", \"Tense\", \"Voice\", \"Mood\"],\n tokenized=false)\n\n\n\n\n\n","category":"method"},{"location":"man/pyndl/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, JudiLing.Pyndl_Weight_Struct, Vector}","page":"Pyndl","title":"JudiLing.make_S_matrix","text":"make_S_matrix(\n data_train::DataFrame,\n data_val::DataFrame,\n pyndl_weights::Pyndl_Weight_Struct,\n n_features_columns::Vector;\n tokenized::Bool=false,\n sep_token::String=\"_\"\n)\n\nCreate semantic matrix based on a training and validation dataframe and weights computed with pyndl. Practically this means that the semantic features are extracted from the weights object and translated to the JudiLing format.\n\nObligatory arguments\n\ndata_train::DataFrame: The training dataset.\ndata_val::DataFrame: The validation dataset.\npyndl_weights::Pyndl_Weight_Struct: Weights trained with JudiLing.pyndl.\nn_features_columns::Vector: Vector of columns with the features in the training and validation datasets.\n\nOptional arguments\n\ntokenized=false: Whether the features in n_features_columns columns are already tokenized (e.g. \"feature1_feature2_feature3\")\nsep_token=\"_\": The string with which the features are separated (only used if tokenized=false).\n\nExample\n\nweights = JudiLing.pyndl(\"data/latin_train_events.tab.gz\")\nS_train, S_val = JudiLing.make_S_matrix(train,\n val,\n weights_latin,\n [\"Lexeme\", \"Person\", \"Number\", \"Tense\", \"Voice\", \"Mood\"],\n tokenized=false)\n\n\n\n\n\n","category":"method"},{"location":"man/wh/","page":"Widrow-Hoff Learning","title":"Widrow-Hoff Learning","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/wh/#Utils","page":"Widrow-Hoff Learning","title":"Utils","text":"","category":"section"},{"location":"man/wh/","page":"Widrow-Hoff Learning","title":"Widrow-Hoff Learning","text":" wh_learn(\n X,\n Y;\n eta = 0.01,\n n_epochs = 1,\n weights = nothing,\n learn_seq = nothing,\n save_history = false,\n history_cols = nothing,\n history_rows = nothing,\n verbose = false,\n )\n make_learn_seq(freq; random_seed = 314)","category":"page"},{"location":"man/wh/#JudiLing.wh_learn-Tuple{Any, Any}","page":"Widrow-Hoff Learning","title":"JudiLing.wh_learn","text":"wh_learn(\n X,\n Y;\n eta = 0.01,\n n_epochs = 1,\n weights = nothing,\n learn_seq = nothing,\n save_history = false,\n history_cols = nothing,\n history_rows = nothing,\n verbose = false,\n )\n\nWidrow-Hoff Learning.\n\nObligatory Arguments\n\ntest_mode::Symbol: which test mode, currently supports :trainonly, :presplit, :carefulsplit and :randomsplit.\n\nOptional Arguments\n\neta::Float64=0.1: the learning rate\nn_epochs::Int64=1: the number of epochs to be trained\nweights::Matrix=nothing: the initial weights\nlearn_seq::Vector=nothing: the learning sequence\nsave_history::Bool=false: if true, a partical training history will be saved\nhistory_cols::Vector=nothing: the list of column indices you want to saved in history, e.g. [1,32,42] or [2]\nhistory_rows::Vector=nothing: the list of row indices you want to saved in history, e.g. [1,32,42] or [2]\nverbose::Bool = false: if true, more information will be printed out\n\n\n\n\n\n","category":"method"},{"location":"man/wh/#JudiLing.make_learn_seq-Tuple{Any}","page":"Widrow-Hoff Learning","title":"JudiLing.make_learn_seq","text":"make_learn_seq(freq; random_seed = 314)\n\nMake Widrow-Hoff learning sequence from frequencies. Creates a randomly ordered sequences of indices where each index appears according to its frequncy.\n\nObligatory arguments\n\nfreq: Vector with frequencies.\n\nOptional arguments\n\nrandom_seed = 314: Random seed to control randomness.\n\nExample\n\nlearn_seq = JudiLing.make_learn_seq(data.frequency)\n\n\n\n\n\n","category":"method"},{"location":"man/utils/","page":"Utils","title":"Utils","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/utils/#Utils","page":"Utils","title":"Utils","text":"","category":"section"},{"location":"man/utils/","page":"Utils","title":"Utils","text":" iscorrect\n display_pred\n translate\n translate_path\n is_truly_sparse\n isattachable\n iscomplete\n isstart\n isnovel\n check_used_token\n cal_max_timestep","category":"page"},{"location":"man/utils/#JudiLing.iscorrect","page":"Utils","title":"JudiLing.iscorrect","text":"Check whether the predictions are correct.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.display_pred","page":"Utils","title":"JudiLing.display_pred","text":"Display prediction nicely.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.translate","page":"Utils","title":"JudiLing.translate","text":"Translate indices into words or utterances\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.translate_path","page":"Utils","title":"JudiLing.translate_path","text":"Append indices together to form a path\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.is_truly_sparse","page":"Utils","title":"JudiLing.is_truly_sparse","text":"Check whether a matrix is truly sparse regardless its format, where M is originally a sparse matrix format.\n\n\n\n\n\nCheck whether a matrix is truly sparse regardless its format, where M is originally a dense matrix format.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.isattachable","page":"Utils","title":"JudiLing.isattachable","text":"Check whether a gram can attach to another gram.\n\n\n\n\n\nCheck whether a gram can attach to another gram.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.iscomplete","page":"Utils","title":"JudiLing.iscomplete","text":"Check whether a path is complete.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.isstart","page":"Utils","title":"JudiLing.isstart","text":"Check whether a gram can start a path.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.isnovel","page":"Utils","title":"JudiLing.isnovel","text":"Check whether a predicted path is in training data.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.check_used_token","page":"Utils","title":"JudiLing.check_used_token","text":"Check whether there are tokens already used in dataset as n-gram components.\n\n\n\n\n\n","category":"function"},{"location":"man/utils/#JudiLing.cal_max_timestep","page":"Utils","title":"JudiLing.cal_max_timestep","text":"function cal_max_timestep(\n data_train::DataFrame,\n data_val::DataFrame,\n target_col::Union{String, Symbol};\n tokenized::Bool = false,\n sep_token::Union{Nothing, String, Char} = \"\",\n)\n\nCalculate the max timestep given training and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\ntarget_col::Union{String, Symbol}: the column with the target word forms\n\nOptional Arguments\n\ntokenized::Bool = false: Whether the word forms in the target_col are already tokenized\nsep_token::Union{Nothing, String, Char} = \"\": The token with which the word forms are tokenized\n\nExamples\n\nJudiLing.cal_max_timestep(latin_train, latin_val, target_col=:Word)\n\n\n\n\n\nfunction cal_max_timestep(\n data::DataFrame,\n target_col::Union{String, Symbol};\n tokenized::Bool = false,\n sep_token::Union{Nothing, String, Char} = \"\",\n)\n\nCalculate the max timestep given training dataset.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\ntarget_col::Union{String, Symbol}: the column with the target word forms\n\nOptional Arguments\n\ntokenized::Bool = false: Whether the word forms in the target_col are already tokenized\nsep_token::Union{Nothing, String, Char} = \"\": The token with which the word forms are tokenized\n\nExamples\n\nJudiLing.cal_max_timestep(latin, target_col=:Word)\n\n\n\n\n\n","category":"function"},{"location":"man/make_adjacency_matrix/","page":"Make Adjacency Matrix","title":"Make Adjacency Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_adjacency_matrix/#Make-Adjacency-Matrix","page":"Make Adjacency Matrix","title":"Make Adjacency Matrix","text":"","category":"section"},{"location":"man/make_adjacency_matrix/","page":"Make Adjacency Matrix","title":"Make Adjacency Matrix","text":" make_full_adjacency_matrix\n make_full_adjacency_matrix(i2f)\n make_combined_adjacency_matrix(data_train, data_val)","category":"page"},{"location":"man/make_adjacency_matrix/#JudiLing.make_full_adjacency_matrix","page":"Make Adjacency Matrix","title":"JudiLing.make_full_adjacency_matrix","text":"make_adjacency_matrix(i2f)\n\nMake full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.\n\nObligatory Arguments\n\ni2f::Dict: the dictionary returning features given indices\n\nOptional Arguments\n\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nverbose::Bool=false: if true, more information will be printed\n\nExamples\n\n# without tokenization\ni2f = Dict([(1, \"#ab\"), (2, \"abc\"), (3, \"bc#\"), (4, \"#bc\"), (5, \"ab#\")])\nJudiLing.make_adjacency_matrix(i2f)\n\n# with tokenization\ni2f = Dict([(1, \"#-a-b\"), (2, \"a-b-c\"), (3, \"b-c-#\"), (4, \"#-b-c\"), (5, \"a-b-#\")])\nJudiLing.make_adjacency_matrix(\n i2f,\n tokenized=true,\n sep_token=\"-\")\n\n\n\n\n\n","category":"function"},{"location":"man/make_adjacency_matrix/#JudiLing.make_full_adjacency_matrix-Tuple{Any}","page":"Make Adjacency Matrix","title":"JudiLing.make_full_adjacency_matrix","text":"make_adjacency_matrix(i2f)\n\nMake full adjacency matrix based only on the form of n-grams regardless of whether they are seen in the training data. This usually takes hours for large datasets, as all possible combinations are considered.\n\nObligatory Arguments\n\ni2f::Dict: the dictionary returning features given indices\n\nOptional Arguments\n\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nverbose::Bool=false: if true, more information will be printed\n\nExamples\n\n# without tokenization\ni2f = Dict([(1, \"#ab\"), (2, \"abc\"), (3, \"bc#\"), (4, \"#bc\"), (5, \"ab#\")])\nJudiLing.make_adjacency_matrix(i2f)\n\n# with tokenization\ni2f = Dict([(1, \"#-a-b\"), (2, \"a-b-c\"), (3, \"b-c-#\"), (4, \"#-b-c\"), (5, \"a-b-#\")])\nJudiLing.make_adjacency_matrix(\n i2f,\n tokenized=true,\n sep_token=\"-\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_adjacency_matrix/#JudiLing.make_combined_adjacency_matrix-Tuple{Any, Any}","page":"Make Adjacency Matrix","title":"JudiLing.make_combined_adjacency_matrix","text":"make_combined_adjacency_matrix(data_train, data_val)\n\nMake combined adjacency matrix.\n\nObligatory Arguments\n\ndata_train::DataFrame: training dataset\ndata_val::DataFrame: validation dataset\n\nOptional Arguments\n\ngrams=3: the number of grams for cues\ntarget_col=:Words: the column name for target strings\ntokenized=false:if true, the dataset target is assumed to be tokenized\nsep_token=nothing: separator\nkeep_sep=false: if true, keep separators in cues\nstart_end_token=\"#\": start and end token in boundary cues\nverbose=false: if true, more information is printed\n\nExamples\n\nJudiLing.make_combined_adjacency_matrix(\n latin_train,\n latin_val,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/","page":"Cholesky","title":"Cholesky","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/cholesky/#Cholesky","page":"Cholesky","title":"Cholesky","text":"","category":"section"},{"location":"man/cholesky/","page":"Cholesky","title":"Cholesky","text":" make_transform_fac\n make_transform_matrix\n make_transform_fac(X::SparseMatrixCSC)\n make_transform_fac(X::Matrix)\n make_transform_matrix(fac::Union{LinearAlgebra.Cholesky, SuiteSparse.CHOLMOD.Factor}, X::Union{SparseMatrixCSC, Matrix}, Y::Union{SparseMatrixCSC, Matrix})\n make_transform_matrix(X::SparseMatrixCSC, Y::Matrix)\n make_transform_matrix(X::Matrix, Y::Union{SparseMatrixCSC, Matrix})\n make_transform_matrix(X::SparseMatrixCSC, Y::SparseMatrixCSC)\n format_matrix(M::Union{SparseMatrixCSC, Matrix}, output_format=:auto)","category":"page"},{"location":"man/cholesky/#JudiLing.make_transform_fac","page":"Cholesky","title":"JudiLing.make_transform_fac","text":"The first part of make transform matrix, usually used by the learn_paths function to save time and computing resources.\n\n\n\n\n\n","category":"function"},{"location":"man/cholesky/#JudiLing.make_transform_matrix","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"Using Cholesky decomposition to calculate the transformation matrix from S to C or from C to S.\n\n\n\n\n\n","category":"function"},{"location":"man/cholesky/#JudiLing.make_transform_fac-Tuple{SparseArrays.SparseMatrixCSC}","page":"Cholesky","title":"JudiLing.make_transform_fac","text":"make_transform_fac(X::SparseMatrixCSC)\n\nCalculate the first step of Cholesky decomposition for sparse matrices.\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_fac-Tuple{Matrix}","page":"Cholesky","title":"JudiLing.make_transform_fac","text":"make_transform_fac(X::Matrix)\n\nCalculate the first step of Cholesky decomposition for dense matrices.\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{Union{SparseArrays.CHOLMOD.Factor, LinearAlgebra.Cholesky}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(fac::Union{LinearAlgebra.Cholesky, SuiteSparse.CHOLMOD.Factor}, X::Union{SparseMatrixCSC, Matrix}, Y::Union{SparseMatrixCSC, Matrix})\n\nSecond step in calculating the Cholesky decomposition for the transformation matrix.\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{SparseArrays.SparseMatrixCSC, Matrix}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(X::SparseMatrixCSC, Y::Matrix)\n\nUse Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a dense matrix.\n\nObligatory Arguments\n\nX::SparseMatrixCSC: the X matrix, where X is a sparse matrix\nY::Matrix: the Y matrix, where Y is a dense matrix\n\nOptional Arguments\n\nmethod::Symbol = :additive: whether :additive or :multiplicative decomposition is required\nshift::Float64 = 0.02: shift value for :additive decomposition\nmultiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition\noutput_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse\nverbose::Bool = false: if true, more information will be printed out\n\nExamples\n\n# additive mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :additive,\n shift = 0.02,\n verbose = false)\n\n# multiplicative mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :multiplicative,\n multiplier = 1.01,\n verbose = false)\n\n# further control of sparsity ratio\nJudiLing.make_transform_matrix(\n ...\n output_format = :auto,\n sparse_ratio = 0.05,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{Matrix, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(X::Matrix, Y::Union{SparseMatrixCSC, Matrix})\n\nUse the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a dense matrix and Y is either a dense matrix or a sparse matrix.\n\nObligatory Arguments\n\nX::Matrix: the X matrix, where X is a dense matrix\nY::Union{SparseMatrixCSC, Matrix}: the Y matrix, where Y is either a sparse or a dense matrix\n\nOptional Arguments\n\nmethod::Symbol = :additive: whether :additive or :multiplicative decomposition is required\nshift::Float64 = 0.02: shift value for :additive decomposition\nmultiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition\noutput_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse\nverbose::Bool = false: if true, more information will be printed out\n\nExamples\n\n# additive mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :additive,\n shift = 0.02,\n verbose = false)\n\n# multiplicative mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method=:multiplicative,\n multiplier = 1.01,\n verbose = false)\n\n# further control of sparsity ratio\nJudiLing.make_transform_matrix(\n ...\n output_format = :auto,\n sparse_ratio = 0.05,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.make_transform_matrix-Tuple{SparseArrays.SparseMatrixCSC, SparseArrays.SparseMatrixCSC}","page":"Cholesky","title":"JudiLing.make_transform_matrix","text":"make_transform_matrix(X::SparseMatrixCSC, Y::SparseMatrixCSC)\n\nUse the Cholesky decomposition to calculate the transformation matrix from X to Y, where X is a sparse matrix and Y is a sparse matrix.\n\nObligatory Arguments\n\nX::SparseMatrixCSC: the X matrix, where X is a sparse matrix\nY::SparseMatrixCSC: the Y matrix, where Y is a sparse matrix\n\nOptional Arguments\n\nmethod::Symbol = :additive: whether :additive or :multiplicative decomposition is required\nshift::Float64 = 0.02: shift value for :additive decomposition\nmultiplier::Float64 = 1.01: multiplier value for :multiplicative decomposition\noutput_format::Symbol = :auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Float64 = 0.05: the ratio to decide whether a matrix is sparse\nverbose::Bool = false: if true, more information will be printed out\n\nExamples\n\n# additive mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :additive,\n shift = 0.02,\n verbose = false)\n\n# multiplicative mode\nJudiLing.make_transform_matrix(\n C,\n S,\n method = :multiplicative,\n multiplier = 1.01,\n verbose = false)\n\n# further control of sparsity ratio\nJudiLing.make_transform_matrix(\n ...\n output_format = :auto,\n sparse_ratio = 0.05,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/cholesky/#JudiLing.format_matrix","page":"Cholesky","title":"JudiLing.format_matrix","text":"format_matrix(M::Union{SparseMatrixCSC, Matrix}, output_format=:auto)\n\nConvert output matrix format to either a dense matrix or a sparse matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_semantic_matrix/#Make-Semantic-Matrix","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":"","category":"section"},{"location":"man/make_semantic_matrix/#Make-binary-semantic-vectors","page":"Make Semantic Matrix","title":"Make binary semantic vectors","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":" PS_Matrix_Struct\n make_pS_matrix\n make_pS_matrix(data)\n make_pS_matrix(data_val, pS_obj)\n make_combined_pS_matrix(\n data_train,\n data_val;\n features_col = :CommunicativeIntention,\n sep_token = \"_\",\n )","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.PS_Matrix_Struct","page":"Make Semantic Matrix","title":"JudiLing.PS_Matrix_Struct","text":"A structure that stores the discrete semantic vectors: pS is the discrete semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.\n\n\n\n\n\n","category":"type"},{"location":"man/make_semantic_matrix/#JudiLing.make_pS_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_pS_matrix","text":"Make discrete semantic matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_pS_matrix-Tuple{Any}","page":"Make Semantic Matrix","title":"JudiLing.make_pS_matrix","text":"make_pS_matrix(data)\n\nCreate a discrete semantic matrix given a dataframe.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\n\nOptional Arguments\n\nfeatures_col::Symbol=:CommunicativeIntention: the column name for target\nsep_token::String=\"_\": separator\n\nExamples\n\ns_obj_train = JudiLing.make_pS_matrix(\n utterance,\n features_col=:CommunicativeIntention,\n sep_token=\"_\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_pS_matrix-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_pS_matrix","text":"make_pS_matrix(data_val, pS_obj)\n\nConstruct discrete semantic matrix for the validation datasets given by the exemplar in the dataframe, and given the S matrix for the training datasets.\n\nObligatory Arguments\n\ndata_val::DataFrame: the dataset\npS_obj::PS_Matrix_Struct: training PS object\n\nOptional Arguments\n\nfeatures_col::Symbol=:CommunicativeIntention: the column name for target\nsep_token::String=\"_\": separator\n\nExamples\n\ns_obj_val = JudiLing.make_pS_matrix(\n data_val,\n s_obj_train,\n features_col=:CommunicativeIntention,\n sep_token=\"_\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_pS_matrix-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_pS_matrix","text":"make_combined_pS_matrix(\n data_train,\n data_val;\n features_col = :CommunicativeIntention,\n sep_token = \"_\",\n)\n\nCreate discrete semantic matrices for a train and validation dataframe.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\n\nOptional Arguments\n\nfeatures_col::Symbol=:CommunicativeIntention: the column name for target\nsep_token::String=\"_\": separator\n\nExamples\n\ns_obj_train, s_obj_val = JudiLing.make_combined_pS_matrix(\n data_train,\n data_val,\n features_col=:CommunicativeIntention,\n sep_token=\"_\")\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#Simulate-semantic-vectors","page":"Make Semantic Matrix","title":"Simulate semantic vectors","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":" L_Matrix_Struct\n make_S_matrix\n make_L_matrix\n make_combined_S_matrix\n make_combined_L_matrix\n make_S_matrix(data::DataFrame, base::Vector, inflections::Vector)\n make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n make_S_matrix(data::DataFrame, base::Vector)\n make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n make_S_matrix(data_train::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n make_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n make_S_matrix(data::DataFrame, base::Vector, L::L_Matrix_Struct)\n make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n make_L_matrix(data::DataFrame, base::Vector)\n make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n make_combined_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n make_combined_S_matrix( data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n L_Matrix_Struct(L, sd_base, sd_base_mean, sd_inflection, sd_inflection_mean, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)\n L_Matrix_Struct(L, sd_base, sd_inflection, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.L_Matrix_Struct","page":"Make Semantic Matrix","title":"JudiLing.L_Matrix_Struct","text":"A structure that stores Lexome semantic vectors: L is Lexome semantic matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices.\n\n\n\n\n\n","category":"type"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"Make simulated semantic matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_L_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_L_matrix","text":"Make simulated lexome matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"Make combined simulated S matrices, where combined features from both training datasets and validation datasets\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_L_matrix","page":"Make Semantic Matrix","title":"JudiLing.make_combined_L_matrix","text":"Make combined simulated Lexome matrix, where combined features from both training datasets and validation datasets\n\n\n\n\n\n","category":"function"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data::DataFrame, base::Vector, inflections::Vector)\n\nCreate simulated semantic matrix for the training datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train = JudiLing.make_S_matrix(\n french,\n [\"Lexeme\"],\n [\"Tense\",\"Aspect\",\"Person\",\"Number\",\"Gender\",\"Class\",\"Mood\"],\n ncol=200)\n\n# deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n\nCreate simulated semantic matrix for the validation datasets, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_S_matrix(\n french,\n french_val,\n [\"Lexeme\"],\n [\"Tense\",\"Aspect\",\"Person\",\"Number\",\"Gender\",\"Class\",\"Mood\"],\n ncol=200)\n\n# deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data::DataFrame, base::Vector)\n\nCreate simulated semantic matrix for the training datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_base::Int64=4: the sd of base features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train = JudiLing.make_S_matrix(\n french,\n [\"Lexeme\"],\n ncol=200)\n\n# deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n\nCreate simulated semantic matrix for the validation datasets with only base features, given the input data of a vector specified contex lexemes and a vector specified gramatic lexemes. The semantic vector of a word form is constructed summing semantic vectors of content and gramatic lexemes.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_base::Int64=4: the sd of base features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_S_matrix(\n french,\n french_val,\n [\"Lexeme\"],\n ncol=200)\n\n# deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n isdeep=true,\n ...)\n\n# non-deep mode\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n isdeep=false,\n ...)\n\n# add additional Gaussian noise\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n add_noise=true,\n sd_noise=1,\n ...)\n\n# further control of means and standard deviations\nS_train, S_val = JudiLing.make_S_matrix(\n ...\n sd_base_mean=1,\n sd_inflection_mean=1,\n sd_base=4,\n sd_inflection=4,\n sd_noise=1,\n ...)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1 = JudiLing.make_S_matrix(\n latin,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Union{Nothing, DataFrames.DataFrame}, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1, S2 = JudiLing.make_S_matrix(\n latin,\n latin_val,\n [\"Lexeme\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data::DataFrame, base::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1 = JudiLing.make_S_matrix(\n latin,\n [\"Lexeme\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_S_matrix","text":"make_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix where lexome matrix is available.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\nL::L_Matrix_Struct: the lexome matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS1, S2 = JudiLing.make_S_matrix(\n latin,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L1,\n add_noise=true,\n sd_noise=1,\n normalized=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_L_matrix-Tuple{DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_L_matrix","text":"make_L_matrix(data::DataFrame, base::Vector)\n\nCreate Lexome Matrix with simulated semantic vectors where there are only base features.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_base::Int64=4: the sd of base features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\n\nExamples\n\n# basic usage\nL = JudiLing.make_L_matrix(\n latin,\n [\"Lexeme\"],\n ncol=200)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\nL::L_Matrix_Struct: the Lexome Matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, Union{Nothing, DataFrames.DataFrame}, Vector, JudiLing.L_Matrix_Struct}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix(data_train::DataFrame, data_val::Union{DataFrame, Nothing}, base::Vector, L::L_Matrix_Struct)\n\nCreate simulated semantic matrix for the training datasets and validation datasets with existing Lexome matrix, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\nL::L_Matrix_Struct: the Lexome Matrix\n\nOptional Arguments\n\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n L)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix( data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n\nCreate simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_S_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_S_matrix","text":"make_combined_S_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n\nCreate simulated semantic matrix for the training datasets and validation datasets, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\nadd_noise::Bool=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Bool=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\n\nExamples\n\n# basic usage\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_L_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_L_matrix","text":"make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector, inflections::Vector)\n\nCreate Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\ninflections::Vector: grammatic lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\n\nExamples\n\n# basic usage\nL = JudiLing.make_combined_L_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_combined_L_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Vector}","page":"Make Semantic Matrix","title":"JudiLing.make_combined_L_matrix","text":"make_combined_L_matrix(data_train::DataFrame, data_val::DataFrame, base::Vector)\n\nCreate Lexome Matrix with simulated semantic vectors, where features are combined from both training datasets and validation datasets.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nbase::Vector: context lexemes\n\nOptional Arguments\n\nncol::Int64=200: dimension of semantic vectors, usually the same as that of cue vectors\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nseed::Int64=314: the random seed\nisdeep::Bool=true: if true, mean of each feature is also randomized\n\nExamples\n\n# basic usage\nL = JudiLing.make_combined_L_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n ncol=n_features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.L_Matrix_Struct-NTuple{12, Any}","page":"Make Semantic Matrix","title":"JudiLing.L_Matrix_Struct","text":"L_Matrix_Struct(L, sd_base, sd_base_mean, sd_inflection, sd_inflection_mean, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)\n\nConstruct LMatrixStruct with deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.L_Matrix_Struct-NTuple{10, Any}","page":"Make Semantic Matrix","title":"JudiLing.L_Matrix_Struct","text":"L_Matrix_Struct(L, sd_base, sd_inflection, base_f, infl_f, base_f2i, infl_f2i, n_base_f, n_infl_f, ncol)\n\nConstruct LMatrixStruct without deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#Load-from-word2vec,-fasttext-or-similar","page":"Make Semantic Matrix","title":"Load from word2vec, fasttext or similar","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":"load_S_matrix_from_fasttext(data::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n load_S_matrix_from_fasttext(data_train::DataFrame,\n data_val::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n load_S_matrix_from_word2vec_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n load_S_matrix_from_word2vec_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)\n load_S_matrix_from_fasttext_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n load_S_matrix_from_fasttext_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext-Tuple{DataFrames.DataFrame, Symbol}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext","text":"load_S_matrix_from_fasttext(data::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n\nLoad semantic matrix from fasttext, loaded using the Embeddings.jl package. Subset fasttext vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available.\n\nThe last parameter, default_file, specifies which vectors are loaded. To learn about all available vectors, use the following commands:\n\nusing Embeddings\nlanguage_files(FastText_Text{:nl})\n\nreplacing the language code (here :nl) with the language you are interested in. In general, for all languages other than English, these files are available:\n\ndefault_file=1 loads from https://fasttext.cc/docs/en/crawl-vectors.html, paper: E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\ndefault_file=2 loads from https://fasttext.cc/docs/en/pretrained-vectors.html paper: P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\nlanguage::Symbol: the language of the words in the dataset, offically ISO 639-2 (see https://github.com/JuliaText/Embeddings.jl/issues/34#issuecomment-782604523) but practically it seems more like ISO 639-1 to me with ISO 639-2 only being used if ISO 639-1 isn't available (see https://en.wikipedia.org/wiki/ListofISO639-2codes)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\ndefault_file::Int=1: source of vectors, for more information see above and here: https://github.com/JuliaText/Embeddings.jl#loading-different-embeddings\n\nExamples\n\n# basic usage\nlatin_small, S = JudiLing.load_S_matrix_from_fasttext(latin, :la, target_col=:Word)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Symbol}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext","text":"load_S_matrix_from_fasttext(data_train::DataFrame,\n data_val::DataFrame,\n language::Symbol;\n target_col=:Word,\n default_file::Int=1)\n\nLoad semantic matrix from fasttext, loaded using the Embeddings.jl package. Subset fasttext vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.\n\nThe last parameter, default_file, specifies which vectors are loaded. To learn about all available vectors, use the following commands:\n\nusing Embeddings\nlanguage_files(FastText_Text{:nl})\n\nreplacing the language code (here :nl) with the language you are interested in. In general, for all languages other than English, these files are available:\n\ndefault_file=1 loads from https://fasttext.cc/docs/en/crawl-vectors.html, paper: E. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, Learning Word Vectors for 157 Languages License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\ndefault_file=2 loads from https://fasttext.cc/docs/en/pretrained-vectors.html paper: P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information License: CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0/\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nlanguage::Symbol: the language of the words in the dataset, offically ISO 639-2 (see https://github.com/JuliaText/Embeddings.jl/issues/34#issuecomment-782604523) but practically it seems more like ISO 639-1 to me with ISO 639-2 only being used if ISO 639-1 isn't available (see https://en.wikipedia.org/wiki/ListofISO639-2codes)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\ndefault_file::Int=1: source of vectors, for more information see above and here: https://github.com/JuliaText/Embeddings.jl#loading-different-embeddings\n\nExamples\n\n# basic usage\nlatin_small_train, latin_small_val, S_train, S_val = JudiLing.load_S_matrix_from_fasttext(latin_train,\n latin_val,\n :la,\n target_col=:Word)\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_word2vec_file-Tuple{DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_word2vec_file","text":"load_S_matrix_from_word2vec_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\nfilepath::String: path to file with word2vec vectors in .txt (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_word2vec_file-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_word2vec_file","text":"load_S_matrix_from_word2vec_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from word2vec filepath. Subset word2vec vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nfilepath::String: path to file with word2vec vectors in .txt (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext_file-Tuple{DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext_file","text":"load_S_matrix_from_fasttext_file(data::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted data and semantic matrix.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\nfilepath::String: path to file with fasttext vectors in .txt or .vec (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.load_S_matrix_from_fasttext_file-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, String}","page":"Make Semantic Matrix","title":"JudiLing.load_S_matrix_from_fasttext_file","text":"load_S_matrix_from_fasttext_file(data_train::DataFrame,\n data_val::DataFrame,\n filepath::String;\n target_col=:Word)\n\nLoad semantic matrix from fasttext filepath. Subset fasttext vectors to include only words in target_col of data_train and data_val, and subset data to only include words in target_col for which semantic vector is available. Returns subsetted train and val data and train and val semantic matrices.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nfilepath::String: path to file with fasttext vectors in .txt (not compressed in any way)\n\nOptional Arguments\n\ntarget_col=:Word: column with orthographic representation of words in data\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#Utility-functions","page":"Make Semantic Matrix","title":"Utility functions","text":"","category":"section"},{"location":"man/make_semantic_matrix/","page":"Make Semantic Matrix","title":"Make Semantic Matrix","text":" process_features(data, feature_cols)\n comp_f_M!(L, sd, sd_mean, n_f, ncol, n_b)\n comp_f_M!(L, sd, n_f, ncol, n_b)\n merge_f2i(base_f2i, infl_f2i, n_base_f, n_infl_f)\n lexome_sum(L, features)\n make_St(L, n, data, base, inflections)\n make_St(L, n, data, base)\n add_St_noise!(St, sd_noise)\n normalize_St!(St, n_base, n_infl)\n normalize_St!(St, n_base)","category":"page"},{"location":"man/make_semantic_matrix/#JudiLing.process_features-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.process_features","text":"process_features(data, feature_cols)\n\nCollect all features given datasets and feature column names.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.comp_f_M!-NTuple{6, Any}","page":"Make Semantic Matrix","title":"JudiLing.comp_f_M!","text":"comp_f_M!(L, sd, sd_mean, n_f, ncol, n_b)\n\nCompose feature Matrix with deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.comp_f_M!-NTuple{5, Any}","page":"Make Semantic Matrix","title":"JudiLing.comp_f_M!","text":"comp_f_M!(L, sd, n_f, ncol, n_b)\n\nCompose feature Matrix without deep mode.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.merge_f2i-NTuple{4, Any}","page":"Make Semantic Matrix","title":"JudiLing.merge_f2i","text":"merge_f2i(base_f2i, infl_f2i, n_base_f, n_infl_f)\n\nMerge base f2i dictionary and inflectional f2i dictionary.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.lexome_sum-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.lexome_sum","text":"lexome_sum(L, features)\n\nSum up semantic vector, given lexome vector.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_St-NTuple{5, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_St","text":"make_St(L, n, data, base, inflections)\n\nMake S transpose matrix with inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.make_St-NTuple{4, Any}","page":"Make Semantic Matrix","title":"JudiLing.make_St","text":"make_St(L, n, data, base)\n\nMake S transpose matrix without inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.add_St_noise!-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.add_St_noise!","text":"add_St_noise!(St, sd_noise)\n\nAdd noise.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.normalize_St!-Tuple{Any, Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.normalize_St!","text":"normalize_St!(St, n_base, n_infl)\n\nNormalize S transpose with inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/make_semantic_matrix/#JudiLing.normalize_St!-Tuple{Any, Any}","page":"Make Semantic Matrix","title":"JudiLing.normalize_St!","text":"normalize_St!(St, n_base)\n\nNormalize S transpose without inflections.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/","page":"Evaluation","title":"Evaluation","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/eval/#Evaluation","page":"Evaluation","title":"Evaluation","text":"","category":"section"},{"location":"man/eval/","page":"Evaluation","title":"Evaluation","text":" Comp_Acc_Struct\n eval_SC\n eval_SC_loose\n accuracy_comprehension(S, Shat, data)\n accuracy_comprehension(\n S_val,\n S_train,\n Shat_val,\n data_val,\n data_train;\n target_col = :Words,\n base = nothing,\n inflections = nothing,\n )\n eval_SC(SChat::AbstractArray, SC::AbstractArray)\n eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray)\n eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol})\n eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray, data::DataFrame, data_rest::DataFrame, target_col::Union{String, Symbol})\n eval_SC(SChat::AbstractArray, SC::AbstractArray, batch_size::Int64)\n eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol}, batch_size::Int64)\n eval_SC_loose(SChat, SC, k)\n eval_SC_loose(SChat, SC, k, data, target_col)\n eval_manual(res, data, i2f)\n eval_acc(res, gold_inds::Array)\n eval_acc(res, cue_obj::Cue_Matrix_Struct)\n eval_acc_loose(res, gold_inds)\n extract_gpi(gpi, threshold=0.1, tolerance=(-1000.0))","category":"page"},{"location":"man/eval/#JudiLing.Comp_Acc_Struct","page":"Evaluation","title":"JudiLing.Comp_Acc_Struct","text":"A structure that stores information about comprehension accuracy.\n\n\n\n\n\n","category":"type"},{"location":"man/eval/#JudiLing.eval_SC","page":"Evaluation","title":"JudiLing.eval_SC","text":"Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. Homophones support option is implemented.\n\n\n\n\n\n","category":"function"},{"location":"man/eval/#JudiLing.eval_SC_loose","page":"Evaluation","title":"JudiLing.eval_SC_loose","text":"Assess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Count it as correct if one of the top k candidates is correct. Homophones support option is implemented.\n\n\n\n\n\n","category":"function"},{"location":"man/eval/#JudiLing.accuracy_comprehension-Tuple{Any, Any, Any}","page":"Evaluation","title":"JudiLing.accuracy_comprehension","text":"accuracy_comprehension(S, Shat, data)\n\nEvaluate comprehension accuracy for training data.\n\nnote: Note\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.\n\nObligatory Arguments\n\nS::Matrix: the (gold standard) S matrix\nShat::Matrix: the (predicted) Shat matrix\ndata::DataFrame: the dataset\n\nOptional Arguments\n\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\nbase::Vector=nothing: base features (typically a lexeme)\ninflections::Union{Nothing, Vector}=nothing: other features (typically in inflectional features)\n\nExamples\n\naccuracy_comprehension(\n S_train,\n Shat_train,\n latin_val,\n target_col=:Words,\n base=[:Lexeme],\n inflections=[:Person, :Number, :Tense, :Voice, :Mood]\n )\n\nNote\n\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform \"Äpfel\" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which \"Äpfel\" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform \"Äpfel\" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form \"Äpfel\" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that \"case\" was comprehended incorrectly.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.accuracy_comprehension-NTuple{5, Any}","page":"Evaluation","title":"JudiLing.accuracy_comprehension","text":"accuracy_comprehension(\n S_val,\n S_train,\n Shat_val,\n data_val,\n data_train;\n target_col = :Words,\n base = nothing,\n inflections = nothing,\n)\n\nEvaluate comprehension accuracy for validation data.\n\nnote: Note\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! See below for more information.\n\nObligatory Arguments\n\nS_val::Matrix: the (gold standard) S matrix of the validation data\nS_train::Matrix: the (gold standard) S matrix of the training data\nShat_val::Matrix: the (predicted) Shat matrix of the validation data\ndata_val::DataFrame: the validation dataset\ndata_train::DataFrame: the training dataset\n\nOptional Arguments\n\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\nbase::Vector=nothing: base features (typically a lexeme)\ninflections::Union{Nothing, Vector}=nothing: other features (typically in inflectional features)\n\nExamples\n\naccuracy_comprehension(\n S_val,\n S_train,\n Shat_val,\n latin_val,\n latin_train,\n target_col=:Words,\n base=[:Lexeme],\n inflections=[:Person, :Number, :Tense, :Voice, :Mood]\n )\n\nNote\n\nIn case of homophones/homographs in the dataset, the correct/incorrect values for base and inflections may be misleading! Consider the following example: The wordform \"Äpfel\" in German can be nominative plural, genitive plural and accusative plural. Let's assume we have a dataset in which \"Äpfel\" occurs in all three case/number combinations (i.e. there are homographs). If all these wordforms have the same semantic vectors (e.g. because they are derived from word2vec or fasttext which typically have a single vector per unique wordform), the predicted semantic vector of the wordform \"Äpfel\" will be equally correlated with all three case/number combinations in the dataset. In such cases, while the algorithm in this function can unambiguously conclude that the correct surface form \"Äpfel\" was comprehended, which of the three possible rows is the correct one will be picked somewhat non-deterministically (see https://docs.julialang.org/en/v1/base/collections/#Base.argmax). It is thus possible that the algorithm will then use the genitive plural instead of the intended nominative plural as the ground plural, and will report that \"case\" was comprehended incorrectly.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C)\neval_SC(Chat_val, cue_obj_val.C)\neval_SC(Shat_train, S_train)\neval_SC(Shat_val, S_val)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, AbstractArray}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nnote: Note\nThe order is important. The fist gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val) or eval_SC(Shat_val, S_val, S_train)\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix\nSC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C)\neval_SC(Chat_val, cue_obj_val.C, cue_obj_train.C)\neval_SC(Shat_train, S_train, S_val)\neval_SC(Shat_val, S_val, S_train)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, DataFrames.DataFrame, Union{String, Symbol}}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol})\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Support for homophones.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\ndata::DataFrame: datasets\ntarget_col::Union{String, Symbol}: target column name\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C, latin, :Word)\neval_SC(Chat_val, cue_obj_val.C, latin, :Word)\neval_SC(Shat_train, S_train, latin, :Word)\neval_SC(Shat_val, S_val, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, AbstractArray, DataFrames.DataFrame, DataFrames.DataFrame, Union{String, Symbol}}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, SC_rest::AbstractArray, data::DataFrame, data_rest::DataFrame, target_col::Union{String, Symbol})\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices.\n\nIf freq is added, token-based accuracy is computed. Token-based accuracy weighs accuracy values according to words' frequency, i.e. if a word has a frequency of 30 and overall there are 3000 tokens (the frequencies of all types sum to 3000), this token's accuracy will contribute 30/3000.\n\nnote: Note\nThe order is important. The first gold standard matrix has to be corresponing to the SChat matrix, such as eval_SC(Shat_train, S_train, S_val, latin, :Word) or eval_SC(Shat_val, S_val, S_train, latin, :Word)\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the training/validation C or S matrix\nSC_rest::Union{SparseMatrixCSC, Matrix}: the validation/training C or S matrix\ndata::DataFrame: the training/validation datasets\ndata_rest::DataFrame: the validation/training datasets\ntarget_col::Union{String, Symbol}: target column name\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nR::Bool=false: if true, pairwise correlation/distance/similarity matrix R is return\nfreq::Union{Missing, Array{Int64, 1}, Array{Float64,1}}=missing: list of frequencies of the wordforms in X and Y\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC(Chat_train, cue_obj_train.C, cue_obj_val.C, latin, :Word)\neval_SC(Chat_val, cue_obj_val.C, cue_obj_train.C, latin, :Word)\neval_SC(Shat_train, S_train, S_val, latin, :Word)\neval_SC(Shat_val, S_val, S_train, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, Int64}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, batch_size::Int64)\n\nAssess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks.\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and the one on the diagonal will not necessarily be selected as the most correlated. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nnote: Note\nCurrently only available for correlation.\n\nObligatory Arguments\n\nSChat: the Chat or Shat matrix\nSC: the C or S matrix\ndata: datasets\ntarget_col: target column name\nbatch_size: batch size\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\neval_SC(Chat_train, cue_obj_train.C, latin, :Word)\neval_SC(Chat_val, cue_obj_val.C, latin, :Word)\neval_SC(Shat_train, S_train, latin, :Word)\neval_SC(Shat_val, S_val, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC-Tuple{AbstractArray, AbstractArray, DataFrames.DataFrame, Union{String, Symbol}, Int64}","page":"Evaluation","title":"JudiLing.eval_SC","text":"eval_SC(SChat::AbstractArray, SC::AbstractArray, data::DataFrame, target_col::Union{String, Symbol}, batch_size::Int64)\n\nAssess model accuracy on the basis of the correlations of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations on the diagonal of the pertinent correlation matrices. For large datasets, pass batch_size to process evaluation in chunks. Support homophones.\n\nnote: Note\nCurrently only available for correlation.\n\nObligatory Arguments\n\nSChat::AbstractArray: the Chat or Shat matrix\nSC::AbstractArray: the C or S matrix\ndata::DataFrame: datasets\ntarget_col::Union{String, Symbol}: target column name\nbatch_size::Int64: batch size\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\neval_SC(Chat_train, cue_obj_train.C, latin, :Word, 5000)\neval_SC(Chat_val, cue_obj_val.C, latin, :Word, 5000)\neval_SC(Shat_train, S_train, latin, :Word, 5000)\neval_SC(Shat_val, S_val, latin, :Word, 5000)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC_loose-Tuple{Any, Any, Any}","page":"Evaluation","title":"JudiLing.eval_SC_loose","text":"eval_SC_loose(SChat, SC, k)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct.\n\nnote: Note\nIf there are homophones/homographs in the dataset, this evaluation method may be misleading: the predicted vector will be equally correlated with the target vector of both words and it is not guaranteed that the target on the diagonal will be among the k neighbours. In particular, eval_SC and eval_SC_loose with k=1 are not guaranteed to give the same result. In such cases, supplying the dataset and target_col is recommended which enables taking into account homophones/homographs.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\nk: top k candidates\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC_loose(Chat, cue_obj.C, k)\neval_SC_loose(Shat, S, k)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_SC_loose-NTuple{5, Any}","page":"Evaluation","title":"JudiLing.eval_SC_loose","text":"eval_SC_loose(SChat, SC, k, data, target_col)\n\nAssess model accuracy on the basis of the correlations (or Euclidean distances or Cosine Similarities) of row vectors of Chat and C or Shat and S. Ideally the target words have highest correlations (lowest distance/highest similarity) on the diagonal of the pertinent correlation (distance/similarity) matrices. Count it as correct if one of the top k candidates is correct. Support for homophones.\n\nObligatory Arguments\n\nSChat::Union{SparseMatrixCSC, Matrix}: the Chat or Shat matrix\nSC::Union{SparseMatrixCSC, Matrix}: the C or S matrix\nk: top k candidates\ndata: datasets\ntarget_col: target column name\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nmethod::Union{Symbol, String}=:correlation: Method for computing similarities, one of {:correlation, :euclidean, :cosine}.\n\neval_SC_loose(Chat, cue_obj.C, k, latin, :Word)\neval_SC_loose(Shat, S, k, latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_manual-Tuple{Any, Any, Any}","page":"Evaluation","title":"JudiLing.eval_manual","text":"eval_manual(res, data, i2f)\n\nCreate extensive reports for the outputs from build_paths and learn_paths.\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_acc-Tuple{Any, Array}","page":"Evaluation","title":"JudiLing.eval_acc","text":"eval_acc(res, gold_inds::Array)\n\nEvaluate the accuracy of the results from learn_paths or build_paths.\n\nObligatory Arguments\n\nres::Array: the results from learn_paths or build_paths\ngold_inds::Array: the gold paths' indices\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# evaluation on training data\nacc_train = JudiLing.eval_acc(\n res_train,\n cue_obj_train.gold_ind,\n verbose=false\n)\n\n# evaluation on validation data\nacc_val = JudiLing.eval_acc(\n res_val,\n cue_obj_val.gold_ind,\n verbose=false\n)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_acc-Tuple{Any, JudiLing.Cue_Matrix_Struct}","page":"Evaluation","title":"JudiLing.eval_acc","text":"eval_acc(res, cue_obj::Cue_Matrix_Struct)\n\nEvaluate the accuracy of the results from learn_paths or build_paths.\n\nObligatory Arguments\n\nres::Array: the results from learn_paths or build_paths\ncue_obj::Cue_Matrix_Struct: the C matrix object\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\nacc = JudiLing.eval_acc(res, cue_obj)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.eval_acc_loose-Tuple{Any, Any}","page":"Evaluation","title":"JudiLing.eval_acc_loose","text":"eval_acc_loose(res, gold_inds)\n\nLenient evaluation of the accuracy of the results from learn_paths or build_paths, counting a prediction as correct when the correlation of the predicted and gold standard semantic vectors is among the n top correlations, where n is equal to max_can in the 'learnpaths' or `buildpaths` function.\n\nObligatory Arguments\n\nres::Array: the results from learn_paths or build_paths\ngold_inds::Array: the gold paths' indices\n\nOptional Arguments\n\ndigits: the specified number of digits after the decimal place (or before if negative)\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# evaluation on training data\nacc_train_loose = JudiLing.eval_acc_loose(\n res_train,\n cue_obj_train.gold_ind,\n verbose=false\n)\n\n# evaluation on validation data\nacc_val_loose = JudiLing.eval_acc_loose(\n res_val,\n cue_obj_val.gold_ind,\n verbose=false\n)\n\n\n\n\n\n","category":"method"},{"location":"man/eval/#JudiLing.extract_gpi","page":"Evaluation","title":"JudiLing.extract_gpi","text":"extract_gpi(gpi, threshold=0.1, tolerance=(-1000.0))\n\nExtract, using gold paths' information, how many n-grams for a gold path are below the threshold but above the tolerance.\n\n\n\n\n\n","category":"function"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/find_path/#Find-Paths","page":"Find Paths","title":"Find Paths","text":"","category":"section"},{"location":"man/find_path/#Structures","page":"Find Paths","title":"Structures","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" Result_Path_Info_Struct\n Gold_Path_Info_Struct\n Threshold_Stat_Struct","category":"page"},{"location":"man/find_path/#JudiLing.Result_Path_Info_Struct","page":"Find Paths","title":"JudiLing.Result_Path_Info_Struct","text":"Store paths' information built by learn_paths or build_paths\n\n\n\n\n\n","category":"type"},{"location":"man/find_path/#JudiLing.Gold_Path_Info_Struct","page":"Find Paths","title":"JudiLing.Gold_Path_Info_Struct","text":"Store gold paths' information including indices and indices' support and total support. It can be used to evaluate how low the threshold needs to be set in order to find most of the correct paths or if set very low, all of the correct paths.\n\n\n\n\n\n","category":"type"},{"location":"man/find_path/#JudiLing.Threshold_Stat_Struct","page":"Find Paths","title":"JudiLing.Threshold_Stat_Struct","text":"Store threshold and tolerance proportional for each timestep.\n\n\n\n\n\n","category":"type"},{"location":"man/find_path/#Build-paths","page":"Find Paths","title":"Build paths","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" build_paths\n build_paths(\n data_val,\n C_train,\n S_val,\n F_train,\n Chat_val,\n A,\n i2f,\n C_train_ind;\n rC = nothing,\n max_t = 15,\n max_can = 10,\n n_neighbors = 10,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n target_col = :Words,\n start_end_token = \"#\",\n if_pca = false,\n pca_eval_M = nothing,\n ignore_nan = true,\n verbose = false,\n )","category":"page"},{"location":"man/find_path/#JudiLing.build_paths","page":"Find Paths","title":"JudiLing.build_paths","text":"The build_paths function constructs paths by only considering those n-grams that are close to the target. It first takes the predicted c-hat vector and finds the closest n neighbors in the C matrix. Then it selects all n-grams of these neighbors, and constructs all valid paths with those n-grams. The path producing the best correlation with the target semantic vector (through synthesis by analysis) is selected.\n\n\n\n\n\n","category":"function"},{"location":"man/find_path/#JudiLing.build_paths-NTuple{8, Any}","page":"Find Paths","title":"JudiLing.build_paths","text":"build_paths(\n data_val,\n C_train,\n S_val,\n F_train,\n Chat_val,\n A,\n i2f,\n C_train_ind;\n rC = nothing,\n max_t = 15,\n max_can = 10,\n n_neighbors = 10,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n target_col = :Words,\n start_end_token = \"#\",\n if_pca = false,\n pca_eval_M = nothing,\n ignore_nan = true,\n verbose = false,\n)\n\nThe build_paths function constructs paths by only considering those n-grams that are close to the target. It first takes the predicted c-hat vector and finds the closest n neighbors in the C matrix. Then it selects all n-grams of these neighbors, and constructs all valid paths with those n-grams. The path producing the best correlation with the target semantic vector (through synthesis by analysis) is selected.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nC_train::SparseMatrixCSC: the C matrix for the training dataset\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for the validation dataset\nF_train::Union{SparseMatrixCSC, Matrix}: the F matrix for the training dataset\nChat_val::Matrix: the Chat matrix for the validation dataset\nA::SparseMatrixCSC: the adjacency matrix\ni2f::Dict: the dictionary returning features given indices\nC_train_ind::Array: the gold paths' indices for the training dataset\n\nOptional Arguments\n\nrC::Union{Nothing, Matrix}=nothing: correlation Matrix of C and Chat, specify to save computing time\nmax_t::Int64=15: maximum number of timesteps\nmax_can::Int64=10: maximum number of candidates to consider\nn_neighbors::Int64=10: the top n form neighbors to be considered\ngrams::Int64=3: the number n of grams that make up n-grams\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\ntarget_col::Union{String, :Symbol}=:Words: the column name for target strings\nif_pca::Bool=false: turn on to enable pca mode\npca_eval_M::Matrix=nothing: pass original F for pca mode\nverbose::Bool=false: if true, more information will be printed\n\nExamples\n\n# training dataset\nJudiLing.build_paths(\n latin_train,\n cue_obj_train.C,\n S_train,\n F_train,\n Chat_train,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t=max_t,\n n_neighbors=10,\n verbose=false\n )\n\n# validation dataset\nJudiLing.build_paths(\n latin_val,\n cue_obj_train.C,\n S_val,\n F_train,\n Chat_val,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t=max_t,\n n_neighbors=10,\n verbose=false\n )\n\n# pca mode\nres_build = JudiLing.build_paths(\n korean,\n Array(Cpcat),\n S,\n F,\n ChatPCA,\n A,\n cue_obj.i2f,\n cue_obj.gold_ind,\n max_t=max_t,\n if_pca=true,\n pca_eval_M=Fo,\n n_neighbors=3,\n verbose=true\n )\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#Learn-paths","page":"Find Paths","title":"Learn paths","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" learn_paths\n learn_paths(\n data::DataFrame,\n cue_obj::Cue_Matrix_Struct,\n S_val::Union{SparseMatrixCSC, Matrix},\n F_train,\n Chat_val::Union{SparseMatrixCSC, Matrix};\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n verbose::Bool = true)\n learn_paths(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n )\n learn_paths_rpi(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n )","category":"page"},{"location":"man/find_path/#JudiLing.learn_paths","page":"Find Paths","title":"JudiLing.learn_paths","text":"A sequence finding algorithm using discrimination learning to predict, for a given word, which n-grams are best supported for a given position in the sequence of n-grams.\n\n\n\n\n\n","category":"function"},{"location":"man/find_path/#JudiLing.learn_paths-Tuple{DataFrames.DataFrame, JudiLing.Cue_Matrix_Struct, Union{SparseArrays.SparseMatrixCSC, Matrix}, Any, Union{SparseArrays.SparseMatrixCSC, Matrix}}","page":"Find Paths","title":"JudiLing.learn_paths","text":"learn_paths(\n data::DataFrame,\n cue_obj::Cue_Matrix_Struct,\n S_val::Union{SparseMatrixCSC, Matrix},\n F_train,\n Chat_val::Union{SparseMatrixCSC, Matrix};\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n verbose::Bool = true)\n\nA high-level wrapper function for learn_paths with much less control. It aims for users who is very new to JudiLing and learn_paths function.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ncue_obj::Cue_Matrix_Struct: the C matrix object containing all information with C\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset\nF_train::Union{SparseMatrixCSC, Matrix, Chain}: either the F matrix for training dataset, or a deep learning comprehension model trained on the training set\nChat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset\n\nOptional Arguments\n\nShat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset\ncheck_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value\nthreshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration\nis_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path\ntolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path\nmax_tolerance::Int64=4: maximum number of n-grams allowed in a path\nactivation::Function=nothing: the activation function you want to pass\nignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\nres = learn_paths(latin, cue_obj, S, F, Chat)\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.learn_paths-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Any, Union{SparseArrays.SparseMatrixCSC, Matrix}, SparseArrays.SparseMatrixCSC, Dict, Dict}","page":"Find Paths","title":"JudiLing.learn_paths","text":"learn_paths(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n)\n\nA sequence finding algorithm using discrimination learning to predict, for a given word, which n-grams are best supported for a given position in the sequence of n-grams.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nC_train::Union{SparseMatrixCSC, Matrix}: the C matrix for training dataset\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset\nF_train::Union{SparseMatrixCSC, Matrix, Chain}: the F matrix for training dataset, or a deep learning comprehension model trained on the training data\nChat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset\nA::SparseMatrixCSC: the adjacency matrix\ni2f::Dict: the dictionary returning features given indices\nf2i::Dict: the dictionary returning indices given features\n\nOptional Arguments\n\ngold_ind::Union{Nothing, Vector}=nothing: gold paths' indices\nShat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset\ncheck_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value\nmax_t::Int64=15: maximum timestep\nmax_can::Int64=10: maximum number of candidates to consider\nthreshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration\nis_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path\ntolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path\nmax_tolerance::Int64=4: maximum number of n-grams allowed in a path\ngrams::Int64=3: the number n of grams that make up an n-gram\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nkeep_sep::Bool=false:if true, keep separators in cues\ntarget_col::Union{String, :Symbol}=:Words: the column name for target strings\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nissparse::Union{Symbol, Bool}=:auto: control of whether output of Mt matrix is a dense matrix or a sparse matrix\nsparse_ratio::Float64=0.05: the ratio to decide whether a matrix is sparse\nif_pca::Bool=false: turn on to enable pca mode\npca_eval_M::Matrix=nothing: pass original F for pca mode\nactivation::Function=nothing: the activation function you want to pass\nignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value\ncheck_threshold_stat::Bool=false: if true, return a threshold and torlerance proportion for each timestep\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# basic usage without tokenization\nres = JudiLing.learn_paths(\nlatin,\nlatin,\ncue_obj.C,\nS,\nF,\nChat,\nA,\ncue_obj.i2f,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=false,\nkeep_sep=false,\ntarget_col=:Word,\nverbose=true)\n\n# basic usage with tokenization\nres = JudiLing.learn_paths(\nfrench,\nfrench,\ncue_obj.C,\nS,\nF,\nChat,\nA,\ncue_obj.i2f,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=true,\nsep_token=\"-\",\nkeep_sep=true,\ntarget_col=:Syllables,\nverbose=true)\n\n# basic usage for validation data\nres_val = JudiLing.learn_paths(\nlatin_train,\nlatin_val,\ncue_obj_train.C,\nS_val,\nF_train,\nChat_val,\nA,\ncue_obj_train.i2f,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=false,\nkeep_sep=false,\ntarget_col=:Word,\nverbose=true)\n\n# turn on tolerance mode\nres_val = JudiLing.learn_paths(\n...\nthreshold=0.1,\nis_tolerant=true,\ntolerance=-0.1,\nmax_tolerance=4,\n...)\n\n# turn on check gold paths mode\nres_train, gpi_train = JudiLing.learn_paths(\n...\ngold_ind=cue_obj_train.gold_ind,\nShat_val=Shat_train,\ncheck_gold_path=true,\n...)\n\nres_val, gpi_val = JudiLing.learn_paths(\n...\ngold_ind=cue_obj_val.gold_ind,\nShat_val=Shat_val,\ncheck_gold_path=true,\n...)\n\n# control over sparsity\nres_val = JudiLing.learn_paths(\n...\nissparse=:auto,\nsparse_ratio=0.05,\n...)\n\n# pca mode\nres_learn = JudiLing.learn_paths(\nkorean,\nkorean,\nArray(Cpcat),\nS,\nF,\nChatPCA,\nA,\ncue_obj.i2f,\ncue_obj.f2i,\ncheck_gold_path=false,\ngold_ind=cue_obj.gold_ind,\nShat_val=Shat,\nmax_t=max_t,\nmax_can=10,\ngrams=3,\nthreshold=0.1,\ntokenized=true,\nsep_token=\"_\",\nkeep_sep=true,\ntarget_col=:Verb_syll,\nif_pca=true,\npca_eval_M=Fo,\nverbose=true);\n\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.learn_paths_rpi-Tuple{DataFrames.DataFrame, DataFrames.DataFrame, Union{SparseArrays.SparseMatrixCSC, Matrix}, Union{SparseArrays.SparseMatrixCSC, Matrix}, Any, Union{SparseArrays.SparseMatrixCSC, Matrix}, SparseArrays.SparseMatrixCSC, Dict, Dict}","page":"Find Paths","title":"JudiLing.learn_paths_rpi","text":"learn_paths_rpi(\n data_train::DataFrame,\n data_val::DataFrame,\n C_train::Union{Matrix, SparseMatrixCSC},\n S_val::Union{Matrix, SparseMatrixCSC},\n F_train,\n Chat_val::Union{Matrix, SparseMatrixCSC},\n A::SparseMatrixCSC,\n i2f::Dict,\n f2i::Dict;\n gold_ind::Union{Nothing, Vector} = nothing,\n Shat_val::Union{Nothing, Matrix} = nothing,\n check_gold_path::Bool = false,\n max_t::Int = 15,\n max_can::Int = 10,\n threshold::Float64 = 0.1,\n is_tolerant::Bool = false,\n tolerance::Float64 = (-1000.0),\n max_tolerance::Int = 3,\n grams::Int = 3,\n tokenized::Bool = false,\n sep_token::Union{Nothing, String} = nothing,\n keep_sep::Bool = false,\n target_col::Union{Symbol, String} = \"Words\",\n start_end_token::String = \"#\",\n issparse::Union{Symbol, Bool} = :auto,\n sparse_ratio::Float64 = 0.05,\n if_pca::Bool = false,\n pca_eval_M::Union{Nothing, Matrix} = nothing,\n activation::Union{Nothing, Function} = nothing,\n ignore_nan::Bool = true,\n check_threshold_stat::Bool = false,\n verbose::Bool = false\n)\n\nCalculate learn_paths with results indices supports as well.\n\nObligatory Arguments\n\ndata::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\nC_train::Union{SparseMatrixCSC, Matrix}: the C matrix for training dataset\nS_val::Union{SparseMatrixCSC, Matrix}: the S matrix for validation dataset\nF_train::Union{SparseMatrixCSC, Matrix, Chain}: the F matrix for training dataset, or a deep learning comprehension model trained on the training data\nChat_val::Union{SparseMatrixCSC, Matrix}: the Chat matrix for validation dataset\nA::SparseMatrixCSC: the adjacency matrix\ni2f::Dict: the dictionary returning features given indices\nf2i::Dict: the dictionary returning indices given features\n\nOptional Arguments\n\ngold_ind::Union{Nothing, Vector}=nothing: gold paths' indices\nShat_val::Union{Nothing, Matrix}=nothing: the Shat matrix for the validation dataset\ncheck_gold_path::Bool=false: if true, return a list of support values for the gold path; this information is returned as second output value\nmax_t::Int64=15: maximum timestep\nmax_can::Int64=10: maximum number of candidates to consider\nthreshold::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration\nis_tolerant::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path\ntolerance::Float64=(-1000.0): the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path\nmax_tolerance::Int64=4: maximum number of n-grams allowed in a path\ngrams::Int64=3: the number n of grams that make up an n-gram\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nkeep_sep::Bool=false:if true, keep separators in cues\ntarget_col::Union{String, :Symbol}=:Words: the column name for target strings\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nissparse::Union{Symbol, Bool}=:auto: control of whether output of Mt matrix is a dense matrix or a sparse matrix\nsparse_ratio::Float64=0.05: the ratio to decide whether a matrix is sparse\nif_pca::Bool=false: turn on to enable pca mode\npca_eval_M::Matrix=nothing: pass original F for pca mode\nactivation::Function=nothing: the activation function you want to pass\nignore_nan::Bool=true: whether to ignore NaN when compare correlations, otherwise NaN will be selected as the max correlation value\ncheck_threshold_stat::Bool=false: if true, return a threshold and torlerance proportion for each timestep\nverbose::Bool=false: if true, more information is printed\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#Utility-functions","page":"Find Paths","title":"Utility functions","text":"","category":"section"},{"location":"man/find_path/","page":"Find Paths","title":"Find Paths","text":" eval_can(candidates, S, F, i2f, max_can, if_pca, pca_eval_M)\n find_top_feature_indices(rC, C_train_ind)\n make_ngrams_ind(res, n)\n predict_shat(F::Union{Matrix, SparseMatrixCSC},\n ci::Vector{Int})","category":"page"},{"location":"man/find_path/#JudiLing.eval_can-NTuple{7, Any}","page":"Find Paths","title":"JudiLing.eval_can","text":"eval_can(candidates, S, F::Union{Matrix,SparseMatrixCSC, Chain}, i2f, max_can, if_pca, pca_eval_M)\n\nCalculate for each candidate path the correlation between predicted semantic vector and the gold standard semantic vector, and select as target for production the path with the highest correlation.\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.find_top_feature_indices-Tuple{Any, Any}","page":"Find Paths","title":"JudiLing.find_top_feature_indices","text":"find_top_feature_indices(rC, C_train_ind)\n\nFind all indices for the n-grams of the top n closest neighbors of a given target.\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.make_ngrams_ind-Tuple{Any, Any}","page":"Find Paths","title":"JudiLing.make_ngrams_ind","text":"make_ngrams_ind(res, n)\n\nConstruct ngrams indices.\n\n\n\n\n\n","category":"method"},{"location":"man/find_path/#JudiLing.predict_shat-Tuple{Union{SparseArrays.SparseMatrixCSC, Matrix}, Vector{Int64}}","page":"Find Paths","title":"JudiLing.predict_shat","text":"predict_shat(F::Union{Matrix, SparseMatrixCSC},\n ci::Vector{Int})\n\nPredicts semantic vector shat given a comprehension matrix F and a list of indices of ngrams ci.\n\nObligatory arguments\n\nF::Union{Matrix, SparseMatrixCSC}: Comprehension matrix F.\nci::Vector{Int}: Vector of indices of ngrams in c vector. Essentially, this is a vector indicating which ngrams in a c vector are absent and which are present.\n\n\n\n\n\n","category":"method"},{"location":"man/display/","page":"Display","title":"Display","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/display/#Cholesky","page":"Display","title":"Cholesky","text":"","category":"section"},{"location":"man/display/","page":"Display","title":"Display","text":" display_matrix(M, rownames, colnames)\n display_matrix(data, target_col, cue_obj, M, M_type)","category":"page"},{"location":"man/display/#JudiLing.display_matrix-Tuple{Any, Any, Any}","page":"Display","title":"JudiLing.display_matrix","text":"display_matrix(M, rownames, colnames)\n\nDisplay matrix with rownames and colnames.\n\n\n\n\n\n","category":"method"},{"location":"man/display/#JudiLing.display_matrix-NTuple{5, Any}","page":"Display","title":"JudiLing.display_matrix","text":"display_matrix(data, target_col, cue_pS_obj, M, M_type)\n\nDisplay matrix with rownames and colnames.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\ntarget_col::Union{String, Symbol}: the target column name\ncue_pS_obj::Union{Cue_Matrix_Struct,PS_Matrix_Struct}: the cue matrix or pS matrix structure\nM::Union{SparseMatrixCSC, Matrix}: the matrix\nM_type::Union{String, Symbol}: the type of the matrix, currently support :C, :S, :F, :G, :Chat, :Shat, :A, :R and :pS\n\nOptional Arguments\n\nnrow::Int64 = 6: the number of rows to display\nncol::Int64 = 6: the number of columns to display\nreturn_matrix::Bool = false: whether the created dataframe should be returned (and not only displayed)\n\nExamples\n\nJudiLing.display_matrix(latin, :Word, cue_obj, cue_obj.C, :C)\nJudiLing.display_matrix(latin, :Word, cue_obj, S, :S)\nJudiLing.display_matrix(latin, :Word, cue_obj, G, :G)\nJudiLing.display_matrix(latin, :Word, cue_obj, Chat, :Chat)\nJudiLing.display_matrix(latin, :Word, cue_obj, F, :F)\nJudiLing.display_matrix(latin, :Word, cue_obj, Shat, :Shat)\nJudiLing.display_matrix(latin, :Word, cue_obj, A, :A)\nJudiLing.display_matrix(latin, :Word, cue_obj, R, :R)\nJudiLing.display_matrix(latin, :Word, pS_obj, pS_obj.pS, :pS)\n\n\n\n\n\n","category":"method"},{"location":"man/input/","page":"Loading data","title":"Loading data","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/input/#Loading-data","page":"Loading data","title":"Loading data","text":"","category":"section"},{"location":"man/input/","page":"Loading data","title":"Loading data","text":"load_dataset(filepath::String;\n delim::String=\",\",\n kargs...)\nloading_data_randomly_split(\n data_path::String,\n output_dir_path::String,\n data_prefix::String;\n val_sample_size::Int = 0,\n val_ratio::Float = 0.0,\n random_seed::Int = 314)\nloading_data_careful_split(\n data_path::String,\n data_prefix::String,\n output_dir_path::String,\n n_features_columns::Union{Vector{Symbol},Vector{String}};\n train_sample_size::Int = 0,\n val_sample_size::Int = 0,\n val_ratio::Float64 = 0.0,\n n_grams_target_col::Union{Symbol, String} = :Word,\n n_grams_tokenized::Bool = false,\n n_grams_sep_token::Union{Nothing, String} = nothing,\n grams::Int = 3,\n n_grams_keep_sep::Bool = false,\n start_end_token::String = \"#\",\n random_seed::Int = 314,\n verbose::Bool = false)","category":"page"},{"location":"man/input/#JudiLing.load_dataset-Tuple{String}","page":"Loading data","title":"JudiLing.load_dataset","text":"load_dataset(filepath::String;\n delim::String=\",\",\n kargs...)\n\nLoad a dataset from file, usually comma- or tab-separated. Returns a DataFrame.\n\nObligatory arguments\n\nfilepath::String: Path to file to be loaded.\n\nOptional arguments\n\ndelim::String=\",\": Delimiter in the file (usually either \",\" or \"\\t\").\nkargs...: Further keyword arguments are passed to CSV.File().\n\nExample\n\nlatin = JudiLing.load_dataset(\"latin.csv\")\nfirst(latin, 10)\n\n\n\n\n\n","category":"method"},{"location":"man/input/#JudiLing.loading_data_randomly_split-Tuple{String, String, String}","page":"Loading data","title":"JudiLing.loading_data_randomly_split","text":"loading_data_randomly_split(\n data_path::String,\n output_dir_path::String,\n data_prefix::String;\n val_sample_size::Int = 0,\n val_ratio::Float64 = 0.0,\n random_seed::Int = 314)\n\nRead in a dataframe, splitting the dataframe into a training and validation dataset. The two are also written to output_dir_path at the same time.\n\nnote: Note\nThe order of data_prefix and output_dir_path is exactly reversed compared to loading_data_careful_split.\n\nObligatory arguments\n\ndata_path::String: Path to where the dataset is stored.\noutput_dir_path::String: Path to where the new dataframes should be stored.\ndata_prefix::String: Prefix of the two new files, will be called data_prefix_train.csv and data_prefix_val.csv.\n\nOptional arguments\n\nval_sample_size::Int = 0: Size of the validation dataset (only val_sample_size or val_ratio may be used).\nval_ratio::Float64 = 0.0: Fraction of the data that should be in the validation dataset (only val_sample_size or val_ratio may be used).\nrandom_seed::Int = 314: Random seed for controlling random split.\n\nExample\n\ndata_train, data_val = JudiLing.loading_data_randomly_split(\n \"latin.csv\",\n \"careful\",\n \"latin\",\n [\"Lexeme\",\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"]\n)\n\n\n\n\n\n","category":"method"},{"location":"man/input/#JudiLing.loading_data_careful_split-Tuple{String, String, String, Union{Vector{String}, Vector{Symbol}}}","page":"Loading data","title":"JudiLing.loading_data_careful_split","text":"loading_data_careful_split(\n data_path::String,\n data_prefix::String,\n output_dir_path::String,\n n_features_columns::Union{Vector{Symbol},Vector{String}};\n train_sample_size::Int = 0,\n val_sample_size::Int = 0,\n val_ratio::Float64 = 0.0,\n n_grams_target_col::Union{Symbol, String} = :Word,\n n_grams_tokenized::Bool = false,\n n_grams_sep_token::Union{Nothing, String} = nothing,\n grams::Int = 3,\n n_grams_keep_sep::Bool = false,\n start_end_token::String = \"#\",\n random_seed::Int = 314,\n verbose::Bool = false)\n\nRead in a dataframe, splitting the dataframe into a training and validation dataset. The split is done such that all features in the columns specified in n_features_columns occur both in the training and validation data. It is also ensured that the unique grams resulting from splitting the strings in column n_grams_target_col into grams-grams occur in both datasets. The two are also written to output_dir_path at the same time.\n\nnote: Note\nThe order of data_prefix and output_dir_path is exactly reversed compared to loading_data_randomly_split.\n\nObligatory arguments\n\ndata_path::String: Path to where the dataset is stored.\noutput_dir_path::String: Path to where the new dataframes should be stored.\ndata_prefix::String: Prefix of the two new files, will be called data_prefix_train.csv and data_prefix_val.csv.\nn_features_columns::Vector{Union{Symbol, String}}: Vector with columns whose features have to occur in both the training and validation data.\n\nOptional arguments\n\nval_sample_size::Int = 0: Size of the validation dataset (only val_sample_size or val_ratio may be used).\nval_ratio::Float64 = 0.0: Fraction of the data that should be in the validation dataset (only val_sample_size or val_ratio may be used).\nn_grams_target_col::Union{Symbol, String} = :Word: Column with target words.\nn_grams_tokenized::Bool = false: Whether the words in n_grams_target_col are already tokenized.\nn_grams_sep_token::Union{Nothing, String} = nothing: String with which tokens in n_grams_target_col are separated (only used if n_grams_tokenized=true).\ngrams::Int = 3: Granularity of the n-grams.\nn_grams_keep_sep::Bool = false: Whether the token separators should be kept in the ngrams (this is useful e.g. when working with syllables).\nstart_end_token::String = \"#\": Token with which the start and end of words should be marked.\nrandom_seed::Int = 314: Random seed for controlling random split.\n\nExample\n\ndata_train, data_val = JudiLing.loading_data_careful_split(\n \"latin.csv\",\n \"latin\",\n \"careful\",\n [\"Lexeme\",\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"]\n)\n\n\n\n\n\n","category":"method"},{"location":"man/all_manual/","page":"All Manual index","title":"All Manual index","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/all_manual/","page":"All Manual index","title":"All Manual index","text":"","category":"page"},{"location":"man/output/","page":"Output","title":"Output","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/output/#Output","page":"Output","title":"Output","text":"","category":"section"},{"location":"man/output/","page":"Output","title":"Output","text":" write2csv\n write2df\n write_comprehension_eval\n write2csv(res, data, cue_obj_train, cue_obj_val, filename)\n write2csv(gpi::Vector{Gold_Path_Info_Struct}, filename)\n write2csv(ts::Threshold_Stat_Struct, filename)\n write2df(res, data, cue_obj_train, cue_obj_val)\n write2df(gpi::Vector{Gold_Path_Info_Struct})\n write2df(ts::Threshold_Stat_Struct)\n write_comprehension_eval(SChat, SC, data, target_col, filename)\n write_comprehension_eval(SChat, SC, SC_rest, data, data_rest, target_col, filename)\n save_L_matrix(L, filename)\n load_L_matrix(filename)\n save_S_matrix(S, filename, data, target_col)\n load_S_matrix(filename)","category":"page"},{"location":"man/output/#JudiLing.write2csv","page":"Output","title":"JudiLing.write2csv","text":"Write results into a csv file. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.\n\n\n\n\n\n","category":"function"},{"location":"man/output/#JudiLing.write2df","page":"Output","title":"JudiLing.write2df","text":"Reformat results into a dataframe. This function takes as input the results from the learn_paths and build_paths functions, including the information on gold paths that is optionally returned as second output result.\n\n\n\n\n\n","category":"function"},{"location":"man/output/#JudiLing.write_comprehension_eval","page":"Output","title":"JudiLing.write_comprehension_eval","text":"Write comprehension evaluation into a CSV file, include target and predicted ids and indentifiers and their correlations.\n\n\n\n\n\n","category":"function"},{"location":"man/output/#JudiLing.write2csv-NTuple{5, Any}","page":"Output","title":"JudiLing.write2csv","text":"write2csv(res, data, cue_obj_train, cue_obj_val, filename)\n\nWrite results into csv file for the results from learn_paths and build_paths.\n\nObligatory Arguments\n\nres::Array{Array{Result_Path_Info_Struct,1},1}: the results from learn_paths or build_paths\ndata::DataFrame: the dataset\ncue_obj_train::Cue_Matrix_Struct: the cue object for training dataset\ncue_obj_val::Cue_Matrix_Struct: the cue object for validation dataset\nfilename::String: the filename\n\nOptional Arguments\n\ngrams::Int64=3: the number n in n-gram cues\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\noutput_sep_token::Union{String, Char}=\"\": output separator\npath_sep_token::Union{String, Char}=\":\": path separator\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\n# writing results for training data\nJudiLing.write2csv(\n res_train,\n latin_train,\n cue_obj_train,\n cue_obj_train,\n \"res_latin_train.csv\",\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word,\n root_dir=\".\",\n output_dir=\"test_out\")\n\n# writing results for validation data\nJudiLing.write2csv(\n res_val,\n latin_val,\n cue_obj_train,\n cue_obj_val,\n \"res_latin_val.csv\",\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word,\n root_dir=\".\",\n output_dir=\"test_out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2csv-Tuple{Vector{JudiLing.Gold_Path_Info_Struct}, Any}","page":"Output","title":"JudiLing.write2csv","text":"write2csv(gpi::Vector{Gold_Path_Info_Struct}, filename)\n\nWrite results into csv file for the gold paths' information optionally returned by learn_paths and build_paths.\n\nObligatory Arguments\n\ngpi::Vector{Gold_Path_Info_Struct}: the gold paths' information\nfilename::String: the filename\n\nOptional Arguments\n\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\n# write gold standard paths to csv for training data\nJudiLing.write2csv(\n gpi_train,\n \"gpi_latin_train.csv\",\n root_dir=\".\",\n output_dir=\"test_out\"\n )\n\n# write gold standard paths to csv for validation data\nJudiLing.write2csv(\n gpi_val,\n \"gpi_latin_val.csv\",\n root_dir=\".\",\n output_dir=\"test_out\"\n )\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2csv-Tuple{JudiLing.Threshold_Stat_Struct, Any}","page":"Output","title":"JudiLing.write2csv","text":"write2csv(ts::Threshold_Stat_Struct, filename)\n\nWrite results into csv file for threshold and tolerance proportion for each timestep.\n\nObligatory Arguments\n\ngpi::Vector{Gold_Path_Info_Struct}: the gold paths' information\nfilename::String: the filename\n\nOptional Arguments\n\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\nJudiLing.write2csv(ts, \"ts.csv\", root_dir = @__DIR__, output_dir=\"out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2df-NTuple{4, Any}","page":"Output","title":"JudiLing.write2df","text":"write2df(res, data, cue_obj_train, cue_obj_val)\n\nReformat results into a dataframe for the results form learn_paths and build_paths functions.\n\nObligatory Arguments\n\nres: output of learn_paths or build_paths\ndata::DataFrame: the dataset\ncue_obj_train: cue object of the training data set\ncue_obj_val: cue object of the validation data set\n\nOptional Arguments\n\ngrams::Int64=3: the number n in n-gram cues\ntokenized::Bool=false: if true, the dataset target is tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\noutput_sep_token::Union{String, Char}=\"\": output separator\npath_sep_token::Union{String, Char}=\":\": path separator\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\n\nExamples\n\n# writing results for training data\nJudiLing.write2df(\n res_train,\n latin_train,\n cue_obj_train,\n cue_obj_train,\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word)\n\n# writing results for validation data\nJudiLing.write2df(\n res_val,\n latin_val,\n cue_obj_train,\n cue_obj_val,\n grams=3,\n tokenized=false,\n sep_token=nothing,\n start_end_token=\"#\",\n output_sep_token=\"\",\n path_sep_token=\":\",\n target_col=:Word)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2df-Tuple{Vector{JudiLing.Gold_Path_Info_Struct}}","page":"Output","title":"JudiLing.write2df","text":"write2df(gpi::Vector{Gold_Path_Info_Struct})\n\nWrite results into a dataframe for the gold paths' information optionally returned by learn_paths and build_paths.\n\nObligatory Arguments\n\ngpi::Vector{Gold_Path_Info_Struct}: the gold paths' information\n\nExamples\n\n# write gold standard paths to df for training data\nJudiLing.write2csv(gpi_train)\n\n# write gold standard paths to df for validation data\nJudiLing.write2csv(gpi_val)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write2df-Tuple{JudiLing.Threshold_Stat_Struct}","page":"Output","title":"JudiLing.write2df","text":"write2df(ts::Threshold_Stat_Struct)\n\nWrite results into a dataframe for threshold and tolerance proportion for each timestep.\n\nObligatory Arguments\n\nts::Threshold_Stat_Struct: the threshold and tolerance proportion\n\nExamples\n\nJudiLing.write2df(ts)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write_comprehension_eval-NTuple{5, Any}","page":"Output","title":"JudiLing.write_comprehension_eval","text":"write_comprehension_eval(SChat, SC, data, target_col, filename)\n\nWrite comprehension evaluation into a CSV file, include target and predicted ids and indentifiers and their correlations.\n\nObligatory Arguments\n\nSChat::Matrix: the Shat/Chat matrix\nSC::Matrix: the S/C matrix\ndata::DataFrame: the data\ntarget_col::Symbol: the name of target column\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nk: top k candidates\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\nJudiLing.write_comprehension_eval(Chat, cue_obj.C, latin, :Word, \"output.csv\",\n k=10, root_dir=@__DIR__, output_dir=\"out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.write_comprehension_eval-NTuple{7, Any}","page":"Output","title":"JudiLing.write_comprehension_eval","text":"write_comprehension_eval(SChat, SC, SC_rest, data, data_rest, target_col, filename)\n\nWrite comprehension evaluation into a CSV file for both training and validation datasets, include target and predicted ids and indentifiers and their correlations.\n\nObligatory Arguments\n\nSChat::Matrix: the Shat/Chat matrix\nSC::Matrix: the S/C matrix\nSC_rest::Matrix: the rest S/C matrix\ndata::DataFrame: the data\ndata_rest::DataFrame: the rest data\ntarget_col::Symbol: the name of target column\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nk: top k candidates\nroot_dir::String=\".\": dir path for project root dir\noutput_dir::String=\".\": output dir inside root dir\n\nExamples\n\nJudiLing.write_comprehension_eval(Shat_val, S_val, S_train, latin_val, latin_train,\n :Word, \"all_output.csv\", k=10, root_dir=@__DIR__, output_dir=\"out\")\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.save_L_matrix-Tuple{Any, Any}","page":"Output","title":"JudiLing.save_L_matrix","text":"save_L_matrix(L, filename)\n\nSave lexome matrix into csv file.\n\nObligatory Arguments\n\nL::L_Matrix_Struct: the lexome matrix struct\nfilename::String: the filename/filepath\n\nExamples\n\nJudiLing.save_L_matrix(L, joinpath(@__DIR__, \"L.csv\"))\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.load_L_matrix-Tuple{Any}","page":"Output","title":"JudiLing.load_L_matrix","text":"load_L_matrix(filename)\n\nLoad lexome matrix from csv file.\n\nObligatory Arguments\n\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nheader::Bool=false: header in csv\n\nExamples\n\nL_load = JudiLing.load_L_matrix(joinpath(@__DIR__, \"L.csv\"))\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.save_S_matrix-NTuple{4, Any}","page":"Output","title":"JudiLing.save_S_matrix","text":"save_S_matrix(S, filename, data, target_col)\n\nSave S matrix into a csv file.\n\nObligatory Arguments\n\nS::Matrix: the S matrix\nfilename::String: the filename/filepath\ndata::DataFrame: the data\ntarget_col::Symbol: the name of target column\n\nOptional Arguments\n\nsep::Bool=\" \": separator in CSV file\n\nExamples\n\nJudiLing.save_S_matrix(S, joinpath(@__DIR__, \"S.csv\"), latin, :Word)\n\n\n\n\n\n","category":"method"},{"location":"man/output/#JudiLing.load_S_matrix-Tuple{Any}","page":"Output","title":"JudiLing.load_S_matrix","text":"load_S_matrix(filename)\n\nLoad S matrix from a csv file.\n\nObligatory Arguments\n\nfilename::String: the filename/filepath\n\nOptional Arguments\n\nheader::Bool=false: header in csv\nsep::Bool=\" \": separator in CSV file\n\nExamples\n\nJudiLing.load_S_matrix(joinpath(@__DIR__, \"S.csv\"))\n\n\n\n\n\n","category":"method"},{"location":"man/make_yt_matrix/","page":"Make Yt Matrix","title":"Make Yt Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_yt_matrix/#Make-Yt-Matrix","page":"Make Yt Matrix","title":"Make Yt Matrix","text":"","category":"section"},{"location":"man/make_yt_matrix/","page":"Make Yt Matrix","title":"Make Yt Matrix","text":" make_Yt_matrix\n make_Yt_matrix(t, data, f2i)","category":"page"},{"location":"man/make_yt_matrix/#JudiLing.make_Yt_matrix","page":"Make Yt Matrix","title":"JudiLing.make_Yt_matrix","text":"Make Yt matrix for timestep t.\n\n\n\n\n\n","category":"function"},{"location":"man/make_yt_matrix/#JudiLing.make_Yt_matrix-Tuple{Any, Any, Any}","page":"Make Yt Matrix","title":"JudiLing.make_Yt_matrix","text":"make_Yt_matrix(t, data, f2i)\n\nMake Yt matrix for timestep t. A given column of the Yt matrix specifies the support for the corresponding n-gram predicted for timestep t for each of the observations (rows of Yt).\n\nObligatory Arguments\n\nt::Int64: the timestep t\ndata::DataFrame: the dataset\nf2i::Dict: the dictionary returning indices given features\n\nOptional Arguments\n\ntokenized::Bool=false: if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator token\nverbose::Bool=false: if verbose, more information will be printed\n\nExamples\n\nlatin = DataFrame(CSV.File(joinpath(\"data\", \"latin_mini.csv\")))\nJudiLing.make_Yt_matrix(2, latin)\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/","page":"Preprocess","title":"Preprocess","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/preprocess/#Preprocess","page":"Preprocess","title":"Preprocess","text":"","category":"section"},{"location":"man/preprocess/","page":"Preprocess","title":"Preprocess","text":" SplitDataException\n lpo_cv_split(p, data_path)\n loo_cv_split(data_path)\n train_val_random_split(data_path, output_dir_path, data_prefix)\n train_val_careful_split(data_path, output_dir_path, data_prefix, n_features_columns)","category":"page"},{"location":"man/preprocess/#JudiLing.SplitDataException","page":"Preprocess","title":"JudiLing.SplitDataException","text":"Split Data Exception\n\n\n\n\n\n","category":"type"},{"location":"man/preprocess/#JudiLing.lpo_cv_split-Tuple{Any, Any}","page":"Preprocess","title":"JudiLing.lpo_cv_split","text":"lpo_cv_split(p, data_path)\n\nLeave p out cross-validation.\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/#JudiLing.loo_cv_split-Tuple{Any}","page":"Preprocess","title":"JudiLing.loo_cv_split","text":"loo_cv_split(data_path)\n\nLeave one out cross-validation.\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/#JudiLing.train_val_random_split-Tuple{Any, Any, Any}","page":"Preprocess","title":"JudiLing.train_val_random_split","text":"train_val_random_split(data_path, output_dir_path, data_prefix)\n\nRandomly split dataset.\n\n\n\n\n\n","category":"method"},{"location":"man/preprocess/#JudiLing.train_val_careful_split-NTuple{4, Any}","page":"Preprocess","title":"JudiLing.train_val_careful_split","text":"train_val_careful_split(data_path, output_dir_path, data_prefix, n_features_columns)\n\nCarefully split dataset.\n\n\n\n\n\n","category":"method"},{"location":"man/test_combo/","page":"Test Combo","title":"Test Combo","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/test_combo/#Test-Combo","page":"Test Combo","title":"Test Combo","text":"","category":"section"},{"location":"man/test_combo/","page":"Test Combo","title":"Test Combo","text":" test_combo(test_mode;kwargs...)","category":"page"},{"location":"man/test_combo/#JudiLing.test_combo-Tuple{Any}","page":"Test Combo","title":"JudiLing.test_combo","text":"test_combo(test_mode;kwargs...)\n\nA wrapper function for a full model for a specific combination of parameters. A detailed introduction is in Test Combo Introduction\n\nnote: Note\ntestcombo: testcombo is deprecated. While it will remain in the package it is no longer actively maintained.\n\nObligatory Arguments\n\ntest_mode::Symbol: which test mode, currently supports :trainonly, :presplit, :carefulsplit and :randomsplit.\n\nOptional Arguments\n\ntrain_sample_size::Int64=0: the desired number of training data\nval_sample_size::Int64=0: the desired number of validation data\nval_ratio::Float64=0.0: the desired portion of validation data, if works only if :valsamplesize is 0.0.\nextension::String=\".csv\": the extension for data nfeaturesinflections\nn_grams_target_col::Union{String, Symbol}=:Word: the column name for target strings\nn_grams_tokenized::Boolean=false: if true, the dataset target is assumed to be tokenized\nn_grams_sep_token::String=nothing: separator\ngrams::Int64=3: the number of grams for cues\nn_grams_keep_sep::Boolean=false: if true, keep separators in cues\nstart_end_token::String=\":\": start and end token in boundary cues\npath_sep_token::String=\":\": path separator in the assembled path\nrandom_seed::Int64=314: the random seed\nsd_base_mean::Int64=1: the sd mean of base features\nsd_inflection_mean::Int64=1: the sd mean of inflectional features\nsd_base::Int64=4: the sd of base features\nsd_inflection::Int64=4: the sd of inflectional features\nisdeep::Boolean=true: if true, mean of each feature is also randomized\nadd_noise::Boolean=true: if true, add additional Gaussian noise\nsd_noise::Int64=1: the sd of the Gaussian noise\nnormalized::Boolean=false: if true, most of the values range between 1 and -1, it may slightly exceed between 1 or -1 depending on the sd\nif_combined::Boolean=false: if true, then features are combined with both training and validation data\nlearn_mode::Int64=:cholesky: which learning mode, currently supports :cholesky and :wh\nmethod::Int64=:additive: whether :additive or :multiplicative decomposition is required\nshift::Int64=0.02: shift value for :additive decomposition\nmultiplier::Int64=1.01: multiplier value for :multiplicative decomposition\noutput_format::Int64=:auto: to force output format to dense(:dense) or sparse(:sparse), make it auto(:auto) to determined by the program\nsparse_ratio::Int64=0.05: the ratio to decide whether a matrix is sparse\nwh_freq::Vector=nothing: the learning sequence\ninit_weights::Matrix=nothing: the initial weights\neta::Float64=0.1: the learning rate\nn_epochs::Int64=1: the number of epochs to be trained\nmax_t::Int64=0: the number of epochs to be trained\nA::Matrix=nothing: the number of epochs to be trained\nA_mode::Symbol=:combined: the adjacency matrix mode, currently supports :combined or :train_only\nmax_can::Int64=10: the max number of candidate path to keep in the output\nthreshold_train::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for training data\nis_tolerant_train::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for training data\ntolerance_train::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for training data\nmax_tolerance_train::Int64=2: maximum number of n-grams allowed in a path for training data\nthreshold_val::Float64=0.1:the value set for the support such that if the support of an n-gram is higher than this value, the n-gram will be taking into consideration for validation data\nis_tolerant_val::Bool=false: if true, select a specified number (given by max_tolerance) of n-grams whose supports are below threshold but above a second tolerance threshold to be added to the path for validation data\ntolerance_val::Float64=-0.1: the value set for the second threshold (in tolerant mode) such that if the support for an n-gram is in between this value and the threshold and the max_tolerance number has not been reached, then allow this n-gram to be added to the path for validation data\nmax_tolerance_val::Int64=2: maximum number of n-grams allowed in a path for validation data\nn_neighbors_train::Int64=10: the top n form neighbors to be considered for training data\nn_neighbors_val::Int64=20: the top n form neighbors to be considered for validation data\nissparse::Bool=false: if true, keep sparse matrix format when learning paths\noutput_dir::String=\"out\": the output directory\nverbose::Bool=false: if true, more information will be printed\n\n\n\n\n\n","category":"method"},{"location":"#JudiLing","page":"Home","title":"JudiLing","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"JudiLing: An implementation for Linear Discriminative Learning in Julia","category":"page"},{"location":"","page":"Home","title":"Home","text":"Maintainer: Maria Heitmeier @MariaHei\nOriginal codebase: Xuefeng Luo @MegamindHenry","category":"page"},{"location":"#Installation","page":"Home","title":"Installation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"You can install JudiLing by the follow commands:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Pkg\nPkg.add(\"JudiLing\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"For brave adventurers, install test version of JudiLing by:","category":"page"},{"location":"","page":"Home","title":"Home","text":"julia> Pkg.add(url=\"https://github.com/quantling/JudiLing.jl.git\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"Or from the Julia REPL, type ] to enter the Pkg REPL mode and run","category":"page"},{"location":"","page":"Home","title":"Home","text":"pkg> add https://github.com/quantling/JudiLing.jl.git","category":"page"},{"location":"#Running-Julia-with-multiple-threads","page":"Home","title":"Running Julia with multiple threads","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"JudiLing supports the use of multiple threads. Simply start up Julia in your terminal as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"$ julia -t your_num_of_threads","category":"page"},{"location":"","page":"Home","title":"Home","text":"For detailed information on using Julia with threads, see this link.","category":"page"},{"location":"#Include-packages","page":"Home","title":"Include packages","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"Before we start, we first need to load the JudiLing package:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using JudiLing","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note: As of JudiLing 0.8.0, PyCall and Flux have become optional dependencies. This means that all code in JudiLing which requires calls to python is only available if PyCall is loaded first, like this:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using PyCall\nusing JudiLing","category":"page"},{"location":"","page":"Home","title":"Home","text":"Likewise, the code involving deep learning is only available if Julia's deep learning library Flux is loaded first, like this:","category":"page"},{"location":"","page":"Home","title":"Home","text":"using Flux\nusing JudiLing","category":"page"},{"location":"","page":"Home","title":"Home","text":"Note that Flux and PyCall have to be installed separately, and the newest version of Flux requires at least Julia 1.9. If you want to run deep learning in a GPU, make sure to also install and import CUDA.","category":"page"},{"location":"#Running-Julia-with-multiple-threads-2","page":"Home","title":"Running Julia with multiple threads","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"JudiLing supports the use of multiple threads. Simply start up Julia in your terminal as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"$ julia -t your_num_of_threads","category":"page"},{"location":"","page":"Home","title":"Home","text":"For detailed information on using Julia with threads, see this link.","category":"page"},{"location":"#Quick-start-example","page":"Home","title":"Quick start example","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The Latin dataset latin.csv contains lexemes and inflectional features for 672 inflected Latin verb forms for 8 lexemes from 4 conjugation classes. Word forms are inflected for person, number, tense, voice and mood.","category":"page"},{"location":"","page":"Home","title":"Home","text":"\"\",\"Word\",\"Lexeme\",\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"\n\"1\",\"vocoo\",\"vocare\",\"p1\",\"sg\",\"present\",\"active\",\"ind\"\n\"2\",\"vocaas\",\"vocare\",\"p2\",\"sg\",\"present\",\"active\",\"ind\"\n\"3\",\"vocat\",\"vocare\",\"p3\",\"sg\",\"present\",\"active\",\"ind\"\n\"4\",\"vocaamus\",\"vocare\",\"p1\",\"pl\",\"present\",\"active\",\"ind\"\n\"5\",\"vocaatis\",\"vocare\",\"p2\",\"pl\",\"present\",\"active\",\"ind\"\n\"6\",\"vocant\",\"vocare\",\"p3\",\"pl\",\"present\",\"active\",\"ind\"","category":"page"},{"location":"","page":"Home","title":"Home","text":"We first download and read the csv file into Julia:","category":"page"},{"location":"","page":"Home","title":"Home","text":"download(\"https://osf.io/2ejfu/download\", \"latin.csv\")\n\nlatin = JudiLing.load_dataset(\"latin.csv\");","category":"page"},{"location":"","page":"Home","title":"Home","text":"and we can inspect the latin dataframe:","category":"page"},{"location":"","page":"Home","title":"Home","text":"display(latin)","category":"page"},{"location":"","page":"Home","title":"Home","text":"672×8 DataFrame. Omitted printing of 2 columns\n│ Row │ Column1 │ Word │ Lexeme │ Person │ Number │ Tense │\n│ │ Int64 │ String │ String │ String │ String │ String │\n├─────┼─────────┼────────────────┼─────────┼────────┼────────┼────────────┤\n│ 1 │ 1 │ vocoo │ vocare │ p1 │ sg │ present │\n│ 2 │ 2 │ vocaas │ vocare │ p2 │ sg │ present │\n│ 3 │ 3 │ vocat │ vocare │ p3 │ sg │ present │\n│ 4 │ 4 │ vocaamus │ vocare │ p1 │ pl │ present │\n│ 5 │ 5 │ vocaatis │ vocare │ p2 │ pl │ present │\n│ 6 │ 6 │ vocant │ vocare │ p3 │ pl │ present │\n│ 7 │ 7 │ clamoo │ clamare │ p1 │ sg │ present │\n│ 8 │ 8 │ clamaas │ clamare │ p2 │ sg │ present │\n⋮\n│ 664 │ 664 │ carpsisseemus │ carpere │ p1 │ pl │ pluperfect │\n│ 665 │ 665 │ carpsisseetis │ carpere │ p2 │ pl │ pluperfect │\n│ 666 │ 666 │ carpsissent │ carpere │ p3 │ pl │ pluperfect │\n│ 667 │ 667 │ cuccurissem │ currere │ p1 │ sg │ pluperfect │\n│ 668 │ 668 │ cuccurissees │ currere │ p2 │ sg │ pluperfect │\n│ 669 │ 669 │ cuccurisset │ currere │ p3 │ sg │ pluperfect │\n│ 670 │ 670 │ cuccurisseemus │ currere │ p1 │ pl │ pluperfect │\n│ 671 │ 671 │ cuccurisseetis │ currere │ p2 │ pl │ pluperfect │\n│ 672 │ 672 │ cuccurissent │ currere │ p3 │ pl │ pluperfect │","category":"page"},{"location":"","page":"Home","title":"Home","text":"For the production model, we want to predict correct forms given their lexemes and inflectional features. For example, giving the lexeme vocare and its inflectional features p1, sg, present, active and ind, the model should produce the form vocoo. On the other hand, the comprehension model takes forms as input and tries to predict their lexemes and inflectional features.","category":"page"},{"location":"","page":"Home","title":"Home","text":"We use letter trigrams to encode our forms. For word vocoo, for example, we use trigrams #vo, voc, oco, coo and oo#. Here, # is used as start/end token to encode the initial trigram and finial trigram of a word. The row vectors of the C matrix specify for each word which of the trigrams are realized in that word.","category":"page"},{"location":"","page":"Home","title":"Home","text":"To make the C matrix, we use the make_cue_matrix function:","category":"page"},{"location":"","page":"Home","title":"Home","text":"cue_obj = JudiLing.make_cue_matrix(\n latin,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )","category":"page"},{"location":"","page":"Home","title":"Home","text":"Next, we simulate the semantic matrix S using the make_S_matrix function:","category":"page"},{"location":"","page":"Home","title":"Home","text":"n_features = size(cue_obj.C, 2)\nS = JudiLing.make_S_matrix(\n latin,\n [\"Lexeme\"],\n [\"Person\",\"Number\",\"Tense\",\"Voice\",\"Mood\"],\n ncol=n_features)","category":"page"},{"location":"","page":"Home","title":"Home","text":"For this simulation, first random vectors are assigned to every lexeme and inflectional feature, and next the vectors of those features are summed up to obtain the semantic vector of the inflected form. Similar dimensions for C and S work best. Therefore, we retrieve the number of columns from the C matrix and pass it to make_S_matrix when constructing S.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, the next step is to calculate a mapping from S to C by solving equation C = SG. We use Cholesky decomposition to solve this equation:","category":"page"},{"location":"","page":"Home","title":"Home","text":"G = JudiLing.make_transform_matrix(S, cue_obj.C)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, we can make our predicted C matrix Chat:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Chat = S * G","category":"page"},{"location":"","page":"Home","title":"Home","text":"and evaluate the model's prediction accuracy:","category":"page"},{"location":"","page":"Home","title":"Home","text":"@show JudiLing.eval_SC(Chat, cue_obj.C)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.eval_SC(Chat, cue_obj.C) = 0.9926","category":"page"},{"location":"","page":"Home","title":"Home","text":"NOTE: Accuracy may be different depending on the simulated semantic matrix.","category":"page"},{"location":"","page":"Home","title":"Home","text":"Similar to G and Chat, we can solve S = CF:","category":"page"},{"location":"","page":"Home","title":"Home","text":"F = JudiLing.make_transform_matrix(cue_obj.C, S)","category":"page"},{"location":"","page":"Home","title":"Home","text":"and we then calculate the Shat matrix and evaluate comprehension accuracy:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Shat = cue_obj.C * F\n@show JudiLing.eval_SC(Shat, S)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.eval_SC(Shat, S) = 0.9911","category":"page"},{"location":"","page":"Home","title":"Home","text":"NOTE: Accuracy may be different depending on the simulated semantic matrix.","category":"page"},{"location":"","page":"Home","title":"Home","text":"To model speech production, the proper triphones have to be selected and put into the right order. We have two algorithms that accomplish this. Both algorithms construct paths in a triphone space that start with word-initial triphones and end with word-final triphones.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The first step is to construct an adjacency matrix that specify which triphone can follow each other. In this example, we use the adjacency matrix constructed by make_cue_matrix, but we can also make use of a independently constructed adjacency matrix if required.","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = cue_obj.A","category":"page"},{"location":"","page":"Home","title":"Home","text":"For our sequencing algorithms, we calculate the number of timesteps we need for our algorithms. For the Latin dataset, the max timestep is equal to the length of the longest word. The argument :Word specifies the column in the Latin dataset that lists the words' forms.","category":"page"},{"location":"","page":"Home","title":"Home","text":"max_t = JudiLing.cal_max_timestep(latin, :Word)","category":"page"},{"location":"","page":"Home","title":"Home","text":"One sequence finding algorithm used discrimination learning for the position of triphones. This function returns two lists, one with candidate triphone paths and their positional learning support (res) and one with the semantic supports for the gold paths (gpi).","category":"page"},{"location":"","page":"Home","title":"Home","text":"res_learn, gpi_learn = JudiLing.learn_paths(\n latin,\n latin,\n cue_obj.C,\n S,\n F,\n Chat,\n A,\n cue_obj.i2f,\n cue_obj.f2i, # api changed in 0.3.1\n check_gold_path = true,\n gold_ind = cue_obj.gold_ind,\n Shat_val = Shat,\n max_t = max_t,\n max_can = 10,\n grams = 3,\n threshold = 0.05,\n tokenized = false,\n keep_sep = false,\n target_col = :Word,\n verbose = true\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"We evaluate the accuracy on the training data as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"acc_learn = JudiLing.eval_acc(res_learn, cue_obj.gold_ind, verbose = false)\n\nprintln(\"Acc for learn: $acc_learn\")","category":"page"},{"location":"","page":"Home","title":"Home","text":"Acc for learn: 0.9985","category":"page"},{"location":"","page":"Home","title":"Home","text":"The second sequence finding algorithm is usually faster than the first, but does not provide positional learnability estimates.","category":"page"},{"location":"","page":"Home","title":"Home","text":"res_build = JudiLing.build_paths(\n latin,\n cue_obj.C,\n S,\n F,\n Chat,\n A,\n cue_obj.i2f,\n cue_obj.gold_ind,\n max_t=max_t,\n n_neighbors=3,\n verbose=true\n )\n\nacc_build = JudiLing.eval_acc(\n res_build,\n cue_obj.gold_ind,\n verbose=false\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Acc for build: 0.9955","category":"page"},{"location":"","page":"Home","title":"Home","text":"After having obtained the results from the sequence functions: learn_paths or build_paths, we can save the results either into a csv or into a dataframe, the dataframe can be loaded into R with the rput command of the RCall package.","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.write2csv(\n res_learn,\n latin,\n cue_obj,\n cue_obj,\n \"latin_learn_res.csv\",\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word,\n root_dir = @__DIR__,\n output_dir = \"latin_out\"\n)\n\ndf_learn = JudiLing.write2df(\n res_learn,\n latin,\n cue_obj,\n cue_obj,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word\n)\n\nJudiLing.write2csv(\n res_build,\n latin,\n cue_obj,\n cue_obj,\n \"latin_build_res.csv\",\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word,\n root_dir = @__DIR__,\n output_dir = \"latin_out\"\n)\n\ndf_build = JudiLing.write2df(\n res_build,\n latin,\n cue_obj,\n cue_obj,\n grams = 3,\n tokenized = false,\n sep_token = nothing,\n start_end_token = \"#\",\n output_sep_token = \"\",\n path_sep_token = \":\",\n target_col = :Word\n)\n\ndisplay(df_learn)\ndisplay(df_build)","category":"page"},{"location":"","page":"Home","title":"Home","text":"3805×9 DataFrame. Omitted printing of 5 columns\n│ Row │ utterance │ identifier │ path │ pred │\n│ │ Int64? │ String? │ Union{Missing, String} │ String? │\n├──────┼───────────┼────────────────┼─────────────────────────────────────────────────────────┼────────────────┤\n│ 1 │ 1 │ vocoo │ #vo:voc:oco:coo:oo# │ vocoo │\n│ 2 │ 2 │ vocaas │ #vo:voc:oca:caa:aas:as# │ vocaas │\n│ 3 │ 2 │ vocaas │ #vo:voc:oca:caa:aab:aba:baa:aas:as# │ vocaabaas │\n│ 4 │ 2 │ vocaas │ #vo:voc:oca:caa:aat:ati:tis:is# │ vocaatis │\n│ 5 │ 2 │ vocaas │ #vo:voc:oca:caa:aav:avi:vis:ist:sti:tis:is# │ vocaavistis │\n│ 6 │ 2 │ vocaas │ #vo:voc:oca:caa:aam:amu:mus:us# │ vocaamus │\n│ 7 │ 2 │ vocaas │ #vo:voc:oca:caa:aab:abi:bit:it# │ vocaabit │\n│ 8 │ 2 │ vocaas │ #vo:voc:oca:caa:aam:amu:mur:ur# │ vocaamur │\n│ 9 │ 2 │ vocaas │ #vo:voc:oca:caa:aar:are:ret:et# │ vocaaret │\n⋮\n│ 3796 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:ure:ree:eet:eti:tis:is# │ cuccureetis │\n│ 3797 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:uri:ris:ist:sti:tis:is# │ cuccuristis │\n│ 3798 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:set:et# │ cuccurisset │\n│ 3799 │ 671 │ cuccurisseetis │ #cu:cur:urr:rri:rim:imi:min:ini:nii:ii# │ curriminii │\n│ 3800 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:sen:ent:nt# │ cuccurissent │\n│ 3801 │ 672 │ cuccurissent │ #cu:cur:urr:rre:rer:ere:ren:ent:nt# │ currerent │\n│ 3802 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:see:eem:emu:mus:us# │ cuccurisseemus │\n│ 3803 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:see:eet:eti:tis:is# │ cuccurisseetis │\n│ 3804 │ 672 │ cuccurissent │ #cu:cur:urr:rre:rer:ere:ren:ent:ntu:tur:ur# │ currerentur │\n│ 3805 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:see:ees:es# │ cuccurissees │\n2519×9 DataFrame. Omitted printing of 4 columns\n│ Row │ utterance │ identifier │ path │ pred │ num_tolerance │\n│ │ Int64? │ String? │ Union{Missing, String} │ String? │ Int64? │\n├──────┼───────────┼────────────────┼─────────────────────────────────────────────────┼──────────────┼───────────────┤\n│ 1 │ 1 │ vocoo │ #vo:voc:oco:coo:oo# │ vocoo │ 0 │\n│ 2 │ 1 │ vocoo │ #vo:voc:oca:caa:aab:abo:boo:oo# │ vocaaboo │ 0 │\n│ 3 │ 1 │ vocoo │ #vo:voc:oca:caa:aab:aba:bam:am# │ vocaabam │ 0 │\n│ 4 │ 2 │ vocaas │ #vo:voc:oca:caa:aas:as# │ vocaas │ 0 │\n│ 5 │ 2 │ vocaas │ #vo:voc:oca:caa:aab:abi:bis:is# │ vocaabis │ 0 │\n│ 6 │ 2 │ vocaas │ #vo:voc:oca:caa:aat:ati:tis:is# │ vocaatis │ 0 │\n│ 7 │ 3 │ vocat │ #vo:voc:oca:cat:at# │ vocat │ 0 │\n│ 8 │ 3 │ vocat │ #vo:voc:oca:caa:aab:aba:bat:at# │ vocaabat │ 0 │\n│ 9 │ 3 │ vocat │ #vo:voc:oca:caa:aas:as# │ vocaas │ 0 │\n⋮\n│ 2510 │ 671 │ cuccurisseetis │ #cu:cur:uri:ris:iss:sse:see:ees:es# │ curissees │ 0 │\n│ 2511 │ 671 │ cuccurisseetis │ #cu:cur:uri:ris:iss:sse:see:eem:emu:mus:us# │ curisseemus │ 0 │\n│ 2512 │ 671 │ cuccurisseetis │ #cu:cur:uri:ris:is# │ curis │ 0 │\n│ 2513 │ 671 │ cuccurisseetis │ #cu:cuc:ucc:ccu:cur:uri:ris:is# │ cuccuris │ 0 │\n│ 2514 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:sen:ent:nt# │ cuccurissent │ 0 │\n│ 2515 │ 672 │ cuccurissent │ #cu:cur:uri:ris:iss:sse:sen:ent:nt# │ curissent │ 0 │\n│ 2516 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:set:et# │ cuccurisset │ 0 │\n│ 2517 │ 672 │ cuccurissent │ #cu:cur:uri:ris:iss:sse:set:et# │ curisset │ 0 │\n│ 2518 │ 672 │ cuccurissent │ #cu:cuc:ucc:ccu:cur:uri:ris:iss:sse:sem:em# │ cuccurissem │ 0 │\n│ 2519 │ 672 │ cuccurissent │ #cu:cur:uri:ris:iss:sse:sem:em# │ curissem │ 0 │","category":"page"},{"location":"#Cross-validation","page":"Home","title":"Cross-validation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"The model also provides functionality for cross-validation. Here, we first split the dataset randomly into 90% training and 10% validation data:","category":"page"},{"location":"","page":"Home","title":"Home","text":"latin_train, latin_val = JudiLing.loading_data_randomly_split(\"latin.csv\",\n \"data\",\n \"latin\",\n val_ratio=0.1,\n random_seed=42)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Then, we make the C matrix by passing both training and validation datasets to the make_combined_cue_matrix function which ensures that the C matrix contains columns for both training and validation data.","category":"page"},{"location":"","page":"Home","title":"Home","text":"cue_obj_train, cue_obj_val = JudiLing.make_combined_cue_matrix(\n latin_train,\n latin_val,\n grams = 3,\n target_col = :Word,\n tokenized = false,\n keep_sep = false\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Next, we simulate semantic vectors, again for both the training and validation data, using make_combined_S_matrix:","category":"page"},{"location":"","page":"Home","title":"Home","text":"n_features = size(cue_obj_train.C, 2)\nS_train, S_val = JudiLing.make_combined_S_matrix(\n latin_train,\n latin_val,\n [\"Lexeme\"],\n [\"Person\", \"Number\", \"Tense\", \"Voice\", \"Mood\"],\n ncol = n_features\n)","category":"page"},{"location":"","page":"Home","title":"Home","text":"After that, we make the transformation matrices, but this time we only use the training dataset. We use these transformation matrices to predict the validation dataset.","category":"page"},{"location":"","page":"Home","title":"Home","text":"G_train = JudiLing.make_transform_matrix(S_train, cue_obj_train.C)\nF_train = JudiLing.make_transform_matrix(cue_obj_train.C, S_train)\n\nChat_train = S_train * G_train\nChat_val = S_val * G_train\nShat_train = cue_obj_train.C * F_train\nShat_val = cue_obj_val.C * F_train\n\n@show JudiLing.eval_SC(Chat_train, cue_obj_train.C)\n@show JudiLing.eval_SC(Chat_val, cue_obj_val.C)\n@show JudiLing.eval_SC(Shat_train, S_train)\n@show JudiLing.eval_SC(Shat_val, S_val)","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"JudiLing.eval_SC(Chat_train, cue_obj_train.C) = 0.995\nJudiLing.eval_SC(Chat_val, cue_obj_val.C) = 0.403\nJudiLing.eval_SC(Shat_train, S_train) = 0.9917\nJudiLing.eval_SC(Shat_val, S_val) = 1.0","category":"page"},{"location":"","page":"Home","title":"Home","text":"Finally, we can find possible paths through build_paths or learn_paths. Since validation datasets are harder to predict, we turn on tolerant mode which allow the algorithms to find more paths but at the cost of investing more time.","category":"page"},{"location":"","page":"Home","title":"Home","text":"A = cue_obj_train.A\nmax_t = JudiLing.cal_max_timestep(latin_train, latin_val, :Word)\n\nres_learn_train, gpi_learn_train = JudiLing.learn_paths(\n latin_train,\n latin_train,\n cue_obj_train.C,\n S_train,\n F_train,\n Chat_train,\n A,\n cue_obj_train.i2f,\n cue_obj_train.f2i, # api changed in 0.3.1\n gold_ind = cue_obj_train.gold_ind,\n Shat_val = Shat_train,\n check_gold_path = true,\n max_t = max_t,\n max_can = 10,\n grams = 3,\n threshold = 0.05,\n tokenized = false,\n sep_token = \"_\",\n keep_sep = false,\n target_col = :Word,\n issparse = :dense,\n verbose = true,\n)\n\nres_learn_val, gpi_learn_val = JudiLing.learn_paths(\n latin_train,\n latin_val,\n cue_obj_train.C,\n S_val,\n F_train,\n Chat_val,\n A,\n cue_obj_train.i2f,\n cue_obj_train.f2i, # api changed in 0.3.1\n gold_ind = cue_obj_val.gold_ind,\n Shat_val = Shat_val,\n check_gold_path = true,\n max_t = max_t,\n max_can = 10,\n grams = 3,\n threshold = 0.05,\n is_tolerant = true,\n tolerance = -0.1,\n max_tolerance = 2,\n tokenized = false,\n sep_token = \"-\",\n keep_sep = false,\n target_col = :Word,\n issparse = :dense,\n verbose = true,\n)\n\nacc_learn_train =\n JudiLing.eval_acc(res_learn_train, cue_obj_train.gold_ind, verbose = false)\nacc_learn_val = JudiLing.eval_acc(res_learn_val, cue_obj_val.gold_ind, verbose = false)\n\nres_build_train = JudiLing.build_paths(\n latin_train,\n cue_obj_train.C,\n S_train,\n F_train,\n Chat_train,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t = max_t,\n n_neighbors = 3,\n verbose = true,\n)\n\nres_build_val = JudiLing.build_paths(\n latin_val,\n cue_obj_train.C,\n S_val,\n F_train,\n Chat_val,\n A,\n cue_obj_train.i2f,\n cue_obj_train.gold_ind,\n max_t = max_t,\n n_neighbors = 20,\n verbose = true,\n)\n\nacc_build_train =\n JudiLing.eval_acc(res_build_train, cue_obj_train.gold_ind, verbose = false)\nacc_build_val = JudiLing.eval_acc(res_build_val, cue_obj_val.gold_ind, verbose = false)\n\n@show acc_learn_train\n@show acc_learn_val\n@show acc_build_train\n@show acc_build_val","category":"page"},{"location":"","page":"Home","title":"Home","text":"Output:","category":"page"},{"location":"","page":"Home","title":"Home","text":"acc_learn_train = 0.9983\nacc_learn_val = 0.6866\nacc_build_train = 1.0\nacc_build_val = 0.3284","category":"page"},{"location":"","page":"Home","title":"Home","text":"Alternatively, we have a wrapper function incorporating all above functionalities. With this function, you can quickly explore datasets with different parameter settings. Please find more in the Test Combo Introduction.","category":"page"},{"location":"#Supports","page":"Home","title":"Supports","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"There are two types of supports in outputs. An utterance level and a set of supports for each cue. The former support is also called \"synthesis-by-analysis\" support. This support is calculated by predicted S vector and original S vector and it is used to select the best paths. Cue level supports are slices of Yt matrices from each timestep. Those supports are used to determine whether a cue is eligible for constructing paths.","category":"page"},{"location":"#Acknowledgments","page":"Home","title":"Acknowledgments","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"This project was supported by the ERC advanced grant WIDE-742545 and by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC number 2064/1 - Project number 390727645.","category":"page"},{"location":"#Citation","page":"Home","title":"Citation","text":"","category":"section"},{"location":"","page":"Home","title":"Home","text":"If you find this package helpful, please cite it as follows:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Luo, X., Heitmeier, M., Chuang, Y. Y., Baayen, R. H. JudiLing: an implementation of the Discriminative Lexicon Model in Julia. Eberhard Karls Universität Tübingen, Seminar für Sprachwissenschaft.","category":"page"},{"location":"","page":"Home","title":"Home","text":"The following studies have made use of several algorithms now implemented in JudiLing instead of WpmWithLdl:","category":"page"},{"location":"","page":"Home","title":"Home","text":"Baayen, R. H., Chuang, Y. Y., Shafaei-Bajestan, E., and Blevins, J. P. (2019). The discriminative lexicon: A unified computational model for the lexicon and lexical processing in comprehension and production grounded not in (de)composition but in linear discriminative learning. Complexity, 2019, 1-39.\nBaayen, R. H., Chuang, Y. Y., and Blevins, J. P. (2018). Inflectional morphology with linear mappings. The Mental Lexicon, 13 (2), 232-270.\nChuang, Y.-Y., Lõo, K., Blevins, J. P., and Baayen, R. H. (2020). Estonian case inflection made simple. A case study in Word and Paradigm morphology with Linear Discriminative Learning. In Körtvélyessy, L., and Štekauer, P. (Eds.) Complex Words: Advances in Morphology, 1-19.\nChuang, Y-Y., Bell, M. J., Banke, I., and Baayen, R. H. (2020). Bilingual and multilingual mental lexicon: a modeling study with Linear Discriminative Learning. Language Learning, 1-55.\nHeitmeier, M., Chuang, Y-Y., Baayen, R. H. (2021). Modeling morphology with Linear Discriminative Learning: considerations and design choices. Frontiers in Psychology, 12, 4929.\nDenistia, K., and Baayen, R. H. (2022). The morphology of Indonesian: Data and quantitative modeling. In Shei, C., and Li, S. (Eds.) The Routledge Handbook of Asian Linguistics, (pp. 605-634). Routledge, London.\nHeitmeier, M., Chuang, Y.-Y., and Baayen, R. H. (2023). How trial-to-trial learning shapes mappings in the mental lexicon: Modelling lexical decision with linear discriminative learning. Cognitive Psychology, 1-30.\nChuang, Y. Y., Kang, M., Luo, X. F. and Baayen, R. H. (2023). Vector Space Morphology with Linear Discriminative Learning. In Crepaldi, D. (Ed.) Linguistic morphology in the mind and brain.\nHeitmeier, M., Chuang, Y. Y., Axen, S. D., & Baayen, R. H. (2024). Frequency effects in linear discriminative learning. Frontiers in Human Neuroscience, 17, 1242720.\nPlag, I., Heitmeier, M. & Domahs, F. (to appear). German nominal number interpretation in an impaired mental lexicon: A naive discriminative learning perspective. The Mental Lexicon.","category":"page"},{"location":"man/make_cue_matrix/","page":"Make Cue Matrix","title":"Make Cue Matrix","text":"CurrentModule = JudiLing","category":"page"},{"location":"man/make_cue_matrix/#Make-Cue-Matrix","page":"Make Cue Matrix","title":"Make Cue Matrix","text":"","category":"section"},{"location":"man/make_cue_matrix/","page":"Make Cue Matrix","title":"Make Cue Matrix","text":" Cue_Matrix_Struct\r\n make_cue_matrix\r\n make_combined_cue_matrix\r\n make_ngrams\r\n make_cue_matrix(data::DataFrame)\r\n make_cue_matrix(data::DataFrame, cue_obj::Cue_Matrix_Struct)\r\n make_cue_matrix(data_train::DataFrame, data_val::DataFrame)\r\n make_combined_cue_matrix(data_train, data_val)\r\n make_cue_matrix_from_CFBS(features::Vector{Vector{T}};\r\n pad_val::T = 0.,\r\n ncol::Union{Missing,Int}=missing) where {T}\r\n make_combined_cue_matrix_from_CFBS(features_train::Vector{Vector{T}},\r\n features_test::Vector{Vector{T}};\r\n pad_val::T = 0.,\r\n ncol::Union{Missing,Int}=missing) where {T}\r\n make_ngrams(tokens, grams, keep_sep, sep_token, start_end_token)","category":"page"},{"location":"man/make_cue_matrix/#JudiLing.Cue_Matrix_Struct","page":"Make Cue Matrix","title":"JudiLing.Cue_Matrix_Struct","text":"A structure that stores information created by makecuematrix: C is the cue matrix; f2i is a dictionary returning the indices for features; i2f is a dictionary returning the features for indices; goldind is a list of indices of gold paths; A is the adjacency matrix; grams is the number of grams for cues; targetcol is the column name for target strings; tokenized is whether the dataset target is tokenized; septoken is the separator; keepsep is whether to keep separators in cues; startendtoken is the start and end token in boundary cues.\n\n\n\n\n\n","category":"type"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"Construct cue matrix.\n\n\n\n\n\n","category":"function"},{"location":"man/make_cue_matrix/#JudiLing.make_combined_cue_matrix","page":"Make Cue Matrix","title":"JudiLing.make_combined_cue_matrix","text":"Construct cue matrix where combined features and adjacencies for both training datasets and validation datasets.\n\n\n\n\n\n","category":"function"},{"location":"man/make_cue_matrix/#JudiLing.make_ngrams","page":"Make Cue Matrix","title":"JudiLing.make_ngrams","text":"Given a list of string tokens, extract their n-grams.\n\n\n\n\n\n","category":"function"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame}","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(data::DataFrame)\n\nMake the cue matrix for training datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_train = JudiLing.make_cue_matrix(\n latin_train,\n grams=3,\n target_col=:Word,\n tokenized=false,\n sep_token=\"-\",\n start_end_token=\"#\",\n keep_sep=false,\n verbose=false\n )\n\n# make cue matrix with tokenization\ncue_obj_train = JudiLing.make_cue_matrix(\n french_train,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n start_end_token=\"#\",\n keep_sep=true,\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame, JudiLing.Cue_Matrix_Struct}","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(data::DataFrame, cue_obj::Cue_Matrix_Struct)\n\nMake the cue matrix for validation datasets and corresponding indices as well as the adjacency matrix and gold paths given a dataset in a form of dataframe.\n\nObligatory Arguments\n\ndata::DataFrame: the dataset\ncue_obj::Cue_Matrix_Struct: training cue object\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_val = JudiLing.make_cue_matrix(\n latin_val,\n cue_obj_train,\n grams=3,\n target_col=:Word,\n tokenized=false,\n sep_token=\"-\",\n keep_sep=false,\n start_end_token=\"#\",\n verbose=false\n )\n\n# make cue matrix with tokenization\ncue_obj_val = JudiLing.make_cue_matrix(\n french_val,\n cue_obj_train,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n keep_sep=true,\n start_end_token=\"#\",\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix-Tuple{DataFrames.DataFrame, DataFrames.DataFrame}","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix","text":"make_cue_matrix(data_train::DataFrame, data_val::DataFrame)\n\nMake the cue matrix for traiing and validation datasets at the same time.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_cue_matrix(\n latin_train,\n latin_val,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )\n\n# make cue matrix with tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_cue_matrix(\n french_train,\n french_val,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n keep_sep=true,\n start_end_token=\"#\",\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_combined_cue_matrix-Tuple{Any, Any}","page":"Make Cue Matrix","title":"JudiLing.make_combined_cue_matrix","text":"make_combined_cue_matrix(data_train, data_val)\n\nMake the cue matrix for training and validation datasets at the same time, where the features and adjacencies are combined.\n\nObligatory Arguments\n\ndata_train::DataFrame: the training dataset\ndata_val::DataFrame: the validation dataset\n\nOptional Arguments\n\ngrams::Int64=3: the number of grams for cues\ntarget_col::Union{String, Symbol}=:Words: the column name for target strings\ntokenized::Bool=false:if true, the dataset target is assumed to be tokenized\nsep_token::Union{Nothing, String, Char}=nothing: separator\nkeep_sep::Bool=false: if true, keep separators in cues\nstart_end_token::Union{String, Char}=\"#\": start and end token in boundary cues\nverbose::Bool=false: if true, more information is printed\n\nExamples\n\n# make cue matrix without tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_combined_cue_matrix(\n latin_train,\n latin_val,\n grams=3,\n target_col=:Word,\n tokenized=false,\n keep_sep=false\n )\n\n# make cue matrix with tokenization\ncue_obj_train, cue_obj_val = JudiLing.make_combined_cue_matrix(\n french_train,\n french_val,\n grams=3,\n target_col=:Syllables,\n tokenized=true,\n sep_token=\"-\",\n keep_sep=true,\n start_end_token=\"#\",\n verbose=false\n )\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_cue_matrix_from_CFBS-Union{Tuple{Array{Vector{T}, 1}}, Tuple{T}} where T","page":"Make Cue Matrix","title":"JudiLing.make_cue_matrix_from_CFBS","text":"make_cue_matrix_from_CFBS(features::Vector{Vector{T}};\n pad_val::T = 0.,\n ncol::Union{Missing,Int}=missing) where {T}\n\nCreate a cue matrix from a vector of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val.\n\nObligatory arguments\n\nfeatures::Vector{Vector{T}}: vector of vectors containing C-FBS features\n\nOptional arguments\n\npad_val::T = 0.: Value with which the feature vectors will be padded\nncol::Union{Missing,Int}=missing: Number of columns of the C matrix. If not set, will be set to the maximum number of features\n\nExamples\n\nC = JudiLing.make_cue_matrix_from_CFBS(features)\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_combined_cue_matrix_from_CFBS-Union{Tuple{T}, Tuple{Array{Vector{T}, 1}, Array{Vector{T}, 1}}} where T","page":"Make Cue Matrix","title":"JudiLing.make_combined_cue_matrix_from_CFBS","text":"make_combined_cue_matrix_from_CFBS(features_train::Vector{Vector{T}},\n features_test::Vector{Vector{T}};\n pad_val::T = 0.,\n ncol::Union{Missing,Int}=missing) where {T}\n\nCreate cue matrices from two vectors of feature vectors (usually CFBS vectors). It is expected (though of course not necessary) that the vectors have varying lengths. They are consequently padded on the right with the provided pad_val. The cue matrices are set to have to the size of the maximum number of feature values in features_train and features_test.\n\nObligatory arguments\n\nfeatures_train::Vector{Vector{T}}: vector of vectors containing C-FBS features\nfeatures_test::Vector{Vector{T}}: vector of vectors containing C-FBS features\n\nOptional arguments\n\npad_val::T = 0.: Value with which the feature vectors will be padded\nncol::Union{Missing,Int}=missing: Number of columns of the C matrices. If not set, will be set to the maximum number of features in features_train and features_test\n\nExamples\n\nC_train, C_test = JudiLing.make_combined_cue_matrix_from_CFBS(features_train, features_test)\n\n\n\n\n\n","category":"method"},{"location":"man/make_cue_matrix/#JudiLing.make_ngrams-NTuple{5, Any}","page":"Make Cue Matrix","title":"JudiLing.make_ngrams","text":"make_ngrams(tokens, grams, keep_sep, sep_token, start_end_token)\n\nGiven a list of string tokens return a list of all n-grams for these tokens.\n\n\n\n\n\n","category":"method"}] }