diff --git a/docs/information_theory.rst b/docs/information_theory.rst index 06e55a3..b0adfb1 100644 --- a/docs/information_theory.rst +++ b/docs/information_theory.rst @@ -6,14 +6,14 @@ from the text. These include: - `{doc/span}._.entropy`: the Shannon entropy of the text using the `token.prob` as the probability of each token. Entropy is defined as :math:`H(X) = -\sum_{i=1}^n p(x_i) \log_e p(x_i)`. Where :math:`p(x_i)` is the probability of the token :math:`x_i`. -- `{doc/span}._.perplexity`: the perplexity of the text. perplexity is a measurement of how well a +- `{doc/span}._.perplexity`: the perplexity of the text. Perplexity is a measurement of how well a probability distribution or probability model predicts a sample. Perplexity is defined as :math:`PPL(X) = e^{-H(X)}`, where :math:`H(X)` is the entropy of the text. -- `{doc/span}._.per_word_perplexity`: The perplexity of the text, divided by the number of words. Can se considered the length normalized perplexity. +- `{doc/span}._.per_word_perplexity`: The perplexity of the text, divided by the number of words. Can be considered the length-normalized perplexity. -These information theoretic measures is for example often used to describe the complexity of a text. +These information theoretic measures are for example often used to describe the complexity of a text. The higher the entropy, the more complex the text is. -Similarly, one could imagine filtering text based on the per word perplexity given the assumption that -highly surprising text is in fact non-coherent text pieces. +Similarly, one could imagine filtering text based on the per-word perplexity given the assumption that +highly surprising text are in fact non-coherent text pieces. .. note:: The information theory components require an available lexeme prop table from spaCy which is not available for all languages. A warning will be raised and values set to np.nan if the table cannot be found for the language.