Skip to content

Commit

Permalink
update gensim wikicorpus doc
Browse files Browse the repository at this point in the history
  • Loading branch information
fabriciorsf committed Jul 23, 2024
1 parent de027da commit f2b2ce1
Showing 1 changed file with 8 additions and 8 deletions.
16 changes: 8 additions & 8 deletions gensim/corpora/wikicorpus.py
Original file line number Diff line number Diff line change
Expand Up @@ -466,10 +466,10 @@ def process_article(
----------
args : (str, str, int)
Article text, article title, page identificator.
tokenizer_func : function OR list of function
Function for tokenization (defaults is :func:`~gensim.corpora.wikicorpus.tokenize`).
Each function needs to have interface:
tokenizer_func(text: str, token_min_len: int, token_max_len: int, lower: bool) -> list of str.
tokenizer_func : function OR list of function, optional
Function for tokenization (defaults is :func:`~gensim.corpora.wikicorpus.tokenize`).
Each function needs to have interface:
`tokenizer_func(text: str, token_min_len: int, token_max_len: int, lower: bool) -> list of str.`
token_min_len : int
Minimal token length.
token_max_len : int
Expand Down Expand Up @@ -593,10 +593,10 @@ def __init__(
**IMPORTANT: this needs a really long time**.
filter_namespaces : tuple of str, optional
Namespaces to consider.
tokenizer_func : function, optional
Function that will be used for tokenization. By default, use :func:`~gensim.corpora.wikicorpus.tokenize`.
If you inject your own tokenizer, it must conform to this interface:
`tokenizer_func(text: str, token_min_len: int, token_max_len: int, lower: bool) -> list of str`
tokenizer_func : function OR list of function, optional
Function for tokenization (defaults is :func:`~gensim.corpora.wikicorpus.tokenize`).
Each function needs to have interface:
`tokenizer_func(text: str, token_min_len: int, token_max_len: int, lower: bool) -> list of str.`
article_min_tokens : int, optional
Minimum tokens in article. Article will be ignored if number of tokens is less.
token_min_len : int, optional
Expand Down

0 comments on commit f2b2ce1

Please sign in to comment.