Skip to content

Commit

Permalink
minor changes to documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
jeromedockes committed May 24, 2021
1 parent 08c9156 commit 3f5c4e6
Show file tree
Hide file tree
Showing 20 changed files with 152 additions and 40 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
doxygen_output/
docs/*.html
docs/example_data/*.html
docs/schema/*.html
!docs/docinfo.html
docs/gh-pages-site/

Expand Down
17 changes: 13 additions & 4 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -84,18 +84,27 @@ set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_SOURCE_DIR}/LICENSE.txt")
set(CPACK_SOURCE_GENERATOR "TGZ")
set(CPACK_SOURCE_IGNORE_FILES
[.]git/
[.]gitignore
[.]gitattributes
cmake_build/
cmake_release_build/
qmake_build/
Makefile
test/labelbuddy_tests
test_build
test_cli/data/newsgroups/
test_cli/__pycache__/
[.]pytest_cache/
test_cli/.*[.]pyc
doxygen_output/
docs/gh-pages-site/
docs/screenshots/
docs/README[.]rst
docs/screenshots/.*[.]png
docs/index[.]html
docs/installation[.]html
docs/manpage[.]html
docs/readme-as-seen-on-github[.]html
docs/screenshots[.]html
docs/example_data/index.html
docs/schema/index.html
moc_.*
src/moc_.*
test/moc_.*
Expand All @@ -116,7 +125,7 @@ set(CPACK_SOURCE_IGNORE_FILES
.*[.]smod
.*[.]lai
.*[.]la
.*[.]a
.*[.]a$
.*[.]lib
.*[.]exe
.*[.]out
Expand Down
3 changes: 1 addition & 2 deletions docs/Description
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
GUI tool for annotating documents
This is an application for annotating parts of documents with labels.
labelbuddy can be used for Part Of Speech tagging,
Named Entity Recognition,
labelbuddy can be used for Named Entity Recognition,
sentiment analysis and document classification, etc.
It depends on Qt5.
23 changes: 19 additions & 4 deletions docs/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ html += readme-as-seen-on-github.html
webhtml := $(patsubst %,gh-pages-site/%,$(html))
examples :=$(filter-out %.py %annotations.json, $(wildcard example_data/*))
examples := $(patsubst %.adoc,%.html,$(examples))
examples += example_data/wiki_extracts_documents.json
examples += example_data/pos_labels.json
webexamples := $(patsubst %,gh-pages-site/%, $(examples))
schemas := $(wildcard schema/*)
schemas := $(patsubst %.adoc,%.html,$(schemas))
Expand All @@ -13,7 +15,7 @@ screenshots := $(wildcard screenshots/*.png)

.PHONY: all clean web

all: $(html) web labelbuddy.1 demo_data/example_documents.json
all: $(html) labelbuddy.1 demo_data/example_documents.json $(examples) $(schemas)

web: $(webhtml) $(webexamples) $(webschemas)

Expand All @@ -39,7 +41,7 @@ gh-pages-site/screenshots.html: screenshots.html $(screenshots)
mkdir -p gh-pages-site
xsltproc add_nav.xsl $< > $@
mkdir -p gh-pages-site/screenshots
cp screenshots/* gh-pages-site/screenshots/
cp screenshots/* gh-pages-site/screenshots/ 2>/dev/null || :

example_data/exported_docs.json: example_data/make_examples.py
$<
Expand All @@ -51,10 +53,23 @@ readme-as-seen-on-github.html: README.adoc docinfo.html ../data/VERSION.txt exam
asciidoctor -b xhtml -a env-github -a lbversion="$$(cat ../data/VERSION.txt)" $< -o $@


demo_data/example_documents.json: $(examples) demo_data/example_labels.json demo_data/make_example_docs.py
demo_data/example_documents.json: $(demo) demo_data/example_labels.json demo_data/make_example_docs.py
python3 demo_data/make_example_docs.py

demo_data/example_documents_for_website.json: $(demo) demo_data/make_example_docs.py
python3 demo_data/make_example_docs.py

example_data/wiki_extracts_documents.json: demo_data/example_documents_for_website.json
cp $< $@

example_data/pos_labels.json: demo_data/example_labels.json
cp $< $@

clean:
rm -f documentation.html index.html installation.html manpage.html \
demo_data/example_documents.json
demo_data/example_documents.json demo_data/example_documents_for_website.json \
labelbuddy.1 readme-as-seen-on-github.html \
screenshots.html example_data/*.xml example_data/*.json \
example_data/*.jsonl example_data/*.csv example_data/*.txt example_data/*.html \
schema/*.html
rm -rf gh-pages-site
19 changes: 11 additions & 8 deletions docs/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -62,17 +62,17 @@ endif::[]
== Introduction

{lb} is an {lblicense}[open-source] desktop Graphical User Interface (GUI) application for annotating documents.
It can be used for example for Named Entity Recognition, Part Of Speech tagging, sentiment analysis and document classification ...
It can be used for example for Named Entity Recognition, sentiment analysis, document classification, etc.

It aims to be easy to {downloadspage}[install] and use, and can efficiently handle many documents and annotations.
It is easy to {downloadspage}[install] and use, and can efficiently handle many documents and annotations.

=== labelbuddy compared to other annotation tools

There exist several tools for annotating documents.
Most of them, such as https://doccano.github.io/doccano/[doccano] and https://labelstud.io/[labelstudio], are meant to run on a web server and be used online.
If you are crowdsourcing annotations and want users to annotate documents online you should turn to one of these tools.

However if you do not plan to host such a tool on a server, it may not be convenient for each annotator to install one of these rather complex programs and run a local server and database management system on their own machine in order to annotate documents.
However if you do not plan to host such a tool on a server, it may not be convenient for each annotator to install a rather complex program and run a local server and database management system on their own machine.
In this case, it may be easier to rely on a desktop application such as {lb}, which is a more lightweight solution.

A <<managing-projects,labelbuddy database>> is just an ordinary file that you can copy, share or delete like any other file.
Expand Down Expand Up @@ -131,6 +131,7 @@ Documents, labels and annotations that are already in the database are skipped i
=== Importing documents (and annotations)

In the {ietab}, click btn:[Import docs & annotations] and select a file containing the documents you plan to annotate.
If you want try this before creating your own, you can download example documents from the link:{examples-url}/[{lb} website].

When importing a new document into {lb}, several attributes can be specified:

Expand Down Expand Up @@ -166,12 +167,14 @@ labelbuddy mydatabase.labelbuddy --import-docs mydocs.jsonl
=== Importing or creating labels

To import labels, click btn:[Import labels] in the {ietab} and select a file.
If you want try this before creating your own labels, you can download example labels from the link:{examples-url}/[{lb} website].

Labels have three attributes: a mandatory `text` (label name), and an optional `color` and `shortcut_key`.
The `shortcut_key` is a lower-case ASCII letter (a-z) that helps quickly <<annotating-documents,annotating text>> with that label.
The `shortcut_key` is a lower-case letter (a-z) that helps quickly <<annotating-documents,annotating text>> with that label.

As for documents, the format is deduced from the filename extension when importing labels, and is described in <<formats-sec>>.

It is also possible to manually enter a new label or to change the labels' color and shortcut key from within the GUI application, in the {dstab}.
It is also possible to manually enter a new label or to change the labels`' color and shortcut key from within the GUI application, in the {dstab}.

Labels can also be imported using the <<command-line-interface,command line>>, for example:

Expand Down Expand Up @@ -445,7 +448,7 @@ It is possible to specify these options several times.
To use these options, the database path must be provided explicitly.

Labels are imported first, then documents, then export operations are performed.
Therefore it is possible to export documents and then export them in one execution of {lb}.
Therefore it is possible to import documents and then export them in one execution of {lb}.
As an example, to strip the annotations from previously exported documents you could run:
[source,sh]
----
Expand Down Expand Up @@ -1050,8 +1053,8 @@ ifdef::env-github[]
<text>Word</text>
</label>
<label>
<shortcut_key>n</shortcut_key>
<text>Number</text>
<shortcut_key>n</shortcut_key>
<color>orange</color>
</label>
</label_set>
Expand Down Expand Up @@ -1157,7 +1160,7 @@ endif::[]
[#compatibility-with-doccano]
=== Compatibility with doccano
Labels exported from {lb} in the JSON format can be imported into {doca}.
Labels exported from {doca} and saved in a `.json` file can be imported into {lb}
Labels exported from {doca} and saved in a `.json` file can be imported into {lb}.

Documents and annotations exported from {doca} can also be imported into a {lb} database.
To do so, when exporting from {doca} select the format "`jsonl (text label)`".
Expand Down
2 changes: 1 addition & 1 deletion docs/changelog
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ labelbuddy (0.0.7) unstable; urgency=medium
* CSV formats
* Attaching extra data to annotations

-- jerome dockes <[email protected]> Fri, 26 Mar 2021 13:17:19 -0400
-- jerome dockes <[email protected]> Sun, 23 May 2021 01:25:41 -0400

labelbuddy (0.0.6) unstable; urgency=medium

Expand Down
1 change: 1 addition & 0 deletions docs/demo_data/example_documents_for_website.json

Large diffs are not rendered by default.

38 changes: 27 additions & 11 deletions docs/demo_data/make_example_docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,12 +23,14 @@ def get_annotations(doc, annotations):
example_dir = Path(__file__).parent
doc_file_names = [
"hello_annotations.txt",
"wikipedia_language_en.txt",
"wikipedia_language_ar.txt",
"wikipedia_language_el.txt",
"wikipedia_language_zh.txt",
]

all_docs = []
demo_docs = []
website_docs = []

for doc_name in doc_file_names:
doc = example_dir.joinpath(doc_name).read_text(encoding="utf-8")
Expand Down Expand Up @@ -62,13 +64,27 @@ def get_annotations(doc, annotations):
"title": doc_name,
"md5": hashlib.md5(body.encode("utf-8")).hexdigest(),
}
all_docs.append(
{
"text": body,
"meta": meta,
"short_title": title,
"long_title": long_title,
"labels": annotations,
}
)
example_dir.joinpath("example_documents.json").write_text(json.dumps(all_docs))
if doc_name != "wikipedia_language_en.txt":
demo_docs.append(
{
"text": body,
"meta": meta,
"short_title": title,
"long_title": long_title,
"labels": annotations,
}
)
if doc_name != "hello_annotations.txt":
website_docs.append(
{
"text": body,
"meta": {"source": "https://en.wikipedia.org"},
"short_title": title,
"long_title": long_title,
}
)

example_dir.joinpath("example_documents.json").write_text(json.dumps(demo_docs))
example_dir.joinpath("example_documents_for_website.json").write_text(
json.dumps(website_docs)
)
13 changes: 13 additions & 0 deletions docs/demo_data/wikipedia_language_en.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
<a href="https://en.wikipedia.org/wiki/Language">Language</a>
Extract from Wikipedia:
https://en.wikipedia.org/wiki/Language

A language is a structured system of communication used by humans, including speech (spoken language), gestures (sign language) and writing. The most widely-spoken languages have writing systems of glyphs that enable sounds or gestures to be inscribed for later reactivation.

The scientific study of language is called linguistics. Critical examinations of languages, such as philosophy of language, the relationships between language and thought, etc, such as how words represent experience, have been debated at least since Gorgias and Plato in ancient Greek civilization. Thinkers such as Rousseau (1712 – 1778) have debated that language originated from emotions, while others like Kant (1724 –1804), have held that languages originated from rational and logical thought. Twentieth century philosophers such as Wittgenstein (1889 – 1951) argued that philosophy is really the study of language itself. Major figures in contemporary linguistics of these times include Ferdinand de Saussure and Noam Chomsky.

Estimates of the number of human languages in the world vary between 5,000 and 7,000. However, any precise estimate depends on the arbitrary distinction (dichotomy) between languages and dialect. Natural languages are spoken or signed, but any language can be encoded into secondary media using auditory, visual, or tactile stimuli – for example, in writing, whistling, signing, or braille. In other words, human language is modality-independent, but written or signed language is the way to inscribe or encode the natural human speech or gestures. Depending on philosophical perspectives regarding the definition of language and meaning, when used as a general concept, "language" may refer to the cognitive ability to learn and use systems of complex communication, or to describe the set of rules that makes up these systems, or the set of utterances that can be produced from those rules. All languages rely on the process of semiosis to relate signs to particular meanings. Oral, manual and tactile languages contain a phonological system that governs how symbols are used to form sequences known as words or morphemes, and a syntactic system that governs how words and morphemes are combined to form phrases and utterances.

Human language has the properties of productivity and displacement, and relies on social convention and learning. Its complex structure affords a much wider range of expressions than any known system of animal communication. Language is thought to have originated when early hominins started gradually changing their primate communication systems, acquiring the ability to form a theory of other minds and a shared intentionality. This development is sometimes thought to have coincided with an increase in brain volume, and many linguists see the structures of language as having evolved to serve specific communicative and social functions. Language is processed in many different locations in the human brain, but especially in Broca's and Wernicke's areas. Humans acquire language through social interaction in early childhood, and children generally speak fluently by approximately three years old. The use of language is deeply entrenched in human culture. Therefore, in addition to its strictly communicative uses, language also has many social and cultural uses, such as signifying group identity, social stratification, as well as social grooming and entertainment.

Languages evolve and diversify over time, and the history of their evolution can be reconstructed by comparing modern languages to determine which traits their ancestral languages must have had in order for the later developmental stages to occur. A group of languages that descend from a common ancestor is known as a language family; in contrast, a language that has been demonstrated to not have any living or non-living relationship with another language is called a language isolate. There are also many unclassified languages whose relationships have not been established, and spurious languages may have not existed at all. Academic consensus holds that between 50% and 90% of languages spoken at the beginning of the 21st century will probably have become extinct by the year 2100.
16 changes: 14 additions & 2 deletions docs/example_data/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,19 @@ The files whose name contains `exported` have been exported from {lb}.
The import and export formats are the same, which means that these exported files can also be imported into {lb}.
All the formats are described in <<../documentation.adoc#,the documentation>>.

== Documents and annotations
== Demo documents and labels

These are copies (in `json` format) of some of the documents and labels used to fill the demo database when you start {lb} with the `--demo` option or select menu:File[Demo] in the GUI.
The documents are short extracts from https://en.wikipedia.org[Wikipedia].

- link:wiki_extracts_documents.json[documents]
- link:pos_labels.json[labels]

== Toy examples in all formats

These are copies of the inline examples shown in <<../documentation.adoc#,the documentation>>.

=== Documents and annotations

The files whose name starts with `exported` contain annotations in addition to the document text and attributes.

Expand All @@ -26,7 +38,7 @@ The files whose name starts with `exported` contain annotations in addition to t
- link:exported_docs.csv[`exported_docs.csv`]
- link:docs.txt[`docs.txt`]

== Labels
=== Labels

- link:labels.json[`labels.json`]
- link:exported_labels.json[`exported_labels.json`]
Expand Down
2 changes: 1 addition & 1 deletion docs/example_data/labels.xml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
<text>Word</text>
</label>
<label>
<shortcut_key>n</shortcut_key>
<text>Number</text>
<shortcut_key>n</shortcut_key>
<color>orange</color>
</label>
</label_set>
7 changes: 4 additions & 3 deletions docs/example_data/make_examples.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,9 +96,10 @@
xml_labels = etree.Element("label_set")
for label in all_labels:
label_elem = etree.SubElement(xml_labels, "label")
for key in {"text", "color", "shortcut_key"}.intersection(label.keys()):
elem = etree.SubElement(label_elem, key)
elem.text = label[key]
for key in ("text", "shortcut_key", "color"):
if key in label:
elem = etree.SubElement(label_elem, key)
elem.text = label[key]
data_dir.joinpath("labels.xml").write_bytes(
etree.tostring(
xml_labels, encoding="utf-8", xml_declaration=True, pretty_print=True
Expand Down
37 changes: 37 additions & 0 deletions docs/example_data/pos_labels.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
[
{
"text": "Word",
"shortcut_key": "e",
"color": "#aec7e8"
},
{
"text": "Mot",
"shortcut_key": "f",
"color": "#ffbb78"
},
{
"text": "Palavra",
"shortcut_key": "p",
"color": "#98df8a"
},
{
"text": "\u0643\u0644\u0645\u0629",
"shortcut_key": "a",
"color": "#ff9896"
},
{
"text": "\u8a5e",
"shortcut_key": "z",
"color": "#c5b0d5"
},
{
"text": "In progress",
"shortcut_key": "n",
"color": "#f7b6d2"
},
{
"text": "Complete",
"shortcut_key": "y",
"color": "#9edae5"
}
]
1 change: 1 addition & 0 deletions docs/example_data/wiki_extracts_documents.json

Large diffs are not rendered by default.

3 changes: 1 addition & 2 deletions docs/extended-description
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
This is an application for annotating parts of documents with labels.
labelbuddy can be used for Part Of Speech tagging,
Named Entity Recognition,
labelbuddy can be used for Named Entity Recognition,
sentiment analysis and document classification, etc.
It depends on Qt5.
2 changes: 1 addition & 1 deletion docs/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Jérôme Dockès <[email protected]>
:doca: pass:q[*doccano*]


{lb} is a Graphical User Interface tool for annotating documents -- for example for Named Entity Recognition or Part Of Speech tagging.
{lb} is a Graphical User Interface tool for annotating documents -- for example for Named Entity Recognition or document classification.

To get started, <<installation.adoc#,install>> {lb}, and have a look at the <<documentation.adoc#,documentation>>.
You can also see some <<screenshots.adoc#,screenshots>>.
Expand Down
7 changes: 6 additions & 1 deletion docs/installation.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,12 @@ cmake /path/to/labelbuddy
cmake --build .
....

It can also be built with https://doc.qt.io/qt-5/qmake-manual.html[qmake]:
Then {lb} can (optionally) be installed with:
....
sudo make install
....

{lb} can also be built with https://doc.qt.io/qt-5/qmake-manual.html[qmake]:
....
qmake /path/to/labelbuddy
make
Expand Down
Binary file modified docs/screenshots/annotate.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/screenshots/dataset.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/screenshots/import_export.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 3f5c4e6

Please sign in to comment.