minor changes to documentation

jeromedockes · May 24, 2021 · 3f5c4e6 · 3f5c4e6
1 parent 08c9156
commit 3f5c4e6
Show file tree

Hide file tree

Showing 20 changed files with 152 additions and 40 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,6 +1,7 @@
 doxygen_output/
 docs/*.html
 docs/example_data/*.html
+docs/schema/*.html
 !docs/docinfo.html
 docs/gh-pages-site/
 

diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -84,18 +84,27 @@ set(CPACK_RESOURCE_FILE_LICENSE "${CMAKE_SOURCE_DIR}/LICENSE.txt")
 set(CPACK_SOURCE_GENERATOR "TGZ")
 set(CPACK_SOURCE_IGNORE_FILES
   [.]git/
+  [.]gitignore
+  [.]gitattributes
   cmake_build/
+  cmake_release_build/
   qmake_build/
-  Makefile
   test/labelbuddy_tests
   test_build
   test_cli/data/newsgroups/
   test_cli/__pycache__/
+  [.]pytest_cache/
   test_cli/.*[.]pyc
   doxygen_output/
   docs/gh-pages-site/
-  docs/screenshots/
-  docs/README[.]rst
+  docs/screenshots/.*[.]png
+  docs/index[.]html
+  docs/installation[.]html
+  docs/manpage[.]html
+  docs/readme-as-seen-on-github[.]html
+  docs/screenshots[.]html
+  docs/example_data/index.html
+  docs/schema/index.html
   moc_.*
   src/moc_.*
   test/moc_.*
@@ -116,7 +125,7 @@ set(CPACK_SOURCE_IGNORE_FILES
   .*[.]smod
   .*[.]lai
   .*[.]la
-  .*[.]a
+  .*[.]a$
   .*[.]lib
   .*[.]exe
   .*[.]out

diff --git a/docs/Description b/docs/Description
@@ -1,6 +1,5 @@
 GUI tool for annotating documents
  This is an application for annotating parts of documents with labels.
- labelbuddy can be used for Part Of Speech tagging,
- Named Entity Recognition,
+ labelbuddy can be used for Named Entity Recognition,
  sentiment analysis and document classification, etc.
  It depends on Qt5.
diff --git a/docs/Makefile b/docs/Makefile
@@ -4,6 +4,8 @@ html += readme-as-seen-on-github.html
 webhtml := $(patsubst %,gh-pages-site/%,$(html))
 examples :=$(filter-out %.py %annotations.json, $(wildcard example_data/*))
 examples := $(patsubst %.adoc,%.html,$(examples))
+examples += example_data/wiki_extracts_documents.json
+examples += example_data/pos_labels.json
 webexamples := $(patsubst %,gh-pages-site/%, $(examples))
 schemas := $(wildcard schema/*)
 schemas := $(patsubst %.adoc,%.html,$(schemas))
@@ -13,7 +15,7 @@ screenshots := $(wildcard screenshots/*.png)
 
 .PHONY: all clean web
 
-all: $(html) web labelbuddy.1 demo_data/example_documents.json
+all: $(html) labelbuddy.1 demo_data/example_documents.json $(examples) $(schemas)
 
 web: $(webhtml) $(webexamples) $(webschemas)
 
@@ -39,7 +41,7 @@ gh-pages-site/screenshots.html: screenshots.html $(screenshots)
 	mkdir -p gh-pages-site
 	xsltproc add_nav.xsl $< > $@
 	mkdir -p gh-pages-site/screenshots
-	cp screenshots/* gh-pages-site/screenshots/
+	cp screenshots/* gh-pages-site/screenshots/ 2>/dev/null || :
 
 example_data/exported_docs.json: example_data/make_examples.py
 	$<
@@ -51,10 +53,23 @@ readme-as-seen-on-github.html: README.adoc docinfo.html ../data/VERSION.txt exam
 	asciidoctor -b xhtml -a env-github -a lbversion="$$(cat ../data/VERSION.txt)" $< -o $@
 
 
-demo_data/example_documents.json: $(examples) demo_data/example_labels.json demo_data/make_example_docs.py
+demo_data/example_documents.json: $(demo) demo_data/example_labels.json demo_data/make_example_docs.py
 	python3 demo_data/make_example_docs.py
 
+demo_data/example_documents_for_website.json: $(demo) demo_data/make_example_docs.py
+	python3 demo_data/make_example_docs.py
+
+example_data/wiki_extracts_documents.json: demo_data/example_documents_for_website.json
+	cp $< $@
+
+example_data/pos_labels.json: demo_data/example_labels.json
+	cp $< $@
+
 clean:
 	rm -f documentation.html  index.html  installation.html  manpage.html \
-    demo_data/example_documents.json
+    demo_data/example_documents.json demo_data/example_documents_for_website.json \
+    labelbuddy.1 readme-as-seen-on-github.html \
+    screenshots.html example_data/*.xml example_data/*.json \
+    example_data/*.jsonl example_data/*.csv example_data/*.txt example_data/*.html \
+    schema/*.html
 	rm -rf gh-pages-site
diff --git a/docs/README.adoc b/docs/README.adoc
@@ -62,17 +62,17 @@ endif::[]
 == Introduction
 
 {lb} is an {lblicense}[open-source] desktop Graphical User Interface (GUI) application for annotating documents.
-It can be used for example for Named Entity Recognition, Part Of Speech tagging, sentiment analysis and document classification ...
+It can be used for example for Named Entity Recognition, sentiment analysis, document classification, etc.
 
-It aims to be easy to {downloadspage}[install] and use, and can efficiently handle many documents and annotations.
+It is easy to {downloadspage}[install] and use, and can efficiently handle many documents and annotations.
 
 === labelbuddy compared to other annotation tools
 
 There exist several tools for annotating documents.
 Most of them, such as https://doccano.github.io/doccano/[doccano] and https://labelstud.io/[labelstudio], are meant to run on a web server and be used online.
 If you are crowdsourcing annotations and want users to annotate documents online you should turn to one of these tools.
 
-However if you do not plan to host such a tool on a server, it may not be convenient for each annotator to install one of these rather complex programs and run a local server and database management system on their own machine in order to annotate documents.
+However if you do not plan to host such a tool on a server, it may not be convenient for each annotator to install a rather complex program and run a local server and database management system on their own machine.
 In this case, it may be easier to rely on a desktop application such as {lb}, which is a more lightweight solution.
 
 A <<managing-projects,labelbuddy database>> is just an ordinary file that you can copy, share or delete like any other file.
@@ -131,6 +131,7 @@ Documents, labels and annotations that are already in the database are skipped i
 === Importing documents (and annotations)
 
 In the {ietab}, click btn:[Import docs & annotations] and select a file containing the documents you plan to annotate.
+If you want try this before creating your own, you can download example documents from the link:{examples-url}/[{lb} website].
 
 When importing a new document into {lb}, several attributes can be specified:
 
@@ -166,12 +167,14 @@ labelbuddy mydatabase.labelbuddy --import-docs mydocs.jsonl
 === Importing or creating labels
 
 To import labels, click btn:[Import labels] in the {ietab} and select a file.
+If you want try this before creating your own labels, you can download example labels from the link:{examples-url}/[{lb} website].
+
 Labels have three attributes: a mandatory `text` (label name), and an optional `color` and `shortcut_key`.
-The `shortcut_key` is a lower-case ASCII letter (a-z) that helps quickly <<annotating-documents,annotating text>> with that label.
+The `shortcut_key` is a lower-case letter (a-z) that helps quickly <<annotating-documents,annotating text>> with that label.
 
 As for documents, the format is deduced from the filename extension when importing labels, and is described in <<formats-sec>>.
 
-It is also possible to manually enter a new label or to change the labels' color and shortcut key from within the GUI application, in the {dstab}.
+It is also possible to manually enter a new label or to change the labels`' color and shortcut key from within the GUI application, in the {dstab}.
 
 Labels can also be imported using the <<command-line-interface,command line>>, for example:
 
@@ -445,7 +448,7 @@ It is possible to specify these options several times.
 To use these options, the database path must be provided explicitly.
 
 Labels are imported first, then documents, then export operations are performed.
-Therefore it is possible to export documents and then export them in one execution of {lb}.
+Therefore it is possible to import documents and then export them in one execution of {lb}.
 As an example, to strip the annotations from previously exported documents you could run:
 [source,sh]
 ----
@@ -1050,8 +1053,8 @@ ifdef::env-github[]
     <text>Word</text>
   </label>
   <label>
-    <shortcut_key>n</shortcut_key>
     <text>Number</text>
+    <shortcut_key>n</shortcut_key>
     <color>orange</color>
   </label>
 </label_set>
@@ -1157,7 +1160,7 @@ endif::[]
 [#compatibility-with-doccano]
 === Compatibility with doccano
 Labels exported from {lb} in the JSON format can be imported into {doca}.
-Labels exported from {doca} and saved in a `.json` file can be imported into {lb}
+Labels exported from {doca} and saved in a `.json` file can be imported into {lb}.
 
 Documents and annotations exported from {doca} can also be imported into a {lb} database.
 To do so, when exporting from {doca} select the format "`jsonl (text label)`".

diff --git a/docs/changelog b/docs/changelog
@@ -4,7 +4,7 @@ labelbuddy (0.0.7) unstable; urgency=medium
   * CSV formats
   * Attaching extra data to annotations
 
- -- jerome dockes <[email protected]>  Fri, 26 Mar 2021 13:17:19 -0400
+ -- jerome dockes <[email protected]>  Sun, 23 May 2021 01:25:41 -0400
 
 labelbuddy (0.0.6) unstable; urgency=medium
 

diff --git a/docs/demo_data/example_documents_for_website.json b/docs/demo_data/example_documents_for_website.json
diff --git a/docs/demo_data/make_example_docs.py b/docs/demo_data/make_example_docs.py
@@ -23,12 +23,14 @@ def get_annotations(doc, annotations):
 example_dir = Path(__file__).parent
 doc_file_names = [
     "hello_annotations.txt",
+    "wikipedia_language_en.txt",
     "wikipedia_language_ar.txt",
     "wikipedia_language_el.txt",
     "wikipedia_language_zh.txt",
 ]
 
-all_docs = []
+demo_docs = []
+website_docs = []
 
 for doc_name in doc_file_names:
     doc = example_dir.joinpath(doc_name).read_text(encoding="utf-8")
@@ -62,13 +64,27 @@ def get_annotations(doc, annotations):
         "title": doc_name,
         "md5": hashlib.md5(body.encode("utf-8")).hexdigest(),
     }
-    all_docs.append(
-        {
-            "text": body,
-            "meta": meta,
-            "short_title": title,
-            "long_title": long_title,
-            "labels": annotations,
-        }
-    )
-example_dir.joinpath("example_documents.json").write_text(json.dumps(all_docs))
+    if doc_name != "wikipedia_language_en.txt":
+        demo_docs.append(
+            {
+                "text": body,
+                "meta": meta,
+                "short_title": title,
+                "long_title": long_title,
+                "labels": annotations,
+            }
+        )
+    if doc_name != "hello_annotations.txt":
+        website_docs.append(
+            {
+                "text": body,
+                "meta": {"source": "https://en.wikipedia.org"},
+                "short_title": title,
+                "long_title": long_title,
+            }
+        )
+
+example_dir.joinpath("example_documents.json").write_text(json.dumps(demo_docs))
+example_dir.joinpath("example_documents_for_website.json").write_text(
+    json.dumps(website_docs)
+)
diff --git a/docs/demo_data/wikipedia_language_en.txt b/docs/demo_data/wikipedia_language_en.txt
@@ -0,0 +1,13 @@
+<a href="https://en.wikipedia.org/wiki/Language">Language</a>
+Extract from Wikipedia:
+https://en.wikipedia.org/wiki/Language
+
+A language is a structured system of communication used by humans, including speech (spoken language), gestures (sign language) and writing. The most widely-spoken languages have writing systems of glyphs that enable sounds or gestures to be inscribed for later reactivation.
+
+The scientific study of language is called linguistics. Critical examinations of languages, such as philosophy of language, the relationships between language and thought, etc, such as how words represent experience, have been debated at least since Gorgias and Plato in ancient Greek civilization. Thinkers such as Rousseau (1712 – 1778) have debated that language originated from emotions, while others like Kant (1724 –1804), have held that languages originated from rational and logical thought. Twentieth century philosophers such as Wittgenstein (1889 – 1951) argued that philosophy is really the study of language itself. Major figures in contemporary linguistics of these times include Ferdinand de Saussure and Noam Chomsky.
+
+Estimates of the number of human languages in the world vary between 5,000 and 7,000. However, any precise estimate depends on the arbitrary distinction (dichotomy) between languages and dialect. Natural languages are spoken or signed, but any language can be encoded into secondary media using auditory, visual, or tactile stimuli – for example, in writing, whistling, signing, or braille. In other words, human language is modality-independent, but written or signed language is the way to inscribe or encode the natural human speech or gestures. Depending on philosophical perspectives regarding the definition of language and meaning, when used as a general concept, "language" may refer to the cognitive ability to learn and use systems of complex communication, or to describe the set of rules that makes up these systems, or the set of utterances that can be produced from those rules. All languages rely on the process of semiosis to relate signs to particular meanings. Oral, manual and tactile languages contain a phonological system that governs how symbols are used to form sequences known as words or morphemes, and a syntactic system that governs how words and morphemes are combined to form phrases and utterances.
+
+Human language has the properties of productivity and displacement, and relies on social convention and learning. Its complex structure affords a much wider range of expressions than any known system of animal communication. Language is thought to have originated when early hominins started gradually changing their primate communication systems, acquiring the ability to form a theory of other minds and a shared intentionality. This development is sometimes thought to have coincided with an increase in brain volume, and many linguists see the structures of language as having evolved to serve specific communicative and social functions. Language is processed in many different locations in the human brain, but especially in Broca's and Wernicke's areas. Humans acquire language through social interaction in early childhood, and children generally speak fluently by approximately three years old. The use of language is deeply entrenched in human culture. Therefore, in addition to its strictly communicative uses, language also has many social and cultural uses, such as signifying group identity, social stratification, as well as social grooming and entertainment.
+
+Languages evolve and diversify over time, and the history of their evolution can be reconstructed by comparing modern languages to determine which traits their ancestral languages must have had in order for the later developmental stages to occur. A group of languages that descend from a common ancestor is known as a language family; in contrast, a language that has been demonstrated to not have any living or non-living relationship with another language is called a language isolate. There are also many unclassified languages whose relationships have not been established, and spurious languages may have not existed at all. Academic consensus holds that between 50% and 90% of languages spoken at the beginning of the 21st century will probably have become extinct by the year 2100.
diff --git a/docs/example_data/index.adoc b/docs/example_data/index.adoc
@@ -12,7 +12,19 @@ The files whose name contains `exported` have been exported from {lb}.
 The import and export formats are the same, which means that these exported files can also be imported into {lb}.
 All the formats are described in <<../documentation.adoc#,the documentation>>.
 
-== Documents and annotations
+== Demo documents and labels
+
+These are copies (in `json` format) of some of the documents and labels used to fill the demo database when you start {lb} with the `--demo` option or select menu:File[Demo] in the GUI.
+The documents are short extracts from https://en.wikipedia.org[Wikipedia].
+
+- link:wiki_extracts_documents.json[documents]
+- link:pos_labels.json[labels]
+
+== Toy examples in all formats
+
+These are copies of the inline examples shown in <<../documentation.adoc#,the documentation>>.
+
+=== Documents and annotations
 
 The files whose name starts with `exported` contain annotations in addition to the document text and attributes.
 
@@ -26,7 +38,7 @@ The files whose name starts with `exported` contain annotations in addition to t
 - link:exported_docs.csv[`exported_docs.csv`]
 - link:docs.txt[`docs.txt`]
 
-== Labels
+=== Labels
 
 - link:labels.json[`labels.json`]
 - link:exported_labels.json[`exported_labels.json`]

diff --git a/docs/example_data/labels.xml b/docs/example_data/labels.xml
@@ -4,8 +4,8 @@
     <text>Word</text>
   </label>
   <label>
-    <shortcut_key>n</shortcut_key>
     <text>Number</text>
+    <shortcut_key>n</shortcut_key>
     <color>orange</color>
   </label>
 </label_set>
diff --git a/docs/example_data/make_examples.py b/docs/example_data/make_examples.py
@@ -96,9 +96,10 @@
 xml_labels = etree.Element("label_set")
 for label in all_labels:
     label_elem = etree.SubElement(xml_labels, "label")
-    for key in {"text", "color", "shortcut_key"}.intersection(label.keys()):
-        elem = etree.SubElement(label_elem, key)
-        elem.text = label[key]
+    for key in ("text", "shortcut_key", "color"):
+        if key in label:
+            elem = etree.SubElement(label_elem, key)
+            elem.text = label[key]
 data_dir.joinpath("labels.xml").write_bytes(
     etree.tostring(
         xml_labels, encoding="utf-8", xml_declaration=True, pretty_print=True

diff --git a/docs/example_data/pos_labels.json b/docs/example_data/pos_labels.json
@@ -0,0 +1,37 @@
+[
+  {
+    "text": "Word",
+    "shortcut_key": "e",
+    "color": "#aec7e8"
+  },
+  {
+    "text": "Mot",
+    "shortcut_key": "f",
+    "color": "#ffbb78"
+  },
+  {
+    "text": "Palavra",
+    "shortcut_key": "p",
+    "color": "#98df8a"
+  },
+  {
+    "text": "\u0643\u0644\u0645\u0629",
+    "shortcut_key": "a",
+    "color": "#ff9896"
+  },
+  {
+    "text": "\u8a5e",
+    "shortcut_key": "z",
+    "color": "#c5b0d5"
+  },
+  {
+    "text": "In progress",
+    "shortcut_key": "n",
+    "color": "#f7b6d2"
+  },
+  {
+    "text": "Complete",
+    "shortcut_key": "y",
+    "color": "#9edae5"
+  }
+]
diff --git a/docs/example_data/wiki_extracts_documents.json b/docs/example_data/wiki_extracts_documents.json
diff --git a/docs/extended-description b/docs/extended-description
@@ -1,5 +1,4 @@
 This is an application for annotating parts of documents with labels.
-labelbuddy can be used for Part Of Speech tagging,
-Named Entity Recognition,
+labelbuddy can be used for Named Entity Recognition,
 sentiment analysis and document classification, etc.
 It depends on Qt5.
diff --git a/docs/index.adoc b/docs/index.adoc
@@ -19,7 +19,7 @@ Jérôme Dockès <[email protected]>
 :doca: pass:q[*doccano*]
 
 
-{lb} is a Graphical User Interface tool for annotating documents -- for example for Named Entity Recognition or Part Of Speech tagging.
+{lb} is a Graphical User Interface tool for annotating documents -- for example for Named Entity Recognition or document classification.
 
 To get started, <<installation.adoc#,install>> {lb}, and have a look at the <<documentation.adoc#,documentation>>.
 You can also see some <<screenshots.adoc#,screenshots>>.

diff --git a/docs/installation.adoc b/docs/installation.adoc
@@ -88,7 +88,12 @@ cmake /path/to/labelbuddy
 cmake --build .
 ....
 
-It can also be built with https://doc.qt.io/qt-5/qmake-manual.html[qmake]:
+Then {lb} can (optionally) be installed with:
+....
+sudo make install
+....
+
+{lb} can also be built with https://doc.qt.io/qt-5/qmake-manual.html[qmake]:
 ....
 qmake /path/to/labelbuddy
 make

diff --git a/docs/screenshots/annotate.png b/docs/screenshots/annotate.png
diff --git a/docs/screenshots/dataset.png b/docs/screenshots/dataset.png
diff --git a/docs/screenshots/import_export.png b/docs/screenshots/import_export.png