Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OWConcordance: new output #371

Merged
merged 3 commits into from
Sep 11, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions doc/widgets/concordance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,16 @@ Signals

- **Selected Documents**

A :ref:`Corpus` instance.
Documents containing the queried word.

- **Concordances**

A table of concordances.

Description
-----------

**Concordance** finds the queried word in a text and displays the context in which this word is used. It can output selected documents for further analysis.
**Concordance** finds the queried word in a text and displays the context in which this word is used. Results in a single color come from the same document. The widget can output selected documents for further analysis or a table of concordances for the queried word. Note that the widget finds only exact matches of a word, which means that if you query the word 'do', the word 'doctor' won't appear in the results.

.. figure:: images/Concordance-stamped.png

Expand All @@ -37,11 +41,17 @@ Description
3. Queried word.
4. If *Auto commit is on*, selected documents are communicated automatically. Alternatively press *Commit*.

Example
-------
Examples
--------

*Concordance* can be used for displaying word contexts in a corpus. First, we load *book-excerpts.tab* in :doc:`Corpus <corpus>`. Then we connect **Corpus** to **Concordances** and search for concordances of a word "doctor". The widget displays all documents containing the word "doctor" together with their surrounding (contextual) words. Note that the widget finds only exact matches of a word.
*Concordance* can be used for displaying word contexts in a corpus. First, we load *book-excerpts.tab* in :doc:`Corpus <corpus>`. Then we connect **Corpus** to **Concordance** and search for concordances of a word "doctor". The widget displays all documents containing the word "doctor" together with their surrounding (contextual) words.

Now we can select those documents that contain interesting contexts and output them to :doc:`Corpus Viewer <corpusviewer>` to inspect them further.

.. figure:: images/Concordance-Example.png
.. figure:: images/Concordance-Example1.png

In the second example, we will output concordances instead. We will keep the *book-excerpts.tab* in :doc:`Corpus <corpus>` and the connection to **Concordance**. Our queried word remain "doctor".

This time, we will connect **Data Table** to **Concordance** and select Concordances output instead. In the **Data Table**, we get a list of concordances for the queried word and the corresponding documents. Now, we will save this table with **Save Data** widget, so we can use it in other projects or for further analysis.

.. figure:: images/Concordance-Example2.png
Binary file added doc/widgets/images/Concordance-Example2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
21 changes: 21 additions & 0 deletions orangecontrib/text/widgets/owconcordance.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
from typing import Optional

from itertools import chain
import numpy as np

from AnyQt.QtCore import Qt, QAbstractTableModel, QSize, QItemSelectionModel, \
QItemSelection, QModelIndex
from AnyQt.QtWidgets import QSizePolicy, QApplication, QTableView, \
QStyledItemDelegate
from AnyQt.QtGui import QColor
from Orange.data import Domain, StringVariable, Table

from Orange.widgets import gui
from Orange.widgets.settings import Setting, ContextSetting, PerfectDomainContextHandler
Expand Down Expand Up @@ -151,6 +154,21 @@ def matching_docs(self):
else:
return 0

def get_data(self):
domain = Domain([], metas=[StringVariable("Conc. {}".format(
self.word)), StringVariable("Document")])
data = []
docs = []
for row in range(self.rowCount()):
txt = []
for column in range(self.columnCount()):
index = self.index(row, column)
txt.append(str(self.data(index)))
data.append([" ".join(txt)])
docs.append([self.corpus.titles[self.word_index[row][0]]])
conc = np.array(np.hstack((data, docs)), dtype=object)
return Corpus(domain, metas=conc, text_features=[domain.metas[1]])


class OWConcordance(OWWidget):
name = "Concordance"
Expand All @@ -164,6 +182,7 @@ class Inputs:

class Outputs:
selected_documents = Output("Selected Documents", Corpus)
concordances = Output("Concordances", Corpus)

settingsHandler = PerfectDomainContextHandler(
match_values = PerfectDomainContextHandler.MATCH_VALUES_ALL
Expand Down Expand Up @@ -314,11 +333,13 @@ def update_widget(self):
def commit(self):
selected_docs = sorted(set(self.model.word_index[row][0]
for row in self.selected_rows))
concordance = self.model.get_data()
if selected_docs:
selected = self.corpus[selected_docs]
self.Outputs.selected_documents.send(selected)
else:
self.Outputs.selected_documents.send(None)
self.Outputs.concordances.send(concordance)

def send_report(self):
view = self.conc_view
Expand Down
6 changes: 6 additions & 0 deletions orangecontrib/text/widgets/tests/test_owconcordances.py
Original file line number Diff line number Diff line change
Expand Up @@ -137,6 +137,12 @@ def test_matching_docs(self):
model.set_corpus(self.corpus)
self.assertEqual(model.matching_docs(), 6)

def test_concordance_output(self):
model = ConcordanceModel()
model.set_word("of")
model.set_corpus(self.corpus)
self.assertEqual(len(model.get_data()), 7)


class TestConcordanceWidget(WidgetTest):
def setUp(self):
Expand Down