Introduce tagging to sort office documents #58

dcaliste · 2015-11-12T22:35:12Z

It's a long time now I was thinking about this PR, and I kept it in my branch for a while to test it extensively, also in term of implementation details.

The idea is to tag documents to be able to better tidy the list of PDFs and other office documents I'm storing on my phone. As usual, tags are one or several keywords, chosen by the user. Then the document list can be filtered to show only documents with a certain list of tags. Well, common feature indeed, but very convenient when one need to get the all electricity bills of last year for instance. I have currently 390 office documents and just filtering with the dates (because I don't remember exactly when the plumber passed) are not enough and names are often not convenient.

The list filtering is done by expanding the existing FilterModel. The tags for every documents are stored in the DocumentListModel. The DocumentListModel also stores a QAbstractListModel exposing to QML all known tags (taglistmodel.cpp). Mainly from the work of François Kubler in TJC, and with his agreement, I've added a tag selector QML page (TagsSelector.qml qml qnd Tqg.qml).

When the user changes tags to a document, the tags are synchronized in a separated thread with a backend (tagsthread.cpp, done in the same philosophy as pdfjob.cpp). Currently the backend to store and retrieve tags is a SQlite file, local to sailfish-office, but the actual thread implementation may be used to access a system-wide tag database if any one day.

In this first PR, one can change tags of one document at a time (long press on an item), but I've tried to think about the API in a sense that minor additions will allow to add or remove a tag from several files at a time: tag storage in memory is a QHash of QSet of filenames, indexed by tags, so adding or removing a bunch of files for a tag is in O(1).

What do you think about it sailors ? Please comment, I will try to answer or improve as I can, I would really like to have this PR in !

pvuorela · 2015-11-13T13:40:51Z

Showed this to our Vappu from our design and she'll look how this kind of functionality should be done throughout the operating system, having some common pattern. As is, the UI has some shortcomings, but I don't think it makes sense to proceed polishing until there's more feedback from design.

pvuorela · 2015-11-13T13:42:58Z

And thanks for the initial version already!

dcaliste · 2015-11-13T13:52:30Z

Thanks for reviewing this, I understand that it should be discussed with design. I agree that polishing can wait after design decisions of course.

I try to keep in mind the step toward system-widde tags when implementing this first, delegating tag storage in a dedicated thread backend for instance. The not so-well done is for the tag list that currently reside inside DocumentListModel (but already as an independant object), but it would not be too much work to put it out in a separate TagManager instance in QML for instance.

Anyway, thanks again for taking this into consideration.

pvuorela · 2015-11-13T14:21:39Z

Not area of my expertise, but thinking if Tracker should be used for tags as it's already file source and afaik offers such functionality. The dedicated thread now also feels a bit excessive, but didn't yet even look into details.

dcaliste · 2015-11-13T15:10:45Z

Yes, I've thought about tracker also to do the tag storage for me. In a first implementation, I decided not to go in that direction, because I don't know anything about tracker (bad excuse) and I wanted to have something functional in a reasonable among of time (I started thinking of this six months ago) to sort all the mess of my documents on my phone and also to test UI ideas, implementation ideas on a real basis…

For the thread stuff, I was afraid of the startup time when the system need to retrieve tags for my whole list of 390 documents. After, when accessing tags for one document (to add or remove), it's largely overkill, I admit it. But it also makes sure that the UI will still be reactive if we need to scale up in the future (having selected 50 documents and changing a bunch of tag at once).

dcaliste · 2016-02-12T21:58:10Z

I've pushed a commit to the PR that replaces the local SQL backend to store tags with a tracker backend. It is much better integrated to the system now and can share tags with other applications.

What do you think ? Does it improve the PR and give it a chance to go in ? If so, I'll change the commits to remove the first SQL backend (and the thread machinery since SparQL is already async) and review the code for coding rule mistakes.

…t returned by tracker.

dcaliste · 2016-03-11T08:12:59Z

May I push this up a bit ? What do you think of the last modification using Tracker tagging system instead of a homebrew local SQL ?

pvuorela · 2016-03-11T12:23:35Z

Sorry, this got pending for quite some time.

We had designer changed for Office and needed to start this discussion again. Unfortunately there seemed to be sentiment that tagging would be a bit oldish kind of approach for finding documents and generally search should be enhanced to better find stuff. Not sure how that plays along with this kind of files. Search possibly could use more meta data provided by files: title, keywords, author and such, though seems many files just have some placeholder values in practice.

Whether this means we won't do tag support, I don't yet know, but as of now seemed a bit hard to sell idea. User defined key words might be an alternative that's similar but more compatible with searching, but better discuss that a bit more before proceeding.

dcaliste · 2016-03-11T13:44:37Z

tagging would be a bit oldish kind of approach for finding documents and generally search should be enhanced to better find stuff

I fully agree with this statement. Ideally, I would like to type facture (bill in French) in the search entry (or have a meaningfull list of suggested keywords) and get all my PDF bills sorted out. But currently, there is no association between my PDF bills and the word to go with it (either done automagically by Tracker or manually).

Search possibly could use more meta data provided by files: title, keywords, author

Sadly, as you said, these metadata are not used by producers of documents (or badly used, tracker found tags in my PDF collection saying that this or that PDF was created by Java, great !).

User defined key words might be an alternative that's similar but more compatible with searching

What do you mean ? It's what I call tagging myself: I define tags (or keywords) and associate them with my documents (now using system wide tracker) so I can search then my document collection using these keywords.

To come back to the first quoted sentence and summarize where we can go to accomodate points of view:

there should be a miner that can extract significant keywords from a document (I think that the Poppler miner for tracker can do this). The difficult point being how we defined « significant ». Should it be a predefined set of words / names (need to be made for each language…) ? Should it be something else ?
there should be a way to search through the document collection by entering keywords or selecting them (it's part of what is implemented here in this PR afterall).
there should be a way for the user to teach the miner which keywords are significant which are not. For example by adding to the predefined keyword list the keywords added by the user. For example, I enter once that a given PDF is tagged « Enalp » (my electricity provider), and for each next PDF document containing the word Enalp, the tag will automatically be added by the miner.

Do you think it is a better way to go ? Because currently if I look for my gas bills, I need to remember that my provider decided to name them something like « 02032015-N°507508091490.pdf » which is nothing like convenient.

Besides, a first very simplified approach would be able to add a rename capability in the interface !

pvuorela · 2016-03-21T11:58:34Z

User defined key words might be an alternative that's similar but more compatible with searching

What do you mean ? It's what I call tagging myself: I define tags (or keywords) and associate them with my documents (now using system wide tracker) so I can search then my document collection using these keywords.

Was thinking of making it more free text type of property. UI not creating tag items that can be attached to files nor showing list of available ones when searching. Don't know if this would be something passing the design, either directly on file list or by allowing user to edit pdf meta data's keyword property.

there should be a miner that can extract significant keywords from a document (I think that the Poppler miner for tracker can do this). The difficult point being how we defined « significant ». Should it be a predefined set of words / names (need to be made for each language…) ? Should it be something else ?

I don't think we should be having any AI type of miner, deciding what is important for the user. Significant information at minimum would be the attached meta data: author, keywords, title and so on, even if they sometimes contain garbage (can think of filtering out a few most common ones if such exist and can be identified).

there should be a way to search through the document collection by entering keywords or selecting them (it's part of what is implemented here in this PR afterall).

I think this part should be the search field. No UI changes, just searching better.

there should be a way for the user to teach the miner which keywords are significant which are not. For example by adding to the predefined keyword list the keywords added by the user. For example, I enter once that a given PDF is tagged « Enalp » (my electricity provider), and for each next PDF document containing the word Enalp, the tag will automatically be added by the miner.

One way to teach, but sounds a bit technical. Will only work for documents that can be searched for specific strings. Having doubt if this will work out.

Btw. checked also the annotation branch and been talking with design. Looked quite nice! Martin started sketching how UI could be streamlined a bit. Talked about perhaps reusing the toolbar for annotation actions. Might attach some spec parts under this project once those are ready.

dcaliste · 2016-03-23T09:01:21Z

I think this part should be the search field. No UI changes, just searching better.

This is the key solution to conciliate our points of view, I think.

On the assumption that documents are correctly decorated by proper (quoting) "meta data: author, keywords, title and so on", typing words in the search field should filter the document list accordingly. I can easily change the PR to do this based on tracker results as implemented in the lastest commit of the PR. This would satisfy my use-case of filtering my documents to find my bills, train tickets…

Remains the hard part: how to defined these proper meta data assuming that document providers (train company…) are not setting them (often at all) ? In my opinion, internally this should be left to tracker to store these as tags (or keywords as you said, but the ontology use the tag name, whatever). What do you think should be a good UI for the user to set or change these keywords (or even other metadata) easily ? I was thinking that choosing from a list of already defined keywords (and having the possibility to add new) was a simple and efficient way. It seems that you disagree. What do you suggest ? Change the DetailsPage.qml for instance to be able to defined keywords there (and change other metadata as author) as free typing text ?

dcaliste · 2016-03-23T09:15:49Z

Martin started sketching how UI could be streamlined a bit.

Great news ! I'm happy with this. I'm waiting for these and in the mean time continue to test my different use-cases for annotation (I have to read articles and comment and modify them in my work). And also work on the partial rendering to avoid having to redraw all the page when an annotation is modified.

Talked about perhaps reusing the toolbar for annotation actions.

Yep, currenly, we can only support annotations that are related to a portion of text (through text selection), like highlight or comment. We're lacking the geometry annotations, in my opinion, that would enable us to draw arrows, circles… freely on the document. These are also usefull when commenting a document to explain some modifications we would like the authors to do.

Last word, if you have some ideas with design, on how to deal with document saving, that would be great. I'm not satisfied at all with how I deal with it currently (namely adding a pull menu entry to export and having a text field to type a new file name…). Ideally, I was thinking of something like that (but not implemented yet):

as soon as a modification is done on a document (new or modified annotation), the document is saved as annotated_<i>_<original filename>.pdf into the same directory that the original one.
as long as the doument is not discard in the UI, each modification is saved in this same annoted file. So like that, the file on disk is always in sync with the UI display.
when a (annotated) document is open and modified again, the <i> part of the filename is increased.
Like that, saving is transparent to the user and documents are always in sync with the disk. In the file list, there will be a bunch of annotated_… documents, but they are sorted out by date already so it's fine, older annotated document will be pushed further and won't clutter to much the list (hopefully).
What's your opinion ?

pvuorela · 2016-04-05T15:19:38Z

Last word, if you have some ideas with design, on how to deal with document saving, that would be great. I'm not satisfied at all with how I deal with it currently (namely adding a pull menu entry to export and having a text field to type a new file name…). Ideally, I was thinking of something like that (but not implemented yet):

One option could be like Adobe Acrobat on Android: annotations are automatically stored to the same file. No questions asked.

as soon as a modification is done on a document (new or modified annotation), the document is saved as annotated__.pdf into the same directory that the original one.

If choosing to keep original file intact, I'd change "annotated" as postfix so sorting by name has files grouped. Don't think it's necessarily worthwhile to keep different annotation versions around, just original and annotated should be enough.

dcaliste force-pushed the tags branch from a553f55 to 0ed7c91 Compare November 13, 2015 12:02

dcaliste force-pushed the tags branch 3 times, most recently from 7622253 to 623085a Compare November 19, 2015 09:32

dcaliste force-pushed the tags branch from 623085a to 6ff9cea Compare December 2, 2015 20:48

dcaliste force-pushed the tags branch 2 times, most recently from 99307a7 to 7b840db Compare January 25, 2016 22:23

dcaliste force-pushed the tags branch from 8e8bcdd to 201db9e Compare February 23, 2016 18:15

dcaliste added 7 commits March 10, 2016 07:48

Add thread infrastructure to retrieve a list of tags on every documen…

4b511c4

…t returned by tracker.

Begin to implement tag filtering.

d104124

Add the filter for tags.

a33fa99

Add SQL backend for tag storage.

cae1899

Use path to identify jobs in threads.

8b37852

Update tag memory storage in DocumentListModel.

fa38554

[office] Use Tracker as a tag provider.

03bde3d

dcaliste force-pushed the tags branch from 201db9e to 03bde3d Compare March 10, 2016 06:48

dcaliste mentioned this pull request May 10, 2016

Add initial support for annotations #80

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce tagging to sort office documents #58

Introduce tagging to sort office documents #58

dcaliste commented Nov 12, 2015

pvuorela commented Nov 13, 2015

pvuorela commented Nov 13, 2015

dcaliste commented Nov 13, 2015

pvuorela commented Nov 13, 2015

dcaliste commented Nov 13, 2015

dcaliste commented Feb 12, 2016

dcaliste commented Mar 11, 2016

pvuorela commented Mar 11, 2016

dcaliste commented Mar 11, 2016

pvuorela commented Mar 21, 2016

dcaliste commented Mar 23, 2016

dcaliste commented Mar 23, 2016

pvuorela commented Apr 5, 2016

Introduce tagging to sort office documents #58

Are you sure you want to change the base?

Introduce tagging to sort office documents #58

Conversation

dcaliste commented Nov 12, 2015

pvuorela commented Nov 13, 2015

pvuorela commented Nov 13, 2015

dcaliste commented Nov 13, 2015

pvuorela commented Nov 13, 2015

dcaliste commented Nov 13, 2015

dcaliste commented Feb 12, 2016

dcaliste commented Mar 11, 2016

pvuorela commented Mar 11, 2016

dcaliste commented Mar 11, 2016

pvuorela commented Mar 21, 2016

dcaliste commented Mar 23, 2016

dcaliste commented Mar 23, 2016

pvuorela commented Apr 5, 2016