Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce tagging to sort office documents #58

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

dcaliste
Copy link
Contributor

It's a long time now I was thinking about this PR, and I kept it in my branch for a while to test it extensively, also in term of implementation details.

The idea is to tag documents to be able to better tidy the list of PDFs and other office documents I'm storing on my phone. As usual, tags are one or several keywords, chosen by the user. Then the document list can be filtered to show only documents with a certain list of tags. Well, common feature indeed, but very convenient when one need to get the all electricity bills of last year for instance. I have currently 390 office documents and just filtering with the dates (because I don't remember exactly when the plumber passed) are not enough and names are often not convenient.

The list filtering is done by expanding the existing FilterModel. The tags for every documents are stored in the DocumentListModel. The DocumentListModel also stores a QAbstractListModel exposing to QML all known tags (taglistmodel.cpp). Mainly from the work of François Kubler in TJC, and with his agreement, I've added a tag selector QML page (TagsSelector.qml qml qnd Tqg.qml).

When the user changes tags to a document, the tags are synchronized in a separated thread with a backend (tagsthread.cpp, done in the same philosophy as pdfjob.cpp). Currently the backend to store and retrieve tags is a SQlite file, local to sailfish-office, but the actual thread implementation may be used to access a system-wide tag database if any one day.

In this first PR, one can change tags of one document at a time (long press on an item), but I've tried to think about the API in a sense that minor additions will allow to add or remove a tag from several files at a time: tag storage in memory is a QHash of QSet of filenames, indexed by tags, so adding or removing a bunch of files for a tag is in O(1).

What do you think about it sailors ? Please comment, I will try to answer or improve as I can, I would really like to have this PR in !

@pvuorela
Copy link
Contributor

Showed this to our Vappu from our design and she'll look how this kind of functionality should be done throughout the operating system, having some common pattern. As is, the UI has some shortcomings, but I don't think it makes sense to proceed polishing until there's more feedback from design.

@pvuorela
Copy link
Contributor

And thanks for the initial version already!

@dcaliste
Copy link
Contributor Author

Thanks for reviewing this, I understand that it should be discussed with design. I agree that polishing can wait after design decisions of course.

I try to keep in mind the step toward system-widde tags when implementing this first, delegating tag storage in a dedicated thread backend for instance. The not so-well done is for the tag list that currently reside inside DocumentListModel (but already as an independant object), but it would not be too much work to put it out in a separate TagManager instance in QML for instance.

Anyway, thanks again for taking this into consideration.

@pvuorela
Copy link
Contributor

Not area of my expertise, but thinking if Tracker should be used for tags as it's already file source and afaik offers such functionality. The dedicated thread now also feels a bit excessive, but didn't yet even look into details.

@dcaliste
Copy link
Contributor Author

Yes, I've thought about tracker also to do the tag storage for me. In a first implementation, I decided not to go in that direction, because I don't know anything about tracker (bad excuse) and I wanted to have something functional in a reasonable among of time (I started thinking of this six months ago) to sort all the mess of my documents on my phone and also to test UI ideas, implementation ideas on a real basis…

For the thread stuff, I was afraid of the startup time when the system need to retrieve tags for my whole list of 390 documents. After, when accessing tags for one document (to add or remove), it's largely overkill, I admit it. But it also makes sure that the UI will still be reactive if we need to scale up in the future (having selected 50 documents and changing a bunch of tag at once).

@dcaliste dcaliste force-pushed the tags branch 3 times, most recently from 7622253 to 623085a Compare November 19, 2015 09:32
@dcaliste dcaliste force-pushed the tags branch 2 times, most recently from 99307a7 to 7b840db Compare January 25, 2016 22:23
@dcaliste
Copy link
Contributor Author

I've pushed a commit to the PR that replaces the local SQL backend to store tags with a tracker backend. It is much better integrated to the system now and can share tags with other applications.

What do you think ? Does it improve the PR and give it a chance to go in ? If so, I'll change the commits to remove the first SQL backend (and the thread machinery since SparQL is already async) and review the code for coding rule mistakes.

@dcaliste
Copy link
Contributor Author

May I push this up a bit ? What do you think of the last modification using Tracker tagging system instead of a homebrew local SQL ?

@pvuorela
Copy link
Contributor

Sorry, this got pending for quite some time.

We had designer changed for Office and needed to start this discussion again. Unfortunately there seemed to be sentiment that tagging would be a bit oldish kind of approach for finding documents and generally search should be enhanced to better find stuff. Not sure how that plays along with this kind of files. Search possibly could use more meta data provided by files: title, keywords, author and such, though seems many files just have some placeholder values in practice.

Whether this means we won't do tag support, I don't yet know, but as of now seemed a bit hard to sell idea. User defined key words might be an alternative that's similar but more compatible with searching, but better discuss that a bit more before proceeding.

@dcaliste
Copy link
Contributor Author

tagging would be a bit oldish kind of approach for finding documents and generally search should be enhanced to better find stuff

I fully agree with this statement. Ideally, I would like to type facture (bill in French) in the search entry (or have a meaningfull list of suggested keywords) and get all my PDF bills sorted out. But currently, there is no association between my PDF bills and the word to go with it (either done automagically by Tracker or manually).

Search possibly could use more meta data provided by files: title, keywords, author

Sadly, as you said, these metadata are not used by producers of documents (or badly used, tracker found tags in my PDF collection saying that this or that PDF was created by Java, great !).

User defined key words might be an alternative that's similar but more compatible with searching

What do you mean ? It's what I call tagging myself: I define tags (or keywords) and associate them with my documents (now using system wide tracker) so I can search then my document collection using these keywords.

To come back to the first quoted sentence and summarize where we can go to accomodate points of view:

  • there should be a miner that can extract significant keywords from a document (I think that the Poppler miner for tracker can do this). The difficult point being how we defined « significant ». Should it be a predefined set of words / names (need to be made for each language…) ? Should it be something else ?
  • there should be a way to search through the document collection by entering keywords or selecting them (it's part of what is implemented here in this PR afterall).
  • there should be a way for the user to teach the miner which keywords are significant which are not. For example by adding to the predefined keyword list the keywords added by the user. For example, I enter once that a given PDF is tagged « Enalp » (my electricity provider), and for each next PDF document containing the word Enalp, the tag will automatically be added by the miner.

Do you think it is a better way to go ? Because currently if I look for my gas bills, I need to remember that my provider decided to name them something like « 02032015-N°507508091490.pdf » which is nothing like convenient.

Besides, a first very simplified approach would be able to add a rename capability in the interface !

@pvuorela
Copy link
Contributor

User defined key words might be an alternative that's similar but more compatible with searching

What do you mean ? It's what I call tagging myself: I define tags (or keywords) and associate them with my documents (now using system wide tracker) so I can search then my document collection using these keywords.

Was thinking of making it more free text type of property. UI not creating tag items that can be attached to files nor showing list of available ones when searching. Don't know if this would be something passing the design, either directly on file list or by allowing user to edit pdf meta data's keyword property.

there should be a miner that can extract significant keywords from a document (I think that the Poppler miner for tracker can do this). The difficult point being how we defined « significant ». Should it be a predefined set of words / names (need to be made for each language…) ? Should it be something else ?

I don't think we should be having any AI type of miner, deciding what is important for the user. Significant information at minimum would be the attached meta data: author, keywords, title and so on, even if they sometimes contain garbage (can think of filtering out a few most common ones if such exist and can be identified).

there should be a way to search through the document collection by entering keywords or selecting them (it's part of what is implemented here in this PR afterall).

I think this part should be the search field. No UI changes, just searching better.

there should be a way for the user to teach the miner which keywords are significant which are not. For example by adding to the predefined keyword list the keywords added by the user. For example, I enter once that a given PDF is tagged « Enalp » (my electricity provider), and for each next PDF document containing the word Enalp, the tag will automatically be added by the miner.

One way to teach, but sounds a bit technical. Will only work for documents that can be searched for specific strings. Having doubt if this will work out.

Btw. checked also the annotation branch and been talking with design. Looked quite nice! Martin started sketching how UI could be streamlined a bit. Talked about perhaps reusing the toolbar for annotation actions. Might attach some spec parts under this project once those are ready.

@dcaliste
Copy link
Contributor Author

I think this part should be the search field. No UI changes, just searching better.

This is the key solution to conciliate our points of view, I think.

On the assumption that documents are correctly decorated by proper (quoting) "meta data: author, keywords, title and so on", typing words in the search field should filter the document list accordingly. I can easily change the PR to do this based on tracker results as implemented in the lastest commit of the PR. This would satisfy my use-case of filtering my documents to find my bills, train tickets…

Remains the hard part: how to defined these proper meta data assuming that document providers (train company…) are not setting them (often at all) ? In my opinion, internally this should be left to tracker to store these as tags (or keywords as you said, but the ontology use the tag name, whatever). What do you think should be a good UI for the user to set or change these keywords (or even other metadata) easily ? I was thinking that choosing from a list of already defined keywords (and having the possibility to add new) was a simple and efficient way. It seems that you disagree. What do you suggest ? Change the DetailsPage.qml for instance to be able to defined keywords there (and change other metadata as author) as free typing text ?

@dcaliste
Copy link
Contributor Author

Martin started sketching how UI could be streamlined a bit.

Great news ! I'm happy with this. I'm waiting for these and in the mean time continue to test my different use-cases for annotation (I have to read articles and comment and modify them in my work). And also work on the partial rendering to avoid having to redraw all the page when an annotation is modified.

Talked about perhaps reusing the toolbar for annotation actions.

Yep, currenly, we can only support annotations that are related to a portion of text (through text selection), like highlight or comment. We're lacking the geometry annotations, in my opinion, that would enable us to draw arrows, circles… freely on the document. These are also usefull when commenting a document to explain some modifications we would like the authors to do.

Last word, if you have some ideas with design, on how to deal with document saving, that would be great. I'm not satisfied at all with how I deal with it currently (namely adding a pull menu entry to export and having a text field to type a new file name…). Ideally, I was thinking of something like that (but not implemented yet):

  • as soon as a modification is done on a document (new or modified annotation), the document is saved as annotated_<i>_<original filename>.pdf into the same directory that the original one.
  • as long as the doument is not discard in the UI, each modification is saved in this same annoted file. So like that, the file on disk is always in sync with the UI display.
  • when a (annotated) document is open and modified again, the <i> part of the filename is increased.
    Like that, saving is transparent to the user and documents are always in sync with the disk. In the file list, there will be a bunch of annotated_… documents, but they are sorted out by date already so it's fine, older annotated document will be pushed further and won't clutter to much the list (hopefully).
    What's your opinion ?

@pvuorela
Copy link
Contributor

pvuorela commented Apr 5, 2016

Last word, if you have some ideas with design, on how to deal with document saving, that would be great. I'm not satisfied at all with how I deal with it currently (namely adding a pull menu entry to export and having a text field to type a new file name…). Ideally, I was thinking of something like that (but not implemented yet):

One option could be like Adobe Acrobat on Android: annotations are automatically stored to the same file. No questions asked.

as soon as a modification is done on a document (new or modified annotation), the document is saved as annotated__.pdf into the same directory that the original one.

If choosing to keep original file intact, I'd change "annotated" as postfix so sorting by name has files grouped. Don't think it's necessarily worthwhile to keep different annotation versions around, just original and annotated should be enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants