Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cxx: extract reference tags #3535

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

masatake
Copy link
Member

@masatake masatake commented Nov 15, 2022

See also #569.

Signed-off-by: Masatake YAMATO [email protected]

TODO:

@codecov
Copy link

codecov bot commented Nov 15, 2022

Codecov Report

Patch coverage: 95.96% and project coverage change: +0.05 🎉

Comparison is base (7e5c8dc) 82.83% compared to head (ef7e809) 82.88%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3535      +/-   ##
==========================================
+ Coverage   82.83%   82.88%   +0.05%     
==========================================
  Files         226      227       +1     
  Lines       54754    54870     +116     
==========================================
+ Hits        45353    45478     +125     
+ Misses       9401     9392       -9     
Impacted Files Coverage Δ
parsers/cxx/cxx_parser_namespace.c 80.29% <ø> (ø)
parsers/cxx/cxx_parser_template.c 74.71% <ø> (ø)
parsers/cxx/cxx_token_chain.c 82.41% <ø> (ø)
parsers/cxx/cxx_reftag.c 91.17% <91.17%> (ø)
parsers/cxx/cxx_parser_variable.c 92.30% <96.42%> (+0.61%) ⬆️
parsers/cxx/cxx_parser.c 86.60% <100.00%> (ø)
parsers/cxx/cxx_parser_function.c 93.09% <100.00%> (+<0.01%) ⬆️
parsers/cxx/cxx_parser_lambda.c 93.33% <100.00%> (ø)
parsers/cxx/cxx_parser_tokenizer.c 92.60% <100.00%> (+0.12%) ⬆️
parsers/cxx/cxx_parser_typedef.c 92.90% <100.00%> (ø)
... and 3 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@masatake
Copy link
Member Author

For the latest linux kernel source, ctags with this pull request generates 8GB tags file.
It is not small, however, citre works fine.
The option I used:

time ~/bin/ctags -o ~/.citre/upstream-linux.tags --kinds-all='*' --fields='*' --fields-all='*' --extras='*' --extras-all='*' --roles-all.'*'='*' -R ~/var/linux

It takes about 10 minutes for generating a tags file.
I used a PC with 256GB RAM + HDD.

Citre works fine but I think we can improve UI for reference tags.

This unknown kind +ref role approach can be applied to all token-based parsers like python and javascript parsers.

@masatake masatake requested a review from pragmaware November 15, 2022 05:05
@masatake
Copy link
Member Author

image

  • I pressed M-. on a definition, I expected reference tags are listed first.
  • I want more emphasized representation for ref tags. Just is not enough.
  • in the future, roles must be displayed.

makefiles/testing.mak Outdated Show resolved Hide resolved
@masatake
Copy link
Member Author

No built-in parser users Y nor W as a kind letter.
Let's use of one them instead of X for representing "unknown" kind.

@AmaiKinono
Copy link
Member

Citre works fine but I think we can improve UI for reference tags.

Citre now takes a pluggable backend design, citre-find-reference-backends supports the following functionalities:

  • xref-find-references
  • citre-jump-to-reference
  • citre-peek-references, citre-ace-peek-references, citre-peek-through-references

Currently the only "find reference" backend is global backend. I could create a tags backend.

@masatake
Copy link
Member Author

@AmaiKinono Thank you for your response.

I have used this branch + citre (87e2cbf3b2ae6d59ec919a2dcb38e56ccfa5ec14, a bit othreeder) for 3 days to read Linux kernel. M-. becomes slower maybe because readtags reports too many tags.
This slowness is acceptable if the command I use is M-?. However, I would like to keep M-. faster.

The question is whether M-. of citre users becomes slower or not if I merge this branch today.

--roles-C.{unknown}=ref (--roles-all.{unknown}=ref, --roles-all.{unknown}=*, or --roles-all.*=*) is needed to make ctags emit the reference tags newly introduced in this pull request. Just adding --extras=+r --fields=+r is not enough. I intentionally made ctgas require the extra option so that users don't make too large tags file unexpectedly.

The instructions (https://github.com/universal-ctags/citre/blob/master/docs/user-manual/about-tags-file.md#create-informative-tags-file) about the way to run ctags, do not enable the "ref" role of "unknown" kind. So I can say merging this branch today doesn't get citre users in trouble.

masatake added a commit to masatake/ctags that referenced this pull request Nov 18, 2022
masatake added a commit to masatake/ctags that referenced this pull request Nov 18, 2022
@AmaiKinono
Copy link
Member

M-. becomes slower maybe because readtags reports too many tags.

The problem is until now, Citre assumes tags files don't contain "reference tags" in the general sense. u-ctags only emits reference tags that point to external modules/files (let's call them dependency references for now). They are useful and aren't too many, so Citre doesn't filter them out when finding definitions.

But after this PR, uctags emits reference tags pointing to general entities (let's call them general reference for now). Citre should exclude them when finding definitions, and include them when finding references. We could do this in 2 ways:

  • Only exclude general reference tags. This has the advantage of keeping M-. behavior of Citre unchanged. We could do this by excluding tags that have "unknown" kind and "ref" role, but if in the future, they are tagged with the correct kind (variable, function, etc.), this technique won't work anymore.
  • Exclude all reference tags. In this way, M-. on a module/file name won't list its references. You have to use M-? for that.

Personally I feel the second way is better.

The question is whether M-. of citre users becomes slower or not if I merge this branch today.

It won't become slower as you already pointed out.

P.S. Actually I think tagsfile is not the best format for references. If we create a tagline for every symbol in a codebase, the tagsfile could become larger than the codebase itself. A binary database (like GNU global does) could save much space. But this is only my own thoughts. It may be unrealistic to make necessary changes to uctags for solving this problem.

@masatake
Copy link
Member Author

masatake commented Nov 20, 2022

@AmaiKinono You wrote what I thought but I could not write.

P.S. Actually I think tagsfile is not the best format for references. If we create a tagline for every symbol in a codebase, the tagsfile could become larger than the codebase itself. A binary database (like GNU global does) could save much space. But this is only my own thoughts. It may be unrealistic to make necessary changes to uctags for solving this problem.

I agree with you. The "space" is one of the biggest reasons why I have not implemented reference tags.

@masatake
Copy link
Member Author

let's call them dependency references for now
let's call them general reference for now

Good ideas. These proposals help me when extending our man pages.

@masatake masatake force-pushed the cxx--reference-tags branch from c85f0a5 to ef9e6b4 Compare November 20, 2022 14:30
@masatake
Copy link
Member Author

Python may extract too many 'self'.

@masatake masatake changed the title CXX: extract reference tags *: extract reference tags Nov 20, 2022
@masatake masatake force-pushed the cxx--reference-tags branch 4 times, most recently from 8f52d02 to 678461c Compare November 21, 2022 21:58
@masatake
Copy link
Member Author

EmacsLisp            u      unknown               yes     no      0      NONE   unknown type of definitions
Go                   u      unknown               yes     yes     1      NONE   unknown
Lisp                 u      unknown               yes     no      0      NONE   unknown type of definitions
Python               x      unknown               yes     no      2      NONE   name referring a class/variable/function/module defined in other module

Though breaking compatibilities, I will change the letters for unknown to Y.

@masatake masatake force-pushed the cxx--reference-tags branch 3 times, most recently from 82c5627 to 78fde67 Compare December 1, 2022 05:08
@masatake
Copy link
Member Author

masatake commented Dec 3, 2022

4615250..fe0c93a can be merged in advance.

@masatake masatake force-pushed the cxx--reference-tags branch 3 times, most recently from 25a5d78 to ddf135e Compare December 3, 2022 17:20
@masatake
Copy link
Member Author

masatake commented Dec 3, 2022

As I expected, the new idea, "assigning a cork index to a token", allows us to implement reference tags more than the "unknown" kind.

input1.c:

struct ops file_ops = {
    .readfn = file_readfn,
    .writefn = file_writefn,
};

output1.tags

ops ...         kind:unknown ...  roles:vardef
file_ops ...    kind:variable ... roles:def                ... typeref:struct:ops
readfn ...      kind:member ...   roles:initialized        ... scope:variable:file_ops
file_readfn ... kind:unknown ...  roles:value              ... scope:variable:file_ops
writefn ...     kind:member ....  roles:initialized        ... scope:variable:file_ops
file_writefn ...kind:unknown ...  roles:value              ... scope:variable:file_ops

@masatake masatake force-pushed the cxx--reference-tags branch from ddf135e to e47cb40 Compare December 3, 2022 18:43
@masatake masatake changed the title *: extract reference tags Cxx: extract reference tags Dec 5, 2022
@masatake masatake force-pushed the cxx--reference-tags branch from e47cb40 to cda022b Compare January 3, 2023 20:36
@masatake
Copy link
Member Author

masatake commented Jan 3, 2023

$ cat /tmp/foo.c
#include <stdio.h>
int main(void)
{
  return puts("hello, world\n");
}
$ ./ctags --kinds-C=+Y --roles-C.Y=+'{ref}'  --extras=+r --fields=+r -o - /tmp/foo.c
main	/tmp/foo.c	/^int main(void)$/;"	f	typeref:typename:int	roles:def
puts	/tmp/foo.c	/^  return puts("hello, world\\n");$/;"	Y	function:main	roles:applied
stdio.h	/tmp/foo.c	/^#include <stdio.h>/;"	h	roles:system

There are still many TODO items, ctags running locally successfully extracted puts, a macro expanded or a function called in main. See the "roles:" field of "puts". The scope was also filled. As I wrote ago, citre works well.

We should add the rule, "unknown+ref should have lower priority" to Citre's sorting engine.

@masatake masatake force-pushed the cxx--reference-tags branch 2 times, most recently from 1921b66 to 21ddab6 Compare January 9, 2023 16:15
@masatake
Copy link
Member Author

@AmaiKinono

When I looked up a rather generic name like "inode", M-. took very long time.
For looking up reference tags, it is acceptable; I will just do C-g. However, when I was interested only in its definition, taking too long time was not acceptable.

So for evaluation of this pull request, I added some code to Citre:

diff --git a/citre-tags.el b/citre-tags.el
index 4ce793e..bdcc520 100644
--- a/citre-tags.el
+++ b/citre-tags.el
@@ -375,6 +375,7 @@ completion can't be done."
         (file-path (citre-get-property 'file-path symbol)))
     `(not
       (or
+       ,(citre-readtags-filter 'extras "reference" 'csv-contain)
        ;; Don't excluded "anonymous" here as such symbols can appear in typeref
        ;; or scope fields of other tags, which may be shown in an xref buffer,
        ;; so we should be able to find their definitions.
@@ -413,6 +414,40 @@ is found."
      :require '(name ext-abspath pattern)
      :optional '(ext-kind-full line typeref scope extras))))
 
+;;;; Find references
+
+(defun citre-tags-reference-default-filter (symbol)
+  (let ((tags-file (citre-get-property 'tags-file symbol))
+        (file-path (citre-get-property 'file-path symbol)))
+    `(not
+      (or
+       ;; Don't excluded "anonymous" here as such symbols can appear in typeref
+       ;; or scope fields of other tags, which may be shown in an xref buffer,
+       ;; so we should be able to find their definitions.
+       ,(if file-path
+            (citre-tags-filter-local-symbol-in-other-file file-path tags-file)
+          'false)))))
+
+(defvar citre-tags-reference-default-sorter
+  (citre-readtags-sorter
+   citre-tags-sorter-arg-put-references-below
+   'input '(length name +) 'name
+   citre-tags-sorter-arg-size-order))
+
+(defun citre-tags-get-references (&optional symbol tagsfile)
+  (when-let* ((symbol (or symbol (citre-tags-get-symbol tagsfile)))
+              (tagsfile (or tagsfile (citre-get-property 'tags-file symbol))))
+    (citre-tags-get-tags
+     tagsfile symbol 'exact
+     :filter (or (citre-tags--get-value-in-language-alist
+                  :reference-filter symbol)
+                 (citre-tags-reference-default-filter symbol))
+     :sorter (or (citre-tags--get-value-in-language-alist
+                  :reference-sorter symbol)
+                 citre-tags-reference-default-sorter)
+     :require '(name ext-abspath pattern)
+     :optional '(ext-kind-full line typeref scope extras))))
+
 ;;;; Completion backend
 
 (defvar citre-tags--completion-cache
@@ -483,7 +518,9 @@ The result is a list (BEG END TAGS), see
     (citre-tags-get-definitions symbol tagsfile)))
 
 (defvar citre-tags--find-definition-for-id-filter
-  `(not ,(citre-readtags-filter 'extras "anonymous" 'csv-contain))
+  `(not (or
+         ,(citre-readtags-filter 'extras "reference" 'csv-contain)         
+         ,(citre-readtags-filter 'extras "anonymous" 'csv-contain)))
   "Filter for finding definitions when the symbol is inputted by user.")
 
 (defvar citre-tags--id-list-cache
@@ -546,6 +583,16 @@ simple tag name matching.  This function is for it."
  :identifier-list-func #'citre-tags-get-identifiers
  :get-definitions-for-id-func #'citre-tags--get-definition-for-id)
 
+;;;; Find refernces backend
+(defun citre-tags-get-references-at-point ()
+  "Get definitions of symbol at point."
+  (when-let ((tagsfile (citre-tags-file-path))
+             (symbol (citre-tags-get-symbol)))
+    (citre-tags-get-references symbol tagsfile)))
+
+(citre-register-find-reference-backend
+ 'tags #'citre-tags-get-references-at-point)
+
 ;;;; Tags in buffer backend
 
 (declare-function tramp-get-remote-tmpdir "tramp" (vec))

@AmaiKinono
Copy link
Member

So for evaluation of this pull request, I added some code to Citre:

Thanks! This is really helpful.

May I ask you for a PR to Citre, or can I use your code directly?

@masatake
Copy link
Member Author

May I ask you for a PR to Citre, or can I use your code directly?

This is a quick hack. Can you please add a new code for M-?.
I'd be happy if my code could be used as a base for the new code.

What I didn't add was code for 'C-u M-?'. I want to have it.

You wrote:

But after this PR, uctags emits reference tags pointing to general entities (let's call them general reference for now). Citre should exclude them when finding definitions, and include them when finding references. We could do this in 2 ways:

  • Only exclude general reference tags. This has the advantage of keeping M-. behavior of Citre unchanged. We could do this by excluding tags that have "unknown" kind and "ref" role, but if in the future, they are tagged with the correct kind (variable, function, etc.), this technique won't work anymore.

  • Exclude all reference tags. In this way, M-. on a module/file name won't list its references. You have to use M-? for that.

Personally I feel the second way is better.

I agree with you.

P.S. Actually I think tagsfile is not the best format for references. If we create a tagline for every symbol in a codebase, the tagsfile could become larger than the codebase itself. A binary database (like GNU global does) could save much space. But this is only my own thoughts. It may be unrealistic to make necessary changes to uctags for solving this problem.

I agree with you. However, I would like to use the tags output till I understand what I want.

@AmaiKinono
Copy link
Member

What I didn't add was code for 'C-u M-?'. I want to have it.

This one is hard.

When finding references of a user-inputted symbol, what xref does is get the symbol by xref-backend-identifier-completion-table, then pass it to xref-backend-references, with the text property stripped.

Citre is an xref backend and itself also has backends. Suppose a Citre backend A can give us all symbols in a project (so it could support xref-backend-identifier-completion-table), but doesn't support finding references, while another Citre backend B supports finding references of a symbol. When I press C-u M-?, Citre tries the backends in turn, and this could happen: we get the symbol name from backend A, and feed it to backend B to find the references. This is wrong, we should get the symbol from backend B.

Before passing a symbol to xref-backend-references, xref removes its text properties, so we could not know which Citre backend gives us the symbol. This makes designing a "xref backend with multiple backends" (like Citre) very hard.

This may be solved by some jump wire code but it needs some design... We are too away from the theme of this PR. Now I know what you want and I'll see how to implement it ;)

Signed-off-by: Masatake YAMATO <[email protected]>
    struct opt file_ops {
      .read  = file_read,
      .write = file_write,
    };

The parser extracts "read" and "write" with "initialized"
role of "member" kind.

Signed-off-by: Masatake YAMATO <[email protected]>
    struct opt file_ops {
      .read  = file_read,
      .write = file_write,
    };

The parser with this change extracts "file_read" and
"file_write" with "value" role of "unknown" kind.

Signed-off-by: Masatake YAMATO <[email protected]>
    struct opt file_ops {
      .read  = file_read,
      .write = file_write,
    };

The parser with this change extracts "opt" with "defvar"
role of "unknown" kind.

Signed-off-by: Masatake YAMATO <[email protected]>
    #include <stdio.h>
    int main(void)
    {
        return puts("hello, world\n");
    }

The parser with this change extracts "puts" with "applied"
role of "unknown" kind.

TODO: consider sub parsers.

Signed-off-by: Masatake YAMATO <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants