Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Readline-based parser of markdown #3233

Closed
wants to merge 3 commits into from

Conversation

techee
Copy link
Contributor

@techee techee commented Dec 22, 2021

We have a rather bad markdown parser in Geany and when looking at uctags, I realized there's just a regex-based parser. From the past experience, these tend to be rather slow so for us it would be better to have a hand-written parser.

I created a simple readline-based parser based on the asciidoc parser and tried to preserve all the features of the regex-based parser (all kinds, full scope, sectionMarker field, running subparsers for code). Would such a parser be interesting for uctags or is the regex-based one the preferred solution?

This parser is based on the asciidoc parser and tries to preserve all
features of the regex-based parser (all kinds, full scope,
sectionMarker field, running subparsers for code).
@codecov
Copy link

codecov bot commented Dec 22, 2021

Codecov Report

Merging #3233 (48f431f) into master (05e6ab4) will increase coverage by 0.27%.
The diff coverage is 95.12%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3233      +/-   ##
==========================================
+ Coverage   85.01%   85.28%   +0.27%     
==========================================
  Files         206      206              
  Lines       49127    49084      -43     
==========================================
+ Hits        41765    41862      +97     
+ Misses       7362     7222     -140     
Impacted Files Coverage Δ
parsers/markdown.c 95.12% <95.12%> (ø)
main/lregex.c 81.94% <0.00%> (-1.14%) ⬇️
main/field.c 92.73% <0.00%> (-0.29%) ⬇️
optlib/markdown.c

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05e6ab4...48f431f. Read the comment docs.

@masatake
Copy link
Member

Thank you. Of course, I will merge this C implementation.

Let's fill the end: fields.

diff --git a/parsers/markdown.c b/parsers/markdown.c
index 173130e06..89358d89a 100644
--- a/parsers/markdown.c
+++ b/parsers/markdown.c
@@ -69,17 +69,22 @@ static NestingLevels *nestingLevels = NULL;
 *   FUNCTION DEFINITIONS
 */
 
-static NestingLevel *getNestingLevel(const int kind)
+static NestingLevel *getNestingLevel(const int kind, int adjustment_when_pop)
 {
        NestingLevel *nl;
        tagEntryInfo *e;
+       unsigned long line = getInputLineNumber();
 
        while (1)
        {
                nl = nestingLevelsGetCurrent(nestingLevels);
                e = getEntryOfNestingLevel (nl);
                if ((nl && (e == NULL)) || (e && (e->kindIndex >= kind)))
+               {
+                       if (e && line > adjustment_when_pop)
+                               e->extensionFields.endLine = line - adjustment_when_pop;
                        nestingLevelsPop(nestingLevels);
+               }
                else
                        break;
        }
@@ -88,7 +93,7 @@ static NestingLevel *getNestingLevel(const int kind)
 
 static int makeMarkdownTag (const vString* const name, const int kind, const bool two_line)
 {
-       const NestingLevel *const nl = getNestingLevel(kind);
+       const NestingLevel *const nl = getNestingLevel(kind, two_line? 2: 1);
        int r = CORK_NIL;
 
        if (vStringLength (name) > 0)

Could you include this?
It seems that my change is not perfect.
I will work more.

@masatake
Copy link
Member

masatake commented Dec 23, 2021

Instead of nestingLevelsNew, let's use nestingLevelsNewFull.

We can add a callback function called when a level is popped.

We can fill the end: fields in the call back function.

deleteBlockData is an example of such a callback function.

I will work on this topic tonight, JST.

@masatake
Copy link
Member

nesting level API must be extended to pass the line adjustment data.

diff --git a/main/nestlevel.c b/main/nestlevel.c
index d3403f78f..3ce87df09 100644
--- a/main/nestlevel.c
+++ b/main/nestlevel.c
@@ -29,7 +29,7 @@
 */
 
 extern NestingLevels *nestingLevelsNewFull(size_t userDataSize,
-										   void (* deleteUserData)(NestingLevel *))
+										   void (* deleteUserData)(NestingLevel *, void *))
 {
 	NestingLevels *nls = xCalloc (1, NestingLevels);
 	nls->userDataSize = userDataSize;
@@ -42,7 +42,7 @@ extern NestingLevels *nestingLevelsNew(size_t userDataSize)
 	return nestingLevelsNewFull (userDataSize, NULL);
 }
 
-extern void nestingLevelsFree(NestingLevels *nls)
+extern void nestingLevelsFreeFull(NestingLevels *nls, void *ctxData)
 {
 	int i;
 	NestingLevel *nl;
@@ -51,7 +51,7 @@ extern void nestingLevelsFree(NestingLevels *nls)
 	{
 		nl = NL_NTH(nls, i);
 		if (nls->deleteUserData)
-			nls->deleteUserData (nl);
+			nls->deleteUserData (nl, ctxData);
 		nl->corkIndex = CORK_NIL;
 	}
 	if (nls->levels) eFree(nls->levels);
@@ -89,13 +89,13 @@ extern NestingLevel *nestingLevelsTruncate(NestingLevels *nls, int depth, int co
 }
 
 
-extern void nestingLevelsPop(NestingLevels *nls)
+extern void nestingLevelsPopFull(NestingLevels *nls, void *ctxData)
 {
 	NestingLevel *nl = nestingLevelsGetCurrent(nls);
 
 	Assert (nl != NULL);
 	if (nls->deleteUserData)
-		nls->deleteUserData (nl);
+		nls->deleteUserData (nl, ctxData);
 	nl->corkIndex = CORK_NIL;
 	nls->n--;
 }
diff --git a/main/nestlevel.h b/main/nestlevel.h
index 18ac9927e..3154ae833 100644
--- a/main/nestlevel.h
+++ b/main/nestlevel.h
@@ -35,7 +35,7 @@ struct NestingLevels
 	int n;					/* number of levels in use */
 	int allocated;
 	size_t userDataSize;
-	void (* deleteUserData) (NestingLevel *);
+	void (* deleteUserData) (NestingLevel *, void *);
 };
 
 /*
@@ -43,11 +43,13 @@ struct NestingLevels
 */
 extern NestingLevels *nestingLevelsNew(size_t userDataSize);
 extern NestingLevels *nestingLevelsNewFull(size_t userDataSize,
-										   void (* deleteUserData)(NestingLevel *));
-extern void nestingLevelsFree(NestingLevels *nls);
+										   void (* deleteUserData)(NestingLevel *, void *));
+#define nestingLevelsFree(NLS) nestingLevelsFreeFull(NLS, NULL)
+extern void nestingLevelsFreeFull(NestingLevels *nls, void *ctxData);
 extern NestingLevel *nestingLevelsPush(NestingLevels *nls, int corkIndex);
 extern NestingLevel * nestingLevelsTruncate(NestingLevels *nls, int depth, int corkIndex);
-extern void nestingLevelsPop(NestingLevels *nls);
+#define nestingLevelsPop(NLS) nestingLevelsPopFull(NLS, NULL)
+extern void nestingLevelsPopFull(NestingLevels *nls, void *ctxData);
 #define nestingLevelsGetCurrent(NLS) nestingLevelsGetNthParent((NLS), 0)
 extern NestingLevel *nestingLevelsGetNthFromRoot(const NestingLevels *nls, int n);
 extern NestingLevel *nestingLevelsGetNthParent(const NestingLevels *nls, int n);
diff --git a/parsers/ruby.c b/parsers/ruby.c
index 2aab8d94d..2b2fd2594 100644
--- a/parsers/ruby.c
+++ b/parsers/ruby.c
@@ -695,7 +695,7 @@ static void attachMixinField (int corkIndex, stringList *mixinSpec)
 								  vStringValue (mixinField));
 }
 
-static void deleteBlockData (NestingLevel *nl)
+static void deleteBlockData (NestingLevel *nl, void *data CTAGS_ATTR_UNUSED)
 {
 	struct blockData *bdata = nestingLevelGetUserData (nl);
 

So we can pass two_line to the call back function.

@masatake
Copy link
Member

@techee, let me take over this pull request.

@techee
Copy link
Contributor Author

techee commented Dec 23, 2021

@techee, let me take over this pull request.

Sure, no problem, less work for me :-).

@techee
Copy link
Contributor Author

techee commented Dec 23, 2021

What's the end: by the way? I thought it was something regex-specific.

@masatake
Copy link
Member

end: is a name of a field. line: represents where the tag is defined. end: represents where the scope established by the tag is ended.

$ cat -n /tmp/foo.c
     1	struct point 
     2	{
     3	  int x;
     4	  int y;
     5	};
     6	
     7	int
     8	main(void)
     9	{
    10	  return 0;
    11 }
    12	
$ ctags -o - --fields=+ne /tmp/foo.c
main	/tmp/foo.c	/^main(void)$/;"	f	line:8	typeref:typename:int	end:11
point	/tmp/foo.c	/^struct point $/;"	s	line:1	file:	end:5
x	/tmp/foo.c	/^  int x;$/;"	m	line:3	struct:point	typeref:typename:int	file:	end:3
y	/tmp/foo.c	/^  int y;$/;"	m	line:4	struct:point	typeref:typename:int	file:	end:4

@techee
Copy link
Contributor Author

techee commented Dec 23, 2021

Nice, I can imagine this information could be interesting for Geany too. I assume this is currently available only for some parsers, not all parsers reporting scope, right?

@masatake
Copy link
Member

Nice, I can imagine this information could be interesting for Geany too. I assume this is currently available only for some parsers, not all parsers reporting scope, right?

No, not all parsers. `grep endLine parsers/*.c' may report what you want to know:-).

@masatake
Copy link
Member

See #3235.


static int makeMarkdownTag (const vString* const name, const int kind, const bool two_line)
{
const NestingLevel *const nl = getNestingLevel(kind);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #3235, I move the getNestingLevel() to...

int r = CORK_NIL;

if (vStringLength (name) > 0)
{
Copy link
Member

@masatake masatake Dec 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...here. The nesting level should be popped when making a tag for the name.

@masatake
Copy link
Member

masatake commented Jan 2, 2022

The changes were merged via #3236.

@masatake masatake closed this Jan 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants