Law Grammar

Warning

These are only about syntax, not semantic!

Refering several articles

In Backus–Naur Form:

<lawArticles>	  ::= <lawName> <severalArticles>
<lawName>	  ::= "民法" | "電信法" | "中華民國刑法" | ...
<severalArticles> ::= <element>
		    | <element> <conj> <severalArticles>
<conj>		  ::= "、" | "至" | "且" | "或"
<element>	  ::= "第" <number> <type> <subElem>
		    | "第" <number> <type> "之" <number> <subElem>
<number>	  ::= <digit> | <digit> <number>
<digit>		  ::= "零" | "一" | ... | "十" | "百" | "千" | "甲" | ... | "子" | ... | "a" | ...
<type>		  ::= "條" | "類" | "項" | "款" | "目" | "小目"
<subElem>	  ::= "" | <element>

In <conj>, I don't remember whether I've seen "或" shown.
In <digit>, 天干, 地支 and alphabets are only used while referring to international or foreign laws. I don't recommend one to handle them, since their sources are much more difficult to locate.
In <type>, "類" and "小目" are not listed in 中央法規標準法, but still used in 所得稅法.

Regular Expression in JavaScript

Codes would be more readable with string operation.

lawNames = ["民事訴訟法", "刑事訴訟法", "行政訴訟法", "軍事審判法", "少年事件處理法", ...];
number = "[零一二三四五六七八九十百千]+";
element = "第%number%[條類項款目](之%number%)?";
elements = "%element%([、至及或]%element%)*";
re = new RegExp(lawNames.join("|") + elements.replace("%element%", element.replace("%number%", number)), 'g');

I use "element" instead of "article" since there are some texts shown as "第五款至第七款". Though it's still possible to write a more accurate grammar such as 第%number%條(之%number%)?(第%number%項(第%number%款(第%number%目?)?)?, I don't think it's necessary now. (But regular expressions without asterisk are still meaningful if we want to process the numbers later.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Law Grammar

Warning

Refering several articles

In Backus–Naur Form:

Regular Expression in JavaScript

Clone this wiki locally