Skip to content
kong0107 edited this page Jun 7, 2013 · 2 revisions

Warning

These are only about syntax, not semantic!

Refering several articles

<lawArticles>	  ::= <lawName> <severalArticles>
<lawName>	  ::= "民法" | "電信法" | "中華民國刑法" | ...
<severalArticles> ::= <element>
		    | <element> <conj> <severalArticles>
<conj>		  ::= "、" | "至" | "且" | "或"
<element>	  ::= "第" <number> <type> <subElem>
		    | "第" <number> <type> "之" <number> <subElem>
<number>	  ::= <digit> | <digit> <number>
<digit>		  ::= "零" | "一" | ... | "十" | "百" | "千" | "甲" | ... | "子" | ... | "a" | ...
<type>		  ::= "條" | "類" | "項" | "款" | "目" | "小目"
<subElem>	  ::= "" | <element>
  • In <conj>, I don't remember whether I've seen "或" shown.
  • In <digit>, 天干, 地支 and alphabets are only used while referring to international or foreign laws. I don't recommend one to handle them, since their sources are much more difficult to locate.
  • In <type>, "類" and "小目" are not listed in 中央法規標準法, but still used in 所得稅法.

Regular Expression in JavaScript

Codes would be more readable with string operation.

lawNames = ["民事訴訟法", "刑事訴訟法", "行政訴訟法", "軍事審判法", "少年事件處理法", ...];
number = "[零一二三四五六七八九十百千]+";
element = "第%number%[條類項款目](之%number%)?";
elements = "%element%([、至及或]%element%)*";
re = new RegExp(lawNames.join("|") + elements.replace("%element%", element.replace("%number%", number)), 'g');

I use "element" instead of "article" since there are some texts shown as "第五款至第七款". Though it's still possible to write a more accurate grammar such as 第%number%條(之%number%)?(第%number%項(第%number%款(第%number%目?)?)?, I don't think it's necessary now. (But regular expressions without asterisk are still meaningful if we want to process the numbers later.)