forked from kong0107/zhLawEasyRead
-
Notifications
You must be signed in to change notification settings - Fork 6
Law Grammar
kong0107 edited this page Jun 7, 2013
·
2 revisions
These are only about syntax, not semantic!
In Backus–Naur Form:
<lawArticles> ::= <lawName> <severalArticles>
<lawName> ::= "民法" | "電信法" | "中華民國刑法" | ...
<severalArticles> ::= <element>
| <element> <conj> <severalArticles>
<conj> ::= "、" | "至" | "且" | "或"
<element> ::= "第" <number> <type> <subElem>
| "第" <number> <type> "之" <number> <subElem>
<number> ::= <digit> | <digit> <number>
<digit> ::= "零" | "一" | ... | "十" | "百" | "千" | "甲" | ... | "子" | ... | "a" | ...
<type> ::= "條" | "類" | "項" | "款" | "目" | "小目"
<subElem> ::= "" | <element>
- In
<conj>
, I don't remember whether I've seen "或" shown. - In
<digit>
, 天干, 地支 and alphabets are only used while referring to international or foreign laws. I don't recommend one to handle them, since their sources are much more difficult to locate. - In
<type>
, "類" and "小目" are not listed in中央法規標準法
, but still used in所得稅法
.
Codes would be more readable with string operation.
lawNames = ["民事訴訟法", "刑事訴訟法", "行政訴訟法", "軍事審判法", "少年事件處理法", ...];
number = "[零一二三四五六七八九十百千]+";
element = "第%number%[條類項款目](之%number%)?";
elements = "%element%([、至及或]%element%)*";
re = new RegExp(lawNames.join("|") + elements.replace("%element%", element.replace("%number%", number)), 'g');
I use "element" instead of "article" since there are some texts shown as "第五款至第七款". Though it's still possible to write a more accurate grammar such as 第%number%條(之%number%)?(第%number%項(第%number%款(第%number%目?)?)?
, I don't think it's necessary now. (But regular expressions without asterisk are still meaningful if we want to process the numbers later.)