Markdown

[心得] learning-regular-expressions

learning-regular-expressions

REF:


實用:


介紹:

\ :

反斜線放在非特殊符號前面,使非特殊符號不會被逐字譯出,代表特殊作用。
例如:
'b’如果沒有 ‘’ 在前頭,功能是找出小寫b;若改為 ‘\b’ 則代表的是邊界功能,block用意。
/\bter\b/.test(“interest”) //false
/\bter\b/.test(“in ter est”) //true

^ :

Start of string, or start of line in multi-line pattern.
匹配輸入的開頭,如果 multiline flag 被設為 true,則會匹配換行字元後。
例如:/^A/ 不會匹配「an A」的 A,但會匹配「An E」中的 A。
註:「^」出現在字元集模式的字首中有不同的意思,代表排除特定字元。

$ :

End of string, or end of line in multi-line pattern.
匹配輸入的結尾,如果 multiline flag 被設為 true,則會匹配換行字元。
例如:/t$/ 不會匹配「eater」中的 t,卻會匹配「eat」中的 t。

* :

Matches the preceding expression 0 or more times. Equivalent to {0,}.
比對條件出現 0 次或是 0 次以上。
For example, /bo*/ matches ‘boooo’ in “A ghost booooed” and ‘b’ in “A bird warbled” but nothing in “A goat grunted”.

+ :

Matches the preceding expression 1 or more times. Equivalent to {1,}.
比對條件出現 1 次或 1 次以上。
For example, /a+/ matches the ‘a’ in “candy” and all the a’s in “caaaaaaandy”, but nothing in “cndy”.

? :

Matches the preceding expression 0 or 1 time. Equivalent to {0,1}.
比對出現 0 次或 1 次。
For example, /e?le?/ matches the ‘el’ in “angel” and the ‘le’ in “angle” and also the ‘l’ in “oslo”.
可以搭配其他表示式使用,來達到限制比對結果。
例如 /\d+/ 比對 “123abc” 會得到 123 ,
但使用 /\d+?/ 則會得到 1。
Also used in lookahead assertions, as described in the x(?=y) and x(?!y) entries of this table.

. :

(小數點)匹配除了換行符號之外的單一字元。
例如:/.n/ 匹配「nay, an apple is on the tree」中的 an 和 on,但在「nay」中沒有匹配。

(x) :

Matches ‘x’ and remembers the match, as the following example shows. The parentheses are called capturing parentheses.
The ‘(foo)’ and ‘(bar)’ in the pattern /(foo) (bar) \1 \2/ match and remember the first two words in the string “foo bar foo bar”. The \1 and \2 in the pattern match the string’s last two words.

(?:x) :

找出 ‘x’,這動作不會記憶。括號用途為
Matches ‘x’ but does not remember the match. The parentheses are called non-capturing parentheses, and let you define subexpressions for regular expression operators to work with. Consider the sample expression /(?:foo){1,2}/. Without the non-capturing parentheses, the {1,2} characters would apply only to the last ‘o’ in ‘foo’. With the capturing parentheses, the {1,2} applies to the entire word ‘foo’.

x(?=y) :

符合’x’,且後接的是’y’。'y’為’x’存在的意義。
例如:/Jack(?=Sprat)/,在後面是Sprat的存在下,Jack才有意義。
/Jack(?=Sprat|Frost)/後面是Sprat「或者是」Frost的存在下,Jack才有意義。但我們要找的目標是Jack,後面的條件都只是filter/條件的功能而已。

x(?!y) :

符合’x’,且後接的不是’y’。'y’為否定’x’存在的意義,後面不行前功盡棄(negated lookahead)。
例如: /\d+(?!.)/ ,要找一個或多個數字時,在後面接的不是「點」的情況下成立。
var result = /\d+(?!.)/.exec(“3.141”) ,
result執行出來為[ ‘141’, index: 2, input: ‘3.141’],
index:2,代表141從index = 2開始。

x|y :

符合「x」或「y」。
舉例來說, /green|red/ 的話,會匹配 “green apple” 中的 “green” 以及 “red apple.” 的 “red” 。

{n} :

規定符號確切發生的次數,n為正整數
例如:/a{2}/無法在 “candy” 找到、但 “caandy” 行;若字串擁有2個以上 “caaandy” 還是只會認前面2個。

{n,m} :

搜尋條件:n為至少、m為至多,其n、m皆為正整數。若把m設定為0,則為Invalid regular expression。
例如:/a{1,3}/ 無法在 “cndy” 匹配到;而在 “candy” 中的第1個"a"符合;在 “caaaaaaandy” 中的前3個 “aaa” 符合,雖然此串有許多a,但只認前面3個。

[xyz] :

字元的集合。此格式會匹配中括號內所有字元, including escape sequences。特殊字元,例如點(.) 和米字號(*),在字元集合中不具特殊意義,所以不需轉換。若要設一個字元範圍的集合,可以使用橫線 “-” ,如下例所示:
[a-d] 等同於 [abcd]。會匹配 “brisket” 的 “b” 、“city” 的 ‘c’ ……等。 而/[a-z.]+/ 和 /[\w.]+/ 均可匹配字串 “test.i.ng

[^xyz] :

bracket中寫入的字元將被否定,匹配非出現在bracket中的符號。
可用 ‘-’ 來界定字元的範圍。一般直接表達的符號都可以使用這種方式。
[abc]可以寫作[a-c]. “brisket” 中找到 ‘r’ 、"chop."中找到 ‘h’。

[\b] :

Matches a backspace (U+0008). (Not to be confused with \ b.)
匹配空白字元

\b :

匹配前面不帶其他字元的條件,也就是要找的條件需要在開頭才會符合。
Examples:
/\bm/ matches the ‘m’ in “moon” ;
/oo\b/ does not match the ‘oo’ in “moon”, because ‘oo’ is followed by ‘n’ which is a word character;
/oon\b/ matches the ‘oon’ in “moon”, because ‘oon’ is the end of the string, thus not followed by a word character;

\B :

跟 \b 相反 ,不在開頭才匹配。
Matches a non-word boundary. This matches a position where the previous and next character are of the same type: Either both must be words, or both must be non-words. The beginning and end of a string are considered non-words.
For example, /\B…/ matches ‘oo’ in “noonday” (, and /y\B./ matches ‘ye’ in “possibly yesterday.”

\cX :

\c 代表control, X 代表 A-Z。
拿來匹配 control-A ~ control-Z。
Where X is a character ranging from A to Z. Matches a control character in a string.
For example, /\cM/ matches control-M (U+000D) in a string.

\d :

挑出數字,寫法等同於 [0-9] 。
例如:/\d/ 或 /[0-9]/ 在 “B2 is the suite number.” 中找到 ‘2’

\D :

挑出非數字,寫法等同於 [^0-9]。
例如:/\D/ 或/[^0-9]/ 在 “B2 is the suite number.” 中找到 ‘B’ 。

\f :

Matches a form feed (U+000C).

\n :

Matches a line feed (U+000A).

\r :

Matches a carriage return (U+000D).

\s :

找空白
Matches a single white space character, including space, tab, form feed, line feed. Equivalent to [ \f\n\r\t\v​\u00a0\u1680​\u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​​\u202f\u205f​\u3000].
For example, /\s\w*/ matches ’ bar’ in “foo bar.”

\S :

找非空白
Matches a single character other than white space. Equivalent to [^ \f\n\r\t\v​\u00a0\​\u1680u180e\u2000​\u2001\u2002​\u2003\u2004​\u2005\u2006​\u2007\u2008​\u2009\u200a​\u2028\u2029​\u202f\u205f​\u3000].
For example, /\S\w*/ matches ‘foo’ in “foo bar.”

\v :

Matches a vertical tab (U+000B).

Vertical tab was used to speed up printer vertical movement. Some printers used special tab belts with various tab spots. This helped align content on forms. VT to header space, fill in header, VT to body area, fill in lines, VT to form footer. Generally it was coded in the program as a character constant. From the keyboard, it would be CTRL-K.
I don’t believe anyone would have a reason to use it any more. Most forms are generated in a printer control language like postscript.

\w :

匹配非特殊符號
包含數字字母與底線,等同於[A-Za-z0-9_]。
例如: /\w/ 符合 ‘apple’中的 ‘a’ 、’$5.28中的 ‘5’ 以及 ‘3D’ 中的 ‘3’。
For example, /\w/ matches ‘a’ in “apple,” ‘5’ in “$5.28,” and ‘3’ in “3D.”

\W :

匹配特殊符號
Matches any non-word character. Equivalent to [^A-Za-z0-9_].
For example, /\W/ or /[^A-Za-z0-9_]/ matches ‘%’ in “50%.”

\n :

從後面比對 n 筆
Where n is a positive integer, a back reference to the last substring matching the n parenthetical in the regular expression (counting left parentheses).
For example, /apple(,)\sorange\1/ matches ‘apple, orange,’ in “apple, orange, cherry, peach.”

\0 :

Matches a NULL (U+0000) character. Do not follow this with another digit, because \0<digits> is an octal escape sequence.

\xhh :

Matches the character with the code hh (two hexadecimal digits)

\uhhhh :

Matches the character with the code hhhh (four hexadecimal digits).

Pattern Modifiers

g : Global match
* : PCRE modifier
i* : Case-insensitive
m* : Multiple lines
s* : Treat string as single line
x* : Allow comments and whitespace in pattern
e* : Evaluate replacement
U* : Ungreedy pattern

PHP Email:

/^(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){255,})(?!(?:(?:\x22?\x5C[\x00-\x7E]\x22?)|(?:\x22?[^\x5C\x22]\x22?)){65,}@)(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22))(?:\.(?:(?:[\x21\x23-\x27\x2A\x2B\x2D\x2F-\x39\x3D\x3F\x5E-\x7E]+)|(?:\x22(?:[\x01-\x08\x0B\x0C\x0E-\x1F\x21\x23-\x5B\x5D-\x7F]|(?:\x5C[\x00-\x7F]))*\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\]))$/iD

MySQL Email:

SELECT * FROM `users` WHERE `email` NOT REGEXP '^[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}$';

HTML5 Email:

<input type="email" placeholder="Enter your email" />
type email 等效於
/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

/^[a-zA-Z0-9.!#$%&’*+/=?^_`{|}~-]+@[a-zA-Z0-9-]+(?:\.[a-zA-Z0-9-]+)*$/

JavaScript Email:

/^(([^<>()\[\]\\.,;:\s@"]+(\.[^<>()\[\]\\.,;:\s@"]+)*)|(".+"))@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/

留言