Patterns
From GMod Wiki
Go to: Useful Information |
Lua: Patterns |
Description: | Shows people how to use lua patterns |
Original Author: | Brian Nevec |
Created: | 30th June, 2009 |
Contents |
What's this article for?
This article is for teaching you how to use Lua's pattern matching language. The pattern matching language( or just patterns for short ) gives you somewhat advanced tools for searching and replacing recurring patterns in strings. These tools can be used for writing text data parsers, custom formatters and many other things that would take hundreds of lines of code.
A lot of the theory in this article is either copied or rewritten from the lua reference manual. You can see the manual section on patterns here
Getting started
An average pattern looks like this: [%w_]+
. That specific pattern could be used for finding variable names( such as "hi_there", "h0w_are_you", etc. ). What each character in the pattern does I will go over later in this article.
These functions can be used together with patterns:
I will try to use all of these functions and explain how each of them work in detail.
Special characters
There are a bunch of special characters that either escape other characters or modify the pattern is some way. These characters are: "^$()%.[]*+-?". They can also be used in the pattern as normal characters by prefixing them with a "%" character, so "%%" becomes "%", "%[" becomes "[", etc.
Character classes
Character classes represent a set of characters. They can be either predefined sets or custom sets that can consist of the same predefined sets, ranges or any single characters.
Available character classes( custom and predefined ):
-
.
: (a dot) represents all characters( will match any character ), -
%a
: represents all letters( from a to z upper and lower case ), -
%c
: represents all control characters( special characters "\t", "\n", etc. ), -
%d
: represents all digits( from 0 to 9 ), -
%l
: represents all lowercase letters( any letter that is lower case ), -
%p
: represents all punctuation characters( ".", ",", etc. ), -
%s
: represents all space characters( a normal space, tab, etc. ) -
%u
: represents all uppercase letters( any letter that is upper case ), -
%w
: represents all alphanumeric characters( all letters and numbers ), -
%x
: represents all hexadecimal digits( digits 0-9, letters a-f, and letters A-F ), -
%z
: represents the character with representation 0( the null character "\0" ), -
%x
: ( x is any non-alphanumeric character ) represents itself, -
[s]
: represents all characters in s as a union. You can already see this used in the previous section. [%w_] will match any letter, digit and an underscore, -
[^s]
: represents the opposite of the union s, so ultimately the above used set when prefixed with "^" matches everything that is not a letter, digit or an underscore, - an upper case version of a predefined character set will represent the opposite of that set, so "%A" will match anything that is not a letter,
- the starting and ending points of a range are separated with a "-", so 0-5 will match a digit from zero to five, a-c will match a, b and c.
Repetition and anchoring
Characters in a string match a pattern in the following ways:
- a single class will match a single character,
- a single class followed by "+" will match one or more repetitions of characters and will match the longest sequence,
- a single class followed by "-" will match zero or more repetitions of characters and will match the shortest sequence,
- a single class followed by "*" will match zero or more repetitions of characters and will match the longest sequence,
- a single class followed by "?" will match one or zero characters,
- %n ( n is a digit between 1 and 9 ) will match the nth capture( see next section ),
- %bxy will match strings that start with x and end with y, "%b()" will match a string that starts with "(" and ends with ")".
Patterns can be anchored like so:
- starting the pattern with "^" will match a string at the beginning,
- ending the pattern with "$" will match a string at the end,
- not anchoring the pattern will match a string at any position.
These two characters only have a meaning if positioned as stated above. At any other position, these characters have no meaning and represent themselves.
Captures
Patterns can also contain sub-patterns enclosed in "()". Captures are used in functions like string.match and string.gsub to return or substitute a specific match from the pattern. Examples on how to use these can be found below.
Usage
Now I'm going to show you how to actually use all that above stuff. The examples below explain how to use the four functions listed above.
string.find
string.find( string str, string pattern, [number start, [boolean plain]] );
Str is the string to search, pattern is the pattern string to find, start is the start index and plain is a boolean indicating whether to use a pattern search or just plain text search. The function returns the start and end indices( not start index and length ) of the matching substring. If the pattern has captures, they will be returned after the indices. If a match couldn't be found, the function returns nil.
The following code will find the first word in the string.
local str = "1. Don't spam!"; local pattern = "([%a']+)"; -- will match a substring that has one or more letter or apostrophes( ' ) local start, endpos, word = string.find( str, pattern ); print( start, endpos, word );
Output:
4 8 Don't
You probably thing that could be done with string.Explode and a few loops, but look, we did it in three lines.
The following code will check if the string is safe to be used as a file name.
local str = "cry|*to"; local pattern = '[\\/:%*%?"<>|]'; -- a set of all restricted characters local start = string.find( str, pattern ); print( "String is "..( ( start ~= nil ) and "unsafe" or "safe" ) );
Output:
String is unsafe
string.match
string.match( string str, string pattern, [number start] );
Str is the string to search, pattern is the pattern to find and start is the start position. If a there is a match, the function return the captures from the pattern, if there are no captures, it will return the whole match. If a match couldn't be found, the function will return nil.
The following code will parse a simple keyvalue line.
local str = "key= value"; local pattern = "([%w_]+)%s*=%s*([%w_]+)"; -- will match a "variable name, 0 or more spaces, equal, 0 or more spaces, variable name" local key, val = string.match( str, pattern ); print( key, val );
Output:
key value
The following code will check if the string ends with a .lua extension.
local str = "teel.lua"; local pattern = ".+%.lua$"; -- anything until a dot and "lua" at the end of the string local match = string.match( str, pattern ); print( "String ends with "..( ( match ) and ".lua" or "something else" ) );
string.gmatch
string.gmatch( string str, string pattern );
Str is the string to search and pattern is the string to search for. The function returns an iterator function( special functions used by loops ) that goes through every match in the string and returns the pattern's captures, if there are any, or the whole match if there are no captures. The function will not return nil in the case where a match couldn't be found, but an 'empty' iterator function that will not start a loop.
The following code goes through every word in the string.
local str = "This is PATTERNS!"; local pattern = "[%a']+"; for word in string.gmatch( str, pattern ) do print( word ); end
Output:
This
is
PATTERNS
Any pattern that you use in string.match can also be used in gmatch, but, instead of finding only the first match, it will find every match in the string.
The following code uses the keyvalue parsing pattern but can now read a list of keyvalues.
local str = "key = value; key2 = value2"; local pattern = "([%w_]+)%s*=%s*([%w_]+)"; -- same pattern as above local tbl = { }; for key, value in string.gmatch( str, pattern ) do tbl[ key ] = value; end table.foreach( tbl, print );
Output:
key value
key2 value2
The interesting thing is that the string can have any characters as separators between keyvalue pairs.
string.gsub
string.gsub( string str, string pattern, string/table/function repl );
String is the string to search in, pattern is the pattern to search for and repl is the value to replace with. The function returns str where all occurrences of pattern have been replaced with the value given by repl and, as the second argument, the total number of matches.
Repl can be the following things:
- a string - in which case all occurrence of pattern are replaced with this string, the "%n" item is also supported with a special case of "%0" representing the whole match,
- a function - in which case the function gets called with the match/captures as its argument(s) each time a match occurs and the match is replaced with the value the function returns,
- a table - in which case the value indexed with the first capture( or the match if there are no captures ) is returned.
If the function or table returns nil or false, the match gets ignored and nothing gets replaced.
The following code formats a keyvalue pair as an xml node.
local str = "key = value"; local pattern = "([%w_]+)%s*=%s*([%w_]+)"; local replacement = "<%1>%2</%1>"; local output = string.gsub( str, pattern, replacement ); print( output );
Output:
<key>value</key>
The following code creates a function that works like the .net formatting feature.
function string.format2( fmt, ... ) // 'arg' is the ... combined in a table return fmt:gsub( "{(%d+)}", function( i ) return arg[ tostring( i ) + 1 ]; end ); end local str = "This is {0}, oh {1}.."; local repl1 = "PATTERNS"; local repl2 = "YEAH"; local output = string.format2( str, repl1, repl2 ); print( output );
Output:
This is PATTERNS, oh YEAH..
Conclusion
The article is finally over! I hope you learned something new from all of this. Lua's patterns are very powerful when used right. When making an addon that heavily relies on strings, patterns will most likely come in handy. You can find some new examples in either the lua manual or PIL.
Good day!