Updated April 6, 2023
Introduction to Regular Expression in C
A regular expression is an expression that has a sequence of characters that are used searching a particular pattern or pattern matching with strings such as finding the subtext in the given text. In C programming language there is a library known as POSIX that is used for a regular expression, whereas there are different regular expression libraries in different programming languages. C does not include regular expression but you can use them by using the library. There are some wild card operators such as “*” and “?” are the most used metacharacters that are supported by regular expressions. A regular expression is mainly used for text manipulation tasks.
Working of Regular Expressions in C with Examples
In C, it does not support regular expressions and there is a POSIX library. Few of the POSIX expressions that are used in C programs are [] this expression is used to find the characters or numbers that are written within these brackets, [: number:] this expression is used to find any digit in numbers, [: lower:] this is used to find lowercase alphabets, [: word:] this can be used find a particular word which can be a combination of letters or numbers or underscores in the given text.
Some functions are used to compile regular expressions in C and they are as follows:
1. regcomp()
This function is used for compiling a regular expression and it takes 3 parameters that are the first parameter has a pointer to the memory location where the pattern to be matched is stored, the second parameter has a string type pointer to a pattern, the third parameter contains the flag which gives us the explanation of the type of compilation and it returns 0 if the compilation is successful and it throws an error if it is not successfully compiled.
Example
#include <stdio.h>
#include <regex.h>
int match(const char *string, const char *pattern)
{
regex_t re;
if (regcomp(&re, pattern, REG_EXTENDED|REG_NOSUB) != 0) return 0;
int status = regexec(&re, string, 0, NULL, 0);
regfree(&re);
if (status != 0) return 0;
return 1;
}
int main(void)
{
const char* s1 = "abc";
const char* s2 = "123";
const char* re = "[1-9]+";
printf("%s Given string matches %s? %s\n", s1, re, match(s1, re) ? "true" : "false");
printf("%s Given string matches %s? %s\n", s2, re, match(s2, re) ? "true" : "false");
}
Output:
In the above program, we can see we are using a library file regex.h header file in C programming language. This header file is mainly used when we are using regular expressions which defines the structures and constants used by the function provided in this header file such as regcomp(), regexec(), regerror() and regfree() and it has structure type called regex_t which contains size_t, re_nsub, etc. In the above program, we have used regcomp() function where we use it for compiling the regular expression and it has flagged as one of the parameters such as REG_EXTENDED used for extended regular expressions, REG_NOSUB which is used for reporting success or failure when regexec() is executed, and many other flags like REG_ICASE, REG_NEWLINE, etc.
2. regexec()
This is function is used for matching the string with the given pattern of the string. This also has 5 arguments like precompiled pattern, the second parameter which will take string which needs to be searched for, the third parameter contains the details of the location of matches, the fourth parameter contains details of searches, and the fifth parameter contains the flag which gives the indication the change in the matching behavior. This regexec() function returns 0 if there is successful matching done and REG_NOMATCH if the string does not match.
Example
From the above example,
int status = regexec(&re, string, 0, NULL, 0);
We can see that this regexec() function is used for executing the regular expression, wherein the above statement we can see that it will compare the string which has null character and this string is specified by the string with a compiled regular expression and this will be initialized by the previous function call such as regcomp() and it finds the match between the null-terminated string and the string with compiled regular expression. There are flags that this function uses such as REG_NOTBOL this is used when we use some special characters so that it will not match the beginning of the string, REG_NOTEOL this is also used when we use special characters so that it will not match the end of the string. Some of the error return values of this function are REG_NOmATCH which specifies failure if there is no match, REG_BADPAT invalid regular expression, REG_ERANGE which gives invalid endpoint in range expression.
3. regfree()
This function is used when we want to free any memory location that is allocated by regcomp() function which is associated with preg and this preg is no longer a compiled regular expression if it is given to regfree().
Example
From the above example,
regfree(&re);
In the above statement, we can see that we are freeing the memory allocated to “re” preg which was allocated during the regcomp() function.
4. regerror()
This function is used when regcomp() or regexec() function throws an error then this function will return some error message and the string that this function store is always terminated with a null character.
Example
Let us see another example of which implements the all above functions:
#include <regex.h>
#include <stdio.h>
#define MAX_MATCHES 1
void match(regex_t *pexp, char *sz) {
regmatch_t matches[MAX_MATCHES];
if (regexec(pexp, sz, MAX_MATCHES, matches, 0) == 0) {
printf("\"%s\" matches characters %d - %d\n", sz, matches[0].rm_so, matches[0].rm_eo);
} else {
printf("\"%s\" does not match\n", sz);
}
}
int main() {
int rv;
regex_t exp;
rv = regcomp(&exp, "-?[0-9]+(\\.[0-9]+)?", REG_EXTENDED);
if (rv != 0) {
printf("regcomp failed with %d\n", rv);
}
//2. Now run some tests on it
match(&exp, "0");
match(&exp, "0.0");
match(&exp, "-10.1");
match(&exp, "a");
match(&exp, "a.1");
match(&exp, "hello");
regfree(&exp);
return 0;
}
Output:
Conclusion – Regular Expression in C
In this article, we conclude that regular expressions are used in all programming languages to find the text pattern from the given huge amount of text. In C programming language it does not support regular expressions directly but it provides the library known as regex.h header file for supporting these compiled regular expressions. In C, it supports POSIX expressions and hence it provides the library so that it can also support regular expressions like other programming languages. This header file provides few functions like regcomp(), regexec(), regfree(), regerror(), etc.
Recommended Articles
This is a guide to Regular Expression in C. Here we also discuss the introduction and working of regular expressions in c along with different examples and its code implementation. You may also have a look at the following articles to learn more –