Regex Guide 1 by anonymous

public
1 month ago

Regular Expressions

1. What is a Regular Expression?

A regular expression is a pattern that specifies a set of strings. It's like a wild card search, but more powerful. At its core, it's a way to search, match, and manipulate text.

2. Basic Building Blocks:

Literals: These are the most basic elements. If you search for the regex apple, it will match the string "apple".
Dot (.): Matches any single character, except for a new line.
Example: h.t will match "hat", "hit", "hot", etc.
Character Sets ([]): Matches any one of the characters inside the square brackets.
Example: h[aei]t will match "hat", "hit", but not "hot".
Negated Character Sets ([^]): Matches any character not inside the square brackets.
Example: h[^aei]t will match "hot", but not "hat" or "hit".
Quantifiers:
*: Matches 0 or more of the preceding token.
+: Matches 1 or more of the preceding token.
?: Matches 0 or 1 of the preceding token.
{n}: Matches exactly n of the preceding token.
{n,}: Matches n or more of the preceding token.
{n,m}: Matches between n and m of the preceding token.

3. Some Special Characters:

Anchors:
^: Start of a string. (e.g., ^apple matches any string that starts with "apple")
$: End of a string. (e.g., apple$ matches any string that ends with "apple")
Escape Sequences:
\d: Matches any digit (equivalent to [0-9]).
\D: Matches any non-digit.
\w: Matches any word character (alphanumeric or underscore).
\W: Matches any non-word character.
\s: Matches any whitespace (spaces, tabs, etc.).
\S: Matches any non-whitespace.
Grouping and Capturing:
(): Groups several tokens together. You can also use this to capture specific parts of a matched string for future reference.
Alternation (|): It acts like a logical OR. Matches either the expression before or the expression after it.
Example: apple|banana will match either "apple" or "banana".

4. Tips:

Start Small: Begin with small patterns and test them. Gradually build up your regex pattern.
Use Tools: There are numerous online tools like regex101 which can help you test and debug your regular expressions. These tools often provide real-time feedback, which is invaluable.
Be Specific: The more specific your pattern, the less likely you are to get unwanted matches.
Practice: Like any other skill, the more you use and practice regex, the more proficient you'll become.

5. Practice:

Now, let's have some simple exercises for you to try:

  1. Write a regex that matches email addresses.
  2. Write a regex that matches URLs.
  3. Write a regex that matches phone numbers in the format (123) 456-7890.

1. Matching Email Addresses:

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$

^[a-zA-Z0-9._%+-]+: Matches the username part before the "@" symbol. Allows alphanumeric characters as well as some special characters like ., _, %, +, and -.
@[a-zA-Z0-9.-]+: Matches the domain name after the "@" symbol.
\.[a-zA-Z]{2,}$: Matches the top-level domain, like .com, .net, etc.

2. Matching URLs:

A very basic example that matches http and https URLs might look like this:

^(https?://)?(www\.)?[^ ]+\.[a-zA-Z]{2,}(/[^ ]*)?$

^(https?://)?: Matches the start of the URL which might be "http://" or "https://".
(www\.)?: Matches the optional "www." part.
[^ ]+\.[a-zA-Z]{2,}: Matches the domain and top-level domain, ensuring no spaces are present in the URL.
/[^ ]*: This will match a forward slash followed by zero or more characters that aren't spaces. The * quantifier means it can match just the slash, or the slash plus a path.
?: The following question mark makes the entire previous group optional. So, the regex can match URLs with or without the path part.
Please note that URLs can have various formats and can contain parameters, paths, and anchors. The above regex is quite basic and may not catch all possible URLs.

3. Matching Phone Numbers in the Format (123) 456-7890:

^\(\d{3}\) \d{3}-\d{4}$

^\(: Matches the opening parenthesis.
\d{3}: Matches three digits.
\) : Matches the closing parenthesis.
\d{3}: Matches three digits after a space.
-\d{4}$: Matches the last four digits after a dash.

Commonly used regex searched:

Date (YYYY-MM-DD)
This regex is correct for the given format, but it will also match invalid dates like 2023-19-39. For complete validation, a more complex regex or another form of date validation would be necessary.
^\d{4}-\d{2}-\d{2}$

Time (HH:MM with 24-hour clock)
This is correct. It matches times from 00:00 to 23:59.
^([01]\d|2[0-3]):[0-5]\d$

IP Address (IPv4)
This regex accurately matches IPv4 addresses.
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$

MAC Address
Accurate for common MAC address formats with : or - separators.
^([0-9A-Fa-f]{2}[:-]){5}([0-9A-Fa-f]{2})$

Hexadecimal Color Code
This matches 3 or 6 character hex color codes with or without the leading #.
^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$

Username (8-20 alphanumeric characters)
Correct for the specified criteria.
^[a-zA-Z0-9]{8,20}$

Password
Matches passwords of 8-20 characters that contain at least one digit, lowercase letter, uppercase letter, and special character.
^(?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[!@#$%^&*()_+{}:"<>?|\[\]\/\\-]).{8,20}$

Postal/ZIP Code (for U.S.)
Matches both 5-digit ZIP codes and ZIP+4 formats for the U.S.
^\d{5}(-\d{4})?$

Credit Card Number
Matches 16-digit credit card numbers with optional - separators.
^\d{4}-?\d{4}-?\d{4}-?\d{4}$

Social Security Number (U.S. format)
Accurate for the U.S. SSN format.
^\d{3}-\d{2}-\d{4}$

UUID
Matches UUIDs in the canonical format.
^[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[1-5][0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}$

File Path (Windows format)
Matches basic Windows file paths, but it's a simplification and may not capture all valid paths.
^([a-zA-Z]:\\)?(?:[a-zA-Z0-9]+\\?)*$

File Path (Mac/Unix format)
Matched basic Mac and Unix-based file paths
^(/[^/ ]+)+/?$

HTML Tags
This matches simple opening and closing HTML tags, but won't handle all edge cases, especially for tags with attributes.
<(/?[^>]+)>