Regex breakdown by anonymous

1 month ago
  • Regular Expressions #regex #[[regular expression]] #Coding #programming

    • 1. What is a Regular Expression?

      A regular expression is a pattern that specifies a set of strings. It's like a wild card search, but more powerful. At its core, it's a way to search, match, and manipulate text.

    • 2. Basic Building Blocks:

    • Literals: These are the most basic elements. If you search for the regex apple, it will match the string "apple".

    • Dot (.): Matches any single character, except for a newline. Example: h.t will match "hat", "hit", "hot", etc.

    • Character Sets ([]): Matches any one of the characters inside the square brackets. Example: h[aei]t will match "hat", "hit", but not "hot".

    • Negated Character Sets ([^]): Matches any character not inside the square brackets. Example: h[^aei]t will match "hot", but not "hat" or "hit".

    • Quantifiers:

      • *: Matches 0 or more of the preceding token.
      • +: Matches 1 or more of the preceding token.
      • ?: Matches 0 or 1 of the preceding token.
      • {n}: Matches exactly n of the preceding token.
      • {n,}: Matches n or more of the preceding token.
      • {n,m}: Matches between n and m of the preceding token.
    • 3. Some Special Characters:

    • Anchors:

      • ^: Start of a string. (e.g., ^apple matches any string that starts with "apple")
      • $: End of a string. (e.g., apple$ matches any string that ends with "apple")
    • Escape Sequences:

      • \d: Matches any digit (equivalent to [0-9]).
      • \D: Matches any non-digit.
      • \w: Matches any word character (alphanumeric or underscore).
      • \W: Matches any non-word character.
      • \s: Matches any whitespace (spaces, tabs, etc.).
      • \S: Matches any non-whitespace.
    • Grouping and Capturing:

      • (): Groups several tokens together. You can also use this to capture specific parts of a matched string for future reference.
    • Alternation (|): It acts like a logical OR. Matches either the expression before or the expression after it. Example: apple|banana will match either "apple" or "banana".

    • 4. Tips:

    • Start Small: Begin with small patterns and test them. Gradually build up your regex pattern.

    • Use Tools: There are numerous online tools like regex101 which can help you test and debug your regular expressions. These tools often provide real-time feedback, which is invaluable.

    • Be Specific: The more specific your pattern, the less likely you are to get unwanted matches.

    • Practice: Like any other skill, the more you use and practice regex, the more proficient you'll become.

    • 5. Practice:

      Now, let's have some simple exercises for you to try:

      1. Write a regex that matches email addresses.
      2. Write a regex that matches URLs.
      3. Write a regex that matches phone numbers in the format (123) 456-7890.

Remember, regex patterns can vary based on specific needs. There might be multiple correct solutions.

	- 1. **Matching Email Addresses**:
	      This is a basic regex to match most common email formats. Remember that truly validating an email address format comprehensively with regex can be more complex.  
		- `^[a-zA-Z0-9._%+-]+`: Matches the username part before the "@" symbol. Allows alphanumeric characters as well as some special characters like `.`, `_`, `%`, `+`, and `-`.
		- `@[a-zA-Z0-9.-]+`: Matches the domain name after the "@" symbol.
		- `\.[a-zA-Z]{2,}$`: Matches the top-level domain, like `.com`, `.net`, etc.
	- 2. **Matching URLs**:
	  A very basic example that matches http and https URLs might look like this:  
	  ^(https?://)?(www\.)?[^ ]+\.[a-zA-Z]{2,}(/[^ ]*)?$
		- `^(https?://)?`: Matches the start of the URL which might be "http://" or "https://".
		- `(www\.)?`: Matches the optional "www." part.
		- `[^ ]+\.[a-zA-Z]{2,}`: Matches the domain and top-level domain, ensuring no spaces are present in the URL.
		- `/[^ ]*`: This will match a forward slash followed by zero or more characters that aren't spaces. The `*` quantifier means it can match just the slash, or the slash plus a path.
		- `?`: The following question mark makes the entire previous group optional. So, the regex can match URLs with or without the path part.
		- Please note that URLs can have various formats and can contain parameters, paths, and anchors. The above regex is quite basic and may not catch all possible URLs.
	- 3. **Matching Phone Numbers in the Format `(123) 456-7890`**:
	  ^\(\d{3}\) \d{3}-\d{4}$
		- `^\(`: Matches the opening parenthesis.
		- `\d{3}`: Matches three digits.
		- `\)` : Matches the closing parenthesis.
		- ` \d{3}`: Matches three digits after a space.
		- `-\d{4}$`: Matches the last four digits after a dash.
- ### Commonly used regex searched:
	- 1. **Date (YYYY-MM-DD)**
	   This regex is correct for the given format, but it will also match invalid dates like `2023-19-39`. For complete validation, a more complex regex or another form of date validation would be necessary.  
	- 2. **Time (HH:MM with 24-hour clock)**
	   This is correct. It matches times from `00:00` to `23:59`.  
	- 3. **IP Address (IPv4)**
	   This regex accurately matches IPv4 addresses.  
	- 4. **MAC Address**
	   Accurate for common MAC address formats with `:` or `-` separators.  
	- 5. **Hexadecimal Color Code**
	   This matches 3 or 6 character hex color codes with or without the leading `#`.  
	- 6. **Username (8-20 alphanumeric characters)**
	   Correct for the specified criteria.  
	- 7. **Password**
	   Matches passwords of 8-20 characters that contain at least one digit, lowercase letter, uppercase letter, and special character.  
	- 8. **Postal/ZIP Code (for U.S.)**
	   Matches both 5-digit ZIP codes and ZIP+4 formats for the U.S.  
	- 9. **Credit Card Number**
	   Matches 16-digit credit card numbers with optional `-` separators.  
	- 10. **Social Security Number (U.S. format)**
	   Accurate for the U.S. SSN format.  
	- 11. **UUID**
	   Matches UUIDs in the canonical format.  
	- 12. **File Path (Windows format)**
	   Matches basic Windows file paths, but it's a simplification and may not capture all valid paths.  
	- 13. **File Path (Mac/Unix format)**
		  ^(/[^/ ]+)+/?$
		  Matched basic Mac and Unix-based file paths  
	- 14. **HTML Tags**
	   This matches simple opening and closing HTML tags, but won't handle all edge cases, especially for tags with attributes.