Skip to content

Commit

Permalink
Replace basic HTML tags with a space when deobfuscating
Browse files Browse the repository at this point in the history
autopull
  • Loading branch information
makyen committed Oct 30, 2024
1 parent 13a9473 commit 0a3c799
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions number_homoglyphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
# which is Copyright (c) 2015 Rob Dawson under an MIT license:
# https://github.com/codebox/homoglyph/blob/master/LICENSE

BASIC_HTML_TAG_REGEX = regex.compile(r'</?[A-Za-z]+\d?>')

# Hex numbers are primarily used below, due to the possibility of the characters becoming corrupted when the file
# is edited in editors which don't fully support Unicode, or even just on different operating systems.
equivalents = {
Expand Down Expand Up @@ -144,6 +146,9 @@


def normalize(text):
if len(text) > 3:
# Replace things that look like basic HTML tags with a space.
text = BASIC_HTML_TAG_REGEX.sub(' ', text)
return text.translate(translate_table)


Expand Down

0 comments on commit 0a3c799

Please sign in to comment.