Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ip.py): ignore non-utf format encoding #288

Merged
merged 1 commit into from
Oct 12, 2021

Conversation

chimaoshu
Copy link
Contributor

Some websites that display IP use non-utf encoding like gbk encoding, which leads to errors when the script is decoding content from these websites. Those non-utf encoding characters like Chinese characters do not affect our job because we just want to match the IP address in the decoded text, so we can ignore those non-utf encoding characters.

有些显IP网站包含非utf编码字符,会导致脚本在decode的时候因为无法以utf解码而出错。而那些导致出错的中文字符不会影响IP的匹配,所以可以在解码时忽略。

@chimaoshu
Copy link
Contributor Author

提供一个例子:
https://raw.githubusercontent.com/chimaoshu/chimaoshu.github.io/master/iptest.txt
这是一个包含gbk编码字符与IP的txt,配置文件中index4项为:
"index4": "url:https://raw.githubusercontent.com/chimaoshu/chimaoshu.github.io/master/iptest.txt"

如果直接decode('utf8'),会导致
ERROR:root:'utf-8' codec can't decode byte 0xb8 in position 3: invalid start byte
ERROR:root:Fail to get ipv4 address!

decode('utf8', 'ignore')则会忽略gbk字符,并匹配到IP

Copy link
Owner

@NewFuture NewFuture left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@NewFuture NewFuture merged commit 914f3c0 into NewFuture:master Oct 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants