fix(ip.py): ignore non-utf format encoding #288

chimaoshu · 2021-10-08T13:51:58Z

Some websites that display IP use non-utf encoding like gbk encoding, which leads to errors when the script is decoding content from these websites. Those non-utf encoding characters like Chinese characters do not affect our job because we just want to match the IP address in the decoded text, so we can ignore those non-utf encoding characters.

有些显IP网站包含非utf编码字符，会导致脚本在decode的时候因为无法以utf解码而出错。而那些导致出错的中文字符不会影响IP的匹配，所以可以在解码时忽略。

chimaoshu · 2021-10-08T15:24:24Z

提供一个例子：
https://raw.githubusercontent.com/chimaoshu/chimaoshu.github.io/master/iptest.txt
这是一个包含gbk编码字符与IP的txt，配置文件中index4项为：
"index4": "url:https://raw.githubusercontent.com/chimaoshu/chimaoshu.github.io/master/iptest.txt"

如果直接decode('utf8')，会导致
ERROR:root:'utf-8' codec can't decode byte 0xb8 in position 3: invalid start byte
ERROR:root:Fail to get ipv4 address!

decode('utf8', 'ignore')则会忽略gbk字符，并匹配到IP

NewFuture

Thanks

fix(ip.py): ignore non-utf format encoding

d6dfb59

NewFuture approved these changes Oct 12, 2021

View reviewed changes

NewFuture merged commit 914f3c0 into NewFuture:master Oct 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ip.py): ignore non-utf format encoding #288

fix(ip.py): ignore non-utf format encoding #288

chimaoshu commented Oct 8, 2021

chimaoshu commented Oct 8, 2021

NewFuture left a comment

fix(ip.py): ignore non-utf format encoding #288

fix(ip.py): ignore non-utf format encoding #288

Conversation

chimaoshu commented Oct 8, 2021

chimaoshu commented Oct 8, 2021

NewFuture left a comment

Choose a reason for hiding this comment