我想屏蔽平面日志文件中存在的所有类型的敏感数据(用户名、密码、API 密钥、数据库连接字符串、端点、秘密,甚至任何包含秘密的自定义变量)。
以下是我当前正在使用的脚本:
import re
def mask_secrets(log_file):
# Read the log file
with open(log_file, 'r') as file:
log_data = file.read()
# Define the pattern to search for sensitive data
pattern = r'\b(\w+)\b:\s*(\w+)'
# Mask the sensitive data in the log data
log_data = re.sub(pattern, r'\1: ********', log_data)
# Write the masked log data back to the file
with open(log_file, 'w') as file:
file.write(log_data)
# Usage example
log_file = 'path/to/your/log/file.txt'
mask_secrets(log_file)
但它掩盖了时间戳的时间字段,并且没有掩盖一些秘密,例如数据库连接字符串和数据库密码以及包含秘密的自定义变量:
2022-01-01 12: ********:00 - User login successful - username: ********.doe, password: ********
2022-01-02 09: ********:15 - API request made - endpoint: /api/data, api_key: ********
2022-01-03 14: ********:22 - User login failed - username: ********.smith, password: ********
2022-01-04 18: ********:10 - API request made - endpoint: /api/data, api_key: ********
2022-01-06 17: ********:22 - DB Connection failed - DB String=guad8b237d7$vu87s, DB password=isbdihkaw978vw8a783wgfb
2022-01-07 19: ********:10 - API request made - endpoint: /api/data, api_key= xyz789s87dv7ghs
2022-01-07 19: ********:10 - User login failed - foo=uyai6d3ibdqi%*^^@%, bar=862479dhb7656%^&^%%^))_=
该脚本中使用的正则表达式需要进行相应修改。理想情况下,我想掩盖 右侧存在的任何值=
。可以这样做吗?