开发手册 欢迎您!
软件开发者资料库

Python Django Middleware中间件限制IP访问频率及判断搜索引擎爬虫

本文主要介绍Python Django外网网站,如有用户恶意请求网站情况,怎样通过中间件来限制一定时间访问网站的频率,以及怎样判断IP是否为搜索引擎蜘蛛爬虫。

1、中间件代码

import timefrom django.utils.deprecation import MiddlewareMixinMAX_REQUEST_PER_SECOND=2 #每秒访问次数class RequestBlockingMiddleware(MiddlewareMixin):    def process_request(self,request):        now=time.time()        request_queue = request.session.get('request_queue',[])        if len(request_queue) < MAX_REQUEST_PER_SECOND:            request_queue.append(now)            request.session['request_queue']=request_queue        else:            time0=request_queue[0]            if (now-time0)<1:                time.sleep(5)            request_queue.append(time.time())            request.session['request_queue']=request_queue[1:]

2、settings.py配置app.middleware.RequestBlockingMiddleware中间件

#启用RequestBlocking中间件IDDLEWARE = [    'django.middleware.security.SecurityMiddleware',    'django.contrib.sessions.middleware.SessionMiddleware',    'django.middleware.common.CommonMiddleware',    'django.middleware.csrf.CsrfViewMiddleware',    'app.middleware.RequestBlockingMiddleware', #在sessions之后,auth之前    'django.contrib.auth.middleware.AuthenticationMiddleware',    'django.contrib.messages.middleware.MessageMiddleware',    'django.middleware.clickjacking.XFrameOptionsMiddleware',     ]

3、判断IP是否是搜索引擎

import socket
def getHost(ip):
try:
result=socket.gethostbyaddr(ip)
if result:
return result[0]
return None
except socket.herror as e:
pass
return None

>>>getHost("203.208.60.11")
'crawl-203-208-60-11.googlebot.com'

#根据返回的结果就可以判断是否为搜索引擎

注意:python2 和 python3 处理 except 子句的语法有点不同,需要注意;

1)Python2   

try:    print (1/0)except ZeroDivisionError, err:      # , 加原因参数名称     print ('Exception: ', err)

2)Python3   

try:    print (1/0)except ZeroDivisionError as err:        # as 加原因参数名称    print ('Exception: ', err)