current position：Home>What about Python memory leak? Pit filling troubleshooting tips
What about Python memory leak? Pit filling troubleshooting tips
2022-02-02 05:47:24 【Charlie is not a dog】
Pit filling summary ：python Memory leak troubleshooting tips
Abstract ： Recently, I encountered a memory leak in my work , The operation and maintenance students urgently call to solve , Therefore, while solving the problem, the system also records the common solutions to the memory leakage problem .
Recently, I encountered a memory leak in my work , The operation and maintenance students urgently call to solve , Therefore, while solving the problem, the system also records the common solutions to the memory leakage problem .
First of all, we have made clear the phenomenon of this problem ：
1. The service is 13 I went online once , And from 23 The start , There is a problem of rising memory , When the alert value is reached, restart the instance , Climbing faster .
2. The services are deployed in A、B 2 On chip , But in addition to model reasoning , Almost all preprocessing 、 Post processing shares a set of code . and B Chip memory leak warning ,A There is no abnormality in the chip .
Train of thought ： Study the dependency differences between old and new source codes and two party libraries
According to the above two conditions , The first thing that comes to mind 13 The problem introduced by the update of No , The update may come from two aspects ：
1. Self developed code
2. Second party dependent code
From the above two perspectives ：
- One side , Use them separately Git Historical information and BeyondCompare The tool compares the source code of the two versions , And focused on reading A、B The two chip codes are processed separately , No abnormality was found .
- On the other hand , adopt pip list The command compares two mirrored packages , Only pytz The version that the time zone tool depends on has changed .
After research and Analysis , It is considered that the memory leak caused by this package is unlikely , So put it down for the time being .
thus , By studying the source code changes of the old and new versions, find out the way to solve the memory leak problem , It seems that I can't go on .
Train of thought two ： Monitor memory changes and differences between old and new versions
at present python Common memory detection tools are pympler、objgraph、tracemalloc etc. .
First , adopt objgraph Tools , For new and old Services TOP50 The types of variables were observed and statistically analyzed
objraph Common commands are as follows ：
\# Number of global types objgraph.show\_most\_common\_types(limit=50) \# Incremental change objgraph.show\_growth(limit=30) Copy code
Here, in order to better observe the change curve , I simply made a package , Make the data output directly to csv File for observation .
stats = objgraph.most\_common\_types(limit=50) stats\_path = "./types\_stats.csv" tmp\_dict = dict(stats) req\_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) tmp\_dict\['req\_time'\] = req\_time df = pd.DataFrame.from\_dict(tmp\_dict, orient='index').T if os.path.exists(stats\_path): df.to\_csv(stats\_path, mode='a', header=True, index=False) else: df.to\_csv(stats\_path, index=False) Copy code
As shown in the figure below , Run on the old and new versions with a batch of pictures 1 Hours , Everything is as stable as an old dog , There is no wave in the quantity of each type .
here , I think I usually use a batch of abnormal format pictures for boundary verification before transfer test or online .
Although these anomalies , The test students must have verified it before going online , But the dead horse was regarded as a live horse, and the doctor took it for a test .
The calm data was broken , As shown in the red box below ：dict、function、method、tuple、traceback The number of important types began to rise .
At this time, the mirrored memory is also increasing and there is no sign of convergence .
thus , Although it is impossible to confirm whether it is an online problem , But at least one bug. And then go back to the log , A strange phenomenon has been found ：
Exceptions caused by special pictures under normal circumstances , The log should output the following information , namely check_image_type Method will only print once in the exception stack .
But the status quo is check_image_type Method repeatedly prints multiple times , And the number of repetitions increases with the number of tests .
Re studied the exception handling code here .
The exception declaration is as follows ：
The throwing exception code is as follows ：
After thinking, I probably figured out the root of the problem ：
Here, each exception instance is equivalent to being defined as a global variable , And when throwing exceptions , It is this global variable that is thrown . When this global variable is pushed into the exception stack, the processing is completed , It won't be recycled .
Therefore, with the increasing number of wrong format picture calls , The information in the exception stack will also increase . And because the exception also contains the requested picture information , Therefore, the memory will be MB Increase in level .
But this part of the code has been online for a long time , If online is really the problem caused here , Why didn't there be any problems before , And why are you A There are no problems on the chip ？
With the above two questions , We did two verifications ：
First , Confirm the previous version and A This problem also occurs on the chip .
secondly , We looked at the online call records , I found a new customer recently , Moreover, a large number of images with similar problems are used to call a certain local point （ Most of the bureau points are B chip ） The phenomenon of service . We found some online examples , The same phenomenon was observed in the log .
thus , The above questions have been basically explained , Fix this bug after , The memory overflow problem no longer occurs .
Be reasonable , When the problem is solved to this point, it seems that the work can be finished . But I asked myself a question , If you didn't print this line of log , Or developers are lazy and don't type out all the exception stacks , How to locate ？
With this question, I continued to study objgraph、pympler Tools .
It has been found that there will be a memory leak in the case of abnormal pictures , So let's focus on what's different at this time ：
By the following order , We can see that every time an exception occurs , What variables have been added to the memory and the increased memory .
1. Use objgraph Tools objgraph.show_growth(limit=20)
2. Use pympler Tools
from pympler import tracker tr = tracker.SummaryTracker() tr.print\_diff() Copy code
Through the following code , You can print out which references these new variables come from , For further analysis .
gth = objgraph.growth(limit=20) for gt in gth: logger.info("growth type:%s, count:%s, growth:%s" % (gt\[0\], gt\[1\], gt\[2\])) if gt\[2\] > 100 or gt\[1\] > 300: continue objgraph.show\_backrefs(objgraph.by\_type(gt\[0\])\[0\], max\_depth=10, too\_many=5, filename="./dots/%s\_backrefs.dot" % gt\[0\]) objgraph.show\_refs(objgraph.by\_type(gt\[0\])\[0\], max\_depth=10, too\_many=5, filename="./dots/%s\_refs.dot" % gt\[0\]) objgraph.show\_chain( objgraph.find\_backref\_chain(objgraph.by\_type(gt\[0\])\[0\], objgraph.is\_proper\_module), filename="./dots/%s\_chain.dot" % gt\[0\] ) Copy code
adopt graphviz Of dot Tools , For the above production graph Convert the format data into the following picture ：
dot -Tpng xxx.dot -o xxx.png Copy code
here , because dict、list、frame、tuple、method There are too many basic types , Observation is difficult , So here's a filter .
New memory ImageReqWrapper The call chain of
New memory traceback The call chain of ：
Although with the prior knowledge , It makes us naturally pay attention to traceback Corresponding to it IMAGE_FORMAT_EXCEPTION abnormal .
But by thinking about why the above variables that should have been recycled after the service call are not recycled , Especially all traceback Variables are being IMAGE_FORMAT_EXCEPTION After the exception is called, it cannot be recycled, etc ; At the same time, do some small experiments , I believe we can locate the root of the problem soon .
thus , We can draw the following conclusions ：
Because the exception thrown cannot be recycled , Cause the corresponding exception stack 、 Variables such as request body cannot be recycled , Since the request body contains picture information, each such request will result in MB Level memory leak .
in addition , During the study, it was also found that python3 It comes with a memory analysis tool tracemalloc, You can observe the relationship between code lines and memory through the following code , Although it may not be accurate , But it can also provide some clues .
import tracemalloc tracemalloc.start(25) snapshot = tracemalloc.take\_snapshot() global snapshot gc.collect() snapshot1 = tracemalloc.take\_snapshot() top\_stats = snapshot1.compare\_to(snapshot, 'lineno') logger.warning("\[ Top 20 differences \]") for stat in top\_stats\[:20\]: if stat.size\_diff < 0: continue logger.warning(stat) snapshot = tracemalloc.take\_snapshot() Copy code
If the article helps you , Let's go with a compliment
author[Charlie is not a dog],Please bring the original link to reprint, thank you.
The sidebar is recommended
- 1313. Unzip the coding list (Java / C / C + + / Python / go / trust)
- Python Office - Python edit word
- Collect it quickly so that you can use the 30 Python tips for taking off
- Strange Python strip
- Python crawler actual combat, pyecharts module, python realizes China Metro data visualization
- DOM breakpoint of Python crawler reverse
- Django admin custom field stores links in the database after uploading files to the cloud
- Who has powder? Just climb who! If he has too much powder, climb him! Python multi-threaded collection of 260000 + fan data
- Python Matplotlib drawing streamline diagram
- The game comprehensively "invades" life: Python releases the "cool run +" plan!
guess what you like
Python crawler notes: use proxy to prevent local IP from being blocked
Python batch PPT to picture, PDF to picture, word to picture script
Advanced face detection: use Dlib, opencv and python to detect face markers
"Python 3 web crawler development practice (Second Edition)" is finally here!!!!
Python and Bloom filters
Python - singleton pattern of software design pattern
Lazy listening network, audio novel category data collection, multi-threaded fast mining cases, 23 of 120 Python crawlers
Troubleshooting ideas and summary of Django connecting redis cluster
Python interface automation test framework (tools) -- interface test tool requests
Implementation of Morse cipher translator using Python program
- [Python] numpy notes
- 24 useful Python tips
- Pandas table beauty skills
- Python tiktok character video, CV2 module, Python implementation
- I used Python to climb my wechat friends. They are like this
- 20000 words take you into the python crawler requests library, the most complete in history!!
- Answer 2: why can you delete the table but not update the data with the same Python code
- [pandas learning notes 02] - advanced usage of data processing
- How to implement association rule algorithm? Python code and powerbi visualization are explained to you in detail (Part 2 - actual combat)
- Python adds list element append() method, extend() method and insert() method [details]
- python wsgi
- Introduction to Python gunicorn
- Python dictionary query key value pair methods and examples
- Opencv Python reads video, processes and saves it frame by frame
- Python learning process and bug
- Imitate the up master and realize a live broadcast room controlled by barrage with Python!
- Essence! Configuration skills of 12 pandas
- [Python automated operation and maintenance road] path inventory
- Daily automatic health punch in (Python + Tencent cloud server)
- [Python] variables, comments, basic data types
- Spring boot calls Python interface
- Using Python to make a key recorder
- Python combat case, pyGame module, python implementation routine confession artifact vs no routine confession artifact
- Python series tutorial 132 -- why use indentation syntax
- 10 minutes to learn how to play excel easily with Python
- Python develops a color dynamic two-dimensional code generator in one hour, and uses the virtual environment to package and release the EXE program.
- Elimination of grammar left recursion in Python
- Python testing - the patches in Python
- Python image processing, CV2 module, OpenCV to achieve target tracking
- How to send alarm notification to nail in Python?
- Introduction to pandas operation
- Mail sending, SMTP and exchange sending in Python 3
- Show your hand and use Python to analyze house prices
- The strongest Python visualization artifact, none of them
- 8 practical Python skills that are easy to use and don't have to suffer a loss for half a year
- Tips: teach you to generate 30 cool dynamic interactive charts with one click of pandas
- I use one line of Python code to dynamically load dependencies
- Blow up this pandas GUI artifact and automatically turn the code!
- Getting started exploring and analyzing data using Python
- Python image processing, CV2 module, OpenCV to achieve template matching