current position：Home>Pit filling summary: Python memory leak troubleshooting tips
Pit filling summary: Python memory leak troubleshooting tips
2022-02-01 07:41:48 【Huawei cloud developer community】
Abstract ： Recently, the service encountered a memory leak , The operation and maintenance students urgently call to solve , Therefore, while solving the problem, the system also records the common solutions to the memory leakage problem .
This article is shared from Huawei cloud community 《python Memory leak troubleshooting tips 》, author ：lutianfei.
Recently, the service encountered a memory leak , The operation and maintenance students urgently call to solve , Therefore, while solving the problem, the system also records the common solutions to the memory leakage problem .
First of all, we have made clear the phenomenon of this problem ：
The service is 13 I went online once , And from 23 The start , There is a problem of rising memory , When the alert value is reached, restart the instance , Climbing faster .
The services are deployed in A、B 2 On chip , But in addition to model reasoning , Almost all preprocessing 、 Post processing shares a set of code . and B Chip memory leak warning ,A There is no abnormality in the chip .
Train of thought ： Study the dependency differences between old and new source codes and two party libraries
According to the above two conditions , The first thing that comes to mind 13 The problem introduced by the update of No , The update may come from two aspects ：
Self developed code
Second party dependent code
From the above two perspectives ：
One side , Use them separately Git Historical information and BeyondCompare The tool compares the source code of the two versions , And focused on reading A、B The two chip codes are processed separately , No abnormality was found .
On the other hand , adopt pip list The command compares two mirrored packages , Only pytz The version that the time zone tool depends on has changed .
After research and Analysis , It is considered that the memory leak caused by this package is unlikely , So put it down for the time being .
thus , By studying the source code changes of the old and new versions, find out the way to solve the memory leak problem , It seems that I can't go on .
Train of thought two ： Monitor memory changes and differences between old and new versions
at present python Common memory detection tools are pympler、objgraph、tracemalloc etc. .
First , adopt objgraph Tools , For new and old Services TOP50 The types of variables were observed and statistically analyzed
objraph Common commands are as follows ：
# Number of global types objgraph.show_most_common_types(limit=50) # Incremental change objgraph.show_growth(limit=30) Copy code
Here, in order to better observe the change curve , I simply made a package , Make the data output directly to csv File for observation .
stats = objgraph.most_common_types(limit=50) stats_path = "./types_stats.csv" tmp_dict = dict(stats) req_time = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) tmp_dict['req_time'] = req_time df = pd.DataFrame.from_dict(tmp_dict, orient='index').T if os.path.exists(stats_path): df.to_csv(stats_path, mode='a', header=True, index=False) else: df.to_csv(stats_path, index=False) Copy code
As shown in the figure below , Run on the old and new versions with a batch of pictures 1 Hours , Everything is as stable as an old dog , There is no wave in the quantity of each type .
here , I think I usually use a batch of abnormal format pictures for boundary verification before transfer test or online .
Although these anomalies , The test students must have verified it before going online , But the dead horse was regarded as a live horse, and the doctor took it for a test .
The calm data was broken , As shown in the red box below ：dict、function、method、tuple、traceback The number of important types began to rise .
At this time, the mirrored memory is also increasing and there is no sign of convergence .
thus , Although it is impossible to confirm whether it is an online problem , But at least one bug. And then go back to the log , A strange phenomenon has been found ： Exceptions caused by special pictures under normal circumstances , The log should output the following information , namely check_image_type Method will only print once in the exception stack .
But the status quo is check_image_type Method repeatedly prints multiple times , And the number of repetitions increases with the number of tests .
Re studied the exception handling code here .
The exception declaration is as follows ：
The throwing exception code is as follows ：
After thinking, I probably figured out the root of the problem ：
Here, each exception instance is equivalent to being defined as a global variable , And when throwing exceptions , It is this global variable that is thrown . When this global variable is pushed into the exception stack, the processing is completed , It won't be recycled .
Therefore, with the increasing number of wrong format picture calls , The information in the exception stack will also increase . And because the exception also contains the requested picture information , Therefore, the memory will be MB Increase in level .
But this part of the code has been online for a long time , If online is really the problem caused here , Why didn't there be any problems before , And why are you A There are no problems on the chip ？ With the above two questions , We did two verifications ：
First , Confirm the previous version and A This problem also occurs on the chip .
secondly , We looked at the online call records , I found a new customer recently , Moreover, a large number of images with similar problems are used to call a certain local point （ Most of the bureau points are B chip ） The phenomenon of service . We found some online examples , The same phenomenon was observed in the log .
thus , The above questions have been basically explained , Fix this bug after , The memory overflow problem no longer occurs .
Be reasonable , When the problem is solved to this point, it seems that the work can be finished . But I asked myself a question , If you didn't print this line of log , Or developers are lazy and don't type out all the exception stacks , How to locate ？
With this question, I continued to study objgraph、pympler Tools .
It has been found that there will be a memory leak in the case of abnormal pictures , So let's focus on what's different at this time ：
By the following order , We can see that every time an exception occurs , What variables have been added to the memory and the increased memory .
- Use objgraph Tools objgraph.show_growth(limit=20)
Use pympler Tools
from pympler import tracker tr = tracker.SummaryTracker() tr.print_diff()
Through the following code , You can print out which references these new variables come from , For further analysis .
gth = objgraph.growth(limit=20) for gt in gth: logger.info("growth type:%s, count:%s, growth:%s" % (gt, gt, gt)) if gt > 100 or gt > 300: continue objgraph.show_backrefs(objgraph.by_type(gt), max_depth=10, too_many=5, filename="./dots/%s_backrefs.dot" % gt) objgraph.show_refs(objgraph.by_type(gt), max_depth=10, too_many=5, filename="./dots/%s_refs.dot" % gt) objgraph.show_chain( objgraph.find_backref_chain(objgraph.by_type(gt), objgraph.is_proper_module), filename="./dots/%s_chain.dot" % gt ) Copy code
adopt graphviz Of dot Tools , For the above production graph Convert the format data into the following picture ：
dot -Tpng xxx.dot -o xxx.png Copy code
here , because dict、list、frame、tuple、method There are too many basic types , Observation is difficult , So here's a filter .
New memory ImageReqWrapper The call chain of
New memory traceback The call chain of ：
Although with the prior knowledge , It makes us naturally pay attention to traceback Corresponding to it IMAGE_FORMAT_EXCEPTION abnormal .
But by thinking about why the above variables that should have been recycled after the service call are not recycled , Especially all traceback Variables are being IMAGE_FORMAT_EXCEPTION After the exception is called, it cannot be recycled, etc ; At the same time, do some small experiments , I believe we can locate the root of the problem soon .
another , About python3 in cache Exception Memory leak caused by , I know there is one that speaks more clearly ：zhuanlan.zhihu.com/p/38600861
thus , We can draw the following conclusions ： Because the exception thrown cannot be recycled , Cause the corresponding exception stack 、 Variables such as request body cannot be recycled , Since the request body contains picture information, each such request will result in MB Level memory leak .
in addition , During the study, it was also found that python3 It comes with a memory analysis tool tracemalloc, You can observe the relationship between code lines and memory through the following code , Although it may not be accurate , But it can also provide some clues .
import tracemalloc tracemalloc.start(25) snapshot = tracemalloc.take_snapshot() global snapshot gc.collect() snapshot1 = tracemalloc.take_snapshot() top_stats = snapshot1.compare_to(snapshot, 'lineno') logger.warning("[ Top 20 differences ]") for stat in top_stats[:20]: if stat.size_diff < 0: continue logger.warning(stat) snapshot = tracemalloc.take_snapshot() Copy code
author[Huawei cloud developer community],Please bring the original link to reprint, thank you.
The sidebar is recommended
- Python * * packaging and unpacking details
- Python realizes weather query function
- Python from 0 to 1 (day 12) - Python data application 2 (STR function)
- Python from 0 to 1 (day 13) - Python data application 3
- Numpy common operations of Python data analysis series Chapter 8
- How to implement mockserver [Python version]
- Van * Python! Write an article and publish the script on multiple platforms
- Python data analysis - file reading
- Python data De duplication and missing value processing
- Python office automation - play with browser
guess what you like
Python series tutorial 127 -- Reference vs copy
Control flow in Python: break and continue
Teach you how to extract tables in PDF with Python
leetcode 889. Construct Binary Tree from Preorder and Postorder Traversal（python）
leetcode 1338. Reduce Array Size to The Half（python）
Object oriented and exception handling in Python
How to configure load balancing for Django service
How to embed Python in go
Python Matplotlib drawing graphics
Python object-oriented programming 05: concluding summary of classes and objects
- Python from 0 to 1 (day 14) - Python conditional judgment 1
- Several very interesting modules in Python
- How IOS developers learn Python Programming 15 - object oriented programming 1
- Daily python, Chapter 20, exception handling
- Understand the basis of Python collaboration in a few minutes
- [centos7] how to install and use Python under Linux
- leetcode 1130. Minimum Cost Tree From Leaf Values（python）
- leetcode 1433. Check If a String Can Break Another String（python）
- Python Matplotlib drawing 3D graphics
- Talk about deep and shallow copying in Python
- Python crawler series - network requests
- Python thread 01 understanding thread
- Analysis of earthquake distribution in the past 10 years with Python~
- You need to master these before learning Python crawlers
- After the old friend (R & D post) was laid off, I wanted to join the snack bar. I collected some data in Python. It's more or less a intention
- Python uses redis
- Python crawler - ETF fund acquisition
- Detailed tutorial on Python operation Tencent object storage (COS)
- [Python] comparison of list, tuple, array and bidirectional queue methods
- Go Python 3 usage and pit Prevention Guide
- Python logging log error and exception exception callback method
- Learn Python quickly and take a shortcut~
- Python from 0 to 1 (day 15) - Python conditional judgment 2
- Python crawler actual combat, requests module, python to capture headlines and take beautiful pictures
- The whole activity collected 8 proxy IP sites to pave the way for the python proxy pool, and the 15th of 120 crawlers
- Why can't list be used as dictionary key value in Python
- Python from 0 to 1 (day 16) - Python conditional judgment 3
- What is the python programming language?
- Python crawler reverse webpack, a real estate management platform login password parameter encryption logic
- Python crawler reverse, a college entrance examination volunteer filling platform encrypts the parameter signsafe and decrypts the returned results
- Python simulated Login, selenium module, python identification graphic verification code to realize automatic login
- Python -- datetime (timedelta class)
- Python's five strange skills will bring you a sense of enrichment in mastering efficient programming skills
- [Python] comparison of dictionary dict, defaultdict and orderdict
- Test driven development using Django
- Face recognition practice: face recognition using Python opencv and deep learning
- leetcode 1610. Maximum Number of Visible Points（python）
- Python thread 03 thread synchronization
- Introduction and internal principles of Python's widely used concurrent processing Library Futures
- Python - progress bar artifact tqdm usage