On Debugging of Multi-threaded Embedded Software and Call Stack Unwinding Methods

来源 :Computer Technology and Application | 被引量 : 0次 | 上传用户:cherry_20050901
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Abstract: Nowadays multi-core processor platforms are widely used even in embedded devices. Providing debugging of multi-threaded embedded software is a more complicated problem in comparison with usual desktop platforms due to embedded platforms limitations. Embedded resources are enough to perform only pre-defined set of applications, but not for debugging. Most of all known debugging solutions for parallel applications are intended for desktops or high-performance computers, but not for embedded systems. Another problem is that most of debugging solutions don’t give any information on a system-wide application behavior. To solve these problems and help developers to debug their multi-threaded embedded applications is a subject of Thread Visualizer. This tool was developed in Samsung Research Center in Moscow and Samsung Advanced Institute of Technology. Thread Visualizer supports based on ARM architecture platforms and Linux OS.
  Key words: Debug, multi-thread, embedded, stack, unwinding.
  1. Introduction
  Nowadays multi-core processor platforms are widely used even in embedded devices. Software complexity for these platforms is rising dramatically. The software becomes more complicated and multi-threaded. In some modern embedded applications, hundreds of threads are created and run simultaneously. The complexity of debugging of such applications rise, because different threads run on different cores, share resources, face with synchronization problems, race conditions, etc. Another problem is embedded platforms’ limitations. Embedded resources are enough to perform only pre-defined set of applications, but not for debugging. Available CPU resources are about 1-5%, available RAM is about several megabytes.
  Most of all known debugging solutions for parallel applications are intended for desktops or high-performance computers, but not for embedded systems. Also, most of them don’t give any information on the system-wide application behavior. To solve these problems and help developers to debug their multi-threaded embedded applications is a subject of Thread Visualizer tool. It was developed in Samsung Research Center in Moscow and Samsung Advanced Institute of Technology. Thread Visualizer supports based on ARM platforms and Linux OS.
  Thread Visualizer provides visualizing of hierarchy between main process and threads; synchronization dependencies; unique thread identifying including full backtrace from thread creation call, and other useful features. This essentially simplifies debugging of complex multi-threaded applications on embedded systems.
  2. Thread Visualizer
  Thread Visualizer is a tool for debugging of multi-threaded embedded applications. It supports ARM-based platforms and Linux OS. Its architecture has a target-host type that allows overcoming the embedded resources limitations. Lightweight target part collects data that describes application behavior and sends it to host through the network connection. All heavy-weight operations like data storage, analysis and visualization operate on host. For collecting data Thread Visualizer uses System-Wide Analyzer of Performance (SWAP) engine [1].
  SWAP is a profiler and performance analyzer for embedded applications also developed in Samsung Research Center in Moscow and Samsung Advanced Institute of Technology. It is based on kprobes technique [2] and provides dynamic instrumentation of kernel and user-space functions. SWAP doesn’t require application’s source code modification or re-compilation.
  Using SWAP engine, Thread Visualizer instruments necessary functions and collects the data on instrumented functions, such as function name, Process IDentifier (PID)/Thread IDentifier (TID) of a process, CPU number, on which function was executed, time stamp of function call, function arguments, etc. Then, processing of collected data and executed binary files and final visualization are performed.
  Thread Visualizer provides visualizing of hierarchy between main process and threads; synchronization dependencies; unique thread identifying; source code mapping, timing view, statistics and other features.
  Using Thread Visualizer developer can consider system-wide behavior of application, not only perform the number of specific operations on parallel threads, like conventional debuggers provide. Developer can see the main process, threads and synchronization objects of application and relations between them, such as hierarchy parent-child relations between processes and threads, synchronization dependencies. Via unique thread identifying, together with generally used in Linux number identifier, also including thread function name and full backtrace from thread creation point, Thread Visualizer provides full information, where and how every thread was created, including source code mapping.
  Additionally, timing view feature provides visualization of the time line with segments of execution of instrumented functions for every thread. Statistics on calls of instrumented functions is provided for every thread.
  Some modern embedded applications create hundreds of threads and synchronization objects. Thread Visualizer is extremely useful for analysis of such applications.
  Thread Visualizer’s thread hierarchy, synchronization dependencies, thread identifying and source code mapping visualization are shown in Fig. 1.
  Timing view visualization is shown in Fig. 2.
  Detailed description of thread identifying feature of Thread Visualizer, development barriers, related to stack unwinding limitations and proposed solution, are given below.
  3. Thread Identifying
  3.1 Standard Linux Thread Identifying
  In Linux, every thread has unique numerical identifier: Process IDentifier (PID)/Thread IDentifier (TID). But such identification is not informative because it gives no clue on source code of particular threads.
  3.2 Thread Identifying in Intel Thread Profiler
  To make thread identifying more transparent, Intel Thread Profiler [3] includes into the thread identifier the name of function which is started on thread creation(thread function name) together with PID/TID. For example, pthread_create POSIX API [4] accepts the address of thread function as the third argument. See an example of thread creation source code in Fig. 3.
  But applications can create a lot of threads with the same thread function. In this case, the thread identifier doesn’t contain enough information for the thread identification. In the next section, let’s consider the solution of this problem.
  3.3 Unique Thread Identifying in Thread Visualizer
  To provide unique identifying of threads, let’s include into the thread identifier together with PID/TID and thread function name full call backtrace from thread creation point. To make it clear that what call backtrace is, look at C code example, shown in Fig. 4. In this example, full call backtrace from thread creation point is a chain of function symbolic names: func2, func1 and main. Including full call backtrace from thread creation point into the thread identifier gives full and unique information about every created thread.
  Thread Visualizer uses SWAP engine to collect stack snapshots and registers values. Then it unwinds stack snapshots to restore full call backtraces from thread creation points.
  Known stack unwinding methods, their limitations and Thread Visualizer’s method are described below.
  4. Stack Unwinding Methods
  Well-known stack unwinding methods are based on using of a frame pointer register [5] and binary file debug information on stack frame layout. But such methods have some limitations which are described below. To overcome them, a new method described below, is proposed in Thread Visualizer.
  4.1 Method Based on Frame Pointer Register Using
  Let’s consider the method based on frame pointer register using. It is the easiest and well-known method of call stack unwinding. To use this method application should be build by gcc/g++ compiler with-fno-omit-frame-pointer option. Any level of code optimization (options -O ... -O3) turns off that option and omits using of frame pointer.
  The values of frame pointer register are stored in the stack frames at application’s execution. See the example of ARM assembly instructions of storing in stack and restoring of frame pointer and return address values:
  Here, fp is a frame pointer register, lr is a link register (it stores return address of called function), and pc is a program counter register; the value of return address of called function is restored to it. List of consecutive values of return addresses is a backtrace.
  An example of stack of executing process for code shown in Fig. 4 at the thread creation point with stored frame pointer values is shown in Fig. 5.
  The stack unwinding code example for stack with stored frame pointer values is shown in Fig. 6.
  Limitations of above mentioned method consist in storing of excess register values (frame pointer) that decrease application performance. This makes impossible using of such a method in embedded platforms. Other drawbacks of method are: in some
  cases the application can’t be re-compiled; some components of application (e.g., libraries) omit the frame pointer using.
  4.2 Method Based on Binary File Debug Information on Stack Frame Layout
  Now let’s consider the method based on binary file debug information on stack frame layout. To use this approach in case of Linux OS and gcc/g++ compilers application should be built with –g option. Thus, the binary file will have .debug_frame DWARF debug section [6] which contains information on stack frame layout.
  In this case, storing of frame pointer values is not necessary. Every stack frame is described by Canonical Frame Address (CFA)–the start address of a stack frame; base register–a register from which CFA offset is calculated (stack pointer in most cases, frame pointer, instruction pointer or others in rare cases) and offsets of the stored register values (link register, frame pointer, instruction pointer, etc.) from CFA.
  An example of the stack of executing process for code shown in Fig. 4 at the thread creation point and DWARF information, essential for stack unwinding, is shown in Fig. 7.
  Method for stack unwinding by using of such information on stack frame layout is also provided by DWARF.
  This approach has some limitations, such as: in some cases binary file doesn’t have .debug_frame section
  and it can’t be recompiled; some components (e.g., libraries) do not have .debug_frame section; .debug_frame (or its part) is corrupted.
  Thread Visualizer uses .debug_frame-based approach when it’s possible and its own method when mentioned approach can’t be used due to listed above limitations.
  4.3 Thread Visualizer’s Method
  To provide stack unwinding, it’s needed to analyze stack (or copy of stack) frame by frame from stack pointer to stack start address (because stack grows towards lower addresses) and collect the return addresses of every frame. To do it the size of every frame and position of return address in every frame should be defined. Frame size can be defined by analysis of corresponding procedure’s code: finding of all cases of stack pointer decreasing and summing of decrease values.
  The binary code of procedure usually includes three parts: a prologue, a body and an epilogue. The prologue begins from the start address of procedure. The prologue contains instructions that establish the stack frame size and store necessary register values (including return address) in stack frame. Thus, Thread Visualizer’s method is based on analysis of prologue code for procedures which do not have debug information on stack frame layout (or if such information is broken or can’t be used by other reasons).
  According to the proposed method, the prologue instructions, that decrease the stack pointer (pushing to stack register values, allocating the space for local variables, etc.), are located and processed; frame size is calculated by summing of decrease values, thus frame start address can be defined. Instruction that stores the return address of called procedure is located, thus offset of the return address from frame start address can be located. Thus, backtrace as the array of return addresses can be formed.
  The number of analyzed instructions of prologue should be set depending on CPU architecture equal or more than a maximum prologue size for used architecture. Strictly speaking, “prologue” is a compiler-dependent concept. If compiler doesn’t support prologue notation, some reasonable number of first instructions should be analyzed. Note that start address of last analyzed instruction must be less than address of last executed instruction when stack snapshot was made or execution of process was stopped for first procedure in the backtrace and less than respective return address for other procedures in the backtrace.
  The start address of code of the first procedure can be found by known address of last executed instruction which is located inside the first procedure. The start code address of the next procedure can be found by return address for previous procedure which is inside the next procedure. Last procedure can be detected in the following ways: frame in stack snapshot is the last one or the return address of called procedure is out of text block of binary file. Start address of procedure code can be defined by address inside it by using of binary file’s block which contains information on procedures’ start addresses, sizes, symbolic names, etc.(.symtab section of binary file in ELF format).
  And, finally, let’s review an example of analysis of procedure’s prologue with proposed method. See an example of a procedure’s prologue on ARM Assembly:
  In given prologue example instructions that decrease the stack pointer are push {lr} which decreases stack pointer by 4 and sub sp sp, #12 which decreases stack pointer by 12. Then the total frame size is equal to 4 + 12 = 16. The instruction that stores the return address is push {lr} which defines return address position as a first word in the frame, where it can be found and read.
  5. Results and Discussion
  Thread Visualizer helps to debug multi-threaded embedded applications and lets developer know the system-wide behavior of application. Further direction of Thread Visualizer development is adding of new features, helpful for debugging of multi-threaded
  applications, such as kernel threads visualizing, concurrency level checking, etc. Another direction is future research on backtracing techniques to extend backtracing approach to Thumb [7] code, other CPU architectures and compilers.
  6. Conclusions
  The Thread Visualizer’s method of call stack unwinding can be used for cases when frame pointer using is omitted; binary file doesn’t have debug information on stack frame layout; some components of binary file (e.g., libraries) do not have debug information; debug information or its part is corrupted or can’t be used by other reasons and application can’t be recompiled. In other words, above mentioned method allows stack unwinding without any debug information. It’s especially significant for usage for embedded applications.
  References
  [1] A.A. Gerenkov, E.A. Gorelkina, S.S. Grekhov, S.Y. Dianov, J. Jeong, O. Kokachev, L.V. Komkov, S.B. Lee, M.P. Levin, System-wide analyzer of performance: performance analysis of multi-core computing systems with limited resources, in: Proceedings of Eurocon 2009 International IEEE Conference Devoted to the 150-Anniversary of Alexander S. Popov, Saint-Petersburg, Russia, May 18-23, 2009, pp. 1302-1307.
  [2] P. Panchamukhi, Kernel Debugging with Kprobes, Linux Technology Center, IBM India Software Labs, available online at: http://www.ibm.com/developerworks/library/lkprobes/index.html, Aug 19, 2004.
  [3] Boost Performance Optimization and Multicore Scalability on Windows and Linux, available online at: http://software.intel.com/en-us/intel-vtune.
  [4] pthread_create(3), Linux Man Page, available online at: http://linux.die.net/man/3/pthread_create.
  [5] Call Stack, Frame Pointer Structure, Wikipedia, available online at: http://en.wikipedia.org/wiki/Frame_pointer#Str ucture.
  [6] DWARF Debugging Format Standards (See Call Frame Information), available online at: http://www.dwarfstd.org/Download.php.
  [7] ARM Architecture, Thumb, available online at: http://en.wikipedia.org/wiki/ARM_architecture#Thumb.
其他文献
2015年是韩庚出道十年的标志性一年,不知不觉间,从前那位羞涩少年已经到了而立之年。搭档范冰冰主演近期引发“重口味青春”话题的电影《万物生长》,凭着努力和坚持,演技再一次让主创、观众刮目相看!而在其和DIOR HOMME合作拍摄的大片中,韩庚展示了成熟男人的多重面貌。让我们期待韩庚的三十后青春如何继续「醉」倒众人。
期刊
现代人的运动不再只是强身健体这么简单的功能了,高逼格才是运动的真正风向标,即能晒出好身材,又能亮出好行头,这才是低调奢华的时尚精,夏季正是晒时尚的好时机,一身健康肤色,时间充裕的海岛假期,价格高昂的运动器械,通通都能很好的证明你在精神或物质上的“富裕”,也体现你生活态度的时尚。
期刊
踏入初夏,连脚步都变得轻盈起来,但她却因为婴儿肥的圆脸与微胖的手臂而深受造型搭配上的困扰,她是谁?她是我们这期读者大变身的改造主角黄景怡,一位笑起来十分腼腆的少女!
期刊
Abstract: Many diseases and health risks are the result of unhealthy lifestyles and technology could be used as an intervention. However, designing healthy lifestyle technologies is challenging, as th
期刊
Abstract: Curators wonder if the research institution’s news can be gathered efficiently, and how past news can be obtained more conveniently. The point is how to use a different perspective to search
期刊
Giorgio Vasari’s Celestial Utopia of Whimsy and Joy:constellations,Zodiac signs,and Grotesques
期刊
Mental Health of Adolescents and Youth
期刊
Abstract: This paper describes the implementation of an Information Systems (IS) capstone project management course that is a requirement for graduating seniors in an undergraduate Computer Informatio
期刊
Abstract: This paper proposes the authors’ algorithm for gene selection in microarray data analysis comparing conditions with replicates. Based on background noise computation in replicated arrays, th
期刊
Q:life2 CONCEPT最主要的设计方式是什么?  A:做旧并将是木材、皮具与铁料运用得出神入化!
期刊