The Architecture of Recon3

@(#)r3uarch.htm 1.2 - 04/11/00

A Tool Set for Dynamic Analysis of Software

Norman Wilde
Laura White
Aaron Tarnosky

University of West Florida


Table of Contents:


1. Introduction
1.1 Purpose
1.2 Scope
1.3 Definitions, Acronyms and Abbreviations
1.4 References
1.5 Overview
2. General Description
2.1 Product Perspective
2.2 Product Background
2.2.1 Trace Architectures
2.2.2 Trace Events
2.2.3 Trace Modes
2.2.4 Optional Trace Data Items
3. Design Issues and Notes
3.1 Tool Communication
3.2 Trace Control
3.3 Trace Record and Trace File Format
3.4 Distribution and Documentation
3.5 Anticipated Changes
3.6 Implementation Languages
3.7 File Names

1. Introduction

1.1 Purpose

This document describes the objectives, architecture, and terminology of the Recon3 tool set. Recon3 represents the results of several years experience with the Recon2 Software Reconnaissance tool, including trial use at several Software Engineering Research Center (SERC) affiliates [WILD.92, WILD.95A, WILD.96]. It also reflects interviews with developers in SERC and elsewhere concerning the ways they use instrumentation and dynamic analysis to help understand software systems [WILD.99].

The expected audience for this document is the developers of Recon3.

1.2 Scope

Recon3 is a collection of tools to facilitate dynamic software analysis though instrumentation. The different tools provide for (See Figure 1):
  1. Instrumenting a user's target program by inserting instrumentation statements into the source code. The user may also insert instrumentation statements by hand. When executed, each instrumentation statement triggers a trace event that the user wants to monitor, causing a trace record to be written.
  2. Generating and managing the trace records produced as the user's instrumented target program executes.
  3. Analyzing and displaying the trace records.
  4. Controlling the above process with a "user friendly" Graphical User Interface (GUI).

The main objective of Recon3 is to allow Software Engineers to analyze existing software more effectively for reuse, maintenance (evolution) and debugging activities. A specific application of Recon3 is to allow the user to perform Software Reconnaissance, a technique for locating where particular features of an unfamiliar target program are implemented by comparing traces of execution ([WILD.95A]).

Experience with Recon2 has indicated that the following criteria are important if Recon3 is to be easily developed and widely used:

  1. Flexible. Recon3 must be usable across as wide as possible a range of target programs. To achieve this Recon3 must be:
    • Portable across a wide range of commonly used platforms. This portability is afforded by providing the alternative Recon3 architectures described in section 2.2.1.
    • Multi-Language to work with target programs in different languages, and where possible with programs that use combinations of languages.
    • User Tailorable so that the user can adapt Recon3 as needed. This tailorability is provided by distributing the code in source form and by providing options so that the user can select the trace events and trace modes that suit his system, and can control tracing as needed. (Section 3.2)
    • Modular so that the user can choose just the tools he needs. Modularity is provided by implementing Recon3 as a series of command-line tools, which may be run independently or executed from the Recon3 GUI. (Section 3.1)
  2. Easy to Use. The tools, GUI and documentation should be designed:
    • To minimize the amount of learning needed to start using Recon3.
    • To minimize the number of steps needed to instrument, compile, run tests, and analyze the traces from a large target program consisting of hundreds of files in multiple directories.
    The Recon3 GUI is the main tool provided to enhance ease of use, but all components and documentation need to be carefully designed to simplify use as much as possible.
  3. Incremental. The initial versions of Recon3 should make use of the mature components of Recon2 as much as possible.

1.3 Definitions, Acronyms and Abbreviations

Architectures: Basic, Extended, Distributed Different versions of Recon3 described in Section 2.2.1
Instrumentation Statements Source code statements introduced into a program to produce trace output. When executed, each instrumentation statement triggers a trace event that the user wants to monitor, causing a trace record to be written,
Instrumented Target Program The original target program with instrumentation statements inserted.
Software Reconnaissance A method of locating a feature in an unfamiliar program by comparing a trace from a test that exhibits the feature with a trace from a test that does not. See [WILD.95A].
Target Program The original program that the user wishes to analyze.
Trace Event The execution of an instrumentation statement. The trace event is usually the execution of a particular code component of the target program. See Section 2.2.2 for a description of the different kinds of trace event.
Trace Mode: immediate, subroutine, order, count The trace mode defines the data content and interpretation of each trace record. See Section 2.2.3
Trace Frame A period of time during which a sequence of trace events occur. Many methods of analysis, including Software Reconnaissance, require the comparison of events in different trace frames. A trace frame may cover a particular test case, the time a particular target program feature was operating, a period of target program idle time, etc.
Trace Record A data record containing information about a single trace event. See Section 3.3.
User The programmer or software engineer who is using Recon3 to analyze the target program.

1.4 References

[WILD.92] Norman Wilde, Juan A. Gomez, Thomas Gust, Douglas Strasburg, "Locating User Functionality in Old Code", Proc. IEEE Conf. on Software Maintenance - 1992, Orlando, Florida, November 1992, pp. 200 - 205.
[WILD.95A] Norman Wilde and Michael Scully, "Software Reconnaissance: Mapping Program Features to Code", Journal of Software Maintenance: Research and Practice, Vol. 7, No. 1, January-February 1995, pp. 49 - 62.
[WILD.96] Norman Wilde and Christopher Casey, "Early Field Experience with the Software Reconnaissance Technique for Program Comprehension", Proc. IEEE Conf. on Software Maintenance - 1996, Monterey, November 1996, pp. 312 - 318.
[WILD.98] Norman Wilde, Christopher Casey, Joe Vandeville, Gary Trio, Dick Hotz, "Reverse Engineering of Software Threads: A Design Recovery Technique for Large Multi-Process Systems", Journal of Systems and Software, Vol. 43, October 1998, pp. 11 - 17.
[WILD.99] Norman Wilde and Dean Knudson, "Understanding Embedded Software Through Instrumentation: Preliminary Results from a Survey of Techniques", report SERC-TR-85-F, Software Engineering Research Center, Purdue University, 1398 Dept. of Computer Science, West Lafayette, IN 47906, February 1999.

1.5 Overview

Section 2 of this document provides a general description of the toolset. Section 2.1 outlines the main tools and gives a data flow diagram showing how they are related. Section 2.2 gives background on the tools and explains the concepts of architectures, trace events, and trace modes. Section 3 sketches several design issues important for Recon3 including: how the tools communicate with each other; how tracing is controlled; how a trace record is formatted; how Recon3 will be distributed and documented; changes anticipated in Recon3; the implementation languages chosen for Recon3; file naming conventions for Recon3.

Terms defined in section 1.3 will appear in emphasis, like this.

Where appropriate, program names, file names, and C/C++ language constructs will appear in CODE format like this.

[Brief explanations and/or status notes are given in block quotes like this one.]

2. General Description

2.1 Product Perspective

Figure 1 provides an overview of the chief components of Recon3 and of the flow of information between the components.

Recon3 Product Overview DFD
Figure 1
Recon 3 Product Overview DFD

The main Recon3 components are instrumentors, trace managers, trace display and analysis tools, and a graphic user interface (GUI).

[A Software Requirements Specification (SRS) document will be prepared for each component (except the GUI) as it is developed. See these individual SRS documents for tool details.]

Instrumentors

An instrumentor is a Recon3 component that reads the target program and produces a version with inserted instrumentation statements. Manual instrumentation is also possible and may sometimes be desirable [WILD.99]. Instrumentors are language dependent.
[Aaron Tarnosky has prepared a slightly modified version of the r2inst instrumentor for C for use in a transition period.]

Trace Managers

A trace manager is a collection of Recon3 components that collect trace events from the user's instrumented target program and output trace records, normally to trace files. The implementation of the trace manager varies depending on the architecture, and the target program's source language. Typically, it may consist of a trace manager interface (tmi) which is linked with the instrumented target program and called from that program on each trace event, and possibly additional components (e.g. trace manager client) which actually write the trace records.

Instrumentors and Trace Managers are language dependent; a different tool is needed for each source language in the user's target program. Table 1 lists the components that are currently planned and gives the tentative tool name.

Language
Instrumentor
Trace Manager
Basic Architecture
Extended Architecture
Distributed Architecture
C/C++
r3cinst
r3ctmi
r3ctmi, r3ctmc
r3ctmi, r3ctmc, r3ctms
FORTRAN 77
r3finst
r3ftmi
(Note 1)
(Note 1)
Ada
(Note 2)
(Note 2)
(Note 2)
(Note 2)
Java
(Note 3)
(Note 3)
(Note 3)
(Note 3)

Note 1: It has not yet been decided if FORTRAN trace managers will be developed for the extended and distributed architectures.

Note 2: The Pensacola project team will be responsible for determining when and how the Recon3 Ada tools will be developed.

Note 3: Norman Wilde is currently researching possible designs for Java Recon3 tools.

Table 1 - Currently Planned Instrumentors and Trace Managers

Trace Display and Analysis Tools

Trace display and analysis tools are Recon3 components that analyze and/or display to the programmer information about events that occurred over one or more traces frames.
[Initially the r2analyz trace analyzer is available to do Software Reconnaissance analysis. Research is underway on other possible display and analysis tools. Aaron Tarnosky has prepared a rough temporary conversion tool called r3uanc to convert trace records from the Recon3 format to the Recon2 format so that r2analyz can be used.]

Graphical User Interface (GUI)

A graphical user interface (GUI) will be provided to facilitate the steps of instrumenting, compiling, running tests, and analyzing the results. All steps cannot be handled by the GUI. For example, compilation of the instrumented target program and the running of tests are highly dependent on the programmer's environment and cannot be automated in general.

2.2 Product Background

2.2.1 Trace Architectures

Three architectures will be provided for the Recon3 trace manager: basic, extended, and distributed. The different architectures provide the user with environmental flexibility; the more complicated architectures will allow Recon3 to be used with a wider range of target programs. However the more complicated architectures also require the use of operating system facilities and thus are less portable. Table 2 gives the target programs and platform requirements for each architecture type.

Basic Architecture
Extended Architecture
Distributed Architecture
Target Programs

Handles only a single process/thread running on a single machine. Writes directly and immediately to a file.

This architecture may be used directly for simple programs or customized by the user to handle special needs such as embedded systems.

Handles multiple processes/threads running on a single machine.

It is anticipated that this architecture will be the one most widely used.

Handles multiple processes/threads running on multiple machines.

Platform Requirements

Requires only standard compiler(s) (e.g. ANSI standard C compiler for C source language, ANSI standard C and FORTRAN 77 for FORTRAN, etc.)

Requires a file system to which the trace records may be written.

Requires either:

  • Unix POSIX compliant system, or
  • Unix SVR4 compliant system, or
  • Win32 compliant system
to provide interprocess communications support.

Requires either:

  • Unix POSIX compliant system, or
  • Unix SVR4 compliant system, or
  • Win32 compliant system
to provide interprocess communications and network support.
Table 2 - Platform Requirements for each Recon3 Architecture

2.2.2 Trace Events

The instrumentation statements in the instrumented target program call the trace manager to inform it of trace events. Six categories of trace events will be supported: comment, block, decision, subroutine, task, and user. Some of these also have subcategories as described in the following sections.
[See Section 3.3 for the data items in the trace record for each event type]
2.2.2.1 Comment Event
The COMMENT (#) event indicates that a user specified message is to be included in the trace output.
2.2.2.2 Block Event
The BLOCK (B) event indicates that a basic block has been executed. A basic block is a sequence of statements in the target program that contains no branches, so if one statement is executed then all other statements in the block are executed.
2.2.2.3 Decision Event
The decision event indicates that a boolean or multiple-choice branch decision has been made by the target program. A decision event may be either:
2.2.2.4 Subroutine Event
2.2.2.5 Task Event
The generic term "task" is used for tasks, treads, processes, etc.
2.2.2.6 User Event
The Recon3 user may define additional events of type USER (U) to record information about other occurrences in the target program. For example, programmers have sometimes wanted to record messages sent between processes, data base accesses, the values of key variables at certain points in the processing, queue sizes, etc. To record such events, the user may insert calls to Recon3 into the target program by hand.
2.2.2.7 Exit Event
The EXIT (X) event indicates termination of the target program or of one of its processes. (For example, in C, by a call to the exit() function or by a return from main().)
[Each trace manager interface shall provide support for all trace events. However each Recon3 language-specific instrumentor automatically instruments only a subset of the events. The others must be instrumented manually by the programmer if they are to be monitored. The event types supported for each language will be defined in the SRS for its instrumentor.]

2.2.3 Trace Modes

Four modes of trace operation will be provided: immediate, subroutine, order and count.
2.2.3.1 Immediate Mode
In immediate mode trace records for each trace event are written to the trace file as soon as they occur. This mode is available in all architectures.
2.2.3.2 Subroutine Mode
Subroutine mode is a special mode used for tracing subroutine events only. As in immediate mode, subroutine events are written immediately to the trace file as they occur. A simplified trace record format is used. This mode is available in the extended and in the distributed architectures.
[This mode has been found useful to allow a software engineer to monitor the call stack as the program executes.]
2.2.3.3 Order Mode
In order mode, trace events are collected over a trace frame and output at the end of the frame in the order in which they occurred. An overflow record (see section 3.3.3) is written at the beginning of each frame. Only the extended and distributed architectures provide this mode of operation.
2.2.3.4 Count Mode
In count mode, the trace manager records the number of times an event takes place during a trace frame. Identical events are stored only once. A trace record thus really provides a summary of, possibly, many identical trace events. Therefore, this mode may significantly reduce the amount of data output by the trace manager, but the exact sequence of events is lost. Only the extended and distributed architectures provide this mode of operation.

2.2.4 Optional Trace Data Items

There are several data items in the trace record which the user may decide to omit since they may slow down tracing. These are: These data items are provided in the extended and distributed architectures only since they require system calls. In the basic architecture these items default to zero (0).

3. Design Issues and Notes

This section provides brief notes on some of the issues in the design of Recon3 and their resolution. It is not intended to be a comprehensive design document but rather to provide background and rationale about the toolset.
[It is anticipated that a brief design document will also be prepared for each Recon3 component as it is developed.]

3.1 Tool Communication

Each of the Recon3 tools will be executable from the command line of the operating system for maximum flexibility. Each tool should avoid operating system dependencies as much as possible. The user may:
[Recon3 developers should keep the command line interfaces as similar as possible to facilitate ease of use. Pay particular attention to the ordering and meaning of parameters, use of upper or lower case, assumptions about file path names, etc.]

3.2 Trace Control

Our interviews have indicated that the user should be given as much flexibility and control as possible over the trace configuration. The trace configuration includes the kinds of events to be traced, the source files to be traced, the trace mode, the start and end of trace frames, the name of the trace file, etc. The user controls the tracing process in several different ways.
  1. When the instrumentor is run, the user selects which types of trace events should be instrumented. Other trace events can also be added manually. Commands to suspend and resume tracing can also be placed around tight loops to reduce the trace size and the performance impact.
  2. When the instrumented target program is compiled, a trace manager header file is included. This file contains default values for the trace configuration.
  3. When the instrumented target program is run, it may read a trace configuration file to override the default trace configuration.
  4. When the extended or distributed architecture is in use, the programmer can send trace commands to the trace manager at run-time to change the trace configuration. The basic architecture does not support a trace manager.

3.3 Trace Record and Trace File Format

The trace file format of Recon3 will be different from that used in Recon2. The file will be limited to ASCII characters and will contain one trace record per line. Each trace record will consist of several fields delimited by a space.
[Note that certain fields containing a variable length string are preceeded by an integer field with the string's length. This design was adopted to facilitate processing of strings with embedded blanks, such as file names on some operating systems.]

3.3.1 Comment Trace Record

A comment trace record contains two fields. The order, format, and description of each field is shown in Table 3.
Field Type Description
ID character This field indicates the type of trace event.
# - Comment
message string This field contains the comment text.
Table 3 - Comment Trace Record

3.3.2 Trace Event Record

A trace record for a trace event contains eleven fields. The order, format, and description of each field is shown in Table 4.
Field Type Description
ID character This field indicates the type of trace event.
B - Block
T - Decision (True)
F - Decision (False)
S - Decision (Switch)
E - Subroutine (Entry)
R - Subroutine (Return)
A - Task (Start)
V - Task (Rendezvous)
O - Task (Stop)
U - User
X - Exit
count integer When count mode is set, this field contains the number of times an event occurred during a trace frame. Otherwise, this field contains a value of one.
time integer Optional trace data item, see 2.2.4. The format is: HHMMSSmmm where HH is the hour, MM is the minute, SS is the second and mmm is thousandths of a second. Precision is operating system dependent.
process id integer Optional trace data item, see 2.2.4.
thread id integer Optional trace data item, see 2.2.4.
line integer This field contains the source file line number at which the trace event occurred.
value integer This field contains additional information that describes a trace event.
Block      - Ending line number.
Decision (Switch)  - Value of the switch control expression.
Subroutine - Zero (field is ignored).
Task       - Zero (field is ignored).
User       - Zero (field is ignored).
Exit       - Value of exit code.
host name length integer This field indicates the number of characters in the host name field that follows.
host name string The trace manager interface initializes this field with an asterisk. The trace manager server initializes this field with the name of the machine on which the trace event occurred.
file name length integer This field indicates the number of characters in the file name field that follows.
file name string This field contains the name of the source file in which the trace event occurred.
message length integer This field indicates the number of characters in the message field that follows.
message string This field contains additional information that describes a trace event.
Block      - Asterisk (field is ignored).
Decision   - Asterisk (field is ignored).
Subroutine - Name of the subroutine.
Task       - User specified text (must be at least one printable character).
User       - User specified text (must be at least one printable character).
Exit       - Asterisk (field is ignored).
Table 4 - Trace Event Record

3.3.3 Overflow Record

An overflow record is used to mark the beginning of each trace frame and to indicate if any records were lost during that frame.

The record contains three fields as shown in Table 5.
Field Type Description
ID character This field indicates the type of trace event.
> - Overflow
total integer The total number of records stored during the trace frame.
lost integer The number of records that were lost due to memory allocation failures.
Table 5 - Overflow Record

3.3.4 Subroutine Mode Record

In subroutine mode the output format is a simplified trace record with four fields as shown in Table 6.
Field Type Description
ID character This field indicates the type of trace event.
E - Subroutine Entry
R - Subroutine Return
function name string Name of the subroutine.
line integer This field contains the source file line number at which the trace event occurred.
file name string This field contains the name of the source file in which the trace event occurred.
Table 6 - Subroutine Mode Record

3.4 Distribution and Documentation

Recon3 will be distributed online through a web site. Several different "packages" will be available, for different platforms and languages. The source code will be made freely available.
[The web site design, the different "packages", and the user documentation to be distributed are all TBD.]

3.5 Anticipated Changes

Recon3 should be designed to facilitate the following kinds of change:

3.6 Implementation Languages

It is anticipated that Recon3 will be maintained by students in the Software Engineering project class. To keep training to a minimum, the number of different technologies used in Recon3 should be kept to a minimum.

3.7 File Names

Recon3 file names should be all lower case, and in "8.3" format for maximum compatibility. The file naming convention will be as follows:
r3LCxxxx.yyy
where "L" is a lower case letter indicating a source language: and "C" is a lower case letter indicating the component: For example, this document is r3uarch.htm. Its figures are stored in a Rich Text Format file r3uarchf.rtf and generated from that file as GIF's with names like r3uarch1.gif.

An exception to the naming convention is the file r3.h which contains constants and other information used system-wide.