Skip to content
Jin Wang edited this page Mar 27, 2015 · 1 revision

Introduction

This section describes how to produce verbose output from Ocelot to diagnose problems with your application.

The PTX Debugger

Ocelot includes an interactive assembly level debugger that can be used to step through kernels and inspect the complete machine state. This can be enabled in the configure.ocelot file.

Execution Logs

Ocelot comes with built in logging capabilities that are used by us to make sure that it is functioning correctly. These can also be used to inspect exactly what is being done in the CUDA Runtime, the PTX Emulator, and the Code Analysis modules.

Enabling Logging

Ocelot is built using Hydrazine as a helper library which provides basic debugging macros that are used liberally throughout Ocelot. These macros print messages to stdout when the preprocessor macro NDEBUG is not defined, and the log level is set appropriately. The log level is defined in the macro REPORT_BASE which is included in each file. If this is defined to be 0, then the log messages are not compiled, otherwise they are. An example of a log message could be:

report( " Block name " << _label );

RegisterIdSet previousIn = std::move( _aliveIn );
assert( _aliveIn.empty() );
		
report( "  Scanning targets: " );
report( "   " << _fallthrough->label() );		

In this example from the Ocelot DataflowGraph class, each report() function is only printed if REPORT_BASE is defined.

Modules can enable additional log messages using module specific macro definitions.

CUDA Runtime

The CUDA runtime for Ocelot is implemented in ocelot/cuda/implementation/CudaRuntime.cpp and ocelot/cuda/implementation/OcelotRuntimeApi.cpp.

Logging for the CUDA runtime can be enabled by editing these files and setting REPORT_BASE to 1.

PTX Emulator

The main driver for the PTX Emulator in Ocelot is the CooperativeThreadArray class in the ocelot/executive/implementation/CooperativeThreadArray.cpp file. Logging in this class allows you to view the exact sequence of instructions that is executed by the emulator. This allows you to do fine grained debugging and trace data values through registers. This should be used sparingly though because even simple programs can produce very large traces.

// global control for enabling reporting within the emulator
#define REPORT_BASE 0

// if 0, only reconverge warps at syncthreads
#define IDEAL_RECONVERGENCE 1

// watchdog instruction timer. Set this to zero to turn this off completely
#define WATCHDOG_TIMER 0

// reporting for kernel instructions
#define REPORT_STATIC_INSTRUCTIONS 1
#define REPORT_DYNAMIC_INSTRUCTIONS 1

// reporting for register accesses
#define REPORT_FIRST_THREAD_ONLY 1
#define REPORT_REGISTER_READS 1
#define REPORT_REGISTER_WRITES 1

// individually turn on or off reporting for particular instructions
#define REPORT_ABS 1
#define REPORT_ADD 1
.
.
.
  • REPORT_BASE turns on reporting for the emulator.

  • REPORT_STATIC_INSTRUCTIONS will print out the entire program before executing a kernel. This may not necessarily correspond exactly to the PTX generated by NVCC since we do a few very basic program transformations before execution.

  • REPORT_DYNAMIC_INSTRUCTIONS prints out each instruction as it is executed as well as the number of threads and the current warpId.

  • REPORT_REGISTER_READS and REPORT_REGISTER_WRITES prints out the value of each register read and write performed by each instruction, REPORT_FIRST_THREAD_ONLY limits this to the first thread in a CTA only.

  • he additional instruction macros like REPORT_ABS turns on more verbose output for specific instructions.

PTX Parser

Ocelot uses a parser to convert a PTX assembly file into an internal representation. PTX is constantly changing and the space of all possible valid PTX programs is greater than what is currently produced by NVCC, so you may run into bugs here.

The parser is divided into two main components, the Lexer, which is written in Flex (ocelot/parser/implementation/ptx.lpp) with a C++ wrapper (ocelot/parser/implementation/PTXLexer.cpp), and the Parser which is written in Bison (ocelot/parser/implementation/ptxgrammer.ypp and ocelot/parser/implementation/ptx1_4grammer.ypp) with a C++ wrapper(ocelot/parser/implementation/PTXParser.cpp). Logging here can be enabled in the C++ wrappers.

Others

Most of the other modules in Ocelot also contain report macros. If you experience a problem or want to know how our implementation is working, try enabling others.

Trace Generation

Ocelot is built around the idea of collecting dynamic program information from PTX kernels as they are executing. To enable this, we allow attaching trace generator objects to each kernel, which are passed a reference to the entire system state after each instruction is executed.

There are examples of this in ocelot/executive/test/TestTrace.cpp .

There has recently been a new trace generator interface added to ocelot. See the file ocelot/api/interface/ocelot.h for details.